ReduceByKey vs CombineByKey
Whats the difference between reduceByKey and CombineByKey in Spark
There is a huge difference between groupByKey and CombineByKey/reduceByKey . Please refer to the following post to get in-depth understanding.
The only difference between reduceByKey and CombineByKey is the API, internally they function exactly the same .
|reduceByKey internally calls combineByKey||CombineByKey is the generic api and is used by reduceByKey and aggregateByKey|
|the input type and outputType of reduceByKey are the same||CombineByKey is more flexible, hence one can mention the required outputType .
The output type is not necessarily required to be the same as that of the input type.
AggregateByKey internally too calls CombineByKey