ReduceByKey vs CombineByKey

Whats the difference between reduceByKey and CombineByKey in Spark

There is a huge difference between groupByKey and  CombineByKey/reduceByKey  . Please refer to the following post to get in-depth understanding.


The only difference between  reduceByKey and CombineByKey is the API, internally they function exactly the same .


reduceByKey CombineByKey
reduceByKey internally calls combineByKey CombineByKey is the generic api  and is used by reduceByKey and aggregateByKey
the input type and outputType of reduceByKey are the same CombineByKey is more flexible, hence one can mention the required outputType .

The output type is not necessarily required to be the same as that of the input type.


AggregateByKey internally too calls CombineByKey