ReduceByKey vs CombineByKey
Whats the difference between reduceByKey and CombineByKey in Spark
There is a huge difference between groupByKey and CombineByKey/reduceByKey . Please refer to the following post to get in-depth understanding.
The only difference between reduceByKey and CombineByKey is the API, internally they function exactly the same .
reduceByKey | CombineByKey |
reduceByKey internally calls combineByKey | CombineByKey is the generic api and is used by reduceByKey and aggregateByKey |
the input type and outputType of reduceByKey are the same | CombineByKey is more flexible, hence one can mention the required outputType .
The output type is not necessarily required to be the same as that of the input type. |
AggregateByKey internally too calls CombineByKey