Spark Map Vs mapPartitions
Spark Map Vs mapPartitions
map | mapPartitions |
transformation | transformation |
can be attached to MapTask or ReduceTask | can be attached to MapTask or ReduceTask |
Works on a single Row at a time | Works on a partition at a time |
Returns after each Input Row | returns after processing all the Rows in the partition |
Doesn’t hold the output result in Memory | output is retained in memory, as it can return after processing all the rows |
Easy to instantiate a service(reusable object) | Easy to instantiate a service (reusable object) |
No way to figure out when to end the service(No CleanupMethod) | Service can be shutdown before returning |