Map Reduce Debugging by taking JVM heap dump
How to debug a Map Reduce program which is being killed due to OOM/GC randomly
To visually see the objects in JVM at this point in time jmap -histo:live pid (Histogram) To take Heap Dump jmap -dump:live,format=b,file=file-name.bin (dump jvm heap as a file on disk) Logonto the datanode where the map/reduce jvm is running , run ps -eaf | grep attempt_id to get the pid . Use Sudo -u “appropriate user to get the heap dump by using jmap command”. Never use -f option . while taking the dump using jmap . To analyse the dump , using jhat . jhat -port “protno” heap_file_path . What to look for in the Jhat analysis 1. Object address having highest memory footprints 2. objects having highest instance count .
Taking HeapDump on OutOfMemoryException using Jvm -XX options
set the following option in Job configuration .
set mapred.child.java.opts ‘-Xmx512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/@taskid@S2sSdebug.hprof ‘.
This option launches the map/reduce task jvm with the value specified thus giving us handle to control various jvm memory related parameters.
Few things to note
- -Xmx512m heap memory in MB
- -XX:+HeapDumpOnOutOfMemoryError dump heap on disk when jvm goes out of memory
- -XX:HeapDumpPath=/tmp/@taskid@S2sSdebug.hprof @taskid@ is replaced by hadoop framework with original taskid which is unique .
One needs to log on to the data nodes and heap dump file would be present at /tmp, file would be named as @taskid@S2sSdebug.hprof ( @taskid@ is replaced by hadoop framework with the original taskid). Jhat can be used to analyze the dump .
Taking HeapDump on OutOfMemoryException And Collecting the dump files across datanodes in a hdfs location for further analysis .
The above mentioned option required one to log on in the data-node on which the map/reduce task has been spawned , and run jmap , jhat on those machines . A MR task which has 100 of Map/reduce tasks can make this process very difficult . This option mentioned below provides a mechanism to collect all heap dump in a specified hdfs location .
Make a shell script named dump.sh :
#!/bin/sh text=`echo $PWD | sed ‘s=/=\\_=g’` (this helps in figuring out which heap dump belongs to which task) hadoop fs -put heapdump.hprof /user/dirname/hprof/$text.hprof
- Place the dump.sh script in a hdfs location by using hadoop dfs -put dump.sh “hdfs location (example /user/dirName/dump.sh) “
- Create a dir on hdfs where u want to gather all the heap dumps and give 777 permission to that dir . (example hadoop dfs -chmod -R 777 /user/dirName/hprof)
- Set the following proprties in the MR job
- set mapred.child.java.opts ‘-Xmx256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./dump.sh’
- set mapred.create.symlink ‘yes’
- set mapred.cache.files ‘hdfs:///user/dir/dump.sh#dump.sh‘
Run the MR job , any OOME issue in any of the datanode will take a heap-dump and place the dump file into the specified hdfs location .One can verify sane execution of the script in the stdoutLog .
on Stdlogout :
java.lang.OutOfMemoryError: Java heap space Dumping heap to ./heapdump.hprof ... Heap dump file created [12039655 bytes in 0.081 secs] # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="./dump.sh" # Executing /bin/sh -c "./dump.sh"...
Use Hadoop Default profiler for profiling and finding issues
set mapred.task.profile ' true'; set mapred.task.profile.params '-agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s' set mapred.task.profile.maps '0-1' set mapred.task.profile.reduces '0-1'
profiler will provide the details of the jvm tasks in the specified range . Location of the dump will be availabe at TaskLogs under profile.out logs section .