Map Reduce Debugging by taking JVM heap dump

By
on Feb 14, 2017
in Map Reduce

How to debug a Map Reduce program which is being killed due to OOM/GC randomly

Taking Heap Dump manually:

 

To visually see the objects in JVM at this point in time 
jmap -histo:live  pid  (Histogram)

To take Heap Dump 
jmap -dump:live,format=b,file=file-name.bin  (dump jvm heap as a file on disk)

    Logonto the datanode where the map/reduce jvm is running ,  run ps -eaf | grep attempt_id  to get the pid .
    Use Sudo -u “appropriate user to get the heap dump by using jmap command”.
    Never use -f option . while taking the dump using jmap .

To analyse the dump , using jhat .
jhat  -port “protno”  heap_file_path .

What to look for in the Jhat analysis
    1. Object address having highest memory footprints
    2. objects having highest instance count .

Taking HeapDump on OutOfMemoryException using Jvm -XX options

set the following option in Job configuration .

set mapred.child.java.opts  ‘-Xmx512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/@taskid@S2sSdebug.hprof ‘.

This option launches the map/reduce task jvm with the value specified thus giving us handle to control various jvm memory related parameters.

Few things to note

-Xmx512m heap memory in MB
-XX:+HeapDumpOnOutOfMemoryError dump heap on disk when jvm goes out of memory
-XX:HeapDumpPath=/tmp/@taskid@S2sSdebug.hprof @taskid@ is replaced by hadoop framework with original taskid which is unique .

One needs to log on to the data nodes and heap dump file would be present at  /tmp, 
file would be named as @taskid@S2sSdebug.hprof  ( @taskid@   is replaced by hadoop framework with the original taskid). 
Jhat can be used to analyze the dump .

[addToAppearHere]

Taking HeapDump on OutOfMemoryException And Collecting the dump files across datanodes in a hdfs location for further analysis .

The above mentioned option required one to log on in the data-node on which the map/reduce task has been spawned , and run jmap , jhat on those machines . A MR task which has 100 of Map/reduce tasks can make this process very difficult . This option mentioned below provides a mechanism to collect all heap dump in a specified hdfs location .

Make a shell script named dump.sh :

#!/bin/sh
text=`echo $PWD | sed ‘s=/=\\_=g’` (this helps in figuring out which heap dump belongs to which task)
hadoop fs -put heapdump.hprof    /user/dirname/hprof/$text.hprof

Place the dump.sh script in a hdfs location by using hadoop dfs -put dump.sh “hdfs location (example /user/dirName/dump.sh) “
Create a dir on hdfs where u want to gather all the heap dumps and give 777 permission to that dir . (example hadoop dfs -chmod -R 777 /user/dirName/hprof)
Set the following proprties in the MR job

set mapred.child.java.opts ‘-Xmx256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./dump.sh’
set mapred.create.symlink ‘yes’
set mapred.cache.files ‘hdfs:///user/dir/dump.sh#dump.sh‘

Run the MR job , any OOME issue in any of the datanode will take a heap-dump and place the dump file into the specified hdfs location .One can verify sane execution of the script in the stdoutLog .

on Stdlogout :

java.lang.OutOfMemoryError: Java heap space
Dumping heap to ./heapdump.hprof ...
Heap dump file created [12039655 bytes in 0.081 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="./dump.sh"
#   Executing /bin/sh -c "./dump.sh"...

Use Hadoop Default profiler for profiling and finding issues

 set mapred.task.profile '  true'; set mapred.task.profile.params  '-agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s'   set mapred.task.profile.maps   '0-1'    set mapred.task.profile.reduces   '0-1'

profiler will provide the details of the jvm tasks in the specified range . Location of the dump will be availabe at TaskLogs under profile.out logs section .

Tags: GC issues, heap dump, Map Reduce Debugging, OOM issues