JVM dissection to Tune GC (JDK 7 )
Understanding Java GC process
Java : http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html , wget -c -O jdk-7u80-linux-x64.tar.gz –no-check-certificate –no-cookies –header “Cookie: oraclelicense=accept-securebackup-cookie” “http://download.oracle.com/otn-pub/java/jdk/7u80-b15/jdk-7u80-linux-x64.tar.gz“
Things to know and think about :
- Any object created in JAVA, is created in Young Gen.
- Before object is transfered to Tenure Gen it is just kept moving between so and s1 Gc params are there to move object directly to Tenure gen (both num cycles and sizes are configurable ).
- All minor GC algo are stop the world , only in case of Major Gc “stop the world can be avoided” .
- jconsole/jvisualm are part of jdk so please use it to see realtime working of GC jstat , jmap , Gc logging can be used to see the GC nos.
- Gc pause of ms is ok , if one is seeing pause for seconds(2+ ) its time to tune your GC . As gc directly impacts application responsiveness.
- Gc is a problem of reference counting and not how much memory(big chunks ) has been allocated ( Allocating larger chunk does not ensure less GC pressure on your application )
How much memory is at disposable of a process : A process can has RAM + Virtual memory at its disposal . Be it a C , perl , python , Java the whole of memory is available for it .
Whats so special About Java Heap : Unlike C process one one does not have to worry about memory deallocation (memory leaks issue) as Java manages the object deallocation on heap . This dosen’t restrict one to use memory available outside the Heap in Java . One can use the Direct Memory allocation available in NIO package to allocate memory off the Heap .
Java Virtual Machine
The Java Virtual Machine (JVM) is an abstract computing machine. The JVM is a program that looks like a machine to the programs written to execute in it. This way, Java programs are written to the same set of interfaces and libraries. Each JVM implementation for a specific operating system, translates the Java programming instructions into instructions and commands that run on the local operating system. This way, Java programs achieve platform independence.
The first prototype implementation of the Java virtual machine, done at Sun Microsystems, Inc., emulated the Java virtual machine instruction set in software hosted by a handheld device that resembled a contemporary Personal Digital Assistant (PDA). Oracle’s current implementations emulate the Java virtual machine on mobile, desktop and server devices, but the Java virtual machine does not assume any particular implementation technology, host hardware, or host operating system. It is not inherently interpreted, but can just as well be implemented by compiling its instruction set to that of a silicon CPU. It may also be implemented in microcode or directly in silicon.
The Java virtual machine knows nothing of the Java programming language, only of a particular binary format, the class file format. A class file contains Java virtual machine instructions (or bytecodes) and a symbol table, as well as other ancillary information.
For the sake of security, the Java virtual machine imposes strong syntactic and structural constraints on the code in a class file. However, any language with functionality that can be expressed in terms of a valid class file can be hosted by the Java virtual machine. Attracted by a generally available, machine-independent platform, implementors of other languages can turn to the Java virtual machine as a delivery vehicle for their languages.
Understanding JVM Memory Model is very important if you want to understand the working of Java Garbage Collection. We will look into different parts of JVM memory and how to monitor and perform garbage collection tuning.
Java (JVM) Memory Model
As you can see in the above image, JVM memory is divided into separate parts. At broad level, JVM Heap memory is physically divided into two parts – Young Generation and Old Generation.
Before one take a deep dive about Gc , Please run jconsole/jvisualm(part of Jdk) to see realtime Gc in action :
Start from what you see in a JDK tool like jstat or jvisualvm and its visualgc plugin:
The Java heap is made up of the Perm, Old and New (sometimes called Young) generations. The New generation is further made up of Eden space where objects are created and Survivor spaces S0 and S1 where they are kept later for a limited number of New generation garbage collection cycles. If you want more details, you might want to read Sun/Oracle’s whitepaper “Memory Management in the Java HotSpot Virtual Machine”.
Young generation is the place where all the new objects are created. When young generation is filled, garbage collection is performed. This garbage collection is called Minor GC. Young Generation is divided into three parts – Eden Memory and two Survivor Memory spaces.
Important Points about Young Generation Spaces:
- Most of the newly created objects are located in the Eden memory space.
- When Eden space is filled with objects, Minor GC is performed and all the survivor objects are moved to one of the survivor spaces.
- Minor GC also checks the survivor objects and move them to the other survivor space. So at a time, one of the survivor space is always empty.
- Objects that are survived after many cycles of GC, are moved to the Old generation memory space. Usually it’s done by setting a threshold for the age of the young generation objects before they become eligible to promote to Old generation.
Old Generation memory contains the objects that are long lived and survived after many rounds of Minor GC. Usually garbage collection is performed in Old Generation memory when it’s full. Old Generation Garbage Collection is called Major GC and usually takes longer time.
Permanent Generation or “Perm Gen” contains the application metadata required by the JVM to describe the classes and methods used in the application. Note that Perm Gen is not part of Java Heap memory.
Perm Gen is populated by JVM at runtime based on the classes used by the application. Perm Gen also contains Java SE library classes and methods. Perm Gen objects are garbage collected in a full garbage collection.
Types of GC (Logical ) : http://www.javacodegeeks.com/2012/01/practical-garbage-collection-part-1.html
Basic stop-the-world, mark, sweep, resume
Compacting vs. non-compacting garbage collection
Generational garbage collection
Stop the World Event
All the Garbage Collections are “Stop the World” events because all application threads are stopped until the operation completes. Since Young generation keeps short-lived objects, Minor GC is very fast and the application doesn’t get affected by this.
However Major GC takes longer time because it checks all the live objects. Major GC should be minimized because it will make your application unresponsive for the garbage collection duration. So if you have a responsive application and there are a lot of Major Garbage Collection happening, you will notice timeout errors.
The duration taken by garbage collector depends on the strategy used for garbage collection. That’s why it’s necessary to monitor and tune the garbage collector to avoid timeouts in the highly responsive applications.
Java Garbage Collection Types
According to JDK 7, there are 5 GC types.
- Serial GC
- Parallel GC
- Parallel Old GC (Parallel Compacting GC)
- Concurrent Mark & Sweep GC (or “CMS”)
- Garbage First (G1) GC
Among these, the serial GC must not be used on an operating server. This GC type was created when there was only one CPU core on desktop computers. Using this serial GC will drop the application performance significantly.
- Serial GC (-XX:+UseSerialGC): Serial GC uses the simple mark-sweep-compactapproach for young and old generations garbage collection i.e Minor and Major GC.Serial GC is useful in client-machines such as our simple stand alone applications and machines with smaller CPU. It is good for small applications with low memory footprint.
- Parallel GC (-XX:+UseParallelGC): Parallel GC is same as Serial GC except that is spawns N threads for young generation garbage collection where N is the number of CPU cores in the system. We can control the number of threads using
-XX:ParallelGCThreads=nJVM option.Parallel Garbage Collector is also called throughput collector because it uses multiple CPUs to speed up the GC performance. Parallel GC uses single thread for Old Generation garbage collection.
- Parallel Old GC (-XX:+UseParallelOldGC): This is same as Parallel GC except that it uses multiple threads for both Young Generation and Old Generation garbage collection.
- Concurrent Mark Sweep (CMS) Collector (-XX:+UseConcMarkSweepGC only for old ): CMS Collector is also referred as concurrent low pause collector. It does the garbage collection for Old generation. CMS collector tries to minimize the pauses due to garbage collection by doing most of the garbage collection work concurrently with the application threads.CMS collector on young generation uses the same algorithm as that of the parallel collector. This garbage collector is suitable for responsive applications where we can’t afford longer pause times. We can limit the number of threads in CMS collector using
- G1 Garbage Collector (-XX:+UseG1GC): The Garbage First or G1 garbage collector is available from Java 7 and it’s long term goal is to replace the CMS collector. The G1 collector is a parallel, concurrent, and incrementally compacting low-pause garbage collector.Garbage First Collector doesn’t work like other collectors and there is no concept of Young and Old generation space. It divides the heap space into multiple equal-sized heap regions. When a garbage collection is invoked, it first collects the region with lesser live data, hence “Garbage First”. You can find more details about it at Garbage-First Collector Oracle Documentation.
Collecting garbage from Young space (consisting of Eden and Survivor spaces) is called a Minor GC. This definition is both clear and uniformly understood. But there are still some interesting take-aways you should be aware of when dealing with Minor Garbage Collection events:
- Minor GC is always triggered when JVM is unable to allocate space for a new Object, e.g. the Eden is getting full. So the higher the allocation rate, the more frequently Minor GC is executed.
- Whenever the pool is filled, its entire content is copied and the pointer can start tracking the free memory from zero again. So instead of classical Mark, Sweep and Compact, cleaning Eden and Survivor spaces is carried out with Mark and Copy instead. So, no fragmentation actually takes place inside Eden or Survivor spaces. The write pointer is always residing on the top of the used pool.
- During a Minor GC event, Tenured generation is effectively ignored. References from tenured generation to young generation are considered de facto GC roots. References from young generation to Tenured generation are simply ignored during the markup phase.
- Against common belief, all Minor GCs do trigger stop-the-world pauses, stopping the application threads. For most of the applications, the length of the pauses is negligible latency-wise. This is true if most of the objects in Eden can be considered garbage and are never copied to Survivor/Old spaces. If the opposite is true and most of the newborn objects are not eligible for GC, Minor GC pauses start taking considerably more time.
So with Minor GC the situation was rather clear – every Minor GC cleans the Young generation.
Major GC vs Full GC
One should notice that there is no formal definitions present for those terms. Neither in JVM specification nor in the Garbage Collection research papers. But on the first glance, building these definitions on top of what we know to be true about Minor GC cleaning Young space should be simple:
- Major GC is cleaning the Tenured space.
- Full GC is cleaning the entire Heap – both Young and Tenured spaces.
Unfortunately it is a bit more complex and confusing. To start with – many Major GCs are triggered by Minor GCs, so separating the two is impossible in many cases. On the other hand – many modern garbage collections perform cleaning the Tenured space partially, so again, using the term “cleaning” is only partially correct.
This leads us to the point where instead of worrying whether the GC is called Major or Full GC, you should focus to finding out whether the GC at hand stopped all the application threads or was it able to progress concurrently with the application threads.
Gc algo in details : http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
JVM params to control GC : http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options.html
Gc analysis tools : http://www.cubrid.org/blog/dev-platform/how-to-monitor-java-garbage-collection/
Commands to print Gc dump :
-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure