Learning Garbage Collection Practically in Java ( Part 1 )
I am writing this article to explain Garbage collection in Java in detail. There are hundreds of articles in this topic but when i read many of those articles or videos, the difficulty i faced was, it was so much theory but it does not explain how to practically apply those learning in real work. Questions like what commands to use, what steps to follow, etc. were not answered fully.
So, in this article, i will explain each of the concept in detail. I will add a code sample that you can run from your end ( i will provide the exact commands ) to try and it will help you get clarity on the concepts.
- What is Garbage Collection ? Is it specific only to Java ?
- Why is it necessary ? Can i avoid Garbage Collection completely ?
- Where does Garbage Collection happens ?
- What are the types of Garbage Collection and how is it different from each other at a high level ?
- Does Garbage Collection Policy varies by Java version ? What determines the default Garbage Collection Policy for a Java process ?
- How to find what Garbage Collection Policy is set for a process ?
- How each Garbage Collection policy is different in detail ?
- How to determine what Garbage Collection Policy is best for my application ?
- How to analyze the performance of Garbage Collection? What tools can i use ? Are they free or Paid ?
- What is the latest in Garbage Collection ?
What is Garbage Collection ? Is it specific only to Java ?
Garbage collection is the concept of reclaiming the free memory so that it can be re-used for new objects. This is not specific to Java. It applies to all programming languages including C, Dot-net, etc. However, in C, object allocation and deallocation is done explicitly using malloc and free operators. But, in languages like Java, memory management is handled by JVM (Java Virtual Machine). What is a JVM ? JVM is nothing but a process. When you run a java program, it default has a main thread. When you build the java code, it creates the Jar file. When you execute this Jar file, it runs as a process in our computer. This process is an instance of JVM. This logic applies to all the complex applications as well. For example, when you run a JBOSS Server, it is nothing but a JVM Process which is started using java arguments like
java -jar -Xmx4000M -XX:+PrintGCDetails -XX:+PrintGCDateStamps "org\jboss7\bin\StartServer.jar"
Why is it necessary ? Can i avoid Garbage Collection completely ?
Every computer or machine or server has only limited amount of memory ( RAM). So, every time a new object needs to be allocated, old objects that are no longer referenced needs to be removed from the memory so that the memory area can be used by the new object.
No, Garbage collection cannot be avoided completely. There are multiple tasks that happens as a part of garbage collection like young GC, old GC, etc. Few of this tasks can be avoided for a extended period of time by managing the objects effectively and algorithms like G1 are aimed at these tasks. But, you can never avoid GC 100%
Where does Garbage Collection happens ?
To understand the concept of Garbage collection, we should understand the different memory areas. When you run a java program, it stores the objects in Stack and Heap Memory. Garbage collection affects only the Heap Memory. Non Heap Memory areas are not affected due to Garbage collection.
What are the types of Garbage Collection and how is it different from each other at a high level ?
There are 4 popular GC types
- Serial GC or Serial Collector
- Parallel GC or Parallel Collector or Throughput Collector
- CMS GC
- G1 GC
Serial GC - Single Threaded - When GC thread runs, all application threads are stopped. This GC policy is default only in machines with one CPU or on 32 bit processors
Parallel GC - Multi Threaded - Same as Serial GC. All application threads are stopped when GC starts. But GC runs with multiple threads.
CMS GC - Multi Threaded - Some phases of this GC can happen concurrently with application threads ( without stopping application threads ) but for some phases, it will stop application threads
G1 GC - Multi Threaded - This GC does not stop application threads. It works differently in terms of how the memory areas are used. Will explain in detail later in the article.
Each of these GC policies have their own advantages and disadvantages. We will explain them later.
Does Garbage Collection Policy varies by Java version ? What determines the default Garbage Collection Policy for a Java process ?
Yes, the default garbage collection policy is dependent on the JDK version.
- Java 7 - Parallel GC
- Java 8 - Parallel GC
- Java 9 - G1
- Java 10 - G1
We can explicitly specify the GC Policy as well.
How do we change it ? When we start a JVM process, we specify the arguments where we can mention what GC policy is needed.
-XX:+UseSerialGC - Serial GC
-XX:+UseParallelGC - Parallel GC
-XX:+UseConcMarkSweepGC - CMS
-XX:ParallelCMSThreads - CMS Collector – number of threads to use
-XX:+UseG1GC - G1 Gargbage Collector
How to find what Garbage Collection Policy is set for a process ?
If you are not sure what process is running on a machine, you can navigate to JDK bin directory and type
jcmd
This will list you all the java processes running on the server and their process ID.
Now, type
jmd -heap PID
This will tell you what is the current GC policy set for that process.
How each Garbage Collection policy is different in detail ?
At a very high level, Heap memory has 2 memory generation areas. Younger Generation and Older Generation. When new objects are created, it is stored on younger generation and after few GCs, these objects are moved from Young Generation to old Generation. ( We will see in detail for each of the Garbage Collection policy how the memory areas are and how object moves later )
As objects are moved to the old generation, eventually it too will fill up, and the JVM will need to find any objects within the old generation that are no longer in use and discard them. This is where GC algorithms have their biggest differences.
The simpler algorithms stop all application threads, find the unused objects and free their memory, and then compact the heap. This process is called a full GC, and it generally causes a long pause for the application threads.
On the other hand, it is possible—though more computationally complex—to find unused objects while application threads are running; CMS and G1 both take that approach.
Because the phase where they scan for unused objects can occur without stopping application threads, CMS and G1 are called concurrent collectors. They are also called low-pause (and sometimes—incorrectly—pause less) collectors, since they minimize the need to stop all the application threads. Concurrent collectors also take different approaches to compacting the old generation.
The benefit of avoiding long pause times with a concurrent collector, however comes at the expense of extra CPU usage.
Lets look at each of the GC Policy in detail.
Parallel GC or Parallel Collector or Throughput Collector
Things to remember about Garbage Collection Settings