How Garbage Collection Internally Works ?

Garbage Collection in Java

Garbage Collection (GC) is the process by which the Java Virtual Machine (JVM) automatically reclaims memory by identifying and disposing of objects that are no longer in use by the program. Java’s garbage collection mechanism helps manage memory allocation and deallocation, preventing memory leaks and ensuring efficient memory usage without the need for developers to manually free memory, as in languages like C or C++.

Here’s how Garbage Collection (GC) works in Java:

1. JVM Memory Structure

The JVM divides its memory into two main areas: the Heap and the Stack.

  • Stack: Stores primitive variables and references to objects in the Heap.
  • Heap: Stores all objects that are dynamically allocated during program execution.

The Heap is where Garbage Collection operates. Objects are created on the Heap, and the garbage collector is responsible for cleaning up unreferenced objects.

2. Object Lifecycle in Java

Java objects are created in the Heap, and they go through the following stages:

  1. Object Creation: Objects are allocated memory in the Heap when created via the new keyword.
  2. Reference Count: Objects can have references pointing to them from variables, fields, arrays, etc.
  3. Object becomes unreachable: When no references point to an object, it becomes unreachable and is considered “garbage.”
  4. Garbage Collection: The garbage collector automatically identifies unreachable objects and reclaims their memory.

3. The Roots of Reachability

Java determines which objects are “alive” and should not be collected based on GC roots. The following are considered roots:

  • Local variables in active threads
  • Static variables in loaded classes
  • JNI (Java Native Interface) references
  • Active threads themselves

Objects directly referenced by the roots are considered “reachable.” Any object that is reachable from another reachable object is also considered live and will not be collected.

4. Phases of Garbage Collection

There are generally two main phases of garbage collection:

Mark Phase

In the Mark Phase, the garbage collector identifies which objects are still in use:

  • Starting from the GC roots, the collector traverses all object references.
  • It marks all objects that are reachable (i.e., still being referenced).
  • Unmarked objects are considered garbage and eligible for collection.

Sweep Phase

In the Sweep Phase, the garbage collector reclaims the memory of unmarked objects:

  • It deallocates the memory used by objects that were not marked in the previous phase.
  • This freed memory is returned to the Heap, allowing new objects to be allocated in that space.

5. Generational Garbage Collection

Java uses a generational garbage collection model, which divides the Heap into different regions based on the age of objects:

Young Generation

  • Eden Space: Newly created objects are first allocated in the Eden space.
  • Survivor Spaces (S0, S1): Objects that survive the first few garbage collection cycles are moved to the survivor spaces.
  • Minor GC: Garbage collection in the Young Generation is called Minor GC and occurs frequently. Objects that are still alive are moved to the next stage (Survivor Spaces or Old Generation).

Old Generation

  • Objects that survive several rounds of Minor GC are promoted to the Old Generation (also known as the Tenured Generation).
  • Full GC: Garbage collection in the Old Generation is called Full GC or Major GC. It happens less frequently but is more expensive and can cause application pauses (Stop-the-World events).

Permanent Generation/Metaspace

  • Older versions of Java (pre-Java 8) had a Permanent Generation (PermGen) space that held class metadata (like class structures and method data).
  • From Java 8 onwards, Metaspace replaced PermGen and is located outside the Heap. It dynamically grows and shrinks based on the metadata requirements.

6. Garbage Collection Algorithms

Java provides different GC algorithms to handle memory reclamation in different environments and use cases. Some of the key algorithms include:

1. Serial GC:

  • Designed for single-threaded environments.
  • It uses a simple mark-and-sweep method but pauses all application threads during garbage collection (Stop-the-World).
  • Suitable for small applications or single-core systems.

2. Parallel GC (Throughput GC):

  • Uses multiple threads for garbage collection.
  • Focuses on maximizing throughput by minimizing the time spent in GC.
  • Suitable for applications where high throughput is more critical than low latency.

3. CMS (Concurrent Mark-Sweep) GC:

  • Minimizes application pauses by performing some garbage collection work concurrently with the application threads.
  • Divides the collection process into marking (done concurrently) and sweeping (may cause pauses).
  • Suitable for applications with low-latency requirements.

4. G1 GC (Garbage First):

  • Designed to be a more balanced GC with predictable pause times.
  • Divides the heap into smaller regions and collects the regions with the most garbage first.
  • It handles both minor and major collections and is recommended for large heaps.

5. ZGC (Z Garbage Collector):

  • A low-latency GC introduced in Java 11.
  • Designed to handle very large heaps with minimal pause times (in milliseconds).
  • Suitable for highly scalable applications requiring low-latency.

6. Shenandoah GC:

  • Another low-latency collector similar to ZGC, introduced in Java 12.
  • It performs garbage collection concurrently without long pauses.

7. Stop-the-World Events

During garbage collection, the JVM sometimes needs to halt all application threads to safely reclaim memory, known as a Stop-the-World (STW) event. This causes the application to temporarily pause, affecting performance.

Modern garbage collectors (like G1 and ZGC) are designed to minimize the duration of these pauses.

8. Optimizing Garbage Collection

1. Tuning the Heap Size

  • Configure the heap size using JVM options like -Xms (initial heap size) and -Xmx (maximum heap size).
  • This helps avoid frequent garbage collection or out-of-memory errors.

2. Choosing the Right GC Algorithm

  • Select the appropriate garbage collector for your application based on your latency and throughput requirements.
  • For low-latency systems, G1 or ZGC may be a better choice, while Parallel GC works for throughput-focused applications.

3. Object Lifetime Management

  • Avoid creating unnecessary objects, especially in performance-critical sections of the code.
  • Use object pooling or reuse patterns when appropriate to reduce pressure on the Young Generation.

Conclusion

Garbage collection in Java is a crucial feature that automatically manages memory, allowing developers to focus on coding without worrying about manual memory management. With various GC algorithms available, Java provides flexible and customizable solutions for different types of applications, from low-latency systems to high-throughput applications. Proper tuning and understanding of how garbage collection works can help ensure optimal performance in Java applications.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *