Explain garbage collection in Java.
Garbage Collection (GC) in Java is an automatic memory management process that identifies and removes objects that are no longer referenced by the application. Its primary goal is to free up memory occupied by these unused objects, preventing memory leaks and simplifying memory management for developers.
What is Garbage Collection?
In Java, memory management is largely handled by the Java Virtual Machine (JVM). Unlike languages like C or C++, where developers must manually allocate and deallocate memory, Java's GC automatically reclaims memory from objects that are no longer needed. This significantly reduces the burden on developers and helps prevent common memory-related errors.
How Does Garbage Collection Work?
The core principle of Java's garbage collector is to identify 'unreachable' objects. An object is considered unreachable if there's no way for the running application to access it. Once an object becomes unreachable, the garbage collector marks it for deletion and eventually reclaims the memory it occupied.
The Mark-Sweep-Compact Algorithm (Conceptual)
Many GC algorithms are variations of the Mark-Sweep-Compact approach:
- Mark Phase: The GC traverses all reachable objects from a set of 'GC roots' (e.g., local variables, static fields). All reachable objects are marked as 'live'.
- Sweep Phase: The GC then scans the heap, identifying and deleting all objects that were not marked during the Mark Phase (i.e., the unreachable objects). The memory is added to a free list.
- Compact Phase: After sweeping, memory can become fragmented (small, non-contiguous free blocks). The compact phase moves live objects together, defragmenting the heap and making larger contiguous blocks available for new object allocations. This phase is not always present in all collectors or runs less frequently.
Generational Garbage Collection
Most Java applications adhere to the 'Weak Generational Hypothesis', which states that most objects are short-lived, and a few objects are long-lived. To optimize GC performance, the heap is divided into generations:
- Young Generation: This is where new objects are initially allocated. It's further divided into Eden space and two Survivor spaces (S0 and S1). Minor GCs (or Young GCs) happen here frequently, are typically fast, and collect most objects.
- Old Generation (Tenured Generation): Objects that survive multiple Minor GCs (i.e., are long-lived) are promoted to the Old Generation. Major GCs (or Full GCs) happen here, are less frequent but typically take longer.
- Permanent Generation / Metaspace: Historically, metadata like class definitions and method information resided in the Permanent Generation. In Java 8 and later, this was replaced by Metaspace, which uses native memory.
Types of Garbage Collectors
The JVM offers several GC algorithms, each with different performance characteristics and goals (throughput vs. low latency):
- Serial GC: Single-threaded, 'stop-the-world' (STW) collector. Suitable for small applications or single-processor machines.
- Parallel GC: Default GC in Java 8. Multi-threaded version of Serial GC for throughput. Still STW during major collections.
- CMS (Concurrent Mark Sweep) GC: Designed for low pause times. It tries to do most of its work concurrently with the application threads, but can suffer from fragmentation and is deprecated/removed in newer Java versions.
- G1 (Garbage-First) GC: Default in Java 9+. A generational, concurrent, parallel, and compacting collector that partitions the heap into regions. It aims for a balance of throughput and predictable pause times.
- ZGC & Shenandoah GC: Designed for very low pause times (sub-millisecond) even for very large heaps. Highly concurrent, experimental in earlier versions, now standard.
When Does Garbage Collection Occur?
The exact timing of garbage collection is determined by the JVM, which monitors memory usage and triggers GC when necessary (e.g., when the heap is running low on space). Developers cannot explicitly force garbage collection; System.gc() is merely a hint to the JVM that a garbage collection might be beneficial, but there's no guarantee it will run immediately or at all.
Advantages and Disadvantages
| Aspect | Description |
|---|---|
| Advantages | Automatic memory management, significantly reduces memory leaks and dangling pointers, simplifies development by removing manual memory handling. |
| Disadvantages | Can introduce performance overhead (pauses during 'stop-the-world' events), unpredictable timing can affect real-time applications, developers have less control over memory reclamation. |
Important Points and Best Practices
- Avoid creating unnecessary objects to reduce GC workload.
- Set object references to
nullif they are no longer needed (especially in long-running scopes) to make objects eligible for GC sooner. - Understand your application's memory footprint and choose the appropriate GC algorithm.
- Monitor GC logs (
-Xlog:gc*) to tune JVM parameters and troubleshoot performance issues. - Consider using memory profilers to identify memory leaks or inefficient object usage.
Example: Object Eligibility for GC
class MyObject {
String name;
public MyObject(String name) {
this.name = name;
}
@Override
protected void finalize() throws Throwable {
System.out.println(this.name + " collected!");
}
}
public class GcExample {
public static void main(String[] args) {
MyObject obj1 = new MyObject("Object 1"); // Reachable
MyObject obj2 = new MyObject("Object 2"); // Reachable
obj1 = null; // Object 1 is now eligible for GC
// System.gc(); // Hint for GC, not guaranteed to run
MyObject obj3 = new MyObject("Object 3"); // Reachable
obj2 = obj3; // The original "Object 2" is now eligible for GC
// "Object 3" is still reachable via obj2
System.out.println("End of main method. Objects eligible for GC: Object 1, original Object 2");
}
}