☕ Java Q40 / 105

How does HashMap work internally?

AI-Powered Answer ✓ Answered

Java's HashMap is a cornerstone data structure for storing key-value pairs, renowned for its average O(1) time complexity for `put`, `get`, and `remove` operations. This efficiency is achieved through clever use of hashing. This document delves into the internal mechanics of how HashMap works, from its core components to its collision resolution and resizing strategies.

Core Components of HashMap

At its foundation, a HashMap comprises an array of 'buckets' (often referred to as a table in the source code). Each bucket is capable of holding the first entry in a chain of key-value pairs that happen to hash to the same array index.

Each key-value pair is encapsulated within an Entry object (or Node in Java 8 and later). These Node objects typically store the key, its associated value, the hash code of the key, and a reference to the next Node in the bucket's chain (essential for collision handling).

  • Array/Table: The primary data structure, an array of Node references.
  • Nodes (Entries): Objects that store (hash, key, value, next).
  • hashCode(): Method called on the key object to generate an integer hash value.
  • equals(): Method called on key objects to determine if two keys are logically identical.
  • Load Factor: A threshold value (default 0.75) that determines when the HashMap should resize.
  • Threshold: The maximum number of entries before resizing (capacity * load factor).

The `put` Operation

When you invoke map.put(key, value), HashMap executes a series of steps to store the entry:

  • Hash Calculation: The hashCode() method of the key is called. This hash value is then often processed internally (e.g., by XORing with its higher bits) to minimize collisions and ensure a more uniform distribution across the buckets.
  • Index Determination: The processed hash code is used to compute the index within the internal array where the entry should reside. This is typically done via (n - 1) & hash, where n is the array's length.
  • Collision Handling: If the target bucket is empty, the new Node is placed directly. If the bucket already contains entries (a collision), the HashMap traverses the linked list (or Red-Black tree, in Java 8+) at that bucket. It checks if an existing key is equal to the new key. If a match is found, the old value is replaced. Otherwise, the new Node is appended to the end of the list/tree.
  • Resizing Check: After successfully adding or updating an entry, the HashMap verifies if its total size has exceeded the threshold. If it has, a resize operation (rehashing) is triggered.
java
// Simplified conceptual put method
public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value); // Special handling for null key
    int hash = hash(key); // Compute hash and potentially mix it
    int i = indexFor(hash, table.length); // Get bucket index

    for (Node<K,V> e = table[i]; e != null; e = e.next) {
        if (e.hash == hash && ((e.key == key) || key.equals(e.key))) {
            V oldValue = e.value;
            e.value = value; // Update existing value
            return oldValue;
        }
    }

    addEntry(hash, key, value, i); // Add new entry or append to list/tree
    return null;
}

The `get` Operation

Retrieving a value using map.get(key) follows a similar hashing process:

  • Hash and Index Calculation: The hashCode() of the provided key is computed, and the corresponding array index is determined using the exact same logic as the put operation.
  • Bucket Traversal: The HashMap navigates to the identified bucket. It then iterates through the linked list or tree structure within that bucket, comparing the supplied key with each entry's key. This comparison first uses the hash code (for a quick filter) and then equals() (for a definitive match).
  • Value Retrieval: Upon finding a matching key via equals(), its associated value is returned. If the traversal completes without finding a matching key, null is returned.
java
// Simplified conceptual get method
public V get(Object key) {
    if (key == null)
        return getForNullKey(); // Special handling for null key
    int hash = hash(key);
    int i = indexFor(hash, table.length);

    for (Node<K,V> e = table[i]; e != null; e = e.next) {
        if (e.hash == hash && ((e.key == key) || key.equals(e.key))) {
            return e.value; // Found!
        }
    }
    return null; // Not found
}

Collision Resolution

Collisions, where distinct keys produce the same hash code and thus map to the same array index, are handled efficiently to maintain performance:

Separate Chaining (Linked List): Historically and by default for shorter chains, each bucket points to the head of a linked list. All Node entries that hash to the same bucket are chained together. When searching, this linked list is traversed sequentially until the equals() method confirms a key match.

Treefication (Red-Black Trees - Java 8+): To prevent worst-case O(n) performance when a large number of keys hash to a single bucket, Java 8 introduced a feature to convert long linked lists into Red-Black trees. If the number of nodes in a single bucket exceeds a TREEIFY_THRESHOLD (typically 8), the linked list is transformed into a balanced binary search tree, improving search, insertion, and deletion operations within that bucket to O(log n).

Resizing (Rehashing)

To preserve its average O(1) performance, HashMap dynamically expands its internal array when it becomes too densely populated. This crucial process is known as resizing or rehashing.

The decision to resize is governed by the load factor (defaulting to 0.75) and the threshold. When the number of entries (size) surpasses the threshold (calculated as capacity * load factor), the HashMap doubles its array capacity. All existing entries must then be rehashed and meticulously redistributed into the new, larger array, as their bucket indices will likely change due to the new array size.

This rehashing is an O(n) operation, meaning its cost is proportional to the number of entries. Therefore, it's performed sparingly to prevent performance degradation. Carefully choosing an appropriate initial capacity can help minimize the frequency of resizing.

Default Configuration Parameters

ParameterDefault ValueDescription
Initial Capacity16The default starting size of the internal array.
Load Factor0.75fA float value determining when the HashMap will be resized (capacity * load factor).
TREEIFY_THRESHOLD8Number of nodes in a single bucket that triggers conversion to a Red-Black tree.
UNTREEIFY_THRESHOLD6Number of nodes in a tree bucket that triggers conversion back to a linked list.
MIN_TREEIFY_CAPACITY64Minimum table capacity required for treefying (avoids treefying in small HashMaps).

Important Considerations

  • Mutability of Keys: Keys used in HashMap should ideally be immutable (e.g., String, Integer). If a mutable object is used as a key and its state changes in a way that affects its hashCode() after it's been put into the map, the entry might become unretrievable.
  • null Keys and Values: HashMap permits exactly one null key and multiple null values. The null key is always mapped to index 0 of the internal array.
  • Thread-Safety: HashMap is *not* inherently thread-safe. For concurrent access in multi-threaded environments, ConcurrentHashMap should be used to avoid data corruption and race conditions.
  • Performance of hashCode() and equals(): The overall efficiency of HashMap critically depends on robust and well-implemented hashCode() and equals() methods for key objects. Poor implementations can lead to excessive collisions, forcing lengthy bucket traversals and degrading performance towards O(n).
  • Fail-Fast Iterators: HashMap's iterators are designed to be fail-fast. If the map's structure is modified (e.g., by adding or removing entries, but not just changing a value) after an iterator has been created, a ConcurrentModificationException will be thrown. This mechanism is primarily for detecting bugs rather than providing thread-safe behavior.