Bullshit Java collection

 

Rough classification: List, Set, Queue, Map

Iterable

The Collection interface inherits the Iterable interface. This interface is designed for the for each loop, and the interface method returns the Iterator object

public interface Iterable<T> {

    Iterator<T> iterator();

    default void forEach(Consumer<? super T> action) {
        Objects.requireNonNull(action);
        for (T t : this) {
            action.accept(t);
        }
    }

    default Spliterator<T> spliterator() {
        return Spliterators.spliteratorUnknownSize(iterator(), 0);
    }
}

Let's take an example to understand the above words

LinkedList<Integer> linkedList = new LinkedList<>();
linkedList.add(1);
linkedList.add(2);
linkedList.add(3);

for (Integer integer : linkedList) {
    System.out.println(integer);
}

After decompiling

LinkedList<Integer> linkedList = new LinkedList();
linkedList.add(1);
linkedList.add(2);
linkedList.add(3);
Iterator var4 = linkedList.iterator();

while(var4.hasNext()) {
    Integer integer = (Integer)var4.next();
    System.out.println(integer);
}

Iterator

Such an iterator appears in the iteratable interface

public interface Iterator<E> {
 
    boolean hasNext();

    E next();

    default void remove() {
        throw new UnsupportedOperationException("remove");
    }

    default void forEachRemaining(Consumer<? super E> action) {
        Objects.requireNonNull(action);
        while (hasNext())
            action.accept(next());
    }
}

The main purpose is to unify the traversal mode and decouple the data structure and access mode of the set

Let's look at the inner classes in the most common ArrayList class

private class Itr implements Iterator<E> {
    int cursor;       // Next returned subscript
    int lastRet = -1; // The subscript to be returned by next this time
    int expectedModCount = modCount; // Modification times

    public boolean hasNext() {
        return cursor != size;
    }

    @SuppressWarnings("unchecked")
    public E next() {
        checkForComodification();
        int i = cursor;
        if (i >= size)
            throw new NoSuchElementException();
        Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length)
            throw new ConcurrentModificationException();
        cursor = i + 1;
        return (E) elementData[lastRet = i];
    }

    public void remove() {
        if (lastRet < 0)
            throw new IllegalStateException();
        checkForComodification();

        try {
            ArrayList.this.remove(lastRet);
            cursor = lastRet;
            lastRet = -1;
            expectedModCount = modCount;
        } catch (IndexOutOfBoundsException ex) {
            throw new ConcurrentModificationException();
        }
    }

    @Override
    @SuppressWarnings("unchecked")
    public void forEachRemaining(Consumer<? super E> consumer) {
        Objects.requireNonNull(consumer);
        final int size = ArrayList.this.size;
        int i = cursor;
        if (i >= size) {
            return;
        }
        final Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length) {
            throw new ConcurrentModificationException();
        }
        while (i != size && modCount == expectedModCount) {
            consumer.accept((E) elementData[i++]);
        }
        // update once at end of iteration to reduce heap write traffic
        cursor = i;
        lastRet = i - 1;
        checkForComodification();
    }

    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
}

We all know that removing in forEach in ArrayList will cause ConcurrentModificationException

ArrayList<Integer> arrayList = new ArrayList<>();
arrayList.add(1);
arrayList.add(1);
arrayList.add(1);

for (Integer integer : arrayList) {
    arrayList.remove(integer);
}
Exception in thread "main" java.util.ConcurrentModificationException

When we use the Iterator to remove, we won't have this problem

public void remove() {
    if (lastRet < 0)
        throw new IllegalStateException();
    checkForComodification();

    try {
        ArrayList.this.remove(lastRet);
        cursor = lastRet;
        lastRet = -1;
        expectedModCount = modCount;
    } catch (IndexOutOfBoundsException ex) {
        throw new ConcurrentModificationException();
    }
}

List

ArrayList

  • Dynamic array
  • Thread unsafe
  • Element is allowed to be null
  • List, RandomAccess, Cloneable, Serializable
  • Contiguous memory space
  • Both adding and deleting will change the value of modCount
  • The default capacity expansion is half

Vector

  • Thread safety
  • Double the previous expansion
  • modCount present
  • synchronized is added to each method that operates on an array

CopyOnWriteArrayList

  • Copy and lock on write
  • Memory consumption
  • Poor real-time performance
  • No ConcurrentModificationException exists
  • The data volume should not be too large
  • Lock with ReentrantLock

Collections.synchronizedList

  • synchronized code block
  • Object locks can be passed in parameters or the current object
  • The List object needs to be passed in
SynchronizedList(List<E> list) {
    super(list);
    this.list = list;
}
SynchronizedList(List<E> list, Object mutex) {
    super(list, mutex);
    this.list = list;
}

LinkedList

  • ArrayList is inefficient in addition, deletion and query, while LinkedList is just the opposite
  • Linked list implementation
  • During the for loop, the order or reverse order is determined according to whether the index is close to the first half or the second half
  • modCount will be changed when adding or deleting

Map

Four common implementation classes

  • HashMap
  • HashTable
  • LinkedHashMap
  • TreeMap

HashMap

HashMap is implemented by array + linked list + red black tree (JDK1.8 adds red black tree), as shown below.

transient Node<K,V>[] table;
// Number of key values actually stored
transient int size;
// Threshold value. When the key value stored in the table is greater than this value, it needs to be expanded
int threshold;
// Load factor because threshold = LoadFactor * table Length
final float loadFactor;

The default length of the table is 16. The default value of the loadFactor is 0.75

Continue to look at the data structure of Node

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}

Determine the location of the hash bucket array index

Method 1:
static final int hash(Object key) {   //jdk1.8 & jdk1.7
     int h;
     // H = key hashCode() takes the hashCode value as the first step
     // H ^ (H > > > 16) is the second step of high-order participation operation
     return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Method 2:
static int indexFor(int h, int length) {  //The source code of jdk1.7. jdk1.8 directly uses the method body without defining this method
     return h & (length-1);  //Step 3 modular operation
}

JDK 1.8 of
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // here
      if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
.....................
....................
}

The Hash algorithm here essentially consists of three steps: Taking the hashCode value of the key, high-order operation, and modular operation

Modulo operation is H & (length - 1). In fact, it is equivalent to h%length, because length is always the nth power of 2. Because & is more efficient than%

(H = key.hashCode()) ^ (H > > > 16) XOR the hashCode of a key with its upper 16 bits

In fact, why do you do this? When the size of the table array is relatively small, the high-order information of the hashCode of the key will be directly discarded. This will increase the low-order conflict. Therefore, the high-order information will be retained through XOR

So why do you want XOR? Isn't binocular operation still available

Method 1 is actually called a perturbation function. The high and low bits of hashCode are XORed to mix the high and low bits of the original hash code, so as to increase the randomness of the low bits. Moreover, the mixed low bits are mixed with some features of the high bits. In this way, the information of the high bits is also retained in a disguised manner. After perturbation, hash conflicts are effectively reduced

 

As for the reason why XOR is used here, in binocular operation & |^, XOR has the best mixing effect, accounting for 50% of the two numbers of binocular operation, and the mixing performance is relatively good

About the problem of circular linked list caused by JDK 1.7 expansion

The following is the capacity expansion code of JDK 1.7

 void resize(int newCapacity) {   //Incoming new capacity
 2     Entry[] oldTable = table;    //Reference the Entry array before capacity expansion
 3     int oldCapacity = oldTable.length;         
 4     if (oldCapacity == MAXIMUM_CAPACITY) {  //If the array size before expansion has reached the maximum (2^30)
 5         threshold = Integer.MAX_VALUE; //Modify the threshold value to the maximum value of int (2^31-1), so that the capacity will not be expanded in the future
 6         return;
 7     }
 8  
 9     Entry[] newTable = new Entry[newCapacity];  //Initialize a new Entry array
10     transfer(newTable);                         //!! Transfer the data to the new Entry array
11     table = newTable;                           //The table property of HashMap refers to the new Entry array
12     threshold = (int)(newCapacity * loadFactor);//Modify threshold
13 }
 void transfer(Entry[] newTable) {
 2     Entry[] src = table;                   //src references old Entry array
 3     int newCapacity = newTable.length;
 4     for (int j = 0; j < src.length; j++) { //Traverse the old Entry array
 5         Entry<K,V> e = src[j];             //Get each element of the old Entry array
 6         if (e != null) {
 7             src[j] = null;//Release the object reference of the old Entry array (after the for loop, the old Entry array no longer references any objects)
 8             do {
 9                 Entry<K,V> next = e.next;
10                 int i = indexFor(e.hash, newCapacity); //!! Recalculate the position of each element in the array
11                 e.next = newTable[i]; //Mark [1]
12                 newTable[i] = e;      //Placing elements on an array
13                 e = next;             //Accessing elements on the next Entry chain
14             } while (e != null);
15         }
16     }
17 } 

Let's take a look at the example above in meituan blog

In the single thread environment, the capacity expansion is normally completed, but is it found or inverted? key7 is in front of key3. This is critical

Let's take another look at the problem of circular linked lists under multithreading

In fact, the circular linked list occurs because the linked list is inverted during capacity expansion

In JDK1.8, two variables are used to solve the problem of circular linked list caused by the inversion of linked list

Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;

Through the two variables head and tail, the problem of inverted linked list during capacity expansion is solved, and the problem of circular linked list is solved

However, in any case, data loss will occur in the case of concurrency. For example, key5 is lost in the above example

HashTable

Legacy classes and many functions are similar to HashMap, but it is thread safe. However, only one thread can write HashTable at any time. The concurrency is not as good as that of ConcurrentHashMap, because ConcurrentHashMap uses segmentation lock. Not recommended

LinkedHashMap

LinkedHashMap inherits from HashMap. On the basis of HashMap, by maintaining a two-way linked list, it solves the problem that HashMap cannot keep the same traversal order and insertion order at any time

Overriding the newNode method of HashMap

It also rewrites the afterNodeInsertion method, which was originally an empty method in HashMap

void afterNodeInsertion(boolean evict) { // possibly remove eldest
    LinkedHashMap.Entry<K,V> first;
    if (evict && (first = head) != null && removeEldestEntry(first)) {
        K key = first.key;
        removeNode(hash(key), key, null, false, true);
    }
}

The method removeEldestEntry returns false in the LinkedHashMap. We can rewrite this method to implement the

/**
 * The iteration ordering method for this linked hash map: <tt>true</tt>
 * for access-order, <tt>false</tt> for insertion-order.
 *
 * @serial
 */
final boolean accessOrder;

The default value is false. The sequence is controlled during traversal

TreeMap

static final class Entry<K,V> implements Map.Entry<K,V> {
    K key;
    V value;
    Entry<K,V> left;
    Entry<K,V> right;
    Entry<K,V> parent;
    boolean color = BLACK;

Implementation of TreeMap bottom layer based on red black tree

Set

There's nothing to say

Queue

PriorityQueue

Default small top heap. You can see the implementation of heap sorting. There are eight common sorting algorithms

public boolean offer(E e) {
    if (e == null)
        throw new NullPointerException();
    modCount++;
    int i = size;
    if (i >= queue.length)
        grow(i + 1);
    size = i + 1;
    if (i == 0)
        queue[0] = e;
    else
        siftUp(i, e);
    return true;
}
public boolean add(E e) {
    return offer(e);
}

 

Tags: Java Algorithm data structure linked list architecture HashMap

Posted by Patch^ on Tue, 31 May 2022 07:08:20 +0530