I often encounter more reads and less writes. I'll teach you how to use ReadWriteLock to implement a general cache Center

Absrtact: in this article, we will talk about how to use ReadWriteLock to implement a general cache center.

This article is shared from Huawei cloud community< [high concurrency] it turns out that ReadWriteLock can also develop high-performance cache. After reading it, I can have a good chat with the interviewer! >, by glacier.

In practice, there is a very common concurrency scenario: the scenario of reading more and writing less. In this scenario, in order to optimize the performance of the program, we often use caching to improve the access performance of the application. Because caching is very suitable for scenarios with more reads and less writes. In the concurrent scenario, the Java SDK provides ReadWriteLock to meet the scenario of reading more and writing less. In this article, we will talk about how to use ReadWriteLock to implement a general cache center.

The knowledge points involved in this article include:

Read write lock

Speaking of read-write locks, I believe my friends are no strangers. Generally speaking, the following principles should be followed for read-write locks:

  • A shared variable can be read by multiple read threads at the same time.
  • A shared variable can only be written by one write thread at a time.
  • When a shared variable is being written by the thread being written, the shared variable cannot be read by the thread being read.

Here, you should note that an important difference between read-write locks and mutexes is that read-write locks allow multiple threads to read shared variables at the same time, while mutexes do not. Therefore, in high concurrency scenarios, the performance of read-write locks is higher than that of mutexes. However, the write operations of the read-write lock are mutually exclusive, that is, when the read-write lock is used, a shared variable cannot be read by the read thread when the write thread performs the write operation.

Read / write locks support fair and unfair modes. Specifically, a boolean variable is passed in the construction method of ReentrantReadWriteLock to control.

public ReentrantReadWriteLock(boolean fair) {
    sync = fair ? new FairSync() : new NonfairSync();
    readerLock = new ReadLock(this);
    writerLock = new WriteLock(this);
}

In addition, it should be noted that in a read-write lock, a read lock calling newCondition() will throw an UnsupportedOperationException exception, that is, a read lock does not support condition variables.

Cache implementation

Here, we use ReadWriteLock to quickly implement a general tool class for caching. The overall code is as follows.

public class ReadWriteLockCache<K,V> {
    private final Map<K, V> m = new HashMap<>();
    private final ReadWriteLock rwl = new ReentrantReadWriteLock();
    // Read lock
    private final Lock r = rwl.readLock();
    // Write lock
    private final Lock w = rwl.writeLock();
    // Read cache
    public V get(K key) {
        r.lock();
        try { return m.get(key); }
        finally { r.unlock(); }
    }
    // Write cache
    public V put(K key, V value) {
        w.lock();
        try { return m.put(key, value); }
        finally { w.unlock(); }
    }
}

As you can see, in ReadWriteLockCache, we define two generic types. K represents the cached Key and V represents the cached value. Inside the ReadWriteLockCache class, we use the Map to cache the corresponding data. Everyone knows that HashMap is not a thread safe class. Therefore, we use the read-write lock to ensure thread safety. For example, we use the read lock in the get() method, which can be read by multiple threads at the same time; The put() method uses a write lock internally, that is, only one thread can write to the cache at the same time.

It should be noted here that the lock release operation, whether it is a read lock or a write lock, needs to be placed in the finally{} code block.

In previous experience, there are two ways to load data into the cache: one is to load the full amount of data into the cache when the project is started, and the other is to load the required cache data on demand during the project operation.

Next, let's take a look at the methods of full load cache and on-demand load cache respectively.

Full load cache

It is relatively simple to load the cache in full, that is, when the project is started, the data is loaded into the cache at one time. This situation is applicable to scenarios where the amount of cached data is small and the data changes are not frequent. For example, some data dictionaries and other information in the system can be cached. The general process of the entire cache load is shown below.

After the full amount of data is loaded into the cache, the corresponding data can be read directly from the cache.

The code implementation of full load cache is relatively simple. Here, I will directly use the following code for demonstration.

public class ReadWriteLockCache<K,V> {
    private final Map<K, V> m = new HashMap<>();
    private final ReadWriteLock rwl = new ReentrantReadWriteLock();
    // Read lock
    private final Lock r = rwl.readLock();
    // Write lock
    private final Lock w = rwl.writeLock();
 
    public ReadWriteLockCache(){
        //query data base
        List<Field<K, V>> list = .....;
        if(!CollectionUtils.isEmpty(list)){
            list.parallelStream().forEach((f) ->{
				m.put(f.getK(), f.getV);
			});
        }
    }
    // Read cache
    public V get(K key) {
        r.lock();
        try { return m.get(key); }
        finally { r.unlock(); }
    }
    // Write cache
    public V put(K key, V value) {
        w.lock();
        try { return m.put(key, value); }
        finally { w.unlock(); }
    }
}

Load cache on demand

On demand cache loading can also be called lazy loading, that is, data will be loaded into the cache only when it needs to be loaded. Specifically, when the program starts, the data will not be loaded into the cache. When the program runs, it is necessary to query some data. First, check whether the required data exists in the cache. If it exists, directly read the data in the cache. If it does not exist, query the data in the database and write the data to the cache. For subsequent read operations, since the corresponding data already exists in the cache, you can directly return the cached data.

This query caching method is applicable to most scenarios where data is cached.

We can use the following code to represent the business of querying the cache on demand.

class ReadWriteLockCache<K,V> {
    private final Map<K, V> m = new HashMap<>();
    private final ReadWriteLock rwl =  new ReentrantReadWriteLock();
    private final Lock r = rwl.readLock();
    private final Lock w = rwl.writeLock();
    V get(K key) {
        V v = null;
        //Read cache
        r.lock();        
        try {
            v = m.get(key);
        } finally{
            r.unlock();    
        }
        //Exists in cache, return
        if(v != null) {  
            return v;
        }  
        //Does not exist in the cache, query the database
        w.lock();     
        try {
		   //Verify again that there is data in the cache
            v = m.get(key);
            if(v == null){ 
                //query data base
                v=Data queried from the database
                m.put(key, v);
            }
        } finally{
            w.unlock();
        }
        return v; 
    }
}

Here, in the get() method, we first read data from the cache. At this time, we add a read lock to the operation of querying the cache. After the query returns, we unlock it. Judge whether the data returned from the cache is empty. If not, the data will be returned directly; If it is empty, the write lock is obtained, and then the data is read from the cache again. If there is no data in the cache, the database is queried, the resulting data is written to the cache, and the write lock is released. Finally, the result data is returned.

Here, a little friend may ask: why has the write lock been added to the program? Why should the cache be queried inside the write lock?

This is because in high concurrency scenarios, multiple threads may compete for write locks. For example, when the get() method is executed for the first time, the data in the cache is empty. If three threads call the get() method at the same time and run to the w.lock() code at the same time, due to the exclusivity of the write lock. At this time, only one thread will acquire the write lock, and the other two threads will block at w.lock(). The thread that obtains the write lock continues to query the database, writes the data to the cache, and then releases the write lock.

At this time, the other two threads compete to write the lock. A thread will acquire the lock and continue to execute. If there is no v = m.get(key) after w.lock(); If the cached data is queried again, the thread will directly query the database, write the data to the cache and release the write lock. The last thread will also follow this process.

Here, in fact, the first thread has already queried the database and written the data into the cache. The other two threads do not need to query the database again. They can directly query the corresponding data from the cache. Therefore, add v = m.get(key) after w.lock(); Querying the cached data again can effectively reduce the problem of repeatedly querying the database in high concurrency scenarios and improve the performance of the system.

Lift level of read / write lock

As for the escalation of locks, partners should note that in ReadWriteLock, locks do not support escalation, because when the read lock is not released, acquiring the write lock at this time will cause the write lock to wait forever, and the corresponding thread will also be blocked and unable to wake up.

Although lock escalation is not supported, ReadWriteLock supports lock degradation. For example, let's take a look at the official ReentrantReadWriteLock example, as shown below.

class CachedData {
    Object data;
    volatile boolean cacheValid;
    final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();

    void processCachedData() {
        rwl.readLock().lock();
        if (!cacheValid) {
            // Must release read lock before acquiring write lock
            rwl.readLock().unlock();
            rwl.writeLock().lock();
            try {
                // Recheck state because another thread might have
                // acquired write lock and changed state before we did.
                if (!cacheValid) {
                    data = ...
                    cacheValid = true;
                }
                // Downgrade by acquiring read lock before releasing write lock
                rwl.readLock().lock();
            } finally {
                rwl.writeLock().unlock(); // Unlock write, still hold read
            }
        }

        try {
            use(data);
        } finally {
            rwl.readLock().unlock();
        }
    }
}}

Data synchronization problems

First of all, the data synchronization here refers to the data synchronization between the data source and the data cache. More directly, it refers to the data synchronization between the database and the cache.

Here, we can adopt three solutions to solve the problem of data synchronization, as shown in the following figure

Timeout mechanism

It is easy to understand that when writing data to the cache, a timeout period is given. When the cache times out, the cached data will be automatically removed from the cache. At this time, when the program accesses the cache again, because there is no corresponding data in the cache, query the database to get the data, and then write the data to the cache. Using this scheme, we need to pay attention to the cache penetration problem.

Scheduled update cache

This scheme is an enhanced version of the timeout mechanism. When writing data to the cache, a timeout time is also given. Different from the timeout mechanism, a separate thread is started in the background of the program to regularly query the data in the database, and then write the data to the cache. This can avoid the penetration of the cache to a certain extent.

Real time update cache

This scheme can synchronize the data in the database with the cached data in real time. The open source Canal framework of Alibaba can be used to synchronize the MySQL database with the cached data in real time.

 

Click "follow" to learn about Huawei cloud new technologies at the first time~

Posted by echoninja on Wed, 01 Jun 2022 19:44:54 +0530