Implement distributed secondary cache based on Spring Cache

1. Let’s talk about what is hard-coded cache?

Before learning Spring Cache, the author often uses the cache in a hard-coded way.

Let's take a practical example. In order to improve the query efficiency of user information, we use cache for user information. The sample code is as follows:

    @Autowire
    private UserMapper userMapper;

    @Autowire
    private RedisCache redisCache;

    //Query users
    public User getUserById(Long userId) {
        //Define the cache key
        String cacheKey = "userId_" + userId;
        //Query the redis cache first
        User user = redisCache.get(cacheKey);
        //If there is one in the cache, return it directly without querying the database
        if (user != null) {
            return user;
        }
        //No more queries to the database
        user = userMapper.getUserById(userId);

        //The data is stored in the cache, so that the next query can be obtained from the cache
        if (user != null) {
            stringCommand.set(cacheKey, user);
        }

        return user;
    }

I believe that many students have written code with a similar style. This style is in line with process-oriented programming thinking and is very easy to understand. But it also has some disadvantages:

The code is not elegant enough. Business logic has four typical actions: "store, read, modify, delete". Each operation needs to define the cache Key, call the API of the cache command, and generate more repetitive codes;

The code coupling between the cache operation and the business logic is high, and it is highly intrusive to the business logic. Intrusion is mainly reflected in the following two points:

1. In the joint debugging stage of development, the cache needs to be removed, and the cache operation code can only be commented or temporarily deleted, which is also prone to errors
2. In some scenarios, cache components need to be replaced. Each cache component has its own API, and the replacement cost is quite high

Wouldn't it be more elegant if it were like the following.

@Mapper
public interface UserMapper  {
    
    /**
     * Get user information based on user id
     *
     * If there is directly returned cached data in the cache, if not, then go to the database to query, and insert it into the cache after the query, where the key prefix of the cache is cache_user_id_, + the incoming user ID
     */
    @Cacheable(key = "'cache_user_id_' + #userId")
    User getUserById(Long userId);
}

Look at the implementation class

    @Autowire
    private UserMapper userMapper;

    //Query users
    public User getUserById(Long userId) {
        return userMapper.getUserById(userId);
    }

In this way, we can see whether it is completely separated from the cache. If the cache needs to be removed during the joint debugging stage of development, then just comment out the annotations directly. Isn’t it perfect?

And this whole set of implementation does not need to be written manually, Spring Cache has already defined the relevant annotations and interfaces for me, and we can easily implement the above functions.

2. Introduction to Spring Cache

Spring Cache is an annotation-based caching component provided in the Spring-context package. It defines some standard interfaces. By implementing these interfaces, you can implement caching by adding annotations to methods.

In this way, the problem of coupling cache code and business processing can be avoided.

There are only two core interfaces of Spring Cache: Cache and CacheManager

1. Cache interface

This interface defines specific operations for providing caches, such as putting, reading, and clearing caches:

package org.Springframework.cache;
import java.util.concurrent.Callable;

public interface Cache {

 // cacheName, the name of the cache, in the default implementation, the cacheName is usually passed in when the CacheManager creates a Cache bean
 String getName();

 //Get the cache used by the bottom layer, such as Ehcache
 Object getNativeCache();

 // Obtain the cached value through the key. Note that the returned value is ValueWrapper. In order to be compatible with the case of storing empty values, the returned value is wrapped with a layer, and the actual value is obtained through the get method.
 ValueWrapper get(Object key);

 // Get the cached value through the key, and return the actual value, that is, the return value type of the method
 <T> T get(Object key, Class<T> type);

 // To get the cached value by key, you can use valueLoader.call() to call the method annotated with @Cacheable. Use this method when the sync attribute of the @Cacheable annotation is configured to true.
 // Therefore, it is necessary to ensure the synchronization between the source and the database in the method. Avoid a large number of requests back to the database when the cache is invalid
 <T> T get(Object key, Callable<T> valueLoader);

 // Put the data returned by the @Cacheable annotation method into the cache
 void put(Object key, Object value);

 // When the key does not exist in the cache, it is put into the cache. The return value is the original data when the key exists
 ValueWrapper putIfAbsent(Object key, Object value);

 // delete cache
 void evict(Object key);

 // Empty the cache
 void clear();

 // Wrapper for cached return values
 interface ValueWrapper {

 // returns the actual cached object
  Object get();
 }
}

2. CacheManager interface

It mainly provides the creation of Cache implementation bean s. Each application can isolate Cache by cacheName, and each cacheName corresponds to a Cache implementation.

package org.Springframework.cache;
import java.util.Collection;

public interface CacheManager {

 // Create a Cache implementation bean through cacheName. In the specific implementation, you need to store the created Cache implementation bean to avoid repeated creation and avoid the loss of the original cache content after the memory cache object (such as Caffeine) is recreated.
 Cache getCache(String name);

 // return all cacheName
 Collection<String> getCacheNames();
}

3. Commonly used annotations

@Cacheable: Mainly applied to the method of querying data.

public @interface Cacheable {

    // cacheNames, CacheManager creates the corresponding Cache implementation bean through this name
 @AliasFor("cacheNames")
 String[] value() default {};

 @AliasFor("value")
 String[] cacheNames() default {};

    // The cache key supports SpEL expressions. The default is to use all parameters and their computed hashCode wrapped objects (SimpleKey)
 String key() default "";

 // Cache key generator, the default implementation is SimpleKeyGenerator
 String keyGenerator() default "";

 // Specify which CacheManager to use, if there is only one, you can not specify
 String cacheManager() default "";

 // cache resolver
 String cacheResolver() default "";

 // The condition of the cache supports SpEL expressions, and the data is cached only when the condition is met. It will be judged before and after calling the method
 String condition() default "";
        
    // Does not update the cache when the condition is met, supports SpEL expressions, and only judges after calling the method
 String unless() default "";

 // Whether to maintain synchronization when returning to the actual method to obtain data, if false, call the Cache.get(key) method; if true, call the Cache.get(key, Callable) method
 boolean sync() default false;

}

@CacheEvict: Clear the cache, mainly applied to the method of deleting data. Compared with Cacheable, there are two more attributes

public @interface CacheEvict {

  // ...For the same attribute description, please refer to the description in @Cacheable
 // Whether to clear all cached data, when it is false, the Cache.evict(key) method is called; when it is true, the Cache.clear() method is called
 boolean allEntries() default false;

 // Clear the cache before or after calling the method
 boolean beforeInvocation() default false;
}

@CachePut: Put it in the cache, mainly used in the method of updating the data. Attribute description reference **@Cacheable**

@Caching: Used to configure multiple annotations on a method

@EnableCaching: Enable Spring cache caching, as a general switch, this annotation needs to be added to the SpringBoot startup class or configuration class to take effect

3. Some issues that need to be considered when using the second-level cache?

We know that relational database (Mysql) data is finally stored on the disk. If you read it from the database every time, the IO of the disk itself will affect the reading speed, so there is a memory cache like redis.

It is true that the query speed can be greatly improved through the memory cache, but if the same query has a very large amount of concurrency and frequent queries to redis, there will also be obvious consumption of network IO, so we can query very frequent data for this kind of query ( Hotspot key), can we consider storing it in the in-app cache, such as: caffeine.

When the in-app cache has qualified data, it can be used directly instead of getting it from redis through the network, thus forming a two-level cache.

The in-app cache is called the first-level cache, and the remote cache (such as redis) is called the second-level cache

The whole process is as follows

The process looks very refreshing, but in fact there are still many points to consider in the second-level cache.

1. How to ensure the consistency of the first-level cache of distributed nodes?

We say that the first-level cache is an in-app cache, so when your project is deployed on multiple nodes, how to ensure that when you modify and delete a key, the first-level cache of other nodes will be consistent?

2. Is it allowed to store null values?

This is indeed something to consider. Because if there is neither a certain query cache nor the database, it will cause frequent queries to the database, causing the database to be Down, which is what we often call cache penetration.

But if you store null values, because a large number of null values ​​may be stored, the cache will become larger, so it is best to configure this, and decide whether to enable it according to the business.

3. Is it necessary to warm up the cache?

In other words, we feel that some key s will be very hot from the beginning, that is, hot data, so can we store them in the cache first to avoid cache breakdown.

4. What is the upper limit of the first-level cache storage?

Since the first-level cache is an in-app cache, do you consider giving a maximum limit to the data stored in the first-level cache to avoid OOM caused by storing too many first-level caches.

5. Consideration of the first-level cache expiration strategy?

We say that redis is used as a second-level cache, and redis is managed by an elimination strategy. For details, please refer to the 8 elimination strategies of redis. What about your L1 cache strategy? For example, if you set the maximum number of first-level caches to 5000,

So when the 5001st one came in, how did you deal with it? Is it not saved directly, or is it a custom LRU or LFU algorithm to eliminate the previous data?

6. How to clear the first-level cache when it expires?

We say that redis is used as a second-level cache, and we have its cache expiration strategy (timed, regular, lazy). What about your first-level cache, how to clear it when it expires?

Here 4, 5, and 6 points are obviously very laborious to implement with our traditional Map, but now there is a better first-level cache library that is Caffeine.

4. Introduction to Caffeine

Caffeine, a high-performance caching library for Java.

A fundamental difference between a cache and a Map is that a cache cleans up stored items

1. Write caching strategy

Caffeine has three cache write strategies: manual, synchronous loading, and asynchronous loading.

2. Cleanup strategy for cached values

Caffeine has three cleaning strategies for cached values: size-based, time-based, and reference-based.

Capacity-based: Reclamation occurs when the cache size exceeds the configured size limit.

Time based:

1. Expiration after write policy.
2. Post-access expiration policy.
3. The expiration time is calculated solely by the Expiry implementation.

Reference-based: Enables garbage collection based on cache keys.

There are four types of references in Java: strong references, soft references, weak references, and phantom references. caffeine can encapsulate values ​​into weak references or soft references.
Soft references: If an object has only soft references, the memory space is sufficient, and the garbage collector will not reclaim it; if the memory space is insufficient, the memory of these objects will be reclaimed.
Weak reference: In the process of scanning the memory area under its jurisdiction, the garbage collector thread will reclaim its memory once it finds an object with only weak references, regardless of whether the current memory space is sufficient or not.

3. Statistics

Caffeine provides a method for recording cache usage statistics, which can monitor the current status of the cache in real time to evaluate the health of the cache and the cache hit rate, etc., to facilitate subsequent adjustment of parameters.

4. Efficient cache elimination algorithm

The role of the cache elimination algorithm is to identify as much as possible which data will be reused in a short time within limited resources, thereby improving the cache hit rate. Commonly used cache elimination algorithms include LRU, LFU, FIFO, etc.

FIFO: First in first out. Select the first entered data to be eliminated first.
LRU: least recently used. Select the least recently used data to be eliminated first.
LFU: least frequently used. Select the data that is used the least frequently within a period of time to be eliminated first.

The LRU (Least Recently Used) algorithm believes that recently accessed data has a higher probability of being accessed in the future.
LRU is usually implemented using a linked list. If data is added or accessed, the data is moved to the head of the linked list. The head of the linked list is hot data, and the tail of the linked list is cold data. When the data is full, the data at the tail is eliminated.

The LFU (Least Frequently Used) algorithm eliminates data based on its historical access frequency. Its core idea is that "if data has been accessed many times in the past, it will be accessed more frequently in the future." According to the idea of ​​LFU, if you want to implement this algorithm, you need an additional set of storage to store the access times of each element, which will cause a waste of memory resources.

Caffeine uses an algorithm that combines the advantages of LRU and LFU: W-TinyLFU, which features: high hit rate and low memory usage.

5. Other instructions

Caffeine's underlying data storage uses ConcurrentHashMap. Because Caffeine is oriented to JDK8, ConcurrentHashMap adds a red-black tree in jdk8, which can also have good read performance when hash conflicts are serious.

Five, based on Spring Cache to achieve secondary cache (Caffeine+Redis)

As mentioned earlier, using redis cache will also consume a certain amount of network transmission, so in-app caching will be considered, but it is very important to remember:

In-app caching can be understood as a more precious resource than redis caching. Therefore, caffeine is not suitable for business scenarios with a large amount of data and a very low cache hit rate, such as user-dimensional caching.

The current project deploys multiple nodes for the application, and the first-level cache is a cache within the application, so when updating and clearing data, it is necessary to notify all nodes to clear the cache.

There are many ways to achieve this effect, such as: zookeeper, MQ, etc., but since redis cache is used, redis itself supports subscription/publishing functions, so it does not depend on other components, and directly uses redis channels to Notify other nodes to clean up the cache.

When a key is updated and deleted, it is enough to notify other nodes to delete the key's local first-level cache by publishing and subscribing.

The specific project code will not be pasted here, so only how to reference this starter package is pasted.

1. Introduce and use maven

   <dependency>
            <groupId>com.jincou</groupId>
            <artifactId>redis-caffeine-cache-starter</artifactId>
            <version>1.0.0</version>
   </dependency>

2,application.yml

Add secondary cache related configuration

# L2 cache configuration
# Note: caffeine is not suitable for business scenarios with a large amount of data and a very low cache hit rate, such as user-dimensional cache. Please choose carefully.
l2cache:
  config:
    # Whether to store empty values, the default is true, to prevent cache penetration
    allowNullValues: true
    # Combined cache configuration
    composite:
      # Whether to enable all first-level caches, the default is false
      l1AllOpen: false
      # Whether to manually enable the first-level cache, the default is false
      l1Manual: true
      # Manually configure the cache key set for the first-level cache, for a single key dimension
      l1ManualKeySet:
      - userCache:user01
      - userCache:user02
      - userCache:user03
      # Manually configure the set of cache names for the first-level cache, for the cacheName dimension
      l1ManualCacheNameSet:
      - userCache
      - goodsCache
    # L1 cache
    caffeine:
      # Whether to automatically refresh expired cache true yes false no
      autoRefreshExpireCache: false
      # The size of the cache flush dispatch thread pool
      refreshPoolSize: 2
      # Cache refresh frequency (seconds)
      refreshPeriod: 10
      # Expiration time after writing (seconds)
      expireAfterWrite: 180
      # Expiration time after access (seconds)
      expireAfterAccess: 180
      # initial size
      initialCapacity: 1000
      # The maximum number of cached objects, when this number is exceeded, the previously placed cache will be invalid
      maximumSize: 3000

    # L2 cache
    redis:
      # The global expiration time, in milliseconds, does not expire by default
      defaultExpiration: 300000
      # The expiration time of each cacheName, in milliseconds, with a higher priority than defaultExpiration
      expires: {userCache: 300000,goodsCache: 50000}
      # The topic name that notifies other nodes when the cache is updated. Default cache:redis:caffeine:topic
      topic: cache:redis:caffeine:topic

3. Add @EnableCaching to the startup class

/**
 *  startup class
 */
@EnableCaching
@SpringBootApplication
public class CacheApplication {

 public static void main(String[] args) {
  SpringApplication.run(CacheApplication.class, args);
 }

}

4. Add the @Cacheable annotation to the method that needs to be cached

/**
 *  test
 */
@Service
public class CaffeineCacheService {

    private final Logger logger = LoggerFactory.getLogger(CaffeineCacheService.class);

    /**
     * for simulating db
     */
    private static Map<String, UserDTO> userMap = new HashMap<>();

    {
        userMap.put("user01", new UserDTO("1", "Zhang San"));
        userMap.put("user02", new UserDTO("2", "Li Si"));
        userMap.put("user03", new UserDTO("3", "Wang Wu"));
        userMap.put("user04", new UserDTO("4", "Zhao Liu"));
    }

    /**
     * Get or load cached items
     */
    @Cacheable(key = "'cache_user_id_' + #userId", value = "userCache")
    public UserDTO queryUser(String userId) {
        UserDTO userDTO = userMap.get(userId);
        try {
            Thread.sleep(1000);// Simulate the time it takes to load data
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        logger.info("Download Data:{}", userDTO);
        return userDTO;
    }


    /**
     * Get or load cached items
     * <p>
     * Note: Because the bottom layer is based on caffeine to implement the first-level cache, so the synchronization mechanism of caffeine itself is used to achieve
     * sync=true It means that cache items are loaded synchronously in a concurrent scenario.
     * sync=true,The cache item is obtained or loaded through get(Object key, Callable<T> valueLoader). At this time, the valueLoader (the specific logic of loading the cache item) will be cached, so when CaffeineCache refreshes the expired cache at regular intervals, the expired cache item will be Reload.
     * sync=false,Cache items are obtained through get(Object key). Since there is no valueLoader (the specific logic for loading cache items), when CaffeineCache regularly refreshes the expired cache, the expired cache items will be eliminated.
     * <p>
     */
    @Cacheable(value = "userCache", key = "#userId", sync = true)
    public List<UserDTO> queryUserSyncList(String userId) {
        UserDTO userDTO = userMap.get(userId);
        List<UserDTO> list = new ArrayList();
        list.add(userDTO);
        logger.info("Download Data:{}", list);
        return list;
    }

    /**
     * refresh cache
     */
    @CachePut(value = "userCache", key = "#userId")
    public UserDTO putUser(String userId, UserDTO userDTO) {
        return userDTO;
    }

    /**
     * Knock out the cache
     */
    @CacheEvict(value = "userCache", key = "#userId")
    public String evictUserSync(String userId) {
        return userId;
    }
}

Project source code: https://github.com/yudiandemingzi/springboot-redis-caffeine-cache

Tags: Java Spring Distribution

Posted by Who27 on Sat, 04 Feb 2023 09:38:11 +0530