redis persistence principle and cache problem solution

redis

redis persistence

When redis is started - > load the persistent file (not when it is first started) - > it is started - > some data will be written
- > redis will write the memory data to the disk at a certain time (generate a persistent file)

1.RDB persistence principle

The principle is that redis will separately create (fork) a sub process identical to the current thread to persist all the data of this sub process
(variables, environment variables, program counters, etc.) are the same as the original process. The data will be written to a temporary file first, and after the persistence is completed,
Replace the last persistent file with this temporary file. During the whole process, the main process does not perform any io operation, which ensures high performance.

  • Where is this persistent file?
  • dir. / (by default, it will be based on the startup location. It is better to configure it)
  • Generate dump.rdb file
  • When is the fork subprocess, or when does it start the rdb persistence mechanism?
  • 1. During shutdown, if aof is not enabled, it will be triggered.
  • 2. Default snapshot configuration in configuration file
  • save 900 1 (timer 900 seconds, a change will trigger)
  • save 300 100
  • save 60 10000 (optimization scheme, delete 300 and 60; rdb cannot be deleted in cluster environment)
  • 3. Execute save (main process execution, blocking), bgsave (fork process, background asynchronous execution snapshot)
  • 4. Execute flush all to clear the memory data, but it is empty and meaningless. (clear the data on the disk, otherwise the data will be lost)

Disadvantages of rdb:
1. Unexpected downtime may cause data loss. Because the rdb persistence mechanism cannot be triggered.

2.AOF persistence principle

The principle is to write the operation log of redis to the file in the form of appending, and the read operation is not recorded.

  • Where is the persistent file?
  • appendonly yes indicates that aof is enabled
  • dir. / (by default, it will be based on the startup location. It is better to configure it)
  • Generate appendonly.aof file
  • Trigger mechanism (according to the configuration items in the configuration file)
  • appendfsync no means that the operating system synchronizes data to disk (batch operation) (fast efficiency, no guarantee of persistence)
  • appendfsync always is synchronized and persistent. Every time there is a data change, it is immediately recorded to the disk (slow and safe)
  • appendfsync everysec means to synchronize once every second (the default value is very fast, but data within one second may be lost)
  • Data flow: main process - > buffer - > AOF file (disk)
  • The data will first enter the buffer from the main process and then be written to the aof file.
  • no indicates that the data enters the buffer and will wait in batch before being written to the aof file
  • always means that the data enters the buffer and is written to the aof file immediately
  • everysec indicates that the data enters the buffer, is executed once every second, and is written to the aof file
  • aof rewriting mechanism (to solve the problem of aof growing)
  • Rewriting will fork the child process. Generate an aof snapshot according to the current memory data.
  • redis4.0 enables mixed persistence, which is a further optimization of rewriting
  • aof use RDB preamble Yes (whether to enable the mechanism of rewriting aof files)
  • Manually execute the command: bgrewriteof, (save the data in the format of rdb to the aof file)

  • Auto trigger override:

    Growth rate (100%)

    auto-aof-rewrite-percentage 100

    When the aof file grows to a certain size, redis will call bgrewriteof to rewrite the log file.

    Auto AOF rewrite min size 64MB (optimization point, production environment must be changed, generally 5G or above)

    For example:
    First trigger: 64mb, second trigger: 64mb + 64mb * 100% = 128mb

summary

type

fork subprocess

shortcoming

advantage

rdb

fork child process

The last written data may be lost

When redis is started, loading persistent data from disk is very fast

aof

No fork child process

No more than 2 seconds of data will be lost

When redis is started, loading persistent data from disk is not as good as rdb

aof override

fork child process

redis cache problem

1. Cache penetration

Refers to querying data that does not exist in the database or cache

resolvent:

  • 1. Cache empty objects (simple code, poor effect)
  • Disadvantages:
  • 1. Only the same key can be restricted, and a large number of keys cannot be restricted.
  • 2. There will be a large number of empty data problems in redis, which will occupy the memory of redis.
    //Get from cache
    obj = redis.getkey(id);
    if(obj != null){
        //(CACHE empty objects)
        if(obj instanceOf Empty object){
            return "No data found";
        }
        return "query was successful"+obj;
    }
    
    try{
        //Get from database
        obj = dao.select(id);
        if(obj != null){
            redis.setKey(id, obj);
            return "query was successful"+obj;
        }else{
            //(CACHE empty objects)
            redis.setKey(id, Empty object);
        }
    }
    
    return "No data found";
  • 2. Bloom filter (complex code and good effect)
    //Bloom filter
    if (!bloomFilter.mightContain(id)) {
        //non-existent
        return "No data found";
    }

    //Get from cache
    obj = redis.getkey(id);
    if(obj != null){
        return "query was successful"+obj;
    }
    
    try{
        //Get from database
        obj = dao.select(id);
        if(obj != null){
            //Add bloom filter
            bloomFilter.put(id);
            return "query was successful"+obj;
        }
    }
    
    return "No data found";
  • shortcoming
  • 1. The maintenance of Bloom filter is troublesome. It cannot be deleted, and only a new one can be reconstructed.
  • 2. The bloom filter needs to be initialized first, otherwise the first query will not exist.
  • principle

For a bit bit group, a key will obtain a hash value through multiple hash algorithms and store it in the bit group; This may occur when a nonexistent key passes through
If the hash value obtained by multiple hash algorithms has data on the corresponding index, there may be misjudgment.

  • Distributed bloom filter

    //guava dependency

    com.google.guava
    guava
    27.0.1-jre

    private static int size = 1000000;

    /**

    • size: how much data is expected to be inserted
    • fpp fault tolerance rate - > what is the probability of false judgment
    • list creates an object array
    • bit array
    • Bit group: 2.1 billion JVM memory data will not be persisted (256M)
    • Redis 4.2 billion redis memory and 512M of persistent data of redis
      */
      private static BloomFilter bloomFilter = BloomFilter.create(Funnels.integerFunnel(), size, 0.001);

    public static void main(String[] args) {

    for (int i = 1; i <= size; i++) {
        bloomFilter.put(i);
    }
    List<Integer> list = new ArrayList<>(10000);
    //Deliberately take 10000 values that do not exist in the filter to see how many will be considered in the filter
    for (int i = size + 10000; i < size + 20000; i++) {
        if (bloomFilter.mightContain(i)) {
            //Misjudgment (if it is clearly not exist, it is misjudged to exist)
            list.add(i);
        }
    }
    System.out.println("Number of misjudgments:" + list.size());
    

    }

2. Cache penetration

There is data in the index database, but there is no cache (this data has not been accessed; the data is just invalid). Concurrent access causes all queries to access the database

Cache breakdown refers to that there is no data in the cache but there is some data in the database (generally, the cache time expires). At this time, due to the large number of concurrent users, the data is not read from the cache at the same time, and the data is fetched from the database at the same time, causing the database pressure to increase instantaneously, causing excessive pressure

As we know, if the cache is not available, we will get it from the database. However, if it is a hot key, the access volume is very large. When the database is rebuilding the cache, many threads will rebuild at the same time. Because of the high concurrency, a large number of hot keys are constantly rebuilt before the reconstruction is completed. Because a large number of threads are doing the cache reconstruction, the server slows down.

resolvent:

  • 1. Mutual exclusion lock

When obtaining the cache for the first time, add a lock, query the database, and then rebuild the cache. At this time, another request comes to get the cache and finds a lock. At this time, it waits. After that, it is a waiting process until the reconstruction is completed. After the lock is released, the cache hit is obtained again.

public String getKey(String key){
    String value = redis.get(key);
    if(value == null){
        String mutexKey = "mutex:key:"+key; //Set the key of the mutex
        if(redis.set(mutexKey,"1","ex 180","nx")){ //Put a lock on this key. ex means that only one thread can execute. The expiration time is 180 seconds
          value = db.get(key);
          redis.set(key,value);
          redis.delete(mutexKety);
  }else{
        // Other threads rest for 100 milliseconds and try again
        Thread.sleep(100);
        getKey(key);
  }
 }
 return value;
}

The advantages of mutexes are that they are very simple and consistent. However, mutexes also have certain problems, that is, a large number of threads are waiting. There is a possibility of deadlock.

  • 2. Distributed lock

redis distributed locks provided by the redisson framework can be used

3. Cache avalanche problem

Cache avalanche refers to an avalanche in which the machine is down or the same expiration time is used when we set the cache, which causes the cache to fail at the same time at a certain time, and all requests are forwarded to the DB. The DB is under excessive pressure.

resolvent:

  • 1. After the cache expires, the number of threads reading and writing the database cache is controlled by locking or queuing. For example, for a key, only one thread is allowed to query data and write cache, while other threads wait.
  • 2. Do L2 cache. A1 is the original cache and A2 is the copy cache. When A1 fails, A2 can be accessed. The expiration time of A1 cache is set to short-term and A2 is set to long-term
  • 3. Set different expiration times for different key s to make the cache expiration time as uniform as possible.
  • 4. If the cache database is distributed, distribute the hotspot data evenly in different cache databases.
4. Consistency between cache and database data

5. Cache granularity control

Generally speaking, the issue of cache granularity is whether to cache all data or cache some data when using cache?

data type

generality

Space occupation (memory space + network code rate)

Code maintenance

All data

high

large

simple

Partial data

low

Small

More complicated

Cache granularity is a problem that is easy to be ignored. If it is not used properly, it may cause a waste of useless space, a waste of network bandwidth, and poor code versatility. Therefore, we must learn to balance the use of three evaluation factors: Data universality, space occupancy ratio, and code maintainability.

redis cluster construction

1. Install redis under / opt/myredis/redis

2. Create new redis7000, redis7001, redis7002, redis7003, redis7004, and redis7005 folders

3. Copy the redis.conf file to redis7000

4. Modify the redis.conf file in the directories

vi redis7000/redis.conf

1.bind 127.0.0.1   Specify native ip(Intranet ip)

2.port 7000

3.daemonize yes

4.pidfile /opt/myredis/redis7000/redis_7000.pid

5.logfile /opt/myredis/redis7000/redis_7000.log

6.dir /opt/myredis/redis7000

7.requirepass 123456

(Cluster environment parameters)
8.cluster-enabled yes(Start cluster mode)

9.cluster-config-file nodes-7000.conf(Here 800 x Best and port Corresponding upper)

10.cluster-node-timeout 15000

11.appendonly yes

4. Copy the redis7000/redis.conf file to redis7001, redis7002, redis7003, redis7004 and redis7005

take redis7000/redis.conf Files: copying to redis7001 And replace 7000 characters with 70001 in batch at the same time

sed 's/7000/7001/g' redis7000/redis.conf > redis7001/redis.conf 

sed 's/7000/7002/g' redis7000/redis.conf > redis7002/redis.conf 

sed 's/7000/7003/g' redis7000/redis.conf > redis7003/redis.conf 

sed 's/7000/7004/g' redis7000/redis.conf > redis7004/redis.conf 

sed 's/7000/7005/g' redis7000/redis.conf > redis7005/redis.conf 

5. Start 700070017002700370047005 instances respectively

/opt/myredis/redis/bin/redis-server /opt/myredis/redis7000/redis.conf 

/opt/myredis/redis/bin/redis-server /opt/myredis/redis7001/redis.conf

/opt/myredis/redis/bin/redis-server /opt/myredis/redis7002/redis.conf 

/opt/myredis/redis/bin/redis-server /opt/myredis/redis7003/redis.conf 

/opt/myredis/redis/bin/redis-server /opt/myredis/redis7004/redis.conf 

/opt/myredis/redis/bin/redis-server /opt/myredis/redis7005/redis.conf  

#Check whether the startup is successful 
ps -ef | grep redis

6. Build a cluster, master-slave allocate and assign slots

/opt/myredis/redis/bin/redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 
127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1

Here are six ip,Indicates the first 3 ip Master node, and the following three are slave nodes
1,Indicates that the master-slave ratio is 1:1(3:3)
2,Indicates that the master-slave ratio is 2:4(2:4),However, there must be more than 3 primary nodes, and an error will be reported here

The slots here are evenly distributed: 16384

7. Verify the cluster

1.Connect to any client:(-c yes redis-cli It can help us automatically calculate the slot position and redirect to the corresponding redis)
/opt/myredis/redis/bin/redis-cli -c -h 127.0.0.1 -p 7000  (-c Indicates the cluster mode, specifying ip Address and port number)

2.Verify:
cluster info(View cluster information) cluster nodes(View node list)

3.Perform data operation verification
set key llsydn

4.To close a cluster, you need to close it one by one. Use the command:
/opt/myredis/redis/bin/redis-cli -c -h 127.0.0.1 -p 700* shutdown 

redis content extension

1.Pipeline

Note: the operation using Pipeline is a non atomic operation

2.GEO

GEOADD locations 116.419217 39.921133 beijin

GEOPOS locations beijin

Geodist locations ABCD km calculation distance

Geordiusbymember locations (Beijing): 150 km (city calculated by distance)

Note: there is no delete command. Its essence is zset (type locations)

So you can use zrem key member to delete the element

Zrange key (0) - 1 means all value s in the specified set are returned

3.hyperLogLog

Redis added the HyperLogLog structure in version 2.8.9.

Redis HyperLogLog is an algorithm for cardinality statistics. The advantage of HyperLogLog is that when the number or volume of input elements is very large, the space required for cardinality calculation is always fixed and small

In Redis, each HyperLogLog key only needs 12 KB of memory to calculate the cardinality of nearly 2 ^ 64 different elements. This is in sharp contrast to a set in which the more elements consume the more memory when calculating the cardinality.

PFADD 2017_03_06:taibai 'yes' 'yes' 'yes' 'yes' 'no'

PFCOUNT 2017_ 03_ 06: how many different values do Taibai statistics have

1.PFADD 2017_09_08:taibai uuid9 uuid10 uu11

2.PFMERGE 2016_03_06:taibai 2017_09_08:taibai merger

Note: the nature is still a string, and there is a fault tolerance rate. The official data is 0.81%

4.bitmaps

setbit taibai 5000 0 (set the 5000bit bit whose key is taibai to 0)

getbit taibai 5000 (the key is the 5000 bit data of taibai)

bitcount taibai (count the number of 1)

Bitmap is essentially a string, a string of consecutive binary digits (0 or 1), and the position of each bit is offset.
The maximum length of string (Bitmap) is 512 MB, so they can represent 2 ^ 32 = 4294967296 different bits.

Tags: Database Linux Java Redis server Cache

Posted by Presto-X on Sat, 27 Aug 2022 15:04:02 +0530