preface
If you want to do a good job in high concurrency, it is inevitable to do a good job in the underlying cache. For example, if you want to do some e-commerce product details pages, the real ultra-high concurrency, with 100000 or even millions of QPS requests, and millions of requests per second, redis alone is not enough, but redis is a very important link in the entire large cache architecture, which supports the high concurrency architecture. First of all, your underlying caching middleware, caching system, must be able to support the high concurrency we call it. Second, after a good overall cache architecture design (such as multi-level cache architecture and hotspot cache)
Where is the bottleneck that redis cannot support high concurrency
stand-alone
What should redis do if it wants to support more than 100000 + concurrency?
It is almost impossible for a single redis to say that the QPS is more than 100000 +, unless there are some special circumstances, such as your machine performance is particularly good, the configuration is particularly high, the physical machine and maintenance are particularly good, and your overall operation is not too complicated
Read / write separation. Generally speaking, the cache is used to support high concurrency of reads. There are relatively few write requests. Write requests may be thousands or twothousand a second. A large number of requests are read, 200000 reads a second
The significance of master persistence for the security of master-slave architecture
Figure basic principle of redis replication
The core mechanism of redis replication
- Redis replicates data to the save node asynchronously. However, since redis2.8, the save node will periodically confirm the amount of data it replicates each time
- A master node can be configured with multiple save nodes
- Slave nodes can also be connected to other save nodes
- When a slave node replicates, the block master node will not work normally
- During replication, the slave node will not block its own query operations. It will use the old data set to provide services; However, when the replication is completed, the old dataset needs to be deleted and the new dataset needs to be loaded. At this time, the external service will be suspended
- The slave node is mainly used for horizontal expansion and read / write separation. The expanded slave node can improve the read throughput
The significance of master persistence for the security of master-slave architecture
If the master-slave architecture is adopted, it is recommended to enable the persistence of the master node. It is not recommended to use the slave node as the data hot standby of the master node, because in that case, if you turn off the persistence of the master node, the data may be empty when the master is down and restarted, and then the salve node data may be lost after replication
If the master RDB and AOF are both turned off, the data will all be stored in the memory. After the Master goes down and restarts, there is no local data to recover. Then it will think that its own data is empty. The master will synchronize the empty data set to the salve. All the slave data will be cleared, and 100% of the data will be lost
Core principles of master-slave architecture
Core principles of master-slave architecture
- When a slave node is started, it will send a PSYNC command to the master node
- If the slave node reconnects to the master node, the master node will only copy the missing data to the slave; Otherwise, if the slave node connects to the master node for the first time, a full resynchronization will be triggered
- When starting full resynchronization, the master will start a background thread to generate an RDB snapshot file, and cache all write commands received from the client in memory. After the RDB file is generated, the master will send the RDB to the slave. The slave will first write to the local disk, and then load it into memory from the local disk. Then the master will send the write command cached in memory to the slave, and the slave will synchronize the data
- If the slave node is disconnected from the master node due to network failure, it will be automatically reconnected. If the master finds that multiple slave nodes are reconnected, it will only start an rdb save operation to serve all the slave nodes with one copy of data
Breakpoint continuation of master-slave replication
Since redis2.8, the master-slave replication has been supported. If the network connection is broken during the master-slave replication, you can continue to copy the last copy instead of copying a copy from the beginning
A backlog exists in the memory of the master node. Both the master node and the slave node store a replica offset and a master id. the offset is stored in the backlog. If the master node and the slave network are disconnected, the slave will let the master continue to replicate from the last replica offset. However, if the corresponding offset is not found, resynchronization will be performed once
Diskless replication
The master directly creates an rdb in memory and sends it to the slave. It will not land the disk locally
# Whether to enable diskless configuration. yes: enable No: No: no by default repl-diskless-sync no # Wait a certain length of time before starting the replication, because you have to wait for more slave s to reconnect. The default configuration is 5 seconds, and the unit is seconds repl-diskless-sync-delay 5
Expired key processing
The slave will not expire. It will only wait for the master key to expire.
If the master has expired a key or eliminated a key through the LRU, a del command will be simulated and sent to the slave
Complete running process of redis replication
Complete process of replication
- The slave node starts and only saves the master node information, including the host and ip of the master node, but the replication process does not start
[Note: where do the master host and ip come from? The slaveof in redis.conf is configured] - slave node sends ping command to master node
- Password authentication. If the master is set to requirepass, the save node must send the password of the masterauth for authentication
- Password authentication. If the master is set to requirepass, the save node must send the password of the masterauth for authentication
- The master node performs full replication for the first time and sends all data to the slave node
- Subsequent continuous write commands of the master node are asynchronously copied to the slave node
Core mechanisms related to data synchronization
It mainly refers to the full replication performed when the slave connects to the master for the first time, and some detailed mechanisms in the process
- Both master and slave maintain an offset
The master keeps accumulating offsets on itself, and the slave keeps accumulating offsets on itself; The slave will report its own offset to the master every second, and the master will also save the offset of each slave. This does not mean that it is specifically used for full replication. The main reason is that both the master and the slave need to know the offset of their respective data in order to know the data inconsistency between them - backlog
The master node has a backlog, which is 1MB by default. When the master node replicates data to the slave node, it will also synchronously write a copy of the data in the backlog; Backlog is mainly used for incremental replication when full replication is interrupted - master run id
info server can see master run id
127.0.0.1:6379> info server # Server redis_version:3.2.8 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:ee73a8e0c4c779ea redis_mode:standalone os:Linux 3.10.0-957.el7.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll gcc_version:4.8.5 process_id:7842 run_id:0f142c4912fd93a030c02237fad7bde72a3d5b66 tcp_port:6379 uptime_in_seconds:21 uptime_in_days:0 hz:10 lru_clock:9753411 executable:/usr/local/bin/redis-server config_file:/etc/redis/6379.conf
If the master node is located according to the host + ip, it is not reliable. If the master node is restarted or the data changes, the slave node should be distinguished according to different run IDS, and full replication should be performed for different run IDs. If you need to restart redis without changing the run id, you can use the redis cli debug reload command
- psync
The slave node uses psync to copy from the master node, and paync runid offset. The master node will return response information according to its own situation. FULLRESYNC runid offset may trigger full replication, or CONTINUE may trigger incremental replication
Full replication
① The master executes bgsave to generate an rdb snapshot file locally
② The master node sends rdb snapshot files to the save node. If the rdb replication time exceeds 60 seconds (repl timeout), the slave node will consider the replication failed. You can increase this parameter appropriately
③ For machines with Gigabit network cards, 100MB, 6G files are usually transmitted per second, and may exceed 60s
④ When the master node generates an rdb, it caches all new write commands in memory. After the save node saves the rdb, it copies the new write commands to the slave node
⑤ client-output-buffer-limit slave 256MB 64MB 60
If the memory cache continuously consumes more than 64MB during replication, or exceeds 256MB at a time, the replication stops and the replication fails
⑥ After receiving the rdb, the slave node empties its old data, reloads the rdb into its own memory, and provides external services based on the old data version
⑦ If the slave node has enabled AOF, BGREWRITEAOP will be executed immediately to rewrite the AOF
[Note: rdb generation, rdb copying through the network, slave old data cleaning, and slave aof rewrite are time-consuming]
Incremental replication
① If the master slave network connection is disconnected during full replication, incremental replication will be triggered when the slave reconnects to the master
② The master directly obtains some lost data from its own backlog and sends it to the slave node. The default backlog is 1MB
③ The master obtains data from the backlog according to the offset in psync sent by the slave
heartbeat
Both master and slave nodes send heartbeat information to each other. By default, the master sends heartbeat every 10 seconds, and the slave node sends heartbeat every 1 second
Asynchronous replication
Each time the master receives a write command, it writes data internally and sends it asynchronously to the slave node
Read write separation configuration
(take two virtual machines as an example)
Prepare two virtual machines
# Primary virtual machine ip 192.168.1.132 # From virtual machine ip 192.168.1.133
Configure redis Conf
# Only the main configuration is shown here (the configuration that has never been changed shall prevail) # Since it is the primary node, the persistence mechanism must be configured # Configure background startup daemonize yes # The following configurations are file storage location and port number, which can be configured by yourself # Set the pid file location of redis pidfile /var/run/redis_6379.pid # Set the listening port number of redis port 6379 # Set persistent file storage to prevail dir /var/redis/6379 # To configure bind, you must configure your own ip address. You cannot configure 127.0.0.1 or comment bind 192.168.1.132
Configure redis Conf
# Only the main configuration is shown here (the configuration that has never been changed shall prevail) # Since it is the primary node, the persistence mechanism must be configured # Configure background startup daemonize yes # The following configurations are file storage location and port number, which can be configured by yourself # Set the pid file location of redis pidfile /var/run/redis_6379.pid # Set the listening port number of redis port 6379 # Set persistent file storage to prevail dir /var/redis/6379 # To configure bind, you must configure your own ip address. You cannot configure 127.0.0.1 or comment bind 192.168.1.133 # Configure the primary node ip and the primary node redis port number slaveof 192.168.1.132 6379 # Configure forced read / write separation # redis slave node is read-only and enabled by default # If the read-only redis slave node is enabled, all write operations will be rejected. In this way, the read-write separation architecture can be enforced slave-read-only yes
Close the firewall to ensure telnet access
systemctl status firewalld.service systemctl stop firewalld.service systemctl start firewalld.service
start-up
Start start the master node first, and then the slave node
Main node operation, for reference only, subject to the actual situation
# Start redis first, then the client redis-cli -h 192.168.1.132 192.168.1.132:6379> info replication # Replication role:master connected_slaves:1 slave0:ip=192.168.1.133,port=6379,state=online,offset=1751,lag=0 master_repl_offset:1751 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:1750 ###################################### # Can be compared for reference 192.168.1.132:6379> keys * 1) "aaa" 192.168.1.132:6379> set bbb 222 OK 192.168.1.132:6379> get bbb "222"
Slave node operation, for reference only, subject to the actual situation
# Start redis first, then the client redis-cli -h 192.168.1.133 192.168.1.133:6379> info replication # Replication role:slave master_host:192.168.1.132 master_port:6379 master_link_status:up master_last_io_seconds_ago:7 master_sync_in_progress:0 slave_repl_offset:29 slave_priority:100 slave_read_only:1 connected_slaves:0 master_repl_offset:0 repl_backlog_active:0 repl_backlog_size:1048576 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0 ###################################### # Can be compared for reference 192.168.1.133:6379> keys * 1) "aaa" # The master node writes and the slave node reads 192.168.1.133:6379> get bbb "222" # The slave node can only read, not write 192.168.1.133:6379> set ccc 123 (error) READONLY You can't write against a read only slave.
Horizontally expand the redis read node to improve the throughput
Redis slave nodes can be set up on other servers (see the above for the configuration of slave nodes). The QPS for reading from a single slave node is about 50000. For two redis slave nodes, all read requests are sent to two machines, and the QPS for reading from the entire cluster is 100000+
redis pressure test (just understand)
If you want to do a benchmark pressure test on your newly built redis, test your redis performance and QPS(query per second)
The redis benchmark pressure test tool provided by redis is the fastest and most convenient
# In the src directory of redis, enter the following command ./redis-benchmark -h 192.168.1.132 # Parameters can be configured redis-benchmark [-h <host>] [-p <port>] [-c <clients>] -h <hostname> Specify the server hostname(Default 127.0.0.1) -p <port> Specify server port(Default 6379) -s <socket> Specify server socket -a <password> Redis Authentication password -c <clients> Specify the number of concurrent connections(Default 50) -n <requests> Specify the number of requests(Default 100000) -d <sise> Specify in bytes SET/GET Data size of value(Default 2) --dbnum <db> Select the specified database number(Default 0) -q Exit, show only query/sec ...... Example: using 20 parallel clients, a total of 100000 requests ########################### ====== PING_INLINE ====== 100000 requests completed in 0.76 seconds 20 parallel clients 3 bytes payload keep alive: 1 100.00% <= 0 milliseconds 131926.12 requests per second ====== PING_BULK ====== 100000 requests completed in 0.73 seconds 20 parallel clients 3 bytes payload keep alive: 1 100.00% <= 0 milliseconds 136239.78 requests per second ====== SET ====== 100000 requests completed in 1.08 seconds 20 parallel clients 3 bytes payload keep alive: 1 99.94% <= 1 milliseconds 99.96% <= 4 milliseconds 99.98% <= 7 milliseconds 100.00% <= 7 milliseconds 93023.25 requests per second ====== GET ====== 100000 requests completed in 0.77 seconds 20 parallel clients 3 bytes payload keep alive: 1 100.00% <= 0 milliseconds 130208.34 requests per second ......
How to achieve 99.99% high availability under Redis master-slave architecture
- Illustration of unavailable redis
What is 99.99% high availability
In a word, in one year (365 days), 99.99% of the time, your system can provide external services, that is, high availability