foreword
Both the array and linked list structure in redis can implement the list type. The linked list is easy to understand. We have already talked about it before, so I won’t repeat it. How the list is realized through the array, you can read the article with this question;
Let me talk about the disadvantages of linked list first:
1) Each node in the linked list occupies an independent piece of memory, resulting in excessive memory fragmentation;
2) The front and back node pointers in the linked list nodes occupy too much extra memory;
Those who have studied data structures can know that arrays can better solve the above two problems, and ziplist is a compact linked list format similar to arrays. It will apply for a whole block of memory, and store all the data of the linked list in this memory.
definition
The ziplist structure is as follows:
zlbytes: uint32_t, records the number of memory bytes occupied by the entire compressed list, including the 4 bytes occupied by zlbtes;
zltail: uint32_t, records the number of bytes from the start address of the "tail" node of the compressed list, that is, the offset of the end of the list, which is used to support pop-up from the tail of the linked list or reverse (from tail to head) variable linked list.
zllen: uint16_t, record the number of nodes contained in the compressed list;
zlend: mark the end point of the compressed list, the fixed value is 0xFF (decimal 255).
The entry is the node saved in the ziplist, and the format of the entry is as follows:
data: records the length of the "previous node";
prevlen: record the length of the predecessor node, the unit is byte, the attribute length is 1 byte or 5 bytes.
1) If the predecessor node is less than 254, use 1 byte to store the length of the predecessor node;
2) Otherwise, use 5 bytes, and the first byte is fixed at 254, and the remaining 4 bytes store the length of the predecessor node;
encoding: represents the encoding format of the current node element, including encoding type and node length. In a ziplist, the encoding formats of different node elements can be different. The encoding format specification is as follows:
① 00pppppp (pppppp represents the lower 6 bits of encoding, the same below): String encoding, the length is less than or equal to 63 (26-1), and the length is stored in the lower 6 bits of encoding.
② 01pppppp: String encoding, the length is less than or equal to 16383 (214-1), and the length is stored in the last 6 bits of encoding and the last 1 byte of encoding.
③ 10000000: String encoding, the length is greater than 16383 (214-1), and the length is stored in the last 4 bytes of encoding.
④ 11000000: Numerical encoding, the type is int16_t, occupying 2 bytes.
⑤ 11010000: Numerical encoding, the type is int32_t, occupying 4 bytes.
⑥ 11100000: Numerical encoding, the type is int64_t, occupying 8 bytes.
⑦ 11110000: Numerical encoding, using 3 bytes to store an integer.
⑧ 11111110: Numerical encoding, using 1 byte to store an integer.
⑨ 1111xxxx: Use the lower 4 bits of encoding to store an integer, and the stored value range is 0~12. The available range of the lower 4 bits of the encoding under this encoding is 0001 to 1101, and the lower 4 bits of the encoding minus 1 is the actual stored value.
⑩ 11111111: 255, ziplist end node.
Note that in the ② and ③ encoding formats, in addition to the encoding attribute, additional space is required to store the length of the node element. The ninth format is also quite special, and the node elements are directly stored on the encoding attribute. The encoding is optimized for small numbers. At this time, entry-data is empty.
byte order
The encoding attribute uses multiple bytes to store the length of node elements. The byte order of this multi-byte data stored in computer memory or transmitted over the network is called endian. There are two types of endian: big endian order and little endian.
ziplist adopts the little-endian byte order, the CPU can read and process the low-order bytes first, and it is more efficient to perform calculation borrow and carry operations.
Source code analysis is the primary source of information =
Source code analysis
Note: The source code is in ziplist.h and ziplist.c
Find elements in ziplist using the ziplistFind function:
/* Find pointer to the entry equal to the specified entry. Skip 'skip' entries * between every comparison. Returns NULL when the field could not be found. */ unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip) { int skipcnt = 0; unsigned char vencoding = 0; long long vll = 0; while (p[0] != ZIP_END) { unsigned int prevlensize, encoding, lensize, len; unsigned char *q; ZIP_DECODE_PREVLENSIZE(p, prevlensize); ZIP_DECODE_LENGTH(p + prevlensize, encoding, lensize, len); q = p + prevlensize + lensize; if (skipcnt == 0) { /* Compare current entry with specified entry */ if (ZIP_IS_STR(encoding)) { if (len == vlen && memcmp(q, vstr, vlen) == 0) { return p; } } else { /* Find out if the searched field can be encoded. Note that * we do it only the first time, once done vencoding is set * to non-zero and vll is set to the integer value. */ if (vencoding == 0) { if (!zipTryEncoding(vstr, vlen, &vll, &vencoding)) { /* If the entry can't be encoded we set it to * UCHAR_MAX so that we don't retry again the next * time. */ vencoding = UCHAR_MAX; } /* Must be non-zero by now */ assert(vencoding); } /* Compare current entry with specified entry, do it only * if vencoding != UCHAR_MAX because if there is no encoding * possible for the field it can't be a valid integer. */ if (vencoding != UCHAR_MAX) { long long ll = zipLoadInteger(q, encoding); if (ll == vll) { return p; } } } /* Reset skip count */ skipcnt = skip; } else { /* Skip entry */ skipcnt--; } /* Move to next entry */ p = q + len; } return NULL; }
Parameter Description:
p: Specify which node to start searching from ziplist
vstr, vlen: the content and length of the element to be found
skip: the number of bytes to perform an element comparison operation
Insert nodes in ziplist:
/* Insert item at "p". */ unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) { size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen; unsigned int prevlensize, prevlen = 0; size_t offset; int nextdiff = 0; unsigned char encoding = 0; long long value = 123456789; /* initialized to avoid warning. Using a value that is easy to see if for some reason we use it uninitialized. */ zlentry tail; /* Find out prevlen for the entry that is inserted. */ if (p[0] != ZIP_END) { ZIP_DECODE_PREVLEN(p, prevlensize, prevlen); } else { unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl); if (ptail[0] != ZIP_END) { prevlen = zipRawEntryLength(ptail); } } /* See if the entry can be encoded */ if (zipTryEncoding(s,slen,&value,&encoding)) { /* 'encoding' is set to the appropriate integer encoding */ reqlen = zipIntSize(encoding); } else { /* 'encoding' is untouched, however zipStoreEntryEncoding will use the * string length to figure out how to encode it. */ reqlen = slen; } /* We need space for both the length of the previous entry and * the length of the payload. */ reqlen += zipStorePrevEntryLength(NULL,prevlen); reqlen += zipStoreEntryEncoding(NULL,encoding,slen); /* When the insert position is not equal to the tail, we need to * make sure that the next entry can hold this entry's length in * its prevlen field. */ int forcelarge = 0; nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0; if (nextdiff == -4 && reqlen < 4) { nextdiff = 0; forcelarge = 1; } /* Store offset because a realloc may change the address of zl. */ offset = p-zl; zl = ziplistResize(zl,curlen+reqlen+nextdiff); p = zl+offset; /* Apply memory move when necessary and update tail offset. */ if (p[0] != ZIP_END) { /* Subtract one because of the ZIP_END bytes */ memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff); /* Encode this entry's raw length in the next entry. */ if (forcelarge) zipStorePrevEntryLengthLarge(p+reqlen,reqlen); else zipStorePrevEntryLength(p+reqlen,reqlen); /* Update offset for tail */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen); /* When the tail contains more than one entry, we need to take * "nextdiff" in account as well. Otherwise, a change in the * size of prevlen doesn't have an effect on the *tail* offset. */ zipEntry(p+reqlen, &tail); if (p[reqlen+tail.headersize+tail.len] != ZIP_END) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff); } } else { /* This element will be the new tail. */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl); } /* When nextdiff != 0, the raw length of the next entry has changed, so * we need to cascade the update throughout the ziplist */ if (nextdiff != 0) { offset = p-zl; zl = __ziplistCascadeUpdate(zl,p+reqlen); p = zl+offset; } /* Write the entry */ p += zipStorePrevEntryLength(p,prevlen); p += zipStoreEntryEncoding(p,encoding,slen); if (ZIP_IS_STR(encoding)) { memcpy(p,s,slen); } else { zipSaveInteger(p,value,encoding); } ZIPLIST_INCR_LENGTH(zl,1); return zl; }
Parameter Description:
zl: to be inserted into the ziplist
p: the follower node pointing to the insertion position
s, slen: the content and length of the element to be inserted
Three important functions zipStorePrevEntryLength, zipStoreEntryEncoding and zipPrevLenByteDiff are used in the insert function
The zipStorePrevEntryLength function calculates the length of the prelen attribute (1 byte or 5 bytes)
/* Encode the length of the previous entry and write it to "p". Return the * number of bytes needed to encode this length if "p" is NULL. */ unsigned int zipStorePrevEntryLength(unsigned char *p, unsigned int len) { if (p == NULL) { return (len < ZIP_BIG_PREVLEN) ? 1 : sizeof(len)+1; } else { if (len < ZIP_BIG_PREVLEN) { p[0] = len; return 1; } else { return zipStorePrevEntryLengthLarge(p,len); } } }
The zipStoreEntryEncoding function calculates the number of bytes required to store additional byte element lengths (2nd and 3rd formats in the encoding); the value of the reqlen variable is added to the return value of these two functions and becomes the length of the inserted node;
/* Write the encoding header of the entry in 'p'. If p is NULL it just returns * the amount of bytes required to encode such a length. Arguments: * * 'encoding' is the encoding we are using for the entry. It could be * ZIP_INT_* or ZIP_STR_* or between ZIP_INT_IMM_MIN and ZIP_INT_IMM_MAX * for single-byte small immediate integers. * * 'rawlen' is only used for ZIP_STR_* encodings and is the length of the * string that this entry represents. * * The function returns the number of bytes used by the encoding/length * header stored in 'p'. */ unsigned int zipStoreEntryEncoding(unsigned char *p, unsigned char encoding, unsigned int rawlen) { unsigned char len = 1, buf[5]; if (ZIP_IS_STR(encoding)) { /* Although encoding is given it may not be set for strings, * so we determine it here using the raw length. */ if (rawlen <= 0x3f) { if (!p) return len; buf[0] = ZIP_STR_06B | rawlen; } else if (rawlen <= 0x3fff) { len += 1; if (!p) return len; buf[0] = ZIP_STR_14B | ((rawlen >> 8) & 0x3f); buf[1] = rawlen & 0xff; } else { len += 4; if (!p) return len; buf[0] = ZIP_STR_32B; buf[1] = (rawlen >> 24) & 0xff; buf[2] = (rawlen >> 16) & 0xff; buf[3] = (rawlen >> 8) & 0xff; buf[4] = rawlen & 0xff; } } else { /* Implies integer encoding, so length is always 1. */ if (!p) return len; buf[0] = encoding; } /* Store this length at p. */ memcpy(p,buf,len); return len; }
The zipPrevLenByteDiff function calculates how many bytes the length of the prevlen attribute of the rear drive node needs to be adjusted, and the result is stored in the nextdiff variable;
/* Given a pointer 'p' to the prevlen info that prefixes an entry, this * function returns the difference in number of bytes needed to encode * the prevlen if the previous entry changes of size. * * So if A is the number of bytes used right now to encode the 'prevlen' * field. * * And B is the number of bytes that are needed in order to encode the * 'prevlen' if the previous element will be updated to one of size 'len'. * * Then the function returns B - A * * So the function returns a positive number if more space is needed, * a negative number if less space is needed, or zero if the same space * is needed. */ int zipPrevLenByteDiff(unsigned char *p, unsigned int len) { unsigned int prevlensize; ZIP_DECODE_PREVLENSIZE(p, prevlensize); return zipStorePrevEntryLength(NULL, len) - prevlensize; }
Cascade update: __ziplistCascadeUpdate function
Definition: Because the length of the previous element is saved in zlEntry, and this length is an integer of side length, when the previous element of a zlEntry changes (the content changes or is deleted, or a new element is inserted), it may Causes a change in the length of prevlen, and a change in the length of prevlen will change the length of the entire zlEntry. When the length of the current zlEntry changes, it may cause a change in the length of the next zlEntry, which causes a cascade update. If The impact of cascading updates caused by many elements in ziplist may be great, affecting the entire redis service, so ziplist is not suitable for storing too many elements.
/* When an entry is inserted, we need to set the prevlen field of the next * entry to equal the length of the inserted entry. It can occur that this * length cannot be encoded in 1 byte and the next entry needs to be grow * a bit larger to hold the 5-byte encoded prevlen. This can be done for free, * because this only happens when an entry is already being inserted (which * causes a realloc and memmove). However, encoding the prevlen may require * that this entry is grown as well. This effect may cascade throughout * the ziplist when there are consecutive entries with a size close to * ZIP_BIG_PREVLEN, so we need to check that the prevlen can be encoded in * every consecutive entry. * * Note that this effect can also happen in reverse, where the bytes required * to encode the prevlen field can shrink. This effect is deliberately ignored, * because it can cause a "flapping" effect where a chain prevlen fields is * first grown and then shrunk again after consecutive inserts. Rather, the * field is allowed to stay larger than necessary, because a large prevlen * field implies the ziplist is holding large entries anyway. * * The pointer "p" points to the first entry that does NOT need to be * updated, i.e. consecutive fields MAY need an update. */ unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) { size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), rawlen, rawlensize; size_t offset, noffset, extra; unsigned char *np; zlentry cur, next; while (p[0] != ZIP_END) { zipEntry(p, &cur); rawlen = cur.headersize + cur.len; rawlensize = zipStorePrevEntryLength(NULL,rawlen); /* Abort if there is no next entry. */ if (p[rawlen] == ZIP_END) break; zipEntry(p+rawlen, &next); /* Abort when "prevlen" has not changed. */ if (next.prevrawlen == rawlen) break; if (next.prevrawlensize < rawlensize) { /* The "prevlen" field of "next" needs more bytes to hold * the raw length of "cur". */ offset = p-zl; extra = rawlensize-next.prevrawlensize; zl = ziplistResize(zl,curlen+extra); p = zl+offset; /* Current pointer and offset for next element. */ np = p+rawlen; noffset = np-zl; /* Update tail offset when next element is not the tail element. */ if ((zl+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))) != np) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+extra); } /* Move the tail to the back. */ memmove(np+rawlensize, np+next.prevrawlensize, curlen-noffset-next.prevrawlensize-1); zipStorePrevEntryLength(np,rawlen); /* Advance the cursor */ p += rawlen; curlen += extra; } else { if (next.prevrawlensize > rawlensize) { /* This would result in shrinking, which we want to avoid. * So, set "rawlen" in the available bytes. */ zipStorePrevEntryLengthLarge(p+rawlen,rawlen); } else { zipStorePrevEntryLength(p+rawlen,rawlen); } /* Stop here, as the raw length of "next" has not changed. */ break; } } return zl; }
Notice:
Redis aimed at the deficiencies in the design of the compressed list. In later versions, two new data structures were designed: quicklist (introduced in Redis 3.2) and listpack (introduced in Redis 5.0). The design goal of these two data structures is to keep the memory-saving advantage of the compressed list as much as possible, and at the same time solve the problem of "chain update" of the compressed list.
Related configuration:
1.hash-max-ziplist-entries 512 when hash Use when the number of elements in is less than 512 ziplist save data,used when exceeding dict 2.hash-max-ziplist-value 64 when hash Use when the byte occupancy of the largest element does not exceed 64 ziplist save data,used when exceeding dict