map
1, Simple structure
A variable of map type is essentially a pointer of hmap type, and key value pairs are stored through a hash table. The following figure:
2, Design of hash table
1. Structure
The hash table first has a continuous space for storing data. We store the data in this continuous space (it can be regarded as an array), as shown in the figure below. We store the key, value and the hash value of the key in the continuous space
2. Addressing
Then, where do you want to store the current key value? At present, there are usually two mainstream methods:
1. Modulo operation: hash% m
2. Bit operation: hash& (m-1)
We all know that bit operation is certainly more efficient, but we must limit m to an integer power of 2. golang uses power operation to address
3. hash conflict
How to deal with the hash conflict between our two key s? We also introduce two methods
1. Zipper method: this method will organize the key value data with hash conflicts in the form of a linked list. As shown in the following figure, when a hash conflict occurs, the data inserted later will be linked to the conflicting data
2. Open address method: for example, in the figure below, if the position of our number 0 has been occupied, we will put the data in the position of number 1. If the position of number 1 has also been occupied, and so on, we continue to address until we find a free place.
When we go to find the data, we will first find the location of 0. If it is not the key we are looking for, we will continue to look down until we find the last location. Obviously, this method is not suitable for cases where the load factor (described later) is too large
Comparison between the above two methods: the zipper method is relatively friendly to find data, and the number of searches will not be too many. The open address rule will not produce discontinuous memory space, and each application is a continuous memory space. In golang, the zipper method is adopted. Why is it similar? The data structure of map will be analyzed in detail later.
4. Capacity expansion
When we have far more than 8 data and adopt the zipper method, the hash table will degenerate into a linked list. The development of address method is even worse. At this time, we will come to the problem of capacity expansion. For capacity expansion, we need to understand two issues: when and how to expand capacity
When to expand capacity
Remember the load factor we mentioned above? Load factor is a parameter to identify when to expand capacity. It is the ratio of the number of key value pairs stored in the hash table to the number of buckets. Usually, it is necessary to set a maximum load factor to trigger capacity expansion
loadFactor = keyCount / bucketCount
We assume that our load factor is 50%. When we come to a new key12, it will trigger capacity expansion
How to expand capacity
Our direct idea is to double the capacity of buckets when the load factor is reached. Then copy all the data to the new address space. As shown below
There will be a problem with this migration. When there are a lot of data in old barrels? At this time, data migration will take a lot of time and affect our main program. Therefore, in almost all languages, incremental capacity expansion will be adopted.
In this way, the key value pairs in the old bucket are moved to the new bucket several times. After triggering the expansion, you can first allocate the new bucket, then mark that the current hash table is expanding, and judge in the reading and writing operation of the hash table. If it is expanding, then migrate some key value pairs to the new bucket, so that the time consumed in the expansion can be dispersed into multiple operations.
Well, we have introduced some features and problems of hash table above. Next, we will introduce what the map in golang looks like.
3, Data structure of map
Graphic structure
The hash table at the bottom of map selects buckets through and operation, so the number of buckets is not directly recorded in hmap, but the power of 2. Now let's take a look at the barrel used by map. Let's take a look at the previous map basic structure diagram. We know that a bucket in golang is actually a bmap.
The following figure shows the data structure of a bmap. We can see that bmap is actually a continuous memory space.
In order to use memory more compactly, 8 keys and 8 values are put together. For each key, only the upper 8 bits of its hash value (tophash) are saved. The index order of tophash, key and value of each key value pair corresponds one by one.
It can be seen that this is actually similar to the zipper method we mentioned above. If there is a hash conflict, it will be written backward in bmap, which not only ensures less search times, but also ensures that the memory is relatively continuous.
What if eight are full? In order to reduce the expansion times, overflow is introduced here. The memory layout of the overflow bucket is the same as that of the conventional bucket. If the number of buckets to be allocated in the hash table is greater than 2^4, 2^(B-4) overflow buckets will be pre allocated for standby. These regular buckets and overflow buckets are continuous in memory, but the first 2^B are used as regular buckets, and the latter are used as overflow buckets.
Source code analysis
Let's take a look at the source code:
type hmap struct { count int //Number of values in map flags uint8 //Action ID (such as writing data) B uint8 // The number of regular barrels is equal to 2^B noverflow uint16 // Number of overflowing barrels hash0 uint32 // hash seed buckets unsafe.Pointer //The pointer of the bucket points to 2^B bmap s oldbuckets unsafe.Pointer //Pointer of old barrel (used during capacity expansion) nevacuate uintptr //Location to migrate during capacity expansion extra *mapextra // All overflow bucket pointers } type mapextra struct { overflow *[]*bmap //Used overflow bucket oldoverflow *[]*bmap //During progressive expansion, the overflow bucket used to save the old bucket nextOverflow *bmap //Next unused overflow bucket } type bmap struct { tophash [bucketCnt]uint8 //Store the value of tophash }
You can see that if the current bucket is full, check hmap.extra Nextoperflow also has available overflow buckets. Just behind this bucket, chain this overflow bucket, and then continue to enter this overflow bucket
4, Create map
We know that creating a map usually uses the make(map) method. Let's see how to create a map. The following figure is the main process when creating a map:
- Determine the value of B according to the size of initialization required
- If the value of B ≥ 4, determine the number of overflowing barrels
- Allocate memory for buckets and overflow buckets
Source code analysis:
// make map returns a pointer func makemap(t *maptype, hint int, h *hmap) *hmap { if h == nil { h = new(hmap) } h.hash0 = fastrand() // Initialize the bucket size according to hint B := uint8(0) for overLoadFactor(hint, B) { B++ } h.B = B // If the number of buckets is greater than 0, initialize the memory of buckets if h.B != 0 { var nextOverflow *bmap h.buckets, nextOverflow = makeBucketArray(t, h.B, nil) if nextOverflow != nil { h.extra = new(mapextra) h.extra.nextOverflow = nextOverflow } } return h } //Allocate memory to map bucket // @Type of param t map // @param b number of barrels 2^b // @param dirtyalloc whether to point the returned buckets to the dirtyalloc address // @return buckets buckets address // @Return nextoflow overflow bucket address func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) { base := bucketShift(b) nbuckets := base //Number of bucket s //When b is greater than 4, add overflow buckets, and the size of overflow buckets is 2^(b-4) if b >= 4 { nbuckets += bucketShift(b - 4) } //Apply for bucket space if dirtyalloc == nil { buckets = newarray(t.bucket, int(nbuckets)) } else { buckets = dirtyalloc size := t.bucket.size * nbuckets if t.bucket.ptrdata != 0 { memclrHasPointers(buckets, size) } else { memclrNoHeapPointers(buckets, size) } } //There is an overflow bucket, and nextoflow points to the address of the overflow bucket if base != nbuckets { nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize))) last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize))) last.setoverflow(t, (*bmap)(buckets)) } return buckets, nextOverflow }
5, Get val through key
What happens when we use Val, OK: = map[key]? Similarly, let's first understand the main process through a flow chart:
- Determine whether the map is empty
- Determine whether there is a concurrent write
- Locate the bucket location corresponding to the key. If the map is expanding (equal) and the bucket has not been migrated, locate the location of the old bucket
- Traverse each tophash and key of the bucket until data equal to the specified key is found
Let's look at the source code:
// @Type of param t map // @param h map data //Get val through key func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer { //If the map is nil or there is no data, zero value is returned if h == nil || h.count == 0 { if t.hashMightPanic() { t.hasher(key, 0) } return unsafe.Pointer(&zeroVal[0]) } //@Step 1 has a concurrent process being written, throwing an exception (panic) if h.flags&hashWriting != 0 { throw("concurrent map read and map write") } //@Step 2 calculate the hash of the key and locate the hash bucket position of the current key hash := t.hasher(key, uintptr(h.hash0)) m := bucketMask(h.B) b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize))) //@Step3 judge whether there are old barrels (migrating) // If the old bucket data has not been migrated, locate the current key in the old bucket // If the current old bucket is not migrated, migrate the bucket if c := h.oldbuckets; c != nil { if !h.sameSizeGrow() { //It is not to expand the capacity of the same length (when the map is deleted too much, it needs to re integrate the data to avoid wasting too much space) m >>= 1 } oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize))) if !evacuated(oldb) { b = oldb } } //Take top hash top := tophash(hash) //@Step4 find the val corresponding to the key from the bucket and cycle through the top hash bucketloop: for ; b != nil; b = b.overflow(t) { //Whether there is overflow bucket for i := uintptr(0); i < bucketCnt; i++ { //Whether the top hash is equal if b.tophash[i] != top { if b.tophash[i] == emptyRest { //If the top hash is emptyRest, it means there is no data later break bucketloop } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } // Whether the key s are equal. If they are equal, the corresponding val will be returned if t.key.equal(key, k) { e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) if t.indirectelem() { e = *((*unsafe.Pointer)(e)) } return e } } } return unsafe.Pointer(&zeroVal[0])//Return zero value }
6, Add key value
What happens when adding key Val? Similarly, let's first look at the flow chart:
- Judge whether the map is empty and whether there are concurrent writes
- If buckets is empty, apply for a bucket with a length of 1
- Find the bucket position corresponding to the changed key
- If the map is migrating and the bucket is not migrated, migrate the bucket
- Traverse the bucket. If the same key is found, return the location of val. If it is not found, find the next empty position, assign tophash and key, and return the position of val
- Judge whether the map needs to be expanded. If so, return to the operation of 5
- If the current buckets and overflow buckets have no location, add an overflow bucket, assign tophash and key to the first empty bit, and return the location of val
Source code analysis:
// Add or replace key val m[key]=val func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer { //@Step1 if map is nil, panic if h == nil { panic(plainError("assignment to entry in nil map")) } // Determine whether there is a collaborative process writing a map if h.flags&hashWriting != 0 { throw("concurrent map writes") } hash := t.hasher(key, uintptr(h.hash0)) // Set the flag for map writing h.flags ^= hashWriting //@Step 2 buckets is nil, new one if h.buckets == nil { h.buckets = newobject(t.bucket) // newarray(t.bucket, 1) } again: //@Step3 find the bucket corresponding to the key hash bucket := hash & bucketMask(h.B) if h.growing() { // @Step4 if the bucket needs to be migrated, migrate the old bucket growWork(t, h, bucket) } b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize))) top := tophash(hash) var inserti *uint8 var insertk unsafe.Pointer var elem unsafe.Pointer //@Step5 find out whether there is a key in the map // 5-1 if it exists, update the value of key and return the location of val // 5-2 if it does not exist, record the position of tophash, key and value of the latest empty position of the bucket // 5-2-1 judge whether the bucket overflows. If not, go to the next step. // 5-2-2 if it overflows, find the next overflow bucket and continue with the above operation of bucketloop bucketloop: for { for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] != top { //Judge whether the current element is empty. If it is empty, record the first empty position (convenient for inserting when the key cannot be found) if isEmpty(b.tophash[i]) && inserti == nil { inserti = &b.tophash[i] insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) } if b.tophash[i] == emptyRest { break bucketloop } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) if !t.key.equal(key, k) { continue } elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) //Found key goto done } //No data found in the regular bucket, continue to find in the overflow bucket ovf := b.overflow(t) if ovf == nil { break } b = ovf } //@Step6 adding an element will cause the bucket to exceed the load (6.5), or overflow too many buckets // Expand the bucket, return to the above logical bucket loop and continue to find the val location if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) { hashGrow(t, h) goto again } //@Step7 the current bucket and overflow bucket are full. Add another overflow bucket if inserti == nil { newb := h.newoverflow(t, b) inserti = &newb.tophash[0] insertk = add(unsafe.Pointer(newb), dataOffset) elem = add(insertk, bucketCnt*uintptr(t.keysize)) } // The location where the key and tophash are stored. h.count +1 if t.indirectkey() { kmem := newobject(t.key) *(*unsafe.Pointer)(insertk) = kmem insertk = kmem } if t.indirectelem() { vmem := newobject(t.elem) *(*unsafe.Pointer)(elem) = vmem } typedmemmove(t.key, insertk, key) *inserti = top h.count++ // Return the location of val done: if h.flags&hashWriting == 0 { throw("concurrent map writes") } h.flags &^= hashWriting if t.indirectelem() { elem = *((*unsafe.Pointer)(elem)) } return elem }
7, Delete key value
What happens when deleting a key? Similarly, let's first look at the flow chart:
Source code analysis:
// Delete key and val func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) { //Judge whether the map is in write state if h.flags&hashWriting != 0 { throw("concurrent map writes") } hash := t.hasher(key, uintptr(h.hash0)) //Set map to write state h.flags ^= hashWriting //1 find out the corresponding barrel position // If you need to migrate, continue the migration bucket := hash & bucketMask(h.B) if h.growing() { growWork(t, h, bucket) } b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize))) bOrig := b top := tophash(hash) //2 // 2-1 find the tophash key val position // 2-2 set the tophash to emptyOne (the current position is empty, but there are elements behind it) // 2-3 if there is no element behind the current bucket, it is set to emptyRest (the current position is empty and there is no element behind it) search: for ; b != nil; b = b.overflow(t) { for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] != top { if b.tophash[i] == emptyRest { //If it is found that the corresponding tophash is already in the emphasisrest state, there is no data behind the identifier break search } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) k2 := k if !t.key.equal(key, k2) { continue } e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) //Clear val if t.indirectelem() { *(*unsafe.Pointer)(e) = nil } else if t.elem.ptrdata != 0 { memclrHasPointers(e, t.elem.size) } else { memclrNoHeapPointers(e, t.elem.size) } // Set the tophash to emptyOne (the current position is empty, but there are elements behind it) b.tophash[i] = emptyOne //3. Judge whether the next position (possibly the first one of the overflow bucket) tophash is emptyRest // 3-1 if not, go directly to notLast to end the process // 3-2 if yes, search forward and set all emptyOne to emptyRest if i == bucketCnt-1 { if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest { goto notLast } } else { if b.tophash[i+1] != emptyRest { goto notLast } } for { b.tophash[i] = emptyRest if i == 0 { if b == bOrig { break } c := b for b = bOrig; b.overflow(t) != c; b = b.overflow(t) { } i = bucketCnt - 1 } else { i-- } if b.tophash[i] != emptyOne { break } } notLast: //Quantity minus 1 h.count-- break search } } }
6, Iteration key value
Traversing key Val is relatively simple. Given a random bucket position startBucket and the offset of the data position in the random bucket, start traversing each value of the map:
Source code analysis:
// Initialize iterator func mapiterinit(t *maptype, h *hmap, it *hiter) { //1 initialize iterator it.t = t if h == nil || h.count == 0 { return } it.h = h it.B = h.B it.buckets = h.buckets //2 give a random number to determine the starting position of the iteration bucket and the sequence of starting iteration in the bucket r := uintptr(fastrand()) if h.B > 31-bucketCntBits { r += uintptr(fastrand()) << 31 } it.startBucket = r & bucketMask(h.B) //Offset of bucket it.offset = uint8(r >> h.B & (bucketCnt - 1)) //Offset in barrel it.bucket = it.startBucket } //Iterator iteration // 1 iterating each bucket of data from a random offset bucket cycle // Traverse from random offset in 2 barrels // 3 list the method that hides how to iterate when capacity expansion is in progress. You need to know the reference source code func mapiternext(it *hiter) { if h.flags&hashWriting != 0 { //There's a project being written throw("concurrent map iteration and map write") } t := it.t bucket := it.bucket b := it.bptr i := it.i checkBucket := it.checkBucket next: if b == nil { //The current bucket pointer is nil, indicating that the data in the bucket has been traversed, and it is necessary to traverse the next bucket if bucket == it.startBucket && it.wrapped { //The beginning bucket has been traversed it.key = nil it.elem = nil return } bucket++ if bucket == bucketShift(it.B) { bucket = 0 it.wrapped = true } i = 0 } for ; i < bucketCnt; i++ { offi := (i + it.offset) & (bucketCnt - 1) //From which position in the bucket to start traversal if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {//No data continue } k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize)) e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize)) b = b.overflow(t) i = 0 goto next }
reference resources: