go source code analysis - map (the most detailed in the whole network)

map

1, Simple structure

A variable of map type is essentially a pointer of hmap type, and key value pairs are stored through a hash table. The following figure:

2, Design of hash table

1. Structure

The hash table first has a continuous space for storing data. We store the data in this continuous space (it can be regarded as an array), as shown in the figure below. We store the key, value and the hash value of the key in the continuous space

2. Addressing

Then, where do you want to store the current key value? At present, there are usually two mainstream methods:

1. Modulo operation: hash% m

2. Bit operation: hash& (m-1)

We all know that bit operation is certainly more efficient, but we must limit m to an integer power of 2. golang uses power operation to address

3. hash conflict

How to deal with the hash conflict between our two key s? We also introduce two methods

1. Zipper method: this method will organize the key value data with hash conflicts in the form of a linked list. As shown in the following figure, when a hash conflict occurs, the data inserted later will be linked to the conflicting data


2. Open address method: for example, in the figure below, if the position of our number 0 has been occupied, we will put the data in the position of number 1. If the position of number 1 has also been occupied, and so on, we continue to address until we find a free place.

When we go to find the data, we will first find the location of 0. If it is not the key we are looking for, we will continue to look down until we find the last location. Obviously, this method is not suitable for cases where the load factor (described later) is too large

Comparison between the above two methods: the zipper method is relatively friendly to find data, and the number of searches will not be too many. The open address rule will not produce discontinuous memory space, and each application is a continuous memory space. In golang, the zipper method is adopted. Why is it similar? The data structure of map will be analyzed in detail later.

4. Capacity expansion

When we have far more than 8 data and adopt the zipper method, the hash table will degenerate into a linked list. The development of address method is even worse. At this time, we will come to the problem of capacity expansion. For capacity expansion, we need to understand two issues: when and how to expand capacity

When to expand capacity

Remember the load factor we mentioned above? Load factor is a parameter to identify when to expand capacity. It is the ratio of the number of key value pairs stored in the hash table to the number of buckets. Usually, it is necessary to set a maximum load factor to trigger capacity expansion

loadFactor = keyCount / bucketCount

We assume that our load factor is 50%. When we come to a new key12, it will trigger capacity expansion


How to expand capacity

Our direct idea is to double the capacity of buckets when the load factor is reached. Then copy all the data to the new address space. As shown below


There will be a problem with this migration. When there are a lot of data in old barrels? At this time, data migration will take a lot of time and affect our main program. Therefore, in almost all languages, incremental capacity expansion will be adopted.

In this way, the key value pairs in the old bucket are moved to the new bucket several times. After triggering the expansion, you can first allocate the new bucket, then mark that the current hash table is expanding, and judge in the reading and writing operation of the hash table. If it is expanding, then migrate some key value pairs to the new bucket, so that the time consumed in the expansion can be dispersed into multiple operations.


Well, we have introduced some features and problems of hash table above. Next, we will introduce what the map in golang looks like.

3, Data structure of map

Graphic structure

The hash table at the bottom of map selects buckets through and operation, so the number of buckets is not directly recorded in hmap, but the power of 2. Now let's take a look at the barrel used by map. Let's take a look at the previous map basic structure diagram. We know that a bucket in golang is actually a bmap.


The following figure shows the data structure of a bmap. We can see that bmap is actually a continuous memory space.


In order to use memory more compactly, 8 keys and 8 values are put together. For each key, only the upper 8 bits of its hash value (tophash) are saved. The index order of tophash, key and value of each key value pair corresponds one by one.

It can be seen that this is actually similar to the zipper method we mentioned above. If there is a hash conflict, it will be written backward in bmap, which not only ensures less search times, but also ensures that the memory is relatively continuous.

What if eight are full? In order to reduce the expansion times, overflow is introduced here. The memory layout of the overflow bucket is the same as that of the conventional bucket. If the number of buckets to be allocated in the hash table is greater than 2^4, 2^(B-4) overflow buckets will be pre allocated for standby. These regular buckets and overflow buckets are continuous in memory, but the first 2^B are used as regular buckets, and the latter are used as overflow buckets.

Source code analysis

Let's take a look at the source code:

type hmap struct {
  count     int //Number of values in map
  flags     uint8 //Action ID (such as writing data)
  B         uint8  // The number of regular barrels is equal to 2^B
  noverflow uint16 // Number of overflowing barrels
  hash0     uint32 // hash seed
  
  buckets    unsafe.Pointer //The pointer of the bucket points to 2^B bmap s
  oldbuckets unsafe.Pointer //Pointer of old barrel (used during capacity expansion)
  nevacuate  uintptr //Location to migrate during capacity expansion       

  extra *mapextra // All overflow bucket pointers
}


type mapextra struct {
    overflow    *[]*bmap //Used overflow bucket
    oldoverflow *[]*bmap //During progressive expansion, the overflow bucket used to save the old bucket
    nextOverflow *bmap   //Next unused overflow bucket
}
type bmap struct {
  tophash [bucketCnt]uint8 //Store the value of tophash
}

You can see that if the current bucket is full, check hmap.extra Nextoperflow also has available overflow buckets. Just behind this bucket, chain this overflow bucket, and then continue to enter this overflow bucket

4, Create map

We know that creating a map usually uses the make(map) method. Let's see how to create a map. The following figure is the main process when creating a map:

  1. Determine the value of B according to the size of initialization required
  2. If the value of B ≥ 4, determine the number of overflowing barrels
  3. Allocate memory for buckets and overflow buckets


Source code analysis:

// make map returns a pointer
func makemap(t *maptype, hint int, h *hmap) *hmap {
  if h == nil {
    h = new(hmap)
  }
  h.hash0 = fastrand()
  // Initialize the bucket size according to hint
  B := uint8(0)
  for overLoadFactor(hint, B) {
    B++
  }
  h.B = B

  // If the number of buckets is greater than 0, initialize the memory of buckets
  if h.B != 0 {
    var nextOverflow *bmap
    h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
    if nextOverflow != nil {
      h.extra = new(mapextra)
      h.extra.nextOverflow = nextOverflow
    }
  }
  return h
}

//Allocate memory to map bucket
//  @Type of param t map
//  @param b number of barrels 2^b
//  @param dirtyalloc whether to point the returned buckets to the dirtyalloc address
//  @return buckets buckets address
//  @Return nextoflow overflow bucket address
func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
  base := bucketShift(b)
  nbuckets := base //Number of bucket s
  
  //When b is greater than 4, add overflow buckets, and the size of overflow buckets is 2^(b-4)
  if b >= 4 {
    nbuckets += bucketShift(b - 4)
  }

   //Apply for bucket space
  if dirtyalloc == nil {
    buckets = newarray(t.bucket, int(nbuckets))
  } else {
    buckets = dirtyalloc
    size := t.bucket.size * nbuckets
    if t.bucket.ptrdata != 0 {
      memclrHasPointers(buckets, size)
    } else {
      memclrNoHeapPointers(buckets, size)
    }
  }

 //There is an overflow bucket, and nextoflow points to the address of the overflow bucket
  if base != nbuckets {
    nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
    last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
    last.setoverflow(t, (*bmap)(buckets))
  }
  return buckets, nextOverflow
}

5, Get val through key

What happens when we use Val, OK: = map[key]? Similarly, let's first understand the main process through a flow chart:

  1. Determine whether the map is empty
  2. Determine whether there is a concurrent write
  3. Locate the bucket location corresponding to the key. If the map is expanding (equal) and the bucket has not been migrated, locate the location of the old bucket
  4. Traverse each tophash and key of the bucket until data equal to the specified key is found


Let's look at the source code:

//  @Type of param t map
//  @param h map data
//Get val through key
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
   //If the map is nil or there is no data, zero value is returned
  if h == nil || h.count == 0 {
    if t.hashMightPanic() {
      t.hasher(key, 0) 
    }
    return unsafe.Pointer(&zeroVal[0])
  }

  //@Step 1 has a concurrent process being written, throwing an exception (panic)
  if h.flags&hashWriting != 0 {
    throw("concurrent map read and map write")
  }

  //@Step 2 calculate the hash of the key and locate the hash bucket position of the current key
  hash := t.hasher(key, uintptr(h.hash0))
  m := bucketMask(h.B)
  b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))

  //@Step3 judge whether there are old barrels (migrating)
  //  If the old bucket data has not been migrated, locate the current key in the old bucket
  //  If the current old bucket is not migrated, migrate the bucket
  if c := h.oldbuckets; c != nil {
    if !h.sameSizeGrow() { //It is not to expand the capacity of the same length (when the map is deleted too much, it needs to re integrate the data to avoid wasting too much space)
      m >>= 1
    }
    oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
    if !evacuated(oldb) {
      b = oldb
    }
  }

  //Take top hash
  top := tophash(hash)

  //@Step4 find the val corresponding to the key from the bucket and cycle through the top hash
bucketloop:
  for ; b != nil; b = b.overflow(t) { //Whether there is overflow bucket
    for i := uintptr(0); i < bucketCnt; i++ {
      //Whether the top hash is equal
      if b.tophash[i] != top {
        if b.tophash[i] == emptyRest { //If the top hash is emptyRest, it means there is no data later
          break bucketloop
        }
        continue
      }
      k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
      if t.indirectkey() {
        k = *((*unsafe.Pointer)(k))
      }
      // Whether the key s are equal. If they are equal, the corresponding val will be returned
      if t.key.equal(key, k) {
        e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
        if t.indirectelem() {
          e = *((*unsafe.Pointer)(e))
        }
        return e
      }
    }
  }
  return unsafe.Pointer(&zeroVal[0])//Return zero value
}

6, Add key value

What happens when adding key Val? Similarly, let's first look at the flow chart:

  1. Judge whether the map is empty and whether there are concurrent writes
  2. If buckets is empty, apply for a bucket with a length of 1
  3. Find the bucket position corresponding to the changed key
  4. If the map is migrating and the bucket is not migrated, migrate the bucket
  5. Traverse the bucket. If the same key is found, return the location of val. If it is not found, find the next empty position, assign tophash and key, and return the position of val
  6. Judge whether the map needs to be expanded. If so, return to the operation of 5
  7. If the current buckets and overflow buckets have no location, add an overflow bucket, assign tophash and key to the first empty bit, and return the location of val


Source code analysis:

// Add or replace key val m[key]=val
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
  //@Step1 if map is nil, panic
  if h == nil {
    panic(plainError("assignment to entry in nil map"))
  }

  // Determine whether there is a collaborative process writing a map
  if h.flags&hashWriting != 0 {
    throw("concurrent map writes")
  }
  hash := t.hasher(key, uintptr(h.hash0))

  // Set the flag for map writing
  h.flags ^= hashWriting

  //@Step 2 buckets is nil, new one
  if h.buckets == nil {
    h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
  }

again:
  //@Step3 find the bucket corresponding to the key hash
  bucket := hash & bucketMask(h.B)
  if h.growing() {
    //  @Step4 if the bucket needs to be migrated, migrate the old bucket
    growWork(t, h, bucket)
  }
  b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
  top := tophash(hash)

  var inserti *uint8
  var insertk unsafe.Pointer
  var elem unsafe.Pointer
  //@Step5 find out whether there is a key in the map
  //  5-1 if it exists, update the value of key and return the location of val
  //  5-2 if it does not exist, record the position of tophash, key and value of the latest empty position of the bucket
  //  5-2-1 judge whether the bucket overflows. If not, go to the next step.
  //  5-2-2 if it overflows, find the next overflow bucket and continue with the above operation of bucketloop
bucketloop:
  for {
    for i := uintptr(0); i < bucketCnt; i++ {
      if b.tophash[i] != top {
        //Judge whether the current element is empty. If it is empty, record the first empty position (convenient for inserting when the key cannot be found)
        if isEmpty(b.tophash[i]) && inserti == nil {
          inserti = &b.tophash[i]
          insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
          elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
        }
        if b.tophash[i] == emptyRest {
          break bucketloop
        }
        continue
      }
      k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
      if !t.key.equal(key, k) {
        continue
      }
      elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
      //Found key
      goto done
    }
    //No data found in the regular bucket, continue to find in the overflow bucket
    ovf := b.overflow(t)
    if ovf == nil {
      break
    }
    b = ovf
  }

  //@Step6 adding an element will cause the bucket to exceed the load (6.5), or overflow too many buckets
  //  Expand the bucket, return to the above logical bucket loop and continue to find the val location
  if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
    hashGrow(t, h)
    goto again 
  }

  //@Step7 the current bucket and overflow bucket are full. Add another overflow bucket
  if inserti == nil {
    newb := h.newoverflow(t, b)
    inserti = &newb.tophash[0]
    insertk = add(unsafe.Pointer(newb), dataOffset)
    elem = add(insertk, bucketCnt*uintptr(t.keysize))
  }

  // The location where the key and tophash are stored. h.count +1
  if t.indirectkey() {
    kmem := newobject(t.key)
    *(*unsafe.Pointer)(insertk) = kmem
    insertk = kmem
  }
  if t.indirectelem() {
    vmem := newobject(t.elem)
    *(*unsafe.Pointer)(elem) = vmem
  }
  typedmemmove(t.key, insertk, key)
  *inserti = top
  h.count++

  // Return the location of val
done:
  if h.flags&hashWriting == 0 {
    throw("concurrent map writes")
  }
  h.flags &^= hashWriting
  if t.indirectelem() {
    elem = *((*unsafe.Pointer)(elem))
  }
  return elem
}

7, Delete key value

What happens when deleting a key? Similarly, let's first look at the flow chart:


Source code analysis:

// Delete key and val
func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
  //Judge whether the map is in write state
  if h.flags&hashWriting != 0 {
    throw("concurrent map writes")
  }

  hash := t.hasher(key, uintptr(h.hash0))
  //Set map to write state
  h.flags ^= hashWriting

  //1 find out the corresponding barrel position
  //  If you need to migrate, continue the migration
  bucket := hash & bucketMask(h.B)
  if h.growing() {
    growWork(t, h, bucket)
  }
  b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
  bOrig := b
  top := tophash(hash)

  //2
  //  2-1 find the tophash key val position
  //  2-2 set the tophash to emptyOne (the current position is empty, but there are elements behind it)
  //  2-3 if there is no element behind the current bucket, it is set to emptyRest (the current position is empty and there is no element behind it)
search:
  for ; b != nil; b = b.overflow(t) {
    for i := uintptr(0); i < bucketCnt; i++ {
      if b.tophash[i] != top {
        if b.tophash[i] == emptyRest { //If it is found that the corresponding tophash is already in the emphasisrest state, there is no data behind the identifier
          break search
        }
        continue
      }
      k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
      k2 := k
      if !t.key.equal(key, k2) {
        continue
      }
      e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
      //Clear val
      if t.indirectelem() {
        *(*unsafe.Pointer)(e) = nil
      } else if t.elem.ptrdata != 0 {
        memclrHasPointers(e, t.elem.size)
      } else {
        memclrNoHeapPointers(e, t.elem.size)
      }
      // Set the tophash to emptyOne (the current position is empty, but there are elements behind it)
      b.tophash[i] = emptyOne
      //3. Judge whether the next position (possibly the first one of the overflow bucket) tophash is emptyRest
      //  3-1 if not, go directly to notLast to end the process
      //  3-2 if yes, search forward and set all emptyOne to emptyRest
      if i == bucketCnt-1 {
        if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
          goto notLast
        }
      } else {
        if b.tophash[i+1] != emptyRest {
          goto notLast
        }
      }
      for {
        b.tophash[i] = emptyRest
        if i == 0 {
          if b == bOrig {
            break 
          }
          c := b
          for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
          }
          i = bucketCnt - 1
        } else {
          i--
        }
        if b.tophash[i] != emptyOne {
          break
        }
      }
      
    notLast:
      //Quantity minus 1
      h.count--
      break search
    }
  }
}

6, Iteration key value

Traversing key Val is relatively simple. Given a random bucket position startBucket and the offset of the data position in the random bucket, start traversing each value of the map:


Source code analysis:

// Initialize iterator
func mapiterinit(t *maptype, h *hmap, it *hiter) {
  //1 initialize iterator
  it.t = t
  if h == nil || h.count == 0 {
    return
  }
  it.h = h
  it.B = h.B
  it.buckets = h.buckets

  //2 give a random number to determine the starting position of the iteration bucket and the sequence of starting iteration in the bucket
  r := uintptr(fastrand())
  if h.B > 31-bucketCntBits {
    r += uintptr(fastrand()) << 31
  }
  it.startBucket = r & bucketMask(h.B)          //Offset of bucket
  it.offset = uint8(r >> h.B & (bucketCnt - 1)) //Offset in barrel

  it.bucket = it.startBucket
}

//Iterator iteration
//  1 iterating each bucket of data from a random offset bucket cycle
//  Traverse from random offset in 2 barrels
//  3 list the method that hides how to iterate when capacity expansion is in progress. You need to know the reference source code
func mapiternext(it *hiter) {

  if h.flags&hashWriting != 0 { //There's a project being written
    throw("concurrent map iteration and map write")
  }
  t := it.t
  bucket := it.bucket
  b := it.bptr
  i := it.i
  checkBucket := it.checkBucket

next:
  if b == nil { //The current bucket pointer is nil, indicating that the data in the bucket has been traversed, and it is necessary to traverse the next bucket
    if bucket == it.startBucket && it.wrapped { //The beginning bucket has been traversed
      it.key = nil
      it.elem = nil
      return
    }
    bucket++
    if bucket == bucketShift(it.B) {
      bucket = 0
      it.wrapped = true
    }
    i = 0
  }
  for ; i < bucketCnt; i++ {
    offi := (i + it.offset) & (bucketCnt - 1) //From which position in the bucket to start traversal
    if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {//No data
      continue
    }
    k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize))
    e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize))
  b = b.overflow(t)
  i = 0
  goto next
}


reference resources:

https://github.com/golang/go/...

Tags: Go Back-end

Posted by akufen on Wed, 03 Aug 2022 21:53:55 +0530