Dictionaries and Collections

Dictionaries perform faster than lists and tuples, especially when added, modified, deleted. Dictionaries can be completed quickly. The difference between collections and dictionaries is that collections have no key-value pairing. They are an out-of-order, unique combination of elements.

Create Dictionary

d1 = {"name": "wp", "age": 18}
d2 = dict({'name': "wp", "age": 18})
d3 = dict([("name", "wp"), ("age", 18)])
d4 = dict(name="wp", age=18)
if d1 == d2 == d3 == d4:
# ok

Create a collection

    s1 = {1,2,3}
    s2 = set([1,2,3])
    if s1 == s2:

Dictionaries and collections in python, whether key or value, can be mixed types

s3 = {1, "wp", 18.5}

Element Access

    d1 = {"name": "wp", "age": 18}
    d2 = dict({'name': "wp", "age": 18})
    d3 = dict([("name", "wp"), ("age", 18)])
    d4 = dict(name="wp", age=18)
    # wp

If there is no key, the exception will be reported directly. Of course, the get(key,default) function can also be used for indexing. When a key does not exist, we can define a default value for it.

    # 18
    # null

Collections do not support indexing because the nature of a collection is a hash table, which is different from a list. So elements cannot be accessed using s[0]. If you want to determine whether an element is in a collection, you can only use value in dict/set to determine

Element Additions, Deletions and Changes

    d1 = {"name": "wp", "age": 18}
    d1['gender'] = "male"
    # {'name': 'wp', 'age': 18, 'gender': 'male'}
    # {'name': 'wp', 'gender': 'male'}
    d1['name'] = "wp1"
    #{'name': 'wp1', 'gender': 'male'}
    s1 = {1, 2, 3}
    # {1, 2, 3, 4}
    # {1, 2, 3}

Sort *

    d = {'b': 1, "c": 3, "a": 10}
    d_sort_by_Key = sorted(d.items(), key=lambda x: x[0])
    # [('a', 10), ('b', 1), ('c', 3)]
    d_sort_by_value = sorted(d.items(),key=lambda x:x[1])
    # [('a', 10), ('b', 1), ('c', 3)]

This returns a list of elements, each of which is a tuple of key and value from the original dictionary

However, the sorting of collections is much simpler, simply use the sorted() function

    s1 = {1,4,2,6}
    # {1, 2, 4, 6}

Performance comparison of dictionaries and collections

Dictionaries and collections are data structures that are highly optimized for performance, but it is still necessary to see how they compare in terms of a certain amount of data

Assume that there are now another 100,000 element goods and calculate the run time for using lists and collections to count the quantity of product prices

# list
def find_unique_price_using_list(products):
    unique_price_list = []
    for _, price in products:  # A
        if price not in unique_price_list:
        return len(unique_price_list)

# aggregate
def find_unique_price_using_set(products):
    unique_price_set = set()
    for _, price in products:
    return len(unique_price_set)

if __name__ == '__main__':
    id = [x for x in range(0, 100000)]
    price = [x for x in range(200000, 300000)]
    produces = list(zip(id, price))
    start_using_list = time.perf_counter()
    end_using_list = time.perf_counter()
    print("list time {}".format(end_using_list - start_using_list))
    # list time 4.9999999999980616e-06

    start_using_list1 = time.perf_counter()
    end_using_list1 = time.perf_counter()
    print("set time {}".format(end_using_list1 - start_using_list1))
    # set time 0.008161099999999998

* You can see that there is such a large performance difference with 10 weeks of data

How Dictionaries and Collections Work

Dictionary and data structure inside collection are both a hash table

  • For dictionaries, this table stores hash values, keys, and values as three elements\
  • For collections, the difference is that there is no key-value pairing in the table, only a single element

Old version hash table structure⬇

  | Hash value(hash)  key(key)  value(value)
0 |    hash0      key0    value0
1 |    hash1      key1    value1
2 |    hash2      key2    value2
. |           ...

The biggest disadvantage of this type of table is that as the hash table expands, it becomes more sparse, if the data is

{'name': 'wp', 'dob': '1999-01-01', 'gender': 'male'}

So that's the structure of his hash table

entries = [
['--', '--', '--']
[-230273521, 'dob', '1999-01-01'],
['--', '--', '--'],
['--', '--', '--'],
[1231236123, 'name', 'wp'],
['--', '--', '--'],
[9371539127, 'gender', 'male']

Such a design is a waste of space, so in order to improve the storage space utilization now, in addition to the structure of the dictionary itself, the hash table now separates the index from the hash value, the healthy value and the value separately

None | index | None | None | index | None | index ...

hash0   key0  value0
hash1   key1  value1
hash2   key2  value2
indices = [None, 1, None, None, 0, None, 2]
entries = [
[1231236123, 'name', 'wp'],
[-230273521, 'dob', '1999-01-01'],
[9371539127, 'gender', 'male']

Insert operation

Each time an element is inserted into a dictionary or collection, Python first calculates the hash(key) of the key, then does and operates with mask = PyDicMinSize - 1, calculating the location index = hash(key) where the element should be inserted into the hash table& mask. If this position in the hash table is empty, the element is inserted into it. If this position is occupied, Python compares whether the hash values and keys of the two elements are equal.

  • If both are equal, the element already exists, and if the values are different, the value is updated.
  • If one of the two elements is not equal, this is often referred to as a hash collision, meaning that the keys of the two elements are not equal, but the hash values are equal.

In this case, Python will continue to look for empty spaces in the table until it finds them. It is worth noting that, in general, the easiest way to do this is to look for them linearly, starting from this location and going back one by one. This step is more efficient.

Find operation

Similar to previous insert operations, Python finds where it should be based on a hash value; then compares the hash values and keys of the elements in this location of the hash table to be equal to the elements that need to be found. If equal, it returns directly; if not, it continues until a space is found or an exception is thrown.

Delete operation

Delete For Delete, Python temporarily assigns a special value to elements in this location until the hash table is resized.

It is not difficult to understand that hash conflicts tend to slow down dictionary and collection operations. Therefore, to ensure their efficiency, hash tables within dictionaries and collections are usually guaranteed to have at least a third of their remaining space. Python keeps inserting elements when the remaining space is less than a thirdThe hash table will be expanded by reclaiming more memory space. However, in this case, all element positions in the table will be re-drained. Although hash conflicts and adjustments to the size of the hash table will cause slowdown, this happens very rarely.

Tags: Python

Posted by koenigsbote on Mon, 20 Sep 2021 13:40:37 +0530