Dictionaries perform faster than lists and tuples, especially when added, modified, deleted. Dictionaries can be completed quickly. The difference between collections and dictionaries is that collections have no key-value pairing. They are an out-of-order, unique combination of elements.
Create Dictionary
d1 = {"name": "wp", "age": 18} d2 = dict({'name': "wp", "age": 18}) d3 = dict([("name", "wp"), ("age", 18)]) d4 = dict(name="wp", age=18) if d1 == d2 == d3 == d4: print('ok') # ok
Create a collection
s1 = {1,2,3} s2 = set([1,2,3]) if s1 == s2: print('ook')
Dictionaries and collections in python, whether key or value, can be mixed types
s3 = {1, "wp", 18.5}
Element Access
d1 = {"name": "wp", "age": 18} d2 = dict({'name': "wp", "age": 18}) d3 = dict([("name", "wp"), ("age", 18)]) d4 = dict(name="wp", age=18) print(d1["name"]) # wp
If there is no key, the exception will be reported directly. Of course, the get(key,default) function can also be used for indexing. When a key does not exist, we can define a default value for it.
print(d1.get("age")) # 18 print(d1.get('wwww',"null")) # null
Collections do not support indexing because the nature of a collection is a hash table, which is different from a list. So elements cannot be accessed using s[0]. If you want to determine whether an element is in a collection, you can only use value in dict/set to determine
Element Additions, Deletions and Changes
d1 = {"name": "wp", "age": 18} d1['gender'] = "male" print(d1) # {'name': 'wp', 'age': 18, 'gender': 'male'} d1.pop("age") print(d1) # {'name': 'wp', 'gender': 'male'} d1['name'] = "wp1" print(d1) #{'name': 'wp1', 'gender': 'male'}
s1 = {1, 2, 3} s1.add(4) print(s1) # {1, 2, 3, 4} s1.remove(4) print(s1) # {1, 2, 3}
Sort *
d = {'b': 1, "c": 3, "a": 10} d_sort_by_Key = sorted(d.items(), key=lambda x: x[0]) print(d_sort_by_Key) # [('a', 10), ('b', 1), ('c', 3)] d_sort_by_value = sorted(d.items(),key=lambda x:x[1]) print(d_sort_by_Key) # [('a', 10), ('b', 1), ('c', 3)]
This returns a list of elements, each of which is a tuple of key and value from the original dictionary
However, the sorting of collections is much simpler, simply use the sorted() function
s1 = {1,4,2,6} sorted(s1) print(s1) # {1, 2, 4, 6}
Performance comparison of dictionaries and collections
Dictionaries and collections are data structures that are highly optimized for performance, but it is still necessary to see how they compare in terms of a certain amount of data
Assume that there are now another 100,000 element goods and calculate the run time for using lists and collections to count the quantity of product prices
# list def find_unique_price_using_list(products): unique_price_list = [] for _, price in products: # A if price not in unique_price_list: unique_price_list.append(price) return len(unique_price_list) # aggregate def find_unique_price_using_set(products): unique_price_set = set() for _, price in products: unique_price_set.add(price) return len(unique_price_set) if __name__ == '__main__': id = [x for x in range(0, 100000)] price = [x for x in range(200000, 300000)] produces = list(zip(id, price)) start_using_list = time.perf_counter() find_unique_price_using_list(produces) end_using_list = time.perf_counter() print("list time {}".format(end_using_list - start_using_list)) # list time 4.9999999999980616e-06 start_using_list1 = time.perf_counter() find_unique_price_using_set(produces) end_using_list1 = time.perf_counter() print("set time {}".format(end_using_list1 - start_using_list1)) # set time 0.008161099999999998
* You can see that there is such a large performance difference with 10 weeks of data
How Dictionaries and Collections Work
Dictionary and data structure inside collection are both a hash table
- For dictionaries, this table stores hash values, keys, and values as three elements\
- For collections, the difference is that there is no key-value pairing in the table, only a single element
Old version hash table structure⬇
--+-------------------------------+ | Hash value(hash) key(key) value(value) --+-------------------------------+ 0 | hash0 key0 value0 --+-------------------------------+ 1 | hash1 key1 value1 --+-------------------------------+ 2 | hash2 key2 value2 --+-------------------------------+ . | ... __+_______________________________+
The biggest disadvantage of this type of table is that as the hash table expands, it becomes more sparse, if the data is
{'name': 'wp', 'dob': '1999-01-01', 'gender': 'male'}
So that's the structure of his hash table
entries = [ ['--', '--', '--'] [-230273521, 'dob', '1999-01-01'], ['--', '--', '--'], ['--', '--', '--'], [1231236123, 'name', 'wp'], ['--', '--', '--'], [9371539127, 'gender', 'male'] ]
Such a design is a waste of space, so in order to improve the storage space utilization now, in addition to the structure of the dictionary itself, the hash table now separates the index from the hash value, the healthy value and the value separately
Indices ---------------------------------------------------- None | index | None | None | index | None | index ... ---------------------------------------------------- Entries -------------------- hash0 key0 value0 --------------------- hash1 key1 value1 --------------------- hash2 key2 value2 --------------------- ... ---------------------
indices = [None, 1, None, None, 0, None, 2] entries = [ [1231236123, 'name', 'wp'], [-230273521, 'dob', '1999-01-01'], [9371539127, 'gender', 'male'] ]
Insert operation
Each time an element is inserted into a dictionary or collection, Python first calculates the hash(key) of the key, then does and operates with mask = PyDicMinSize - 1, calculating the location index = hash(key) where the element should be inserted into the hash table& mask. If this position in the hash table is empty, the element is inserted into it. If this position is occupied, Python compares whether the hash values and keys of the two elements are equal.
- If both are equal, the element already exists, and if the values are different, the value is updated.
- If one of the two elements is not equal, this is often referred to as a hash collision, meaning that the keys of the two elements are not equal, but the hash values are equal.
In this case, Python will continue to look for empty spaces in the table until it finds them. It is worth noting that, in general, the easiest way to do this is to look for them linearly, starting from this location and going back one by one. This step is more efficient.
Find operation
Similar to previous insert operations, Python finds where it should be based on a hash value; then compares the hash values and keys of the elements in this location of the hash table to be equal to the elements that need to be found. If equal, it returns directly; if not, it continues until a space is found or an exception is thrown.
Delete operation
Delete For Delete, Python temporarily assigns a special value to elements in this location until the hash table is resized.
It is not difficult to understand that hash conflicts tend to slow down dictionary and collection operations. Therefore, to ensure their efficiency, hash tables within dictionaries and collections are usually guaranteed to have at least a third of their remaining space. Python keeps inserting elements when the remaining space is less than a thirdThe hash table will be expanded by reclaiming more memory space. However, in this case, all element positions in the table will be re-drained. Although hash conflicts and adjustments to the size of the hash table will cause slowdown, this happens very rarely.