

下载APP



关闭

讲堂

算法训练营

前端训练营

客户端下载

兑换中心

企业版

渠道合作

推荐作者

04 | 字典、集合，你真的了解吗？

2019-05-17 景霄

Python核心技术与实战

进入课程



讲述：冯永吉

时长09:56大小9.10M



你好，我是景霄。

前面的课程，我们学习了 Python 中的列表和元组，了解了他们的基本操作和性能比较。这节课，我们再来学习两个同样很常见并且很有用的数据结构：字典（dict）和集合（set）。字典和集合在 Python 被广泛使用，并且性能进行了高度优化，其重要性不言而喻。

字典和集合基础

那究竟什么是字典，什么是集合呢？字典是一系列由键（key）和值（value）配对组成的元素的集合，在 Python3.7+，字典被确定为有序（注意：在 3.6 中，字典有序是一个 implementation detail，在 3.7 才正式成为语言特性，因此 3.6 中无法 100% 确保其有序性），而 3.6 之前是无序的，其长度大小可变，元素可以任意地删减和改变。

相比于列表和元组，字典的性能更优，特别是对于查找、添加和删除操作，字典都能在常数时间复杂度内完成。

而集合和字典基本相同，唯一的区别，就是集合没有键和值的配对，是一系列无序的、唯一的元素组合。

首先我们来看字典和集合的创建，通常有下面这几种方式：

 d1 = {'name': 'jason', 'age': 20, 'gender': 'male'}
d2 = dict({'name': 'jason', 'age': 20, 'gender': 'male'})
d3 = dict([('name', 'jason'), ('age', 20), ('gender', 'male')])
d4 = dict(name='jason', age=20, gender='male') 
d1 == d2 == d3 ==d4
True
 
s1 = {1, 2, 3}
s2 = set([1, 2, 3])
s1 == s2
True复制代码

这里注意，Python 中字典和集合，无论是键还是值，都可以是混合类型。比如下面这个例子，我创建了一个元素为1，'hello'，5.0的集合：

 s = {1, 'hello', 5.0}复制代码

再来看元素访问的问题。字典访问可以直接索引键，如果不存在，就会抛出异常：

 d = {'name': 'jason', 'age': 20}
d['name']
'jason'
d['location']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'location'复制代码

也可以使用 get(key, default) 函数来进行索引。如果键不存在，调用 get() 函数可以返回一个默认值。比如下面这个示例，返回了'null'。

 d = {'name': 'jason', 'age': 20}
d.get('name')
'jason'
d.get('location', 'null')
'null'复制代码

说完了字典的访问，我们再来看集合。

首先我要强调的是，集合并不支持索引操作，因为集合本质上是一个哈希表，和列表不一样。所以，下面这样的操作是错误的，Python 会抛出异常：

 s = {1, 2, 3}
s[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object does not support indexing复制代码

想要判断一个元素在不在字典或集合内，我们可以用 value in dict/set 来判断。

 s = {1, 2, 3}
1 in s
True
10 in s
False
 
d = {'name': 'jason', 'age': 20}
'name' in d
True
'location' in d
False复制代码

当然，除了创建和访问，字典和集合也同样支持增加、删除、更新等操作。

 d = {'name': 'jason', 'age': 20}
d['gender'] = 'male' # 增加元素对'gender': 'male'
d['dob'] = '1999-02-01' # 增加元素对'dob': '1999-02-01'
d
{'name': 'jason', 'age': 20, 'gender': 'male', 'dob': '1999-02-01'}
d['dob'] = '1998-01-01' # 更新键'dob'对应的值 
d.pop('dob') # 删除键为'dob'的元素对
'1998-01-01'
d
{'name': 'jason', 'age': 20, 'gender': 'male'}
 
s = {1, 2, 3}
s.add(4) # 增加元素 4 到集合
s
{1, 2, 3, 4}
s.remove(4) # 从集合中删除元素 4
s
{1, 2, 3}复制代码

不过要注意，集合的 pop() 操作是删除集合中最后一个元素，可是集合本身是无序的，你无法知道会删除哪个元素，因此这个操作得谨慎使用。

实际应用中，很多情况下，我们需要对字典或集合进行排序，比如，取出值最大的 50 对。

对于字典，我们通常会根据键或值，进行升序或降序排序：

 d = {'b': 1, 'a': 2, 'c': 10}
d_sorted_by_key = sorted(d.items(), key=lambda x: x[0]) # 根据字典键的升序排序
d_sorted_by_value = sorted(d.items(), key=lambda x: x[1]) # 根据字典值的升序排序
d_sorted_by_key
[('a', 2), ('b', 1), ('c', 10)]
d_sorted_by_value
[('b', 1), ('a', 2), ('c', 10)]复制代码

这里返回了一个列表。列表中的每个元素，是由原字典的键和值组成的元组。

而对于集合，其排序和前面讲过的列表、元组很类似，直接调用 sorted(set) 即可，结果会返回一个排好序的列表。

 s = {3, 4, 2, 1}
sorted(s) # 对集合的元素进行升序排序
[1, 2, 3, 4]复制代码

字典和集合性能

文章开头我就说到了，字典和集合是进行过性能高度优化的数据结构，特别是对于查找、添加和删除操作。那接下来，我们就来看看，它们在具体场景下的性能表现，以及与列表等其他数据结构的对比。

比如电商企业的后台，存储了每件产品的 ID、名称和价格。现在的需求是，给定某件商品的 ID，我们要找出其价格。

如果我们用列表来存储这些数据结构，并进行查找，相应的代码如下：

 def find_product_price(products, product_id):
    for id, price in products:
        if id == product_id:
            return price
    return None 
     
products = [
    (143121312, 100), 
    (432314553, 30),
    (32421912367, 150) 
]
 
print('The price of product 432314553 is {}'.format(find_product_price(products, 432314553)))
 
# 输出
The price of product 432314553 is 30复制代码

假设列表有 n 个元素，而查找的过程要遍历列表，那么时间复杂度就为 O(n)。即使我们先对列表进行排序，然后使用二分查找，也会需要 O(logn) 的时间复杂度，更何况，列表的排序还需要 O(nlogn) 的时间。

但如果我们用字典来存储这些数据，那么查找就会非常便捷高效，只需 O(1) 的时间复杂度就可以完成。原因也很简单，刚刚提到过的，字典的内部组成是一张哈希表，你可以直接通过键的哈希值，找到其对应的值。

 products = {
  143121312: 100,
  432314553: 30,
  32421912367: 150
}
print('The price of product 432314553 is {}'.format(products[432314553])) 
 
# 输出
The price of product 432314553 is 30复制代码

类似的，现在需求变成，要找出这些商品有多少种不同的价格。我们还用同样的方法来比较一下。

如果还是选择使用列表，对应的代码如下，其中，A 和 B 是两层循环。同样假设原始列表有 n 个元素，那么，在最差情况下，需要 O(n^2) 的时间复杂度。

 # list version
def find_unique_price_using_list(products):
    unique_price_list = []
    for _, price in products: # A
        if price not in unique_price_list: #B
            unique_price_list.append(price)
    return len(unique_price_list)
 
products = [
    (143121312, 100), 
    (432314553, 30),
    (32421912367, 150),
    (937153201, 30)
]
print('number of unique price is: {}'.format(find_unique_price_using_list(products)))
 
# 输出
number of unique price is: 3复制代码

但如果我们选择使用集合这个数据结构，由于集合是高度优化的哈希表，里面元素不能重复，并且其添加和查找操作只需 O(1) 的复杂度，那么，总的时间复杂度就只有 O(n)。

 # set version
def find_unique_price_using_set(products):
    unique_price_set = set()
    for _, price in products:
        unique_price_set.add(price)
    return len(unique_price_set)        
 
products = [
    (143121312, 100), 
    (432314553, 30),
    (32421912367, 150),
    (937153201, 30)
]
print('number of unique price is: {}'.format(find_unique_price_using_set(products)))
 
# 输出
number of unique price is: 3复制代码

可能你对这些时间复杂度没有直观的认识，我可以举一个实际工作场景中的例子，让你来感受一下。

下面的代码，初始化了含有 100,000 个元素的产品，并分别计算了使用列表和集合来统计产品价格数量的运行时间：

 import time
id = [x for x in range(0, 100000)]
price = [x for x in range(200000, 300000)]
products = list(zip(id, price))
 
# 计算列表版本的时间
start_using_list = time.perf_counter()
find_unique_price_using_list(products)
end_using_list = time.perf_counter()
print("time elapse using list: {}".format(end_using_list - start_using_list))
## 输出
time elapse using list: 41.61519479751587
 
# 计算集合版本的时间
start_using_set = time.perf_counter()
find_unique_price_using_set(products)
end_using_set = time.perf_counter()
print("time elapse using set: {}".format(end_using_set - start_using_set))
# 输出
time elapse using set: 0.008238077163696289复制代码

你可以看到，仅仅十万的数据量，两者的速度差异就如此之大。事实上，大型企业的后台数据往往有上亿乃至十亿数量级，如果使用了不合适的数据结构，就很容易造成服务器的崩溃，不但影响用户体验，并且会给公司带来巨大的财产损失。

字典和集合的工作原理

我们通过举例以及与列表的对比，看到了字典和集合操作的高效性。不过，字典和集合为什么能够如此高效，特别是查找、插入和删除操作？

这当然和字典、集合内部的数据结构密不可分。不同于其他数据结构，字典和集合的内部结构都是一张哈希表。

对于字典而言，这张表存储了哈希值（hash）、键和值这 3 个元素。
而对集合来说，区别就是哈希表内没有键和值的配对，只有单一的元素了。

我们来看，老版本 Python 的哈希表结构如下所示：

 --+-------------------------------+
  | 哈希值 (hash)  键 (key)  值 (value)
--+-------------------------------+
0 |    hash0      key0    value0
--+-------------------------------+
1 |    hash1      key1    value1
--+-------------------------------+
2 |    hash2      key2    value2
--+-------------------------------+
. |           ...
__+_______________________________+
 复制代码

不难想象，随着哈希表的扩张，它会变得越来越稀疏。举个例子，比如我有这样一个字典：

 {'name': 'mike', 'dob': '1999-01-01', 'gender': 'male'}复制代码

那么它会存储为类似下面的形式：

 entries = [
['--', '--', '--']
[-230273521, 'dob', '1999-01-01'],
['--', '--', '--'],
['--', '--', '--'],
[1231236123, 'name', 'mike'],
['--', '--', '--'],
[9371539127, 'gender', 'male']
]复制代码

这样的设计结构显然非常浪费存储空间。为了提高存储空间的利用率，现在的哈希表除了字典本身的结构，会把索引和哈希值、键、值单独分开，也就是下面这样新的结构：

 Indices
----------------------------------------------------
None | index | None | None | index | None | index ...
----------------------------------------------------
 
Entries
--------------------
hash0   key0  value0
---------------------
hash1   key1  value1
---------------------
hash2   key2  value2
---------------------
        ...
---------------------复制代码

那么，刚刚的这个例子，在新的哈希表结构下的存储形式，就会变成下面这样：

 indices = [None, 1, None, None, 0, None, 2]
entries = [
[1231236123, 'name', 'mike'],
[-230273521, 'dob', '1999-01-01'],
[9371539127, 'gender', 'male']
]复制代码

我们可以很清晰地看到，空间利用率得到很大的提高。

清楚了具体的设计结构，我们接着来看这几个操作的工作原理。

插入操作

每次向字典或集合插入一个元素时，Python 会首先计算键的哈希值（hash(key)），再和 mask = PyDicMinSize - 1 做与操作，计算这个元素应该插入哈希表的位置 index = hash(key) & mask。如果哈希表中此位置是空的，那么这个元素就会被插入其中。

而如果此位置已被占用，Python 便会比较两个元素的哈希值和键是否相等。

若两者都相等，则表明这个元素已经存在，如果值不同，则更新值。
若两者中有一个不相等，这种情况我们通常称为哈希冲突（hash collision），意思是两个元素的键不相等，但是哈希值相等。这种情况下，Python 便会继续寻找表中空余的位置，直到找到位置为止。

值得一提的是，通常来说，遇到这种情况，最简单的方式是线性寻找，即从这个位置开始，挨个往后寻找空位。当然，Python 内部对此进行了优化（这一点无需深入了解，你有兴趣可以查看源码，我就不再赘述），让这个步骤更加高效。

查找操作

和前面的插入操作类似，Python 会根据哈希值，找到其应该处于的位置；然后，比较哈希表这个位置中元素的哈希值和键，与需要查找的元素是否相等。如果相等，则直接返回；如果不等，则继续查找，直到找到空位或者抛出异常为止。

删除操作

对于删除操作，Python 会暂时对这个位置的元素，赋于一个特殊的值，等到重新调整哈希表的大小时，再将其删除。

不难理解，哈希冲突的发生，往往会降低字典和集合操作的速度。因此，为了保证其高效性，字典和集合内的哈希表，通常会保证其至少留有 1/3 的剩余空间。随着元素的不停插入，当剩余空间小于 1/3 时，Python 会重新获取更大的内存空间，扩充哈希表。不过，这种情况下，表内所有的元素位置都会被重新排放。

虽然哈希冲突和哈希表大小的调整，都会导致速度减缓，但是这种情况发生的次数极少。所以，平均情况下，这仍能保证插入、查找和删除的时间复杂度为 O(1)。

总结

这节课，我们一起学习了字典和集合的基本操作，并对它们的高性能和内部存储结构进行了讲解。

字典在 Python3.7+ 是有序的数据结构，而集合是无序的，其内部的哈希表存储结构，保证了其查找、插入、删除操作的高效性。所以，字典和集合通常运用在对元素的高效查找、去重等场景。

思考题

1. 下面初始化字典的方式，哪一种更高效？

 # Option A
d = {'name': 'jason', 'age': 20, 'gender': 'male'}
 
# Option B
d = dict({'name': 'jason', 'age': 20, 'gender': 'male'})复制代码

2. 字典的键可以是一个列表吗？下面这段代码中，字典的初始化是否正确呢？如果不正确，可以说出你的原因吗？

 d = {'name': 'jason', ['education']: ['Tsinghua University', 'Stanford University']}复制代码

欢迎留言和我分享，也欢迎你把这篇文章分享给你的同事、朋友。

03 | 列表和元组，到底用哪一个？

05 | 深入浅出字符串

 写留言

精选留言(79)

pyhhou

2019-05-17

 91

思考题 1：
第一种方法更快，原因感觉上是和之前一样，就是不需要去调用相关的函数，而且像老师说的那样 {} 应该是关键字，内部会去直接调用底层C写好的代码

思考题 2:
用列表作为 Key 在这里是不被允许的，因为列表是一个动态变化的数据结构，字典当中的 key 要求是不可变的，原因也很好理解，key 首先是不重复的，如果 Key 是可以变化的话，那么随着 Key 的变化，这里就有可能就会有重复的 Key，那么这就和字典的定义相违背；如果把这里的列表换成之前我们讲过的元组是可以的，因为元组不可变

展开

作者回复: 正解
燕儿衔泥

2019-05-17

 21

1.直接｛｝的方式，更高效。可以使用dis分析其字节码
2.字典的键值，需要不可变，而列表是动态的，可变的。可以改为元组

作者回复: 使用dis分析其字节码很赞
随风の

2019-05-17

 16

文中提到的新的哈希表结构有点不太明白 None 1 None None 0 None 2 是什么意思？ index是索引的话为什么中间会出现两个None

作者回复: 这只是一种表示。None表示indices这个array上对应的位置没有元素，index表示有元素，并且对应entries这个array index位置上的元素。你看那个具体的例子就能看懂了
Python高...

2019-05-17

 15

Python 3.7 以后插入有序变为字典的特性。构造新字典的方式：
1. double star
>>> d1 = {'name': 'jason', 'age': 20, 'gender': 'male'}
>>> d2 = {'hobby': 'swim', **d1}
2. update 函数：
>>> d3 = {'hobby': 'swim'}
>>> d3.update(d1)
我们可以看到这两种方式得到的字典是满足插入有序的：
>>> d3
{'hobby': 'swim', 'name': 'jason', 'age': 20, 'gender': 'male'}
在我的电脑上第一种方式的性能要好一些。
double star:
229 ns ± 22.8 ns per loop
update:
337 ns ± 49.7 ns per loop

展开
charily

2019-05-18

 6

老师，你好！有几个让我困惑的地方想跟您确认一下，问题有点多，希望不吝赐教！
1. 为了提高哈希表的空间利用率，于是使用了Indices、Entries结构分开存储（index）和（hashcode、key、value），这里的index是否就是Entries列表的下标？
2、如果问题1成立，通过hash(key) & (PyDicMinSize - 1)计算出来的是否为Indices列表的下标？
3、如果问题2成立，PyDicMinSize是什么？为什么要使用hashcode与(PyDicMinSize - 1)做与运算，相比起直接用hashcode作为Indices列表的下标会有什么好处？
4、如果问题2成立，在往字典插入新元素的时候，通过hash(key) & mask计算出Indices的下标，如果Indices对应的元素位置值为None，则是否会将其值(index)修改为len(Entries)，然后在Entries列表追加一行新的记录（hashcode，key，value）？
5、如果问题2成立，在往字典插入新元素的时候，通过hash(key) & mask计算出Indices的下标，如果Indices对应的元素位置已经有值，则会跟Entries表中对应位置的key进行hash比较及相等比较来决定是进行value的更新处理还是hash冲突处理？

展开
许山山

2019-05-17

 6

>>> dis.dis(lambda : dict())
  1 0 LOAD_GLOBAL 0 (dict)
              3 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
              6 RETURN_VALUE

>>> dis.dis(lambda : {})
  1 0 BUILD_MAP 0
              3 RETURN_VALUE

>>> %timeit dict()
133 ns ± 2.95 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>> %timeit {}
74.6 ns ± 3.07 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

{}, [], () 都比 dict(), list(), tuple() 初始化列表的性能要好，因为后者需要函数调用消耗了更多的时间。

展开
许山山

2019-05-17

 4

老师我明白了，(hash, key, val) 都是存在 entries 里面的，通过 indices[index] 找到 entry 再做比较就好了。
farFlight

2019-05-17

 4

老师好，在王争老师的数据结构课程中提到哈希表常与链表一起使用，譬如用来解决哈希冲突。请问python底层对字典和集合的实现是否也是这样的呢？

作者回复: 这个就是文中所说的线性寻找了，但是Python底层解决哈希冲突还有更好的方法，线性寻找是最简单的，但是不是最高效的
小狼

2019-05-17

 3

s2 = Set([1, 2, 3])
# Set 大写会报错：
NameError: name 'Set' is not defined
改成小写问题解决
鱼腐

2019-05-17

 3

Indices:none | one | none | index | none | index 是什么意思？能补充讲解下吗

作者回复: 这只是一种表示。None表示indices这个array上对应的位置没有元素，index表示有元素，并且对应entries这个array index位置上的元素。你看那个具体的例子就能看懂了
Hoo-Ah

2019-05-17

 3

1. 直接使用大括号更高效，避免了使用类生成实例其他不必要的操作；
2. 列表不可以作为key，因为列表是可变类型，可变类型不可hash。
问题：为什么在旧哈希表中元素会越来越稀？

展开

作者回复: 你比较一下旧哈希表和新哈希表的存储结构就会发现，旧哈希表的空间利用率很低，一个位置要同时分配哈希值，键和值的空间，但是新哈希表把indices和entries分开后，空间利用率大大提高。

看文中的例子，这是旧哈希表存储示意图
entries = [
['--', '--', '--']
[-230273521, 'dob', '1999-01-01'],
['--', '--', '--'],
['--', '--', '--'],
[1231236123, 'name', 'mike'],
['--', '--', '--'],
[9371539127, 'gender', 'male']
]
VS
新哈希表存储示意图：
indices = [None, 1, None, None, 0, None, 2]
entries = [
[1231236123, 'name', 'mike'],
[-230273521, 'dob', '1999-01-01'],
[9371539127, 'gender', 'male']
]

你数一下两个版本中空着的元素的个数，就很清晰了。
Danpier

2019-05-18

 2

老师，对于集合插值有个疑问：
s={1, 2.0}
s.add(1.0)
s
# 输出
{1, 2.0}
上面这个示例，是不是因为1和1.0的哈希值相同，所以判定已存在元素，不插入。
关于商品有多少种不同价格的时间复杂度也有疑问：
第一个例子的时间复杂度是O(n^2)，但我的理解是，最差的情况是每种商品价格不同。A层循环遍历时间复杂度是O(n)，B层的时间复杂度由添加和遍历组成，添加的总时间复杂度是O(n)，遍历时间复杂度是递增的：O(0+1+……+n-1),B层总的时间复杂度是O((n^2+n)/2)，那list的时间复杂度不就是A+B：O((n^2+3n)/2)吗？怎么想都和两层for循环嵌套的时间复杂度不一样。
第二个例子的时间复杂度是O(n)，而我的理解是A层循环遍历时间复杂度是O(n)，B层的添加查找的总时间复杂度是O(n)，那set的时间复杂度不应该是A+B：O(2n)吗？
不知道哪里理解出了问题，望老师指点。

展开
趁早

2019-05-18

 2

最后的例子很有代表性，举例很好

展开
刘朋

2019-05-17

 2

插入操作,
mask = PyDicMinSize -1
index = hash(key) & mask

能否有个例子,想详细了解一下细节

展开
山石尹口

2019-06-02

 1

list做key的问题有点疑惑，即使list是可变的，它在内存中的地址是不变的，对地址是可以hash的吧？不然所有引用类型都没法做key了。当然，用引用类型做key不一定是好的做法，可能会带来混淆。

展开
美美

2019-05-26

 1

1.文中说字典是无序的不对吧？Python3.6 后字典是有序的
2.还有示例代码用的是 format 方法，而 Python3.6 后的 f-string比原来的format方法易读多了

展开
William

2019-05-24

 1

老师请问，key、hash值、indice三者的联系是啥？一直以为hash(key)就是内存地址
taoist

2019-05-18

 1

思考题:1
Option A: python3 -m timeit -n 1000000 "d = {'name': 'jason', 'age': 20, 'gender': 'male'}"
1000000 loops, best of 5: 76.2 nsec per loop
Option B: python3 -m timeit -n 1000000 "d = dict({'name': 'jason', 'age': 20, 'gender': 'male'})"
1000000 loops, best of 5: 248 nsec per loop
环境: Linux Python3.7.3 , Option A 比 Option B 快了３倍

思考题２:不可以，列表是动态变化的，字典的key要求不可变

Python3.6 引入字典有序性，Python3.7后字典有序性已经可以依赖了。https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6

展开
许山山

2019-05-17

 1

每次向字典或集合插入一个元素时，Python 会首先计算键的哈希值（hash(key)），再和 mask = PyDicMinSize - 1 做与操作，计算这个元素应该插入哈希表的位置 index = hash(key) & mask。如果哈希表中此位置是空的，那么这个元素就会被插入其中。

而如果此位置已被占用，Python 便会比较两个元素的哈希值和键是否相等。

老师这里比较两个元素是比较哪两个元素？我理解的应该是比较现在插入的这个 key 和原本已经在表中的 key 是否相同，请问这个 key 的比较是怎么做的呢？index 是哈希表（Indices）的索引，里面存储了指向 value的指针？根据实现字典里面 key 和 value 应该是有映射的，那 key 的信息是存在哪里呢？

展开
天凉好个秋

2019-05-17

 1

不难想象，随着哈希表的扩张，它会变得越来越稀疏。
后面例子中解释的原因没看懂，能详细说说吗？

作者回复: 哈希表为了保证其操作的有效性（查找，添加，删除等等），都会overallocate（保留至少1/3的剩余空间），但是很多空间其实都没有被利用，因此很稀疏

	d1 = {'name': 'jason', 'age': 20, 'gender': 'male'}
	d2 = dict({'name': 'jason', 'age': 20, 'gender': 'male'})
	d3 = dict([('name', 'jason'), ('age', 20), ('gender', 'male')])
	d4 = dict(name='jason', age=20, gender='male')
	d1 == d2 == d3 ==d4
	True

	s1 = {1, 2, 3}
	s2 = set([1, 2, 3])
	s1 == s2
	True

	d = {'name': 'jason', 'age': 20}
	d['name']
	'jason'
	d['location']
	Traceback (most recent call last):
	File "<stdin>", line 1, in <module>
	KeyError: 'location'

	d = {'name': 'jason', 'age': 20}
	d.get('name')
	'jason'
	d.get('location', 'null')
	'null'

	s = {1, 2, 3}
	s[0]
	Traceback (most recent call last):
	File "<stdin>", line 1, in <module>
	TypeError: 'set' object does not support indexing

	s = {1, 2, 3}
	1 in s
	True
	10 in s
	False

	d = {'name': 'jason', 'age': 20}
	'name' in d
	True
	'location' in d
	False

	d = {'name': 'jason', 'age': 20}
	d['gender'] = 'male' # 增加元素对'gender': 'male'
	d['dob'] = '1999-02-01' # 增加元素对'dob': '1999-02-01'
	d
	{'name': 'jason', 'age': 20, 'gender': 'male', 'dob': '1999-02-01'}
	d['dob'] = '1998-01-01' # 更新键'dob'对应的值
	d.pop('dob') # 删除键为'dob'的元素对
	'1998-01-01'
	d
	{'name': 'jason', 'age': 20, 'gender': 'male'}

	s = {1, 2, 3}
	s.add(4) # 增加元素 4 到集合
	s
	{1, 2, 3, 4}
	s.remove(4) # 从集合中删除元素 4
	s
	{1, 2, 3}

	d = {'b': 1, 'a': 2, 'c': 10}
	d_sorted_by_key = sorted(d.items(), key=lambda x: x[0]) # 根据字典键的升序排序
	d_sorted_by_value = sorted(d.items(), key=lambda x: x[1]) # 根据字典值的升序排序
	d_sorted_by_key
	[('a', 2), ('b', 1), ('c', 10)]
	d_sorted_by_value
	[('b', 1), ('a', 2), ('c', 10)]

	s = {3, 4, 2, 1}
	sorted(s) # 对集合的元素进行升序排序
	[1, 2, 3, 4]

	def find_product_price(products, product_id):
	for id, price in products:
	if id == product_id:
	return price
	return None

	products = [
	(143121312, 100),
	(432314553, 30),
	(32421912367, 150)
	]

	print('The price of product 432314553 is {}'.format(find_product_price(products, 432314553)))

	# 输出
	The price of product 432314553 is 30

	products = {
	143121312: 100,
	432314553: 30,
	32421912367: 150
	}
	print('The price of product 432314553 is {}'.format(products[432314553]))

	# 输出
	The price of product 432314553 is 30

	# list version
	def find_unique_price_using_list(products):
	unique_price_list = []
	for _, price in products: # A
	if price not in unique_price_list: #B
	unique_price_list.append(price)
	return len(unique_price_list)

	products = [
	(143121312, 100),
	(432314553, 30),
	(32421912367, 150),
	(937153201, 30)
	]
	print('number of unique price is: {}'.format(find_unique_price_using_list(products)))

	# 输出
	number of unique price is: 3

	# set version
	def find_unique_price_using_set(products):
	unique_price_set = set()
	for _, price in products:
	unique_price_set.add(price)
	return len(unique_price_set)

	products = [
	(143121312, 100),
	(432314553, 30),
	(32421912367, 150),
	(937153201, 30)
	]
	print('number of unique price is: {}'.format(find_unique_price_using_set(products)))

	# 输出
	number of unique price is: 3

	import time
	id = [x for x in range(0, 100000)]
	price = [x for x in range(200000, 300000)]
	products = list(zip(id, price))

	# 计算列表版本的时间
	start_using_list = time.perf_counter()
	find_unique_price_using_list(products)
	end_using_list = time.perf_counter()
	print("time elapse using list: {}".format(end_using_list - start_using_list))
	## 输出
	time elapse using list: 41.61519479751587

	# 计算集合版本的时间
	start_using_set = time.perf_counter()
	find_unique_price_using_set(products)
	end_using_set = time.perf_counter()
	print("time elapse using set: {}".format(end_using_set - start_using_set))
	# 输出
	time elapse using set: 0.008238077163696289

	--+-------------------------------+
	\| 哈希值 (hash) 键 (key) 值 (value)
	--+-------------------------------+
	0 \| hash0 key0 value0
	--+-------------------------------+
	1 \| hash1 key1 value1
	--+-------------------------------+
	2 \| hash2 key2 value2
	--+-------------------------------+
	. \| ...
	__+_______________________________+

	entries = [
	['--', '--', '--']
	[-230273521, 'dob', '1999-01-01'],
	['--', '--', '--'],
	['--', '--', '--'],
	[1231236123, 'name', 'mike'],
	['--', '--', '--'],
	[9371539127, 'gender', 'male']
	]

	Indices
	----------------------------------------------------
	None \| index \| None \| None \| index \| None \| index ...
	----------------------------------------------------

	Entries
	--------------------
	hash0 key0 value0
	---------------------
	hash1 key1 value1
	---------------------
	hash2 key2 value2
	---------------------
	...
	---------------------

	indices = [None, 1, None, None, 0, None, 2]
	entries = [
	[1231236123, 'name', 'mike'],
	[-230273521, 'dob', '1999-01-01'],
	[9371539127, 'gender', 'male']
	]

	# Option A
	d = {'name': 'jason', 'age': 20, 'gender': 'male'}

	# Option B
	d = dict({'name': 'jason', 'age': 20, 'gender': 'male'})