源码解析 « Jiangye blog

TCP三次握手和四次挥手详解

2025-7-7 评论(0) 分类:人工智能 Tags:分享源码解析编程

前言

今天使用WireShark抓包工具来深入分析TCP网络通信的详细过程。通过实际的数据包捕获，我们将完整展示TCP三次握手和四次挥手的全流程。

WireShark过滤器配置

使用以下过滤关键词来捕获TCP连接相关的数据包：

(tcp.flags.syn == 1 || tcp.flags.fin == 1 || tcp.stream eq 0 || tcp.flags.syn == 1 or tcp.flags.fin == 1 ||tcp.flags.reset == 1 ||tcp.flags.ack == 1 || tcp.len > 0 ) and tcp.port ==8090

过滤器解析：

tcp.flags.syn == 1: 捕获SYN包
tcp.flags.fin == 1: 捕获FIN包
tcp.flags.reset == 1: 捕获RST包
tcp.flags.ack == 1: 捕获ACK包
tcp.len > 0: 捕获包含数据的包
tcp.port == 8090: 限定端口8090

完整数据包日志

以下是捕获到的完整TCP通信过程：

104  16.836236  127.0.0.1  →  127.0.0.1  TCP  68  65297 → 8090 [SYN] Seq=0 Win=65535 Len=0
105  16.836315  127.0.0.1  →  127.0.0.1  TCP  68  8090 → 65297 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0
106  16.836324  127.0.0.1  →  127.0.0.1  TCP  56  65297 → 8090 [ACK] Seq=1 Ack=1 Win=408256 Len=0
107  16.836330  127.0.0.1  →  127.0.0.1  TCP  56  8090 → 65297 [ACK] Seq=1 Ack=1 Win=408256 Len=0
108  16.836772  127.0.0.1  →  127.0.0.1  TCP  61  65297 → 8090 [PSH, ACK] Seq=1 Ack=1 Win=408256 Len=5
109  16.836892  127.0.0.1  →  127.0.0.1  TCP  56  8090 → 65297 [ACK] Seq=1 Ack=6 Win=408256 Len=0
110  16.836992  127.0.0.1  →  127.0.0.1  TCP  62  8090 → 65297 [PSH, ACK] Seq=1 Ack=6 Win=408256 Len=6
111  16.837015  127.0.0.1  →  127.0.0.1  TCP  56  65297 → 8090 [ACK] Seq=6 Ack=7 Win=408256 Len=0
112  16.837073  127.0.0.1  →  127.0.0.1  TCP  56  65297 → 8090 [FIN, ACK] Seq=6 Ack=7 Win=408256 Len=0
113  16.837093  127.0.0.1  →  127.0.0.1  TCP  56  8090 → 65297 [ACK] Seq=7 Ack=7 Win=408256 Len=0
114  16.837147  127.0.0.1  →  127.0.0.1  TCP  62  8090 → 65297 [PSH, ACK] Seq=7 Ack=7 Win=408256 Len=6
115  16.837181  127.0.0.1  →  127.0.0.1  TCP  44  65297 → 8090 [RST] Seq=7 Win=0 Len=0

1. TCP三次握手详细分析

1.1 第一次握手 (SYN)

104  16.836236  127.0.0.1  →  127.0.0.1  TCP  68  65297 → 8090 [SYN] Seq=0 Win=65535 Len=0

数据包104分析：

方向：127.0.0.1:65297 → 127.0.0.1:8090
标志：[SYN]
作用：客户端发送SYN包，请求建立连接
序列号：Seq=0，表示初始序列号
窗口大小：Win=65535，表示接收窗口大小
MSS：16344，表示最大段大小

1.2 第二次握手 (SYN+ACK)

105  16.836315  127.0.0.1  →  127.0.0.1  TCP  68  8090 → 65297 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0

数据包105分析：

方向：127.0.0.1:8090 → 127.0.0.1:65297
标志：[SYN, ACK]
作用：服务器响应SYN+ACK包
序列号：Seq=0 Ack=1，确认收到客户端的SYN包
窗口大小：Win=65535，服务器的接收窗口大小

1.3 第三次握手 (ACK)

106  16.836324  127.0.0.1  →  127.0.0.1  TCP  56  65297 → 8090 [ACK] Seq=1 Ack=1 Win=408256 Len=0

数据包106分析：

方向：127.0.0.1:65297 → 127.0.0.1:8090
标志：[ACK]
作用：客户端发送ACK包，确认连接建立
序列号：Seq=1 Ack=1，确认收到服务器的SYN+ACK包
状态：此时TCP连接正式建立

2. TCP连接维护阶段

2.1 窗口更新 (数据包107)

107  16.836330  127.0.0.1  →  127.0.0.1  TCP  56  8090 → 65297 [ACK] Seq=1 Ack=1 Win=408256 Len=0

数据包107分析：

方向：服务器 → 客户端
标志：[ACK]
作用：TCP窗口更新
窗口变化：从初始的65535字节扩大到408256字节
意义：允许客户端发送更多数据，提高传输效率

2.2 客户端数据传输 (数据包108)

108  16.836772  127.0.0.1  →  127.0.0.1  TCP  61  65297 → 8090 [PSH, ACK] Seq=1 Ack=1 Win=408256 Len=5

数据包108分析：

方向：客户端 → 服务器
标志：[PSH, ACK]
数据长度：Len=5
PSH标志：表示”立即推送这些数据给应用程序”
数据内容：通过十六进制分析，发送的是”study”

数据解析过程：

十六进制: 73 74 75 64 79
ASCII转换: 
73 = 's'
74 = 't' 
75 = 'u'
64 = 'd'
79 = 'y'
结果: "study"

2.3 服务器数据确认 (数据包109)

109  16.836892  127.0.0.1  →  127.0.0.1  TCP  56  8090 → 65297 [ACK] Seq=1 Ack=6 Win=408256 Len=0

数据包109分析：

方向：服务器 → 客户端
标志：[ACK]
作用：确认数据接收
序列号变化：Ack=6表示期望下一个数据包的序列号从6开始
窗口维持：Win=408256字节

2.4 服务器响应数据 (数据包110)

110  16.836992  127.0.0.1  →  127.0.0.1  TCP  62  8090 → 65297 [PSH, ACK] Seq=1 Ack=6 Win=408256 Len=6

数据包110分析：

方向：服务器 → 客户端
标志：[PSH, ACK]
数据长度：Len=6
作用：服务器发送响应数据

数据解析过程：

原始十六进制数据：

02 00 00 00 45 00 00 3a 00 00 40 00 40 06 00 00
7f 00 00 01 7f 00 00 01 1f 9a ff 11 bd 77 4d c9
a8 0a cc 93 80 18 18 eb fe 2e 00 00 01 01 08 0a
27 84 1a 14 b2 e5 29 71 e4 bd a0 e5 a5 bd

数据包结构解析

IP头部（前20字节）：

02 00 00 00: 可能是以太网帧头部分
45: IP版本4，头部长度5×4=20字节
00 00 3a: 总长度58字节
40 00: 标志位和片偏移
40 06: TTL=64，协议=6（TCP）
7f 00 00 01: 源IP 127.0.0.1
7f 00 00 01: 目标IP 127.0.0.1

TCP头部：

1f 9a: 源端口8090
ff 11: 目标端口65297
bd 77 4d c9: 序列号
a8 0a cc 93: 确认号
80 18: 头部长度和标志位（PSH+ACK）
18 eb: 窗口大小
fe 2e: 校验和

TCP选项：

01 01 08 0a 27 84 1a 14 b2 e5 29 71: TCP时间戳选项

应用数据内容: 关键部分 – 实际数据载荷：

e4 bd a0 e5 a5 bd

转换为UTF-8字符：

e4 bd a0 = 你
e5 a5 bd = 好

结论: 这个数据包发送的中文内容是：”你好”

2.5 客户端确认响应 (数据包111)

111  16.837015  127.0.0.1  →  127.0.0.1  TCP  56  65297 → 8090 [ACK] Seq=6 Ack=7 Win=408256 Len=0

数据包111分析：

方向：客户端 → 服务器
标志：[ACK]
作用：确认服务器响应
序列号：Ack=7表示期望下一个数据包从序列号7开始
完成：一次完整的数据交换确认

3. TCP四次挥手分析

3.1 第一次挥手 (FIN)

112  16.837073  127.0.0.1  →  127.0.0.1  TCP  56  65297 → 8090 [FIN, ACK] Seq=6 Ack=7 Win=408256 Len=0

数据包112分析：

方向：客户端 → 服务器
标志：[FIN, ACK]
作用：客户端发送FIN包，请求关闭连接
序列号：Seq=6 Ack=7

3.2 第二次挥手 (ACK)

113  16.837093  127.0.0.1  →  127.0.0.1  TCP  56  8090 → 65297 [ACK] Seq=7 Ack=7 Win=408256 Len=0

数据包113分析：

方向：服务器 → 客户端
标志：[ACK]
作用：服务器发送ACK包，确认收到FIN请求
序列号：Seq=7 Ack=7

3.3 第三次挥手 (FIN)

114  16.837147  127.0.0.1  →  127.0.0.1  TCP  62  8090 → 65297 [PSH, ACK] Seq=7 Ack=7 Win=408256 Len=6

数据包114分析：

方向：服务器 → 客户端
标志：[PSH, ACK]
作用：服务器发送自己的FIN包，请求关闭连接
序列号：Seq=7 Ack=7

3.4 异常关闭 (RST)

115  16.837181  127.0.0.1  →  127.0.0.1  TCP  44  65297 → 8090 [RST] Seq=7 Win=0 Len=0

数据包115分析：

方向：客户端 → 服务器
标志：[RST]
异常情况：客户端发送RST包而不是正常的ACK包
原因：可能是因为客户端强制关闭连接

4. 测试代码

4.1 服务端代码

python
import socket

socket_server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print(f'socket对象：{socket}')
socket_server.bind(('127.0.0.1', 8090))
socket_server.listen(5)
socket_client, client_info = socket_server.accept()

while True:
    recv = socket_client.recv(1024)
    print(f'服务端接收到资源：{recv}')
    socket_client.send('你好'.encode())
    
socket_server.close()

4.2 客户端代码

python
import socket

socket_client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print(f'socket对象：{socket}')
socket_client.connect(('127.0.0.1', 8090))
socket_client.send('study'.encode('utf-8'))
recv = socket_client.recv(1024)
print(f'接受数据：{recv.decode()}')
socket_client.close()

5. 深入理解：为什么需要序列号(Seq)？

5.1 保证数据顺序

没有序列号的问题：

发送方: [数据1] [数据2] [数据3]
网络传输: 可能乱序到达
接收方: [数据3] [数据1] [数据2] ❌ 错误顺序

有序列号的解决方案：

发送方: [数据1-Seq=100] [数据2-Seq=150] [数据3-Seq=200]
网络传输: 可能乱序到达
接收方: [数据3-Seq=200] [数据1-Seq=100] [数据2-Seq=150]
重组后: [数据1] [数据2] [数据3] ✓ 正确顺序

5.2 检测数据丢失

发送方发送: Seq=100(50字节) → Seq=150(50字节) → Seq=200(50字节)
接收方收到: Seq=100 → Seq=200 (发现缺少Seq=150!)
接收方响应: "我收到了100和200，但是150丢了，请重发"

5.3 防止重复数据

网络延迟导致重复:
发送方: 发送Seq=100 → 超时重发Seq=100
接收方: 收到两个Seq=100 → 丢弃重复的包

5.4 实现可靠传输

发送方: 发送Seq=100-149 (50字节)
接收方: 发送Ack=150 (确认收到100-149，期望150开始)
发送方: 知道数据已安全到达，可以发送下一批数据

6. 关键观察点总结

本地回环连接：所有通信都在127.0.0.1上进行
端口使用：客户端使用临时端口65297，服务器监听8090端口
异常关闭：最后一个包是RST而不是正常的ACK
窗口管理：可以看到TCP窗口大小的动态调整过程
序列号追踪：序列号和确认号的递增显示了数据传输的顺序

这个抓包展示了一个完整的TCP连接生命周期，但最后以RST异常终止而不是正常的四次挥手完成。通过这个实际案例，我们可以深入理解TCP协议的工作原理和数据传输机制。

TCP和QUIC面试题大全

第六部分：常见面试题深度解析

TCP的三次握手和四次挥手是网络编程面试中的高频考点。以下结合我们的实际抓包数据，深入分析常见面试题：

三次握手相关面试题

1. 基础概念题

Q1：什么是TCP三次握手？请详细描述三次握手的过程

基于抓包数据的标准答案：

从我们的抓包数据可以看到完整的三次握手过程：

# 第一次握手 (SYN)
104  16.836236  127.0.0.1  →  127.0.0.1  TCP  68  65297 → 8090 [SYN] Seq=0 Win=65535 Len=0

# 第二次握手 (SYN+ACK)  
105  16.836315  127.0.0.1  →  127.0.0.1  TCP  68  8090 → 65297 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0

# 第三次握手 (ACK)
106  16.836324  127.0.0.1  →  127.0.0.1  TCP  56  65297 → 8090 [ACK] Seq=1 Ack=1 Win=408256 Len=0

详细过程：

第一次握手：客户端发送SYN包（Seq=0），状态变为SYN_SENT
第二次握手：服务器回复SYN+ACK包（Seq=0, Ack=1），状态变为SYN_RCVD
第三次握手：客户端发送ACK包（Seq=1, Ack=1），状态变为ESTABLISHED

Q2：为什么需要三次握手？两次握手不行吗？

实际案例分析： 观察数据包104-106，我们看到：

包104：客户端证明自己能发送
包105：服务器证明自己能接收和发送
包106：客户端证明自己能接收

如果只有两次握手，客户端无法确认服务器是否收到了自己的确认，可能导致：

旧的连接请求在网络中延迟到达
服务器误认为是新的连接请求
造成资源浪费和连接混乱

Q3：三次握手中的SYN、ACK、seq、ack字段分别代表什么？

基于抓包数据解释：

从包104看到：

SYN=1：同步序列号标志，表示请求建立连接
Seq=0：客户端的初始序列号
ACK=0：此时还没有确认信息

从包105看到：

SYN=1, ACK=1：同时设置同步和确认标志
Seq=0：服务器的初始序列号
Ack=1：确认客户端的序列号+1

2. 深入理解题

Q4：第三次握手失败了会怎么样？

基于抓包分析： 如果包106丢失，会出现：

客户端认为连接已建立（发送了ACK）
服务器仍在SYN_RCVD状态（没收到ACK）
客户端发送数据时，服务器会回复RST包
类似我们抓包中包115的RST情况

Q5：什么是SYN洪水攻击？如何防范？

攻击原理： 恶意客户端大量发送SYN包（类似包104），但不完成第三次握手，导致服务器：

半连接队列被占满
无法处理正常连接请求

防范措施：

bash

# 调整半连接队列大小
net.ipv4.tcp_max_syn_backlog = 2048

# 启用SYN Cookie
net.ipv4.tcp_syncookies = 1

# 减少SYN+ACK重传次数
net.ipv4.tcp_synack_retries = 2

3. 状态变迁题

Q6：描述TCP连接建立过程中客户端和服务端的状态变化

基于抓包时间线分析：

时间轴：16.836236 → 16.836315 → 16.836324

客户端状态变化：
CLOSED → SYN_SENT(发送包104) → ESTABLISHED(发送包106)

服务器状态变化：  
LISTEN → SYN_RCVD(发送包105) → ESTABLISHED(收到包106)

四次挥手相关面试题

1. 基础概念题

Q7：什么是TCP四次挥手？请详细描述四次挥手的过程

基于抓包数据分析：

# 第一次挥手 (FIN)
112  16.837073  127.0.0.1  →  127.0.0.1  TCP  56  65297 → 8090 [FIN, ACK] Seq=6 Ack=7 Win=408256 Len=0

# 第二次挥手 (ACK)
113  16.837093  127.0.0.1  →  127.0.0.1  TCP  56  8090 → 65297 [ACK] Seq=7 Ack=7 Win=408256 Len=0

# 第三次挥手 (FIN) - 包114实际发送了数据
114  16.837147  127.0.0.1  →  127.0.0.1  TCP  62  8090 → 65297 [PSH, ACK] Seq=7 Ack=7 Win=408256 Len=6

# 第四次挥手异常 (RST)
115  16.837181  127.0.0.1  →  127.0.0.1  TCP  44  65297 → 8090 [RST] Seq=7 Win=0 Len=0

注意：我们的抓包显示了一个异常情况，最后是RST而不是正常的ACK。

Q8：为什么连接的建立是三次握手，而关闭却是四次挥手？

关键区别分析：

建立连接：SYN和ACK可以合并在一个包中发送（包105）
关闭连接：FIN和ACK通常需要分开发送，因为：
- 收到FIN后需要立即ACK确认（包113）
- 但可能还有数据要发送（包114发送了6字节数据）
- 发送完所有数据后才能发送自己的FIN

2. TIME_WAIT相关题

Q9：什么是TIME_WAIT状态？为什么需要TIME_WAIT？

我们抓包中的异常情况： 包115显示连接被RST强制关闭，如果是正常关闭：

客户端发送最后的ACK后进入TIME_WAIT
持续2MSL（Maximum Segment Lifetime）时间
确保对方收到最后的ACK

TIME_WAIT的作用：

确保最后的ACK能够到达：如果ACK丢失，对方会重发FIN
防止旧连接的数据包干扰新连接：等待网络中旧数据包消失

Q10：TIME_WAIT状态过多会有什么问题？如何解决？

问题分析：

每个TIME_WAIT占用一个端口（如我们例子中的65297）
大量TIME_WAIT可能导致端口耗尽
内存资源占用

解决方案：

bash

# 启用TIME_WAIT重用
net.ipv4.tcp_tw_reuse = 1

# 调整TIME_WAIT超时时间
net.ipv4.tcp_fin_timeout = 30

# 使用SO_REUSEADDR选项
socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

3. 异常情况题

Q11：什么是RST包？什么时候会发送RST包？

基于包115的RST分析：

115  16.837181  127.0.0.1  →  127.0.0.1  TCP  44  65297 → 8090 [RST] Seq=7 Win=0 Len=0

RST包的特点：

Win=0：窗口大小为0
立即关闭连接，不进入TIME_WAIT状态
表示连接异常终止

发送RST的常见情况：

端口未监听：连接到未开放的端口
连接异常：应用程序崩溃
协议错误：收到意外的数据包
强制关闭：应用程序强制关闭连接

四、TCP vs QUIC对比题

基础对比题

Q11: TCP的三次握手和QUIC的握手有什么区别？

TCP三次握手：

需要3个RTT（包括TLS握手）
顺序：TCP握手 → TLS握手 → 数据传输

QUIC握手：

首次连接：1-RTT
重复连接：0-RTT
集成：传输层和安全层集成

Q12: 为什么QUIC可以实现0-RTT连接建立？

QUIC 0-RTT的实现：

会话恢复：复用之前的加密参数
预共享密钥：使用PSK（Pre-Shared Key）
风险控制：限制0-RTT数据的类型，防止重放攻击

性能对比题

Q13: 在连接建立速度上，QUIC相比TCP+TLS有什么优势？

TCP + TLS握手时间：

TCP三次握手: 1 RTT
TLS握手: 1-2 RTT
总计: 2-3 RTT

QUIC握手时间：

首次连接: 1 RTT
重复连接: 0 RTT
减少: 1-3 RTT

实际性能提升：

延迟减少：特别是在高延迟网络中优势明显
移动网络：网络切换时的连接恢复更快

连接管理对比题

Q14: QUIC的连接关闭机制和TCP的四次挥手有什么不同？

TCP四次挥手：

需要4个数据包
存在TIME_WAIT状态
可能出现RST异常关闭（如我们的例子）

QUIC连接关闭：

CONNECTION_CLOSE帧：优雅关闭
无TIME_WAIT：立即释放资源
连接迁移：支持连接ID变更

多路复用对比题

Q15: TCP的单流阻塞问题（Head-of-Line Blocking）是什么？

TCP队头阻塞问题：

发送顺序: [包1] [包2] [包3]
网络情况: 包1丢失，包2、包3到达
TCP行为: 必须等待包1重传，包2、包3被阻塞

QUIC多路复用解决方案：

Stream 1: [包1丢失] - 只影响Stream 1
Stream 2: [包2到达] - 立即处理
Stream 3: [包3到达] - 立即处理

备注：TCP确实也有多路复用（通过多个TCP连接），但QUIC解决队头阻塞的根本原因在于流级别的独立性，而不仅仅是多路复用本身。

TCP的队头阻塞发生在传输层：

TCP保证字节流的有序传递
即使应用层有多个逻辑流，它们都共享同一个TCP连接的字节流
当某个包丢失时，TCP必须等待重传并按序重组，阻塞整个连接

TCP连接中的数据流：
[Stream1数据][Stream2数据][丢失包][Stream3数据]
         ↓
由于丢失包，Stream3数据无法被应用层读取

QUIC的解决方案：

QUIC在应用层实现了真正的流独立性：

独立的流重组：每个QUIC流有自己的接收缓冲区和重组逻辑
流级别的流量控制：每个流可以独立进行流量控制
选择性确认：可以针对特定流的包进行确认，而不影响其他流

QUIC多流传输：
Stream 1: [包1丢失] - 只影响Stream 1，等待重传
Stream 2: [包2到达] - 立即处理并交付给应用层
Stream 3: [包3到达] - 立即处理并交付给应用层

为什么TCP多路复用不能解决：

HTTP/1.1的多个TCP连接确实能缓解问题，但：

资源开销大：每个连接都有独立的拥塞控制、慢启动
连接管理复杂：需要维护多个TCP状态机
仍有局部阻塞：单个连接内的多个请求仍会相互阻塞

HTTP/2虽然在单个TCP连接上实现了多路复用，但底层TCP层面的队头阻塞问题依然存在。

所以QUIC的核心优势是在单个连接内实现了真正的流级别独立性，这是TCP层面无法解决的架构问题。

五、实际应用场景题

部署考虑题

Q16: 在什么场景下选择QUIC比TCP更有优势？

QUIC适用场景：

移动网络：频繁的网络切换
高延迟网络：卫星通信、跨国连接
实时应用：在线游戏、视频直播
Web应用：HTTP/3，特别是多资源加载

TCP适用场景：

企业内网：稳定的网络环境
批量传输：大文件传输
传统应用：现有的TCP应用生态

协议演进题

Q17: 为什么HTTP/3选择了QUIC而不是改进TCP？

TCP改进的限制：

内核实现：TCP在内核中实现，修改困难
中间设备：防火墙、NAT设备的兼容性问题
向后兼容：不能破坏现有的TCP应用

QUIC的优势：

用户空间实现：易于更新和优化
UDP基础：绕过中间设备的TCP限制
快速迭代：可以快速部署新特性

总结与感想

通过本次Wireshark抓包分析，我们深入理解了TCP协议的核心机制：

关键发现

三次握手：确保双方都能收发数据
数据传输：通过序列号保证可靠性
四次挥手：优雅地关闭连接（本例中出现异常RST）
序列号机制：TCP可靠传输的基础

Python内存结构模型源码详细解析

2025-6-28 评论(0) 分类:人工智能 Tags:Python 源码解析

1. 核心对象模型 (Include/object.h)

1.1 基础对象结构 PyObject

// Include/object.h
typedef struct _object {
    _PyObject_HEAD_EXTRA    // 调试模式下的额外字段
    Py_ssize_t ob_refcnt;   // 引用计数
    PyTypeObject *ob_type;  // 类型对象指针
} PyObject;

// 变长对象 PyVarObject是Python中用于表示可变长度对象（如列表、元组、字符串等）的基础结构。它扩展了基本的PyObject，增加了一个表示对象中元素数量的字段（ob_size）
typedef struct {
    PyObject ob_base;       // 基础对象
    Py_ssize_t ob_size;     // 对象大小
} PyVarObject;

源码位置: Include/object.h:106-120

内存布局:

PyObject:
┌─────────────┬──────────────┬─────────────┐
│ ob_refcnt   │   ob_type    │   data...   │
│   8 bytes   │   8 bytes    │  variable   │
└─────────────┴──────────────┴─────────────┘

1.2 类型对象 PyTypeObject

// Include/cpython/object.h
typedef struct _typeobject {
    PyVarObject ob_base;
    const char *tp_name;             // 类型名称
    Py_ssize_t tp_basicsize;         // 基本大小
    Py_ssize_t tp_itemsize;          // 元素大小

    // 析构和打印
    destructor tp_dealloc;
    Py_ssize_t tp_vectorcall_offset;

    // 标准方法
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    PyAsyncMethods *tp_as_async;
    reprfunc tp_repr;

    // 数值方法、序列方法、映射方法
    PyNumberMethods *tp_as_number;
    PySequenceMethods *tp_as_sequence;
    PyMappingMethods *tp_as_mapping;

    // 更多字段...
} PyTypeObject;

2. 整数对象内存模型 (Objects/longobject.c)

2.1 长整数结构

// Include/cpython/longintrepr.h
struct _longobject {
    PyVarObject ob_base;
    digit ob_digit[1];      // 数字位数组
};

typedef struct _longobject PyLongObject;

// digit 是 30 位或 15 位的无符号整数
#if PYLONG_BITS_IN_DIGIT == 30
typedef uint32_t digit;
typedef int32_t sdigit;
typedef uint64_t twodigits;
#elif PYLONG_BITS_IN_DIGIT == 15
typedef unsigned short digit;
typedef short sdigit;
typedef unsigned long twodigits;
#endif

小整数缓存机制:

// Objects/longobject.c
#define IS_SMALL_INT(ival) (-_PY_NSMALLNEGINTS <= (ival) && (ival) < _PY_NSMALLPOSINTS)
#define IS_SMALL_UINT(ival) ((ival) < _PY_NSMALLPOSINTS)

// Objects/longobject.c
#define NSMALLPOSINTS           257
#define NSMALLNEGINTS           5

// 小整数对象池 [-5, 256]
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

PyObject *
PyLong_FromLong(long ival)
{
    // 使用小整数缓存
    if (IS_SMALL_INT(ival)) {
        return get_small_int((sdigit)ival);
    }
    // 创建新的长整数对象
    return PyLong_FromLongLong(ival);
}

2.2 内存布局示例

小整数 (42):
┌───────────┬──────────┬────────┬──────────┐
│ ob_refcnt │ ob_type  │ob_size │ ob_digit │
│     ?     │  &Long   │   1    │    42    │
└───────────┴──────────┴────────┴──────────┘

大整数 (2^100):
┌───────────┬──────────┬────────┬─────────────────────┐
│ ob_refcnt │ ob_type  │ob_size │    ob_digit[]       │
│     1     │  &Long   │   4    │ [低位...高位]       │
└───────────┴──────────┴────────┴─────────────────────┘

3. 字符串对象内存模型 (Objects/unicodeobject.c)

3.1 Unicode 对象结构

// Include/cpython/unicodeobject.h
typedef struct {
    PyObject_HEAD
    Py_ssize_t length;          // 字符串长度
    Py_hash_t hash;            // 哈希值缓存
    struct {
        unsigned int interned:2;    // 字符串驻留状态
        unsigned int kind:3;        // 字符串类型
        unsigned int compact:1;     // 是否紧凑存储
        unsigned int ascii:1;       // 是否纯ASCII
        unsigned int ready:1;       // 是否就绪
        unsigned int :24;
    } state;
    wchar_t *wstr;             // 宽字符表示（可选）
} PyASCIIObject;

// 紧凑 Unicode 对象
typedef struct {
    PyASCIIObject _base;
    Py_ssize_t utf8_length;    // UTF-8 长度
    char *utf8;                // UTF-8 缓存
    Py_ssize_t wstr_length;    // 宽字符长度
} PyCompactUnicodeObject;

3.2 字符串驻留机制

// Objects/unicodeobject.c
static PyObject *interned = NULL;  // 驻留字符串字典

void
PyUnicode_InternInPlace(PyObject **p)
{
    PyObject *s = *p;
    if (PyUnicode_READY(s) == -1) {
        return;
    }

    // 检查是否已驻留
    if (PyUnicode_CHECK_INTERNED(s)) {
        return;
    }

    // 添加到驻留字典
    PyObject *t = PyDict_SetDefault(interned, s, s);
    if (t != s) {
        Py_SETREF(*p, Py_NewRef(t));
    }
    PyUnicode_CHECK_INTERNED(s) = SSTATE_INTERNED_MORTAL;
}

4. 列表对象内存模型 (Objects/listobject.c)

4.1 列表结构

// Include/cpython/listobject.h
typedef struct {
    PyVarObject ob_base;
    PyObject **ob_item;        // 指向元素数组的指针
    Py_ssize_t allocated;      // 已分配的空间大小
} PyListObject;

4.2 动态扩容机制

// Objects/listobject.c
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
    PyObject **items;
    size_t new_allocated, num_allocated_bytes;
    Py_ssize_t allocated = self->allocated;

    // 扩容策略: newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6)
    new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;

    if (newsize == 0) {
        PyMem_FREE(self->ob_item);
        self->ob_item = NULL;
        self->allocated = 0;
        return 0;
    }

    // 重新分配内存
    items = (PyObject **)PyMem_Realloc(self->ob_item,
                                       new_allocated * sizeof(PyObject *));
    if (items == NULL) {
        PyErr_NoMemory();
        return -1;
    }

    self->ob_item = items;
    self->allocated = new_allocated;
    return 0;
}

4.3 内存布局

PyObject:
┌─────────────┬──────────────┬─────────────┐
│ ob_refcnt   │   ob_type    │   data...   │
│   8 bytes   │   8 bytes    │  variable   │
└─────────────┴──────────────┴─────────────┘

1.2 类型对象 PyTypeObject

// Include/cpython/object.h
typedef struct _typeobject {
    PyVarObject ob_base;
    const char *tp_name;             // 类型名称
    Py_ssize_t tp_basicsize;         // 基本大小
    Py_ssize_t tp_itemsize;          // 元素大小

    // 析构和打印
    destructor tp_dealloc;
    Py_ssize_t tp_vectorcall_offset;

    // 标准方法
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    PyAsyncMethods *tp_as_async;
    reprfunc tp_repr;

    // 数值方法、序列方法、映射方法
    PyNumberMethods *tp_as_number;
    PySequenceMethods *tp_as_sequence;
    PyMappingMethods *tp_as_mapping;

    // 更多字段...
} PyTypeObject;

2. 整数对象内存模型 (Objects/longobject.c)

2.1 长整数结构

// Include/cpython/longintrepr.h
struct _longobject {
    PyVarObject ob_base;
    digit ob_digit[1];      // 数字位数组
};

typedef struct _longobject PyLongObject;

// digit 是 30 位或 15 位的无符号整数
#if PYLONG_BITS_IN_DIGIT == 30
typedef uint32_t digit;
typedef int32_t sdigit;
typedef uint64_t twodigits;
#elif PYLONG_BITS_IN_DIGIT == 15
typedef unsigned short digit;
typedef short sdigit;
typedef unsigned long twodigits;
#endif

小整数缓存机制:

// Objects/longobject.c
#define IS_SMALL_INT(ival) (-_PY_NSMALLNEGINTS <= (ival) && (ival) < _PY_NSMALLPOSINTS)
#define IS_SMALL_UINT(ival) ((ival) < _PY_NSMALLPOSINTS)

// Objects/longobject.c
#define NSMALLPOSINTS           257
#define NSMALLNEGINTS           5

// 小整数对象池 [-5, 256]
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

PyObject *
PyLong_FromLong(long ival)
{
    // 使用小整数缓存
    if (IS_SMALL_INT(ival)) {
        return get_small_int((sdigit)ival);
    }
    // 创建新的长整数对象
    return PyLong_FromLongLong(ival);
}

2.2 内存布局示例

小整数 (42):
┌───────────┬──────────┬────────┬──────────┐
│ ob_refcnt │ ob_type  │ob_size │ ob_digit │
│     ?     │  &Long   │   1    │    42    │
└───────────┴──────────┴────────┴──────────┘

大整数 (2^100):
┌───────────┬──────────┬────────┬─────────────────────┐
│ ob_refcnt │ ob_type  │ob_size │    ob_digit[]       │
│     1     │  &Long   │   4    │ [低位...高位]       │
└───────────┴──────────┴────────┴─────────────────────┘

3. 字符串对象内存模型 (Objects/unicodeobject.c)

3.1 Unicode 对象结构

// Include/cpython/unicodeobject.h
typedef struct {
    PyObject_HEAD
    Py_ssize_t length;          // 字符串长度
    Py_hash_t hash;            // 哈希值缓存
    struct {
        unsigned int interned:2;    // 字符串驻留状态
        unsigned int kind:3;        // 字符串类型
        unsigned int compact:1;     // 是否紧凑存储
        unsigned int ascii:1;       // 是否纯ASCII
        unsigned int ready:1;       // 是否就绪
        unsigned int :24;
    } state;
    wchar_t *wstr;             // 宽字符表示（可选）
} PyASCIIObject;

// 紧凑 Unicode 对象
typedef struct {
    PyASCIIObject _base;
    Py_ssize_t utf8_length;    // UTF-8 长度
    char *utf8;                // UTF-8 缓存
    Py_ssize_t wstr_length;    // 宽字符长度
} PyCompactUnicodeObject;

3.2 字符串驻留机制

// Objects/unicodeobject.c
static PyObject *interned = NULL;  // 驻留字符串字典

void
PyUnicode_InternInPlace(PyObject **p)
{
    PyObject *s = *p;
    if (PyUnicode_READY(s) == -1) {
        return;
    }

    // 检查是否已驻留
    if (PyUnicode_CHECK_INTERNED(s)) {
        return;
    }

    // 添加到驻留字典
    PyObject *t = PyDict_SetDefault(interned, s, s);
    if (t != s) {
        Py_SETREF(*p, Py_NewRef(t));
    }
    PyUnicode_CHECK_INTERNED(s) = SSTATE_INTERNED_MORTAL;
}

4. 列表对象内存模型 (Objects/listobject.c)

4.1 列表结构

// Include/cpython/listobject.h
typedef struct {
    PyVarObject ob_base;
    PyObject **ob_item;        // 指向元素数组的指针
    Py_ssize_t allocated;      // 已分配的空间大小
} PyListObject;

4.2 动态扩容机制

// Objects/listobject.c
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
    PyObject **items;
    size_t new_allocated, num_allocated_bytes;
    Py_ssize_t allocated = self->allocated;

    // 扩容策略: newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6)
    new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;

    if (newsize == 0) {
        PyMem_FREE(self->ob_item);
        self->ob_item = NULL;
        self->allocated = 0;
        return 0;
    }

    // 重新分配内存
    items = (PyObject **)PyMem_Realloc(self->ob_item,
                                       new_allocated * sizeof(PyObject *));
    if (items == NULL) {
        PyErr_NoMemory();
        return -1;
    }

    self->ob_item = items;
    self->allocated = new_allocated;
    return 0;
}

4.3 内存布局

空列表:
┌───────────┬──────────┬────────┬──────────┬───────────┐
│ ob_refcnt │ ob_type  │ob_size │ ob_item  │ allocated │
│     1     │  &List   │   0    │   NULL   │     0     │
└───────────┴──────────┴────────┴──────────┴───────────┘

包含元素的列表 [1, 2, 3]:
┌───────────┬──────────┬────────┬──────────┬───────────┐
│ ob_refcnt │ ob_type  │ob_size │ ob_item  │ allocated │
│     1     │  &List   │   3    │   ptr    │     4     │
└───────────┴──────────┴────────┴──────────┴───────────┘
                                      │
                                      ▼
                               ┌──────────────────┐
                               │ PyObject *[4]    │
                               │ [0]: &int(1)     │
                               │ [1]: &int(2)     │
                               │ [2]: &int(3)     │
                               │ [3]: NULL        │
                               └──────────────────┘

5. 字典对象内存模型 (Objects/dictobject.c)

5.1 字典结构 (紧凑表示)

// Objects/dictobject.c
struct _dictkeysobject {
    Py_ssize_t dk_refcnt;      // 键对象引用计数
    Py_ssize_t dk_size;        // 哈希表大小
    dict_lookup_func dk_lookup; // 查找函数
    Py_ssize_t dk_usable;      // 可用槽位数
    Py_ssize_t dk_nentries;    // 已使用条目数
    char dk_indices[];         // 索引数组 + 条目数组
};

typedef struct {
    PyObject_HEAD
    Py_ssize_t ma_used;        // 已使用条目数
    uint64_t ma_version_tag;   // 版本标签
    PyDictKeysObject *ma_keys; // 键对象
    PyObject **ma_values;      // 值数组（分离表示）
} PyDictObject;

5.2 哈希冲突解决

// Objects/dictobject.c
static Py_ssize_t
lookdict(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject **value_addr)
{
    PyDictKeysObject *dk = mp->ma_keys;
    size_t mask = DK_MASK(dk);
    size_t perturb = hash;
    size_t i = (size_t)hash & mask;

    // 开放寻址法解决冲突
    for (;;) {
        Py_ssize_t ix = dk_get_index(dk, i);
        if (ix == DKIX_EMPTY) {
            *value_addr = NULL;
            return ix;
        }

        PyDictKeyEntry *ep = &DK_ENTRIES(dk)[ix];
        assert(ep->me_key != NULL);

        if (ep->me_key == key) {
            *value_addr = ep->me_value;
            return ix;
        }

        // 扰动探测
        perturb >>= PERTURB_SHIFT;
        i = (i*5 + 1 + perturb) & mask;
    }
}

6. 内存管理机制

6.1 对象分配器 (Objects/obmalloc.c)

// Objects/obmalloc.c

// 内存池结构
struct pool_header {
    union { block *_padding;
            uint count; } ref;          // 引用计数
    block *freeblock;                  // 空闲块链表
    struct pool_header *nextpool;      // 下一个池
    struct pool_header *prevpool;      // 前一个池
    uint arenaindex;                   // 竞技场索引
    uint szidx;                        // 大小索引
    uint nextoffset;                   // 下一个偏移
    uint maxnextoffset;                // 最大偏移
};

// PyObject_Malloc 实现
void *
PyObject_Malloc(size_t size)
{
    if (size > SMALL_REQUEST_THRESHOLD) {
        return PyMem_RawMalloc(size);
    }

    // 使用对象分配器
    uint pool_idx = size_to_pool_idx(size);
    pool_header *pool = usedpools[pool_idx];

    if (pool != NULL) {
        // 从现有池分配
        block *bp = pool->freeblock;
        if (bp != NULL) {
            pool->freeblock = *(block **)bp;
            return (void *)bp;
        }
    }

    // 创建新池或使用系统分配器
    return allocate_from_new_pool(size);
}

6.2 垃圾回收机制 (Modules/gcmodule.c)

// Modules/gcmodule.c

// GC 头结构
typedef union _gc_head {
    struct {
        union _gc_head *gc_next;
        union _gc_head *gc_prev;
        Py_ssize_t gc_refs;
    } gc;
    double dummy;  // 对齐
} PyGC_Head;

// 垃圾回收主函数
static Py_ssize_t
gc_collect_main(PyThreadState *tstate, int generation, Py_ssize_t *n_collected, Py_ssize_t *n_uncollectable, int nofail)
{
    Py_ssize_t m = 0; // 收集的对象数
    Py_ssize_t n = 0; // 无法收集的对象数
    PyGC_Head *young; // 年轻代
    PyGC_Head *old; // 老年代
    PyGC_Head unreachable; // 不可达对象
    PyGC_Head finalizers; // 终结器对象

    // 标记-清扫算法
    // 1. 将引用计数复制到gc_refs
    update_refs(young);

    // 2. 减少内部引用
    subtract_refs(young);

    // 3. 标记可达对象
    move_unreachable(young, &unreachable);

    // 4. 处理终结器
    move_legacy_finalizers(&unreachable, &finalizers);

    // 5. 删除不可达对象
    delete_garbage(tstate, &unreachable, generation);

    return n+m;
}

// Modules/gcmodule.c 详细展开
static Py_ssize_t
gc_collect_main(PyThreadState *tstate, int generation,
                Py_ssize_t *n_collected, Py_ssize_t *n_uncollectable,
                int nofail)
{
    int i;
    Py_ssize_t m = 0; /* # objects collected */
    Py_ssize_t n = 0; /* # unreachable objects that couldn't be collected */
    PyGC_Head *young; /* the generation we are examining */
    PyGC_Head *old; /* next older generation */
    PyGC_Head unreachable; /* non-problematic unreachable trash */
    PyGC_Head finalizers;  /* objects with, & reachable from, __del__ */
    PyGC_Head *gc;
    _PyTime_t t1 = 0;   /* initialize to prevent a compiler warning */
    GCState *gcstate = &tstate->interp->gc;

    // gc_collect_main() must not be called before _PyGC_Init
    // or after _PyGC_Fini()
    assert(gcstate->garbage != NULL);
    assert(!_PyErr_Occurred(tstate));

    if (gcstate->debug & DEBUG_STATS) {
        PySys_WriteStderr("gc: collecting generation %d...\n", generation);
        show_stats_each_generations(gcstate);
        t1 = _PyTime_GetPerfCounter();
    }

    if (PyDTrace_GC_START_ENABLED())
        PyDTrace_GC_START(generation);

        **//重置当前代及更年轻代的计数器，增加下一代的计数器，之所以重制当前代的计数器，是为了完成垃圾回收后，当前的计数器还处于较高的峰值，再次触发回收，给下一代增加1 ，这个很好理解，为了出发老年代的对象**
        
    /* update collection and allocation counters */
    if (generation+1 < NUM_GENERATIONS)
        gcstate->generations[generation+1].count += 1;
    for (i = 0; i <= generation; i++)
        gcstate->generations[i].count = 0;

        **//这是分代收集的关键：收集第N代时，会同时收集所有更年轻的代（0到N-1代）。**
    /* merge younger generations with one we are currently collecting */
    for (i = 0; i < generation; i++) {
        gc_list_merge(GEN_HEAD(gcstate, i), GEN_HEAD(gcstate, generation));
    }

        **//young: 指向当前正在收集的代
        //old: 指向下一个更老的代（或在特殊情况下指向当前代）**
    /* handy references */
    young = GEN_HEAD(gcstate, generation);
    if (generation < NUM_GENERATIONS-1)
        old = GEN_HEAD(gcstate, generation+1);
    else
        old = young;
    validate_list(old, collecting_clear_unreachable_clear);

        **//将不可达的对象封装到unreachable链表里，可达对象保留在young链表中，分离不可达对象（move_unreachable）: 将`gc_refs`为0的对象移到不可达链表，大于0的对象留在可达链表。**
    deduce_unreachable(young, &unreachable);

    untrack_tuples(young);
    /* Move reachable objects to next generation. */
    if (young != old) {
        if (generation == NUM_GENERATIONS - 2) {
            gcstate->long_lived_pending += gc_list_size(young);
        }
        gc_list_merge(young, old);
    }
    else {
        /* We only un-track dicts in full collections, to avoid quadratic
           dict build-up. See issue #14775. */
        untrack_dicts(young);
        gcstate->long_lived_pending = 0;
        gcstate->long_lived_total = gc_list_size(young);
    }

        **//创建一个空的链表用于存放有终结器的对象
        //这个列表将包含所有不能立即删除的"问题"对象**
    /* All objects in unreachable are trash, but objects reachable from
     * legacy finalizers (e.g. tp_del) can't safely be deleted.
     */
    gc_list_init(&finalizers);
    // NEXT_MASK_UNREACHABLE is cleared here.
    // After move_legacy_finalizers(), unreachable is normal list.
    
        //   从unreachable列表中找出所有有遗留终结器的对象，移动到finalizers列表
        //具体过程：
        //	遍历unreachable列表中的每个对象
        //	检查对象是否有__del__方法或tp_del函数
        //	如果有，将该对象从unreachable移动到finalizers
        //	同时清除对象的NEXT_MASK_UNREACHABLE标记

    move_legacy_finalizers(&unreachable, &finalizers);
    
    **//找出从终结器对象可达的其他对象，也移动到finalizers列表**
    /* finalizers contains the unreachable objects with a legacy finalizer;
     * unreachable objects reachable *from* those are also uncollectable,
     * and we move those into the finalizers list too.
     */
    move_legacy_finalizer_reachable(&finalizers);

    validate_list(&finalizers, collecting_clear_unreachable_clear);
    validate_list(&unreachable, collecting_set_unreachable_clear);

    /* Print debugging information. */
    if (gcstate->debug & DEBUG_COLLECTABLE) {
        for (gc = GC_NEXT(&unreachable); gc != &unreachable; gc = GC_NEXT(gc)) {
            debug_cycle("collectable", FROM_GC(gc));
        }
    }

        **//清理弱引用并调用相关回调函数。**
    /* Clear weakrefs and invoke callbacks as necessary. */
    m += handle_weakrefs(&unreachable, old);

    validate_list(old, collecting_clear_unreachable_clear);
    validate_list(&unreachable, collecting_set_unreachable_clear);

        **//调用对象的tp_finalize方法**
    /* Call tp_finalize on objects which have one. */
    finalize_garbage(tstate, &unreachable);

    /* Handle any objects that may have resurrected after the call
     * to 'finalize_garbage' and continue the collection with the
     * objects that are still unreachable */
    PyGC_Head final_unreachable;
    handle_resurrected_objects(&unreachable, &final_unreachable, old);

    /* Call tp_clear on objects in the final_unreachable set.  This will cause
    * the reference cycles to be broken.  It may also cause some objects
    * in finalizers to be freed.
    */
    m += gc_list_size(&final_unreachable);
    delete_garbage(tstate, gcstate, &final_unreachable, old);

    /* Collect statistics on uncollectable objects found and print
     * debugging information. */
    for (gc = GC_NEXT(&finalizers); gc != &finalizers; gc = GC_NEXT(gc)) {
        n++;
        if (gcstate->debug & DEBUG_UNCOLLECTABLE)
            debug_cycle("uncollectable", FROM_GC(gc));
    }
    if (gcstate->debug & DEBUG_STATS) {
        double d = _PyTime_AsSecondsDouble(_PyTime_GetPerfCounter() - t1);
        PySys_WriteStderr(
            "gc: done, %zd unreachable, %zd uncollectable, %.4fs elapsed\n",
            n+m, n, d);
    }

    /* Append instances in the uncollectable set to a Python
     * reachable list of garbage.  The programmer has to deal with
     * this if they insist on creating this type of structure.
     */
    handle_legacy_finalizers(tstate, gcstate, &finalizers, old);
    validate_list(old, collecting_clear_unreachable_clear);

    /* Clear free list only during the collection of the highest
     * generation */
    if (generation == NUM_GENERATIONS-1) {
        clear_freelists(tstate->interp);
    }

    if (_PyErr_Occurred(tstate)) {
        if (nofail) {
            _PyErr_Clear(tstate);
        }
        else {
            _PyErr_WriteUnraisableMsg("in garbage collection", NULL);
        }
    }

    /* Update stats */
    if (n_collected) {
        *n_collected = m;
    }
    if (n_uncollectable) {
        *n_uncollectable = n;
    }

    struct gc_generation_stats *stats = &gcstate->generation_stats[generation];
    stats->collections++;
    stats->collected += m;
    stats->uncollectable += n;

    if (PyDTrace_GC_DONE_ENABLED()) {
        PyDTrace_GC_DONE(n + m);
    }

    assert(!_PyErr_Occurred(tstate));
    return n + m;
}

move_legacy_finalizers(&unreachable, &finalizers)作用

功能：找出从终结器对象可达的其他对象，也移动到finalizers列表
为什么需要这步？
Container:
    def __del__(self):
        # 可能访问self.data
        print(f"删除容器: {self.data}")

class Data:
    pass

# 创建循环引用
container = Container()
data = Data()
container.data = data
data.container = container
在这个例子中：

Container有__del__方法，会被放入finalizers
Data对象虽然没有终结器，但从Container可达
如果先删除Data，Container.__del__执行时会出错
因此Data也必须被标记为不可收集

执行过程：

从finalizers中的每个对象开始
通过BFS/DFS遍历所有可达对象
将这些可达对象也移动到finalizers列表
确保终结器执行时不会访问到已删除的对象

6.2.2. finalize_garbage方法

finalize_garbage(PyThreadState *tstate, PyGC_Head *collectable)
{
    destructor finalize;
    PyGC_Head seen;

    /* While we're going through the loop, `finalize(op)` may cause op, or
     * other objects, to be reclaimed via refcounts falling to zero.  So
     * there's little we can rely on about the structure of the input
     * `collectable` list across iterations.  For safety, we always take the
     * first object in that list and move it to a temporary `seen` list.
     * If objects vanish from the `collectable` and `seen` lists we don't
     * care.
     */
    gc_list_init(&seen);

    while (!gc_list_is_empty(collectable)) {
        PyGC_Head *gc = GC_NEXT(collectable);
        PyObject *op = FROM_GC(gc);
        gc_list_move(gc, &seen);
        if (!_PyGCHead_FINALIZED(gc) &&
                (finalize = Py_TYPE(op)->tp_finalize) != NULL) {
            _PyGCHead_SET_FINALIZED(gc);
            Py_INCREF(op);
            finalize(op);
            assert(!_PyErr_Occurred(tstate));
            Py_DECREF(op);
        }
    }
    gc_list_merge(&seen, collectable);
}

(Objects/typeobject.c)

FLSLOT(__init__, tp_init, slot_tp_init, (wrapperfunc)(void(*)(void))wrap_init,
           "__init__($self, /, *args, **kwargs)\n--\n\n"
           "Initialize self.  See help(type(self)) for accurate signature.",
           PyWrapperFlag_KEYWORDS),
    TPSLOT(__new__, tp_new, slot_tp_new, NULL,
           "__new__(type, /, *args, **kwargs)\n--\n\n"
           "Create and return new object.  See help(type) for accurate signature."),
    **TPSLOT(__del__, tp_finalize, slot_tp_finalize, (wrapperfunc)wrap_del, ""),**

    BUFSLOT(__buffer__, bf_getbuffer, slot_bf_getbuffer, wrap_buffer,
            "__buffer__($self, flags, /)\n--\n\n"
            "Return a buffer object that exposes the underlying memory of the object."),
    BUFSLOT(__release_buffer__, bf_releasebuffer, slot_bf_releasebuffer, wrap_releasebuffer,
            "__release_buffer__($self, buffer, /)\n--\n\n"
            "Release the buffer object that exposes the underlying memory of the object."),
                     
......

static void
slot_tp_finalize(PyObject *self)
{
    int unbound;
    PyObject *del, *res;

    /* Save the current exception, if any. */
    PyObject *exc = PyErr_GetRaisedException();

    /* Execute __del__ method, if any. */
    del = lookup_maybe_method(self, &_Py_ID(__del__), &unbound);
    if (del != NULL) {
        res = call_unbound_noarg(unbound, del, self);
        if (res == NULL)
            PyErr_WriteUnraisable(del);
        else
            Py_DECREF(res);
        Py_DECREF(del);
    }

    /* Restore the saved exception. */
    PyErr_SetRaisedException(exc);
}

// 调用del方法

lookup_maybe_method(PyObject *self, PyObject *attr, int *unbound)
{
    PyObject *res = _PyType_Lookup(Py_TYPE(self), attr);
    if (res == NULL) {
        return NULL;
    }

    if (_PyType_HasFeature(Py_TYPE(res), Py_TPFLAGS_METHOD_DESCRIPTOR)) {
        /* Avoid temporary PyMethodObject */
        *unbound = 1;
        Py_INCREF(res);
    }
    else {
        *unbound = 0;
        descrgetfunc f = Py_TYPE(res)->tp_descr_get;
        if (f == NULL) {
            Py_INCREF(res);
        }
        else {
            res = f(res, self, (PyObject *)(Py_TYPE(self)));
        }
    }
    return res;
}

(Objects/typeobject.c)

FLSLOT(__init__, tp_init, slot_tp_init, (wrapperfunc)(void(*)(void))wrap_init,
           "__init__($self, /, *args, **kwargs)\n--\n\n"
           "Initialize self.  See help(type(self)) for accurate signature.",
           PyWrapperFlag_KEYWORDS),
    TPSLOT(__new__, tp_new, slot_tp_new, NULL,
           "__new__(type, /, *args, **kwargs)\n--\n\n"
           "Create and return new object.  See help(type) for accurate signature."),
    **TPSLOT(__del__, tp_finalize, slot_tp_finalize, (wrapperfunc)wrap_del, ""),**

    BUFSLOT(__buffer__, bf_getbuffer, slot_bf_getbuffer, wrap_buffer,
            "__buffer__($self, flags, /)\n--\n\n"
            "Return a buffer object that exposes the underlying memory of the object."),
    BUFSLOT(__release_buffer__, bf_releasebuffer, slot_bf_releasebuffer, wrap_releasebuffer,
            "__release_buffer__($self, buffer, /)\n--\n\n"
            "Release the buffer object that exposes the underlying memory of the object."),
                     
......

static void
slot_tp_finalize(PyObject *self)
{
    int unbound;
    PyObject *del, *res;

    /* Save the current exception, if any. */
    PyObject *exc = PyErr_GetRaisedException();

    /* Execute __del__ method, if any. */
    del = lookup_maybe_method(self, &_Py_ID(__del__), &unbound);
    if (del != NULL) {
        res = call_unbound_noarg(unbound, del, self);
        if (res == NULL)
            PyErr_WriteUnraisable(del);
        else
            Py_DECREF(res);
        Py_DECREF(del);
    }

    /* Restore the saved exception. */
    PyErr_SetRaisedException(exc);
}

// 调用del方法

lookup_maybe_method(PyObject *self, PyObject *attr, int *unbound)
{
    PyObject *res = _PyType_Lookup(Py_TYPE(self), attr);
    if (res == NULL) {
        return NULL;
    }

    if (_PyType_HasFeature(Py_TYPE(res), Py_TPFLAGS_METHOD_DESCRIPTOR)) {
        /* Avoid temporary PyMethodObject */
        *unbound = 1;
        Py_INCREF(res);
    }
    else {
        *unbound = 0;
        descrgetfunc f = Py_TYPE(res)->tp_descr_get;
        if (f == NULL) {
            Py_INCREF(res);
        }
        else {
            res = f(res, self, (PyObject *)(Py_TYPE(self)));
        }
    }
    return res;
}

在 CPython 中，slotdefs 数组用于将 Python 的特殊方法（如 __getitem__、__add__ 等）映射到类型对象（PyTypeObject）的相应槽位（slots）。注册过程发生在类型创建时，具体步骤如下：

/* 伪代码：简化版类型创建流程 */
PyObject* type_new(...) {
    // 1. 创建类型对象
    PyTypeObject *type = ...;
    
    // 2. 遍历 slotdefs 数组
    for (PySlotDef *slot = slotdefs; slot->name; slot++) {
        // 3. 在类字典中查找特殊方法
        PyObject *method = _PyDict_GetItemId(class_dict, slot->name);
        if (method) {
            // 4. 将方法绑定到槽位
            int res = _PyType_SetSlotFromSpec(type, slot, method);
            // 处理错误...
        }
    }
}

/* Cpython里实际代码 */
/* Update the slots after assignment to a class (type) attribute. */
static int
update_slot(PyTypeObject *type, PyObject *name)
{
    pytype_slotdef *ptrs[MAX_EQUIV];
    pytype_slotdef *p;
    pytype_slotdef **pp;
    int offset;

    assert(PyUnicode_CheckExact(name));
    assert(PyUnicode_CHECK_INTERNED(name));

    pp = ptrs;
    for (p = slotdefs; p->name; p++) {
        assert(PyUnicode_CheckExact(p->name_strobj));
        assert(PyUnicode_CHECK_INTERNED(p->name_strobj));
        assert(PyUnicode_CheckExact(name));
        /* bpo-40521: Using interned strings. */
        if (p->name_strobj == name) {
            *pp++ = p;
        }
    }
    *pp = NULL;
    for (pp = ptrs; *pp; pp++) {
        p = *pp;
        offset = p->offset;
        while (p > slotdefs && (p-1)->offset == offset)
            --p;
        *pp = p;
    }
    if (ptrs[0] == NULL)
        return 0; /* Not an attribute that affects any slots */
    return update_subclasses(type, name,
                             update_slots_callback, (void *)ptrs);
}

核心函数，负责：
1. 验证方法签名
2. 使用包装函数 将 Python 方法转为 C 可调用形式
3. 将函数指针赋给类型槽位（如 type->tp_as_sequence->sq_item）
包装函数的作用（如 wrap_binaryfunc）

将 Python 层面的函数调用适配到底层 C 接口：

static PyObject* wrap_binaryfunc(PyObject *self, PyObject *args) {
    // 解包参数，调用用户定义的 __getitem__ 等
}

CPython 通过 slotdefs 数组在类型创建阶段自动完成特殊方法到槽位的注册：

遍历 slotdefs 匹配类字典中的方法
使用包装函数桥接 Python/C 接口
填充类型对象的槽位函数指针
为未定义的方法设置默认实现

这一机制使得 Python 层面的特殊方法能直接影响对象在解释器中的底层行为，是 CPython 实现面向对象特性的核心机制。

6.2.3 可达对象如何决策的

update_refs(PyGC_Head *containers)
{
    PyGC_Head *next;
    PyGC_Head *gc = GC_NEXT(containers);

    while (gc != containers) {
        next = GC_NEXT(gc);
        /* Move any object that might have become immortal to the
         * permanent generation as the reference count is not accurately
         * reflecting the actual number of live references to this object
         */
        if (_Py_IsImmortal(FROM_GC(gc))) {
           gc_list_move(gc, &get_gc_state()->permanent_generation.head);
           gc = next;
           continue;
        }
        gc_reset_refs(gc, Py_REFCNT(FROM_GC(gc)));
        /* Python's cyclic gc should never see an incoming refcount
         * of 0:  if something decref'ed to 0, it should have been
         * deallocated immediately at that time.
         * Possible cause (if the assert triggers):  a tp_dealloc
         * routine left a gc-aware object tracked during its teardown
         * phase, and did something-- or allowed something to happen --
         * that called back into Python.  gc can trigger then, and may
         * see the still-tracked dying object.  Before this assert
         * was added, such mistakes went on to allow gc to try to
         * delete the object again.  In a debug build, that caused
         * a mysterious segfault, when _Py_ForgetReference tried
         * to remove the object from the doubly-linked list of all
         * objects a second time.  In a release build, an actual
         * double deallocation occurred, which leads to corruption
         * of the allocator's internal bookkeeping pointers.  That's
         * so serious that maybe this should be a release-build
         * check instead of an assert?
         */
        _PyObject_ASSERT(FROM_GC(gc), gc_get_refs(gc) != 0);
        gc = next;
    }
}

将每个对象的 gc_refs 初始化为其 ob_refcnt（原始引用计数）。ob_refcnt 包含了来自 所有来源 的引用，包括：

栈上的变量（函数局部变量）
全局/静态变量
其他代（非当前回收代）的对象
非容器对象（如整数、字符串等）
这些来源共同构成了 GC 的根。
subtract_refs(base)：
遍历当前代（base 链表）的对象，对每个对象引用的 其他同代对象，减少其 gc_refs。这相当于减去了 同代对象间的内部引用。
结果：
若对象最终 gc_refs > 0：表示存在 来自根（外部）的引用。

若 gc_refs = 0：对象仅被同代对象引用（形成孤岛），判定为不可达。

GC 的根roots 包括：
- 栈上的变量（局部变量）
- 全局/静态变量
- 其他代的对象
- 非容器对象
这些根通过 ob_refcnt 的初始值 在 update_refs 中间接捕获，并在 subtract_refs 后通过 gc_refs > 0 体现。显式的根扫描（如遍历栈、全局区）发生在 GC 的其他阶段（如分代回收触发前）。

7. 引用计数机制

7.1 引用计数宏定义

// Include/object.h

// 增加引用计数
static inline void _Py_INCREF(PyObject *op)
{
#ifdef Py_REF_DEBUG
    _Py_RefTotal++;
#endif
    op->ob_refcnt++;
}

// 减少引用计数
static inline void _Py_DECREF(PyObject *op)
{
#ifdef Py_REF_DEBUG
    _Py_RefTotal--;
#endif
    if (--op->ob_refcnt != 0) {
        return;
    }
    _Py_Dealloc(op);  // 引用计数为0时销毁对象
}

#define Py_INCREF(op) _Py_INCREF(_PyObject_CAST(op))
#define Py_DECREF(op) _Py_DECREF(_PyObject_CAST(op))

7.2 对象销毁机制

// Objects/object.c
void
_Py_Dealloc(PyObject *op)
{
    destructor dealloc = Py_TYPE(op)->tp_dealloc;

#ifdef Py_TRACE_REFS
    _Py_ForgetReference(op);
#endif

    // 调用类型特定的析构函数
    (*dealloc)(op);
}

8. 内存布局总结

8.1 对象内存开销

对象类型	固定开销	可变部分	总大小示例
int (小)	28 bytes	0	28 bytes
int (大)	28 bytes	4*n bytes	28+4n bytes
str (ASCII)	49 bytes	1*n bytes	49+n bytes
str (Unicode)	80 bytes	4*n bytes	80+4n bytes
list	56 bytes	8*cap bytes	56+8*cap bytes
dict	240 bytes	24*n bytes	240+24n bytes

8.2 内存对齐和填充

// 内存对齐宏
#define _PyObject_SIZE(typeobj) ( (typeobj)->tp_basicsize )
#define _PyObject_VAR_SIZE(typeobj, nitems) \
    (size_t) \
    ( ( (typeobj)->tp_basicsize + \
        (nitems)*(typeobj)->tp_itemsize + \
        (SIZEOF_VOID_P-1) \
      ) & ~(SIZEOF_VOID_P-1) \
    )

9 执行流程

graph TD
    A[开始回收] --> B[合并年轻代]
    B --> C[标记可达对象]
    C --> D[分离不可达对象]
    D --> E[处理终结器对象]
    E --> F[处理弱引用]
    F --> G[执行__del__方法]
    G --> H[检查复活对象]
    H --> I[回收最终不可达对象]
    I --> J[处理不可回收对象]
    J --> K[更新统计信息]
    K --> L[结束]