Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask 论文阅读笔记
看了 Ref[1] 和 Ref[2],基本就差不多了
Vectorization:materialization 开销,可以利用 SIMD 并行数据操作,最好是 column store
Code gen:指令数少,利于计算密集型
- join (memory bound):向量化快
- memory load 消耗 CPU cycle,向量化减少 cache miss
- computation (CPU intensive task):code gen 快
- cache 压力小,code gen 指令数少,高效利用 register
- selection 使用 SIMD
- 越多 select,越稀疏,column 上 offset 越大,导致 cache miss
消除分支的操作:a>b?1:0
可以被写成没有分支的语句 setg
Reference
- 在 2019.4.20 杭州举办的 Infra Meetup No.98 上,我司 TiDB 研发工程师徐怀宇为大家带来了《Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask》论文分享。
- CMU 15-721 Advanced Database Systems (Spring 2018)
- A Deep Dive into Query Execution Engine of Spark SQL - Maryann Xue - You Tube
- Vectorization vs. compilation in query execution