上期我们简单介绍了一下PostgreSQL的执行计划关键字解读,这期根据一个实例进行sql执行计划的解读及优化。
实例
我们先看sql,根据现场反馈修改其他c_dbbh后,sql执行较快,唯独这一个c_dbbh需要一分钟才会出结果。
SELECT COUNT(1)
FROM db_test.t_zh_axx ajLEFT JOIN (SELECT n_ccsl,c_ajbh,c_zblx,c_laay FROM db_test.t_zh_zjxx A WHERE
c_bh = (SELECT c_bh FROM db_test.t_zh_zjxx b WHERE b.c_ajbh = A.c_ajbh AND b.c_zblx = '0050002' ORDER BY dt_cjsj DESC NULLS LAST LIMIT 1)
) zb ON zb.c_ajbh = aj.c_ajbh
WHEREc_dbbh = '0191H4325678D8172F58EE383720D0A9' AND zb.c_zblx = '0050002' AND aj.c_zy = '1628'
--涉及表的数据量
--db_test.t_zh_axx 1200万+
--db_test.t_zh_zjxx 1900万+
有经验的同学第一眼看到sql是可能就会发现sql编写有些问题,这里暂时先不管。先看执行计划为什么慢,在进行sql优化。
Aggregate (cost=3214.11..3214.12 rows=1 width=8) (actual time=61097.734..61097.734 rows=1 loops=1)-> Nested Loop (cost=743.92..3214.11 rows=1 width=0) (actual time=8.702..61097.110 rows=1461 loops=1)-> Bitmap Heap Scan on t_zh_axx aj (cost=743.36..763.39 rows=5 width=17) (actual time=8.585..11.327 rows=1461 loops=1)Recheck Cond:((c_dbbh = '0191H4325678D8172F58EE383720D0A9'::bpchar) AND ((c_zy)::text = '1801'::text))Heap Blocks:exact=720-> BitmapAnd (cost=743.36..743.36 rows=5 width=0)(actual time=8.479..8.479 rows=0 loops=1)-> Bitmap Index Scan on i_t_zh_axx_c_dbbh_c_cbfy (cost=0.00..70.63 rows=1343 width=0)(actual time=0.766..0.766 rows=0 loops=1)Index Cond:(c_dbbh = '0191H4325678D8172F58EE383720D0A9'::bpchar)-> Bitmap Index Scan on i_t_zh_axx_zblx_c_zy (cost=0.00..672.47 rows=36272 width=0)(actual time=7.644..7.644 rows=0 loops=1)Index Cond:((c_zy)::text = '1628'::text)-> Index Scan using i_t_zh_zjxx_c_ajbh on t_zh_zjxx a (cost=0.56..190.13 rows=1 width=16) (actual time=20.069..41.822 rows=1 loops=1461)Index Cond: ((c_ajbh)::text = (aj.c_ajbh)::text)Filter: (((c_zblx)::text = '0020002'::text) AND (c_bh = (SubPlan 1)))Rows Removed by Filter: 167SubPlan 1-> Limit (cost=44.55..44.56 rows=1 width=41) (actual time=0.247..0.248 rows=1 loops=245603)-> Sort (cost=44.55..44.56 rows=1 width=41) (actual time=0.247..0.247 rows=1 loops=245603)Sort Key: b.dt_cjsj DESC NULLS LASTSort Method: top-N heapsort Memory: 25kB-> Index Scan using i_t_zh_zjxx_c_ajbh on t_zh_zjxx b (cost=0.56..54.82 rows=1 width=41) (actual time=0.017..0.071 rows=38 loops=245603)Index Cond: ((c_ajbh)::text = (a.c_ajbh)::text)Filter: ((c_zblx)::text = '0020002'::text)Rows Removed by Filter: 114
Planning Time: 1.267 ms
Execution Time: 61097.876 ms
仔细观察执行计划后发现,占用时间最多的在第2行Nested Loop(actual time=8.702…61097.110),嵌套循环占用了一分钟,然后在16 -> 20行看到(loops=245603),循环了24.5万次。
上篇文章可以了解到嵌套循环:其中一个表扫描一次,另一个表则循环多次。这里基本可以确定问题了,找开发确认发现是确实是数据问题,原因是重复上报导致。
改写1
处理掉垃圾数据后,再来看下原sql是否有优化空间,进行sql改写(以下测试数据采用公司测试环境)。
--在测试环境原sql执行需要2s,勉强符合公司标准
db_zxzh_jy=# SELECT count(1) from db_test.t_zh_axx aj left join (select n_ccsl,c_ajbh,c_zblx,c_laay from db_test.t_zh_zjxx a where c_bh = (SELECT c_bh
from db_test.t_zh_zjxx b where b.c_ajbh=a.c_ajbh and b.c_zblx = '0020002' order by dt_cjsj desc NULLS LAST LIMIT 1) ) zb on zb.c_ajbh = aj.c_ajbh where c_dbbh = '8B7D8C93864E0D0C3E3259C49ED65471' and zb.c_zblx = '0020002' and aj.c_zy = '1801';count
-------1356
(1 行记录)
Time: 2227.680 ms (00:02.228)
原sql中left join是一个自关联,取相同c_ajbh中dt_cjsj时间最大的一条与t_zh_axx进行关联。
--在第一次修改过程中陷入原sql的固定模式,想把自连接改掉,直接取相同c_ajbh中时间最大的一条
explain analyze
select count(*)
from db_test.t_zh_axx aj
join (select c_ajbh,c_zblx,max(dt_cjsj) from db_test.t_zh_zjxx group by c_ajbh,c_zblx) zbajxx on zbajxx.c_ajbh = aj.c_ajbh
where aj.c_dbbh = '8B7D8C93864E0D0C3E3259C49ED65471'
and aj.c_zy = '1801'
and zbajxx.c_zblx = '0020002'--改完发现更慢,根据执行计划进行解读
--执行计划选择的是Merge Join,这种join方式是需要两个表关联字段事先排序
--可以看到两个表的Sort Key是c_ajbh,t_zh_zjxx表没用到索引,先进性全表扫描,然后进行排序,然后是Merge Join
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=1046565.51..1046565.52 rows=1 width=8) (actual time=24074.583..24074.583 rows=1 loops=1)-> Merge Join (cost=1012316.07..1046565.50 rows=6 width=0) (actual time=24057.923..24074.497 rows=1356 loops=1)Merge Cond: ((t_zh_zjxx.c_ajbh)::text = (aj.c_ajbh)::text)-> GroupAggregate (cost=1007105.01..1028294.00 rows=1044828 width=30) (actual time=23062.977..24014.438 rows=176529 loops=1)Group Key: t_zh_zjxx.c_ajbh, t_zh_zjxx.c_zblx-> Sort (cost=1007105.01..1010685.25 rows=1432095 width=22) (actual time=23062.958..23895.947 rows=347291 loops=1)Sort Key: t_zh_zjxx.c_ajbhSort Method: external merge Disk: 45504kB-> Seq Scan on t_zh_zjxx (cost=0.00..831303.47 rows=1432095 width=22) (actual time=0.031..9310.987 rows=1366950 loops=1)Filter: ((c_zblx)::text = '0020002'::text)Rows Removed by Filter: 18389119-> Sort (cost=5211.06..5211.07 rows=6 width=17) (actual time=4.402..4.524 rows=1461 loops=1)Sort Key: aj.c_ajbhSort Method: quicksort Memory: 117kB-> Index Scan using t_zh_axx_c_dbbh_idx on t_zh_axx aj (cost=0.56..5210.98 rows=6 width=17) (actual time=0.099..1.896 rows=1461 loops=1)Index Cond: (c_dbbh = '8B7D8C93864E0D0C3E3259C49ED65471'::bpchar)Filter: ((c_zy)::text = '1801'::text)Planning Time: 1.547 msExecution Time: 24082.631 ms
改写2
我们改变下思路,再看下原sql想要得到什么结果。SQL中只有c_zblx = '0020002’这一个条件,随机选一条即可,并不必须是dt_cjsj创建时间最大的一条。
--用exists改写
explain analyze
select count(*) from db_test.t_zh_axx aj
where exists (select 1 from db_test.t_zh_zjxx zbajxx where zbajxx.c_ajbh=aj.c_ajbh and c_zblx = '0020002')
and aj.c_dbbh = '8B7D8C93864E0D0C3E3259C49ED65471'
AND aj.c_zy = '1801' --执行计划都走索引
--值得关注的是join的方式是Nested Loop Semi Join,多了个Semi。Semi Join支持支持hash, merge, nestloop几种JOIN方法
--semi Join的操作在EXISTS中有一个返回TRUE的操作即可,所以在有索引的情况下很大概率下并不需要全表扫描
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=5560.83..5560.84 rows=1 width=8) (actual time=38.158..38.158 rows=1 loops=1)-> Nested Loop Semi Join, (cost=1.12..5560.82 rows=6 width=0) (actual time=0.072..37.912 rows=1356 loops=1)-> Index Scan using t_zh_axx_c_dbbh_idx on t_zh_axx aj (cost=0.56..5210.98 rows=6 width=17) (actual time=0.042..1.315 rows=1461 loops=1)Index Cond: (c_dbbh = '8B7D8C93864E0D0C3E3259C49ED65471'::bpchar)Filter: ((c_zy)::text = '1801'::text)-> Index Scan using i_t_zh_zjxx_c_ajbh on t_zh_zjxx zbajxx (cost=0.56..58.50 rows=1 width=16) (actual time=0.024..0.024 rows=1 loops=1461)Index Cond: ((c_ajbh)::text = (aj.c_ajbh)::text)Filter: ((c_zblx)::text = '0020002'::text)
Planning Time: 0.852 ms
Execution Time: 38.255 ms
优化后速度大大提升,验证多个编号结果也无问题。
小结
通过看执行计划我们能得到很多有用的信息,根据这些信息找到相应的应对策略。