当前位置: 代码迷 >> SQL >> 走主键索引的查询sql变慢的有关问题
  详细解决方案

走主键索引的查询sql变慢的有关问题

热度:25   发布时间:2016-05-05 14:16:41.0
走主键索引的查询sql变慢的问题

问题描述
数据库hang,大量latch: cache buffers chains异常等待事件 业务慢,主机CPU达到100%

EventWaitsTime(s)Avg wait (ms)% DB timeWait Class
latch: cache buffers chains644,380321,72049947.19Concurrency
log file sync1,324,320173,15213125.40Commit
DB CPU 109,560 16.07 
latch free20,2814,2062070.62Other
db file sequential read254,6203,612140.53User I/O

执行慢的语句几乎都是一条undate语句,该语句执行走主键索引。该语句执行计划一直都没有变化,且为主键索引,性能一直较好。

5y22vy8jp06yz UPDATE service_deal_info SET server_info = :1, finishserv_time = to_char(systimestamp, 'yyyy-MM-dd HH24:MI:SS.ff3'), response_info = :2, restransaction_id = :3 WHERE orchestration_id = :4 AND business_id = :5

[10:25:48]----------------------------------------------------------[10:25:48]| Id  | Operation                    | Name                     | Rows  | Bytes | Cost (%CPU)|[10:25:48]----------------------------------------------------------[10:25:48]|   0 | UPDATE STATEMENT             |                          |       |       |     1 (100)|[10:25:48]|   1 |  UPDATE                      |   SERVICE_DEAL_INFO    |       |       |            |[10:25:48]|   2 |   TABLE ACCESS BY INDEX ROWID| SERVICE_DEAL_INFO    |     1 |  2367 |     0   (0)|[10:25:48]|   3 |    INDEX UNIQUE SCAN         | PK_SERVICE_DEAL_INFO |     1 |       |     0   (0)|[10:25:48]----------------------------------------------------------

同时,发现此时还有一个delete操作,正在对改表做delete操作。初步判断改delete对其它undate有影响

3xccqv0u3f49x DELETE FROM SERVICE_DEAL_INFO WHERE 1=1 AND CALLSERV_TIME BETWEEN '2012-07-17 00:00:00.000' AND '2012-07-17 23:59:59.999'。

经过分析,比较了有无此delete时的数据库性能变化

无此删除操作时,平均逻辑读为15.47
Buffer Gets     Executions    Gets per Exec     %Total    Elapsed Time (s)    %CPU    %IO    SQL Id    SQL Module    SQL Text
1,155,588    74,721    15.47    2.81    52.61    84.2    13.3    5y22vy8jp06yz     [email protected] (TNS V1-V3)     UPDATE service_deal_info S...
有delete操作时,平均逻辑读是31,829.71,增加了2000倍,整个过程中直接计划并没有变化
Buffer Gets     Executions    Gets per Exec     %Total    Elapsed Time (s)    %CPU    %IO    SQL Id    SQL Module    SQL Text
2,093,885,334    65,784    31,829.71    9.86    251,438.86    21.2    0    5y22vy8jp06yz     [email protected] (TNS V1-V3)     UPDATE service_deal_info S...


delete操作对update操作为什么有这么大的影响呢?下面在测试机上做了模拟实验,版本是11gr2

create table as select * from dba_objectscreate index idx_t  on t(object_id);SQL> select * from t where object_id=100;OWNER                          OBJECT_NAME                                                                                  SUBOBJECT_NAME------------------------------ -------------------------------------------------------- ------------------------------ OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE         CREATED             LAST_DDL_TIME       TIMESTAMP           STATUS  T G S  NAMESPACE EDITION_NAME---------- -------------- ------------------- ------------------- ------------------- ------------------- ------- - - - ---------- ------------------------------SYS                            ORA$BASE       100                EDITION             2012-07-10 15:27:10 2012-07-10 15:28:10 2012-07-10:15:27:10 VALID   N N N     64----------------------Plan hash value: 1594971208-------------------------------------------------| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |-------------------------------------------------|   0 | SELECT STATEMENT            |       |   128 | 11392 |   141   (0)| 00:00:02 ||   1 |  TABLE ACCESS BY INDEX ROWID| T     |   128 | 11392 |   141   (0)| 00:00:02 ||*  2 |   INDEX RANGE SCAN          | IDX_T |   137 |       |     3   (0)| 00:00:01 |-------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   2 - access("OBJECT_ID"=100)í3??D??¢----------------------          0  recursive calls          0  db block gets          4  consistent gets          0  physical reads          0  redo size       1612  bytes sent via SQL*Net to client        520  bytes received via SQL*Net from client          2  SQL*Net roundtrips to/from client          0  sorts (memory)          0  sorts (disk)          1  rows processedSQL> delete t where object_id >100;ò?é?3y13643DD?£

再次查询表t,逻辑读变为83,并且产生了172的redo size,且多次查询逻辑读均为83,且有了少量redo

SQL> select * from t where object_id=100;OWNER                          OBJECT_NAME                                                                                  SUBOBJECT_NAME------------------------------ -------------------------------------------------------- ------------------------------ OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE         CREATED             LAST_DDL_TIME       TIMESTAMP           STATUS  T G S  NAMESPACE EDITION_NAME---------- -------------- ------------------- ------------------- ------------------- ------------------- ------- - - - ---------- ------------------------------SYS                            ORA$BASE       100                EDITION             2012-07-10 15:27:10 2012-07-10 15:28:10 2012-07-10:15:27:10 VALID   N N N     64?′DD????----------------------Plan hash value: 1594971208-------------------------------------------------| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |-------------------------------------------------|   0 | SELECT STATEMENT            |       |   128 | 11392 |   141   (0)| 00:00:02 ||   1 |  TABLE ACCESS BY INDEX ROWID| T     |   128 | 11392 |   141   (0)| 00:00:02 ||*  2 |   INDEX RANGE SCAN          | IDX_T |   137 |       |     3   (0)| 00:00:01 |-------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   2 - access("OBJECT_ID"=100)í3??D??¢----------------------          0  recursive calls          0  db block gets         83  consistent gets          0  physical reads        172  redo size       1612  bytes sent via SQL*Net to client        520  bytes received via SQL*Net from client          2  SQL*Net roundtrips to/from client          0  sorts (memory)          0  sorts (disk)          1  rows processed09:57:50 SQL> /-------------------------------------------------| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |-------------------------------------------------|   0 | SELECT STATEMENT            |       |   128 | 11392 |   141   (0)| 00:00:02 ||   1 |  TABLE ACCESS BY INDEX ROWID| T     |   128 | 11392 |   141   (0)| 00:00:02 ||*  2 |   INDEX RANGE SCAN          | IDX_T |   137 |       |     3   (0)| 00:00:01 |-------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   2 - access("OBJECT_ID"=100)í3??D??¢----------------------          0  recursive calls          0  db block gets         83  consistent gets          0  physical reads        128  redo size       1612  bytes sent via SQL*Net to client        520  bytes received via SQL*Net from client          2  SQL*Net roundtrips to/from client          0  sorts (memory)          0  sorts (disk)          1  rows processed

实验看,应该是批量delete操作,导致了select操作的逻辑读变化很大,增加了20倍。为什么同样的操作,逻辑读差别这么大呢,很明显和delete操作有关。delete操作会产生大量的undo段,这undo被缓存在内存中,此时有448个undo块

SQL> @showundoSESS         User Name  STA Machine         OS  User   Process   Start Time  STA      UBLK      UREC UNDO------------ ---------- --- --------------- ---------- --------- ----------- --- --------- --------- ------------493,2841     SYS        INA dtydb3          oracle     5401      07/19  11:1 ACT       448     14464 _SYSSMU14_39

为了弄清原因,使用10046 trace进行跟踪分析

-- 开启级别为12的Trace,level后面的数字设置了Trace的级别ALTER SESSION SET EVENTS '10046 trace name context forever, level 12'--为了方面跟踪,刷新内存中的数据alter system flush buffer_cache;--再次运行select查询select * from t where object_id=100;
查看trace文件发现如下内容,该select语句要对一些数据库做物理读

*** 2012-07-19 13:53:21.117WAIT #47325232720248: nam='SQL*Net message from client' ela= 77532123 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1342677201117155CLOSE #47325232720248:c=0,e=26,dep=0,type=3,tim=1342677201117394PARSE #47325232720248:c=999,e=100,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1594971208,tim=1342677201117583EXEC #47325232720248:c=0,e=115,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1594971208,tim=1342677201117748WAIT #47325232720248: nam='SQL*Net message to client' ela= 12 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1342677201117838WAIT #47325232720248: nam='gc cr grant 2-way' ela= 494 p1=1 p2=56761 p3=1 obj#=14679 tim=1342677201118711WAIT #47325232720248: nam='db file sequential read' ela= 703 file#=1 block#=56761 blocks=1 obj#=14679 tim=1342677201119677WAIT #47325232720248: nam='gc cr grant 2-way' ela= 355 p1=1 p2=56762 p3=1 obj#=14679 tim=1342677201120387WAIT #47325232720248: nam='db file sequential read' ela= 746 file#=1 block#=56762 blocks=1 obj#=14679 tim=1342677201121279WAIT #47325232720248: nam='gc current grant 2-way' ela= 277 p1=1 p2=56762 p3=33619969 obj#=14679 tim=1342677201121938WAIT #47325232720248: nam='db file sequential read' ela= 648 file#=4 block#=2834 blocks=1 obj#=0 tim=1342677201123117WAIT #47325232720248: nam='gc cr grant 2-way' ela= 440 p1=1 p2=32250 p3=1 obj#=13803 tim=1342677201124671WAIT #47325232720248: nam='db file sequential read' ela= 576 file#=1 block#=32250 blocks=1 obj#=13803 tim=1342677201125414WAIT #47325232720248: nam='gc current grant 2-way' ela= 1205 p1=1 p2=32250 p3=33619969 obj#=13803 tim=1342677201126887WAIT #47325232720248: nam='db file sequential read' ela= 8551 file#=4 block#=155526 blocks=1 obj#=0 tim=1342677201136316WAIT #47325232720248: nam='db file sequential read' ela= 553 file#=4 block#=155525 blocks=1 obj#=0 tim=1342677201137873WAIT #47325232720248: nam='db file sequential read' ela= 556 file#=4 block#=155524 blocks=1 obj#=0 tim=1342677201140705FETCH #47325232720248:c=10999,e=24598,p=7,cr=82,cu=0,mis=0,r=1,dep=0,og=1,plh=1594971208,tim=1342677201142513WAIT #47325232720248: nam='SQL*Net message from client' ela= 744 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1342677201143371FETCH #47325232720248:c=0,e=34,p=0,cr=1,cu=0,mis=0,r=0,dep=0,og=1,plh=1594971208,tim=1342677201143493WAIT #47325232720248: nam='SQL*Net message to client' ela= 7 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1342677201143622
继续查看,物理读的对象除了表T和相关索引,还有undo数据块

SQL> select FILE_NAME,file_id from dba_data_files where file_id in(1,4);FILE_NAME                                                FILE_ID---------------------------------------------- +DATA/hrdb/datafile/system.374.788282825                 1+DATA/hrdb/datafile/undotbs2.378.788282973               4SELECT SEGMENT_TYPE,OWNER||'.'||SEGMENT_NAME FROM DBA_EXTENTS WHERE  4 = FILE_ID AND 155524 BETWEEN BLOCK_ID AND BLOCK_ID+BLOCKS -1;SQL> SELECT SEGMENT_TYPE,OWNER||'.'||SEGMENT_NAME FROM DBA_EXTENTS WHERE  4 = FILE_ID AND 155526 BETWEEN BLOCK_ID AND BLOCK_ID+BLOCKS -1;SEGMENT_TYPE       OWNER||'.'||SEGMENT_NAME------------------ ----------------------------------------TYPE2 UNDO         SYS._SYSSMU20_2938845397$SQL> SELECT SEGMENT_TYPE,OWNER||'.'||SEGMENT_NAME FROM DBA_EXTENTS WHERE  4 = FILE_ID AND 155525 BETWEEN BLOCK_ID AND BLOCK_ID+BLOCKS -1;SEGMENT_TYPE       OWNER||'.'||SEGMENT_NAME------------------ ----------------------------------------TYPE2 UNDO         SYS._SYSSMU20_2938845397$SQL> SELECT SEGMENT_TYPE,OWNER||'.'||SEGMENT_NAME FROM DBA_EXTENTS WHERE  4 = FILE_ID AND 155524 BETWEEN BLOCK_ID AND BLOCK_ID+BLOCKS -1;SEGMENT_TYPE       OWNER||'.'||SEGMENT_NAME------------------ ----------------------------------------TYPE2 UNDO         SYS._SYSSMU20_2938845397$

总结:原因就很明显了,当批量删除数据时,如果另一个会话需要读写该表时,由于没有提交,为了保持数据的一致性,需要读取undo数据库进行数据库的还原操作。而这些读的操作往往是之前逻辑读的几十倍甚至上千倍,如果修改数据的事务长时间没有提交,会严重影响其它对改表的语句的性能。

对于大事务,特别是更新或DELETE数千万记录的大事务,在生产系统上尽量避免单条SQL一次性做。这造成的影响特别大,比如:

  • 事务可能意外中断,回滚时间很长,事务恢复时过高的并行度可能引起负载增加。
  • 表中大量的行长时间被锁住。
  • 如果事务意外中断,长时间的回滚(恢复)过程中,可能严重影响SQL性能(因为查询时需要回滚块)。
  • 事务还未提交时,影响SQL性能,比如本文中提到的情况。
  • 消耗过多UNDO空间。
  • 对于DELETE大事务,有些版本的oracle在空闲空间查找上会有问题,导致在INSERT数据时,查找空间导致过长的时间。
  • 对于RAC数据库,由于一致性读的代价更大,所以大事务的危害更大。
参考文档
http://www.laoxiong.net/full-table-scan-but-lots-of-db-file-sequential-read%E4%B8%80%E4%BE%8B.html

  相关解决方案