当前位置: 代码迷 >> 综合 >> 索引的Clustering Factor
  详细解决方案

索引的Clustering Factor

热度:26   发布时间:2023-12-14 21:35:26.0

Clustering Factor:集群因子(聚簇因子)用来描述一个表中的列是否是规则排序的

Clustering Factor的算法如下:我们知道可以通过dbms_rowid.rowid_block_number(rowid)找到记录对应的block号。索引中记录了rowid,因此oracle就可以根据索引中的rowid来判断记录是否是在同一个block中。举个例子,比如说索引中有a,b,c,d,e五个记录,首先比较a,b是否在同一个block,如果不在同一个block那么Clustering Factor +1,然后继续比较b,c同理,如果b,c不在同一个block,那么Clustering Factor+1,这样一直进行下去,直到比较了所有的记录。

    根据算法我们就可以知道clustering factor的值介于block数和表行数之间。如果clustering factor接近block数,说明表的存储和索引存储排序接近,也就是说表中的记录很有序,这样在做index range scan 的时候能,读取少量的data block就能得到我们想要的数据,代价比较小。如果clustering factor接近表记录数,说明表的存储和索引排序差异很大,在做index range scan的时候,会额外读取多个block,因为表记录分散,代价较高。

 

下面来看一个例子:

SQL> desc test;
 名称                                                                                                      是否为空? 类型
 ----------------------------------------------------------------------------------------------------------------- -------- ----------------

 OWNER                                                                                                              VARCHAR2(30)
 OBJECT_NAME                                                                                                        VARCHAR2(128)
 SUBOBJECT_NAME                                                                                                     VARCHAR2(30)
 OBJECT_ID                                                                                                 NOT NULL NUMBER
 DATA_OBJECT_ID                                                                                                     NUMBER
 OBJECT_TYPE                                                                                                        VARCHAR2(19)
 CREATED                                                                                                            DATE
 LAST_DDL_TIME                                                                                                      DATE
 TIMESTAMP                                                                                                          VARCHAR2(19)
 STATUS                                                                                                             VARCHAR2(7)
 TEMPORARY                                                                                                          VARCHAR2(1)
 GENERATED                                                                                                          VARCHAR2(1)
 SECONDARY                                                                                                          VARCHAR2(1)

SQL> select index_name,clustering_factor from user_indexes where table_name='TEST';

INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
NAMETYPE                                   25007
OBJID                                        730

SQL> select index_name,column_name,column_position from user_ind_columns;

INDEX_NAME                     COLUMN_NAME          COLUMN_POSITION
------------------------------ -------------------- ---------------
OBJID                          OBJECT_ID                          1
NAMETYPE                       OBJECT_TYPE                        2
NAMETYPE                       OBJECT_NAME                        1

SQL> select count(distinct dbms_rowid.rowid_block_number(rowid)) from test;

COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------------------------------------------------
                                                684

SQL> select count(*) from test;

  COUNT(*)
----------
     49947

可以看到test表一共有49949行,存储在684个block,索引OBJID建立在object_id列上,它的值与块数接近,说明表中的object_id很有可能是排过序的,我们查一下前10行验证下

SQL> select count(distinct dbms_rowid.rowid_block_number(rowid)) from test where object_id<11;

COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------------------------------------------------
                                                  1

我们看到从object_id=1到object_id=10都在同一个block,说明object_id这列经过了适当排序,那么我们在对object_id这一列进行index range scan的时候就可以少读很多block,就能把结果检索出来。

 

索引NAMETYPE建立在object_name,object_type上,并且以object_name列为主导列,它的clustering factor接近行数,说明object_name没有经过适当排序,是分散的,在进行index range scan的时候 会读取较多的block.

现在我将OBJID索引反序

SQL> alter index objid rebuild reverse;

索引已更改。

SQL> select index_name,clustering_factor from user_indexes;

INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
SYS_IL0000052108C00036$$
OBJID                                      49945
NAMETYPE                                   25007

由于反序了索引,此时clustering factor必然很高,因为表中objid是有序的,而索引是反序的

这里也得到了一点提示,如果索引发生了热点块(cache buffers chains)竞争,可以建立一个reverse索引,不过带来的却是index range scan的花销,需要权衡利弊,看看是否有其他解决方案。

 

  相关解决方案