7.3?“New” GROUP BY Functionality “新”GROUP BY 功能?? ?????????? (page 197)
??? At times, it’s necessary to write SQL that appears as unruly as the convoluted example in Listing 7-5 so that the desired output can be obtained.? The need for writing such unwieldy SQL has become much less frequent due to the advanced functionality Oracle has included in SQL the past few years.? Much of what will be covered in this chapter is not actually new; it has been available for quite some time.
??? 很多时候,写出的像列表7-5中复杂例子那么难整的SQL是必要的,只有这样才能得到想要的输出。由于Oracle在过去几年引入的高级功能,写出这样难控制的?SQL已经变得很不常见了?。在本章中所阐述的很多内容实际上也不是什么新鲜玩意;已经有效了很长时间了。
You can start exploring some of the advanced grouping functionality in the Oracle database by
experimenting with the CUBE and ROLLUP extensions to GROUP BY, and the GROUPING function. It takes a little effort to get started, as the benefits of newer functionality are not always clear until you spend some time learning to use them.?
??? 你探索ORACLE数据库的高级分组功能,可从GROUP BY的CUBE和ROLLUP扩展,以及GROUPING函数开始。入门容易,但是新功能的优势只有在你发了很多时间学习使用它们才能体会。
7.4CUBE Extension to GROUP BY?? ? ? GROUP BY的CUBE扩展
??? The CUBE extension is not exactly a newcomer to Oracle. It was first introduced in Oracle 8i in 1999. When used with a GROUP BY clause, it will cause all possible combinations of the elements included in the arguments to CUBE to be considered for each row. This operation will generate more rows than actually exist in the table. (If there are no rows in the table, GROUP BY CUBE() will return 0 rows. ?)?
? ? CUBE扩展(函数)绝不是全新的新概念才引入Oralce的。1999年时Oracle 8i首先引入。当它用于GROUP BY子句时,将使包含在CUBE中的参数作为元素互相组合,(每种组合)视为一行。这种运算将生成比表中实际存在的更多的行。(如果表中没有一行数据,GROUP BY CUBE()将返回0行。)
??? Let’s look at an example that generates all possible combinations of FIRST_NAME and LAST_NAME for each row in the HR.EMPLOYEES table.? The CUBE function was intended for use in generating cross-tab reports with lots of numbers and dollar signs.? When trying to understand new functionality, I find it helps to dumb down the SQL a bit so I can see what’s going on without getting distracted with subtotals. Examine Listing 7-8 to see the results of using CUBE as described with the HR.EMPLOYEES table.? You will see that there are three rows returned for most employees.? In other words, there are 301 rows returned, even though there are only 107 rows in the table. ?
??? 让我们看一个例子,对于HR.EMPLOYEES表的每一行,生成所有可能的FIRST_NAME和LAST_NAME组合。CUBE函数的意图是用来生成带有很多数字和美元符号的交叉表格报告。当试图理解一项新功能,我发现把SQL写的“傻瓜化”一些是有帮助的,这样我能看清发生了什么,而不是渐渐对小计(求和)搞得烦躁。考察列表7-8检查所述对HR.EMPLOYEES表使用CUBE的结果。你将发现对于大多数雇员会返回三行。换句话说,返回了 301行,而不是表中仅有的107行。
Listing 7-8. CUBE Operation on HR.EMPLOYEES??????? HR.EMPLOYEES上的CUBE运算
SQL> set autotrace on statistics
?
? 1? with emps as (
? 2???? select /*+ gather_plan_statistics */
? 3???????????? last_name
? 4???????????? , first_name
? 5???? from hr.employees
? 6???? group by cube(first_name,last_name)
? 7? )
? 8? select rownum
? 9???? , last_name
?10???? , first_name
?11? from emps;
??
??? ROWNUM LAST_NAME???????????????? FIRST_NAME
---------- ------------------------- --------------------
???????? 1
???????? 2?????????????????????????? Ki
???????? 3?????????????????????????? TJ ?
???????? 4?????????????????????????? Den
???????? 5?????????????????????????? Guy
???????? 6?????????????????????????? Lex
???????? 7?????????????????????????? Pat?
…
?????? 231 Vargas
?????? 232 Vargas??????????????????? Peter
?????? 233 Whalen
?????? 234 Whalen??????????????????? Jennifer
?????? 235 De Haan
?????? 236 De Haan?????????????????? Lex
?????? 237 Everett
?????? 238 Everett?????????????????? Britney
…
301 rows selected. ?
Statistics
---------------------------------------------------
??????? 759? recursive calls
????????? 0? db block gets
??????? 188? consistent gets
????????? 9? physical reads
????????? 0? redo size
?????? 5990? bytes sent via SQL*Net to client
??????? 557? bytes received via SQL*Net from client
????????? 5? SQL*Net roundtrips to/from client
????????? 7? sorts (memory)
????????? 0? sorts (disk)
??????? 301? rows processed
?
PLAN_TABLE_OUTPUT
---------------------------------------
| Id? | Operation????????????????????????????????????? | Name??????????????? | Starts | E-Rows | A-Rows |
---------------------------------------
|?? 0 | SELECT STATEMENT???????????????? |??????????? ? ? ????????? |????? 1 |?????? ? ? ? ?? |??? 301 |
|?? 1 |? COUNT??????????????????????????????????????? |????????????????? ? ? ? ? |????? 1 |?????? ? ? ? ? ? |??? 301 |
|?? 2 |?? VIEW?????????????????????????????????????????? |?????????????? ? ? ? ? ?? |????? 1 |??? 107 ? ?? |??? 301 |
|?? 3 |??? SORT GROUP BY??????? ? ? ? ? ? ?? |???????????????????????? |????? 1 |??? 107 ? ?? |??? 301 |
|?? 4 |???? GENERATE CUBE??????????????????? |???????????????????????? |????? 1 |??? 107????? |??? 428 |
|?? 5 |????? SORT GROUP BY NOSORT? |????????????????????????? |????? 1 |??? 107????? |??? 107 |
|?? 6 |?????? INDEX FULL SCAN??????????????? | EMP_NAME_IX |????? 1 |??? 107???? |??? 107 |
---------------------------------------
??? Table 7-1 shows why there are three rows returned for each name pair.? For each LAST_NAME,
FIRST_NAME pair, CUBE will substitute NULL for each element in turn. The rows generated by CUBE are
referred to in the Oracle documentation as superaggregate rows, which are recognizable by the NULL
values placed in the columns being operated in.? The results described in Table 7-1 appear in the output
in Listing 7-8 due to the GROUP BY CUBE(FIRST_NAME,LAST_NAME) operation.?? ?
??? 表7-1展示了为什么每一名字对将返回3行。对每一LAST_NAME,FIRST_NAME对,CUBE将轮流用NULL替换每一元素。这些被CUBE所产生的行在Oracle文档中被称之为超聚合行 ,通过(查看)在运算中哪些列被设置成NULL就可以识别出。由于GROUP BY CUBE(FIRST_NAME,LAST_NAME) 的运算,?表7-1所述的结果出现在了列表7-8的输出中。
Table 7-1. CUBE Operation
First Name???????????? Last Name
Vance????????????????????? Jones
Vance????????????????????? NULL
NULL????????????????????? Jones
?
??? Did you notice that the first row returned in Listing 7-8 contained NULL for both LAST_NAME and
FIRST_NAME?? When considering all possible combinations of a pair of arguments to CUBE, as seen in
Listing 7-8, there is a combination of (NULL, NULL) that is returned for each row in the GENERATE CUBE
operation. These 428 rows are then processed by the SORT GROUP BY operation, which removes all but
one of the NULL pair of columns to produce the final 301 rows to satisfy the query.
??? 你是否注意到列表7-8返回的第一行,对于LAST_NAME和FIRST_NAME都是NULL。当考虑对CUBE一对参数的所有可能组合,如列表 7-8所见,GENERATE CUBE运算对每一行都返回一个(NULL,NULL)的组合。有428行被SOR GROUP BY操作处理,它只保留了一行,其它都去除,最终生成了满足查询的301行。
??? Knowing how CUBE operates, you can predict how many rows should be created when using GROUP BY CUBE.? Listing 7-9 shows that the number of rows returned can be predicted by adding together the count for three different distinct combinations of names, and adding 1 to that to account for the null pair. ?
?? 知道CUBE是如何运算的了,你就能预测使用GROUP BY CUBE将生成多少行数据。列表7-9展示了,通过汇总三种不同的distinct名字组合count值,外加1(计入null值对)。
Listing 7-9. Predicting CUBE Return Rows
? 1? with counts as (
? 2???? select
? 3???????????? count(distinct first_name) first_name_count
? 4???????????? , count(distinct last_name) last_name_count
? 5???????????? , count(distinct(first_name||last_name)) full_name_count
? 6???? from hr.employees
? 7? )
? 8? select
? 9???? first_name_count
?10???? , last_name_count
?11???? , full_name_count
?12???? , first_name_count + last_name_count + full_name_count + 1 total_count
?13? from counts;
?
FIRST_NAME_COUNT ? ? LAST_NAME_COUNT?????? FULL_NAME_COUNT????? TOTAL_COUNT
----------------??????????????????????? --------------- ? ? ? ? ? ? ? ? ? ? ? ?? --------------- ? ? ? ? ? ? ? ? ? ? ? ? -----------
????????????? 91??????????? ? ? ? ? ? ? ? 102????????????????????????????????????? 107??????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 301
?
1 row selected.
??? You can simulate the operation of CUBE by using SQL to reproduce the steps taken by the database, both to see how the operation works and to see just how much work the database is saving you by using GROUP BY CUBE.?
??? 你能通过用SQL语句重现在数据库中的步骤来模拟CUBE运算,既能看出运算是怎么进行的也能看出数据库使用GROUP BY CUBE为你节省了多少工作。
??? By examining the execution plan shown in Listing 7-8, you can see that the SORT GROUP BY NOSORT operation (step 5) returns 107 rows to the GROUP BY CUBE operation (step 4), which in turn generates 428 rows. Why are 428 rows generated? Listing 7-10 shows that 428 is the expected number of rows if all combinations of LAST_NAME and FIRST_NAME are generated.? The GROUP BY then reduces the output to 301 rows, just as the CUBE extension did, but with an important difference: the manual method of UNION ALL and GROUP BY employed in Listing 7-10 required three full scans of the EMP_NAME_IX index and one full ?scan of the EMP_EMAIL_UK index.? Contrast this to the single full scan of the EMP_NAME_IX index in Listing 7-8 as performed by the GROUP BY extension.
??? 通过检查列表7-8展示的执行计划,你可看到SORT ROUP BY NOSORT操作(步骤5)返回107行给GROUP BY CUBE运算(步骤4),后者返回428行。为什么会生成428行?列表7-10显示428是预计之中的行数,如果生成所有的LAST_NAME和 FIRST_NAME组合。而然后GROUP BY减少输出行到301,和CUBE扩展做的一样,但是有一个重要的区别:列表7-10中所使用的 UNION ALL和 GROUP BY 的手动方法需要三次全扫描EMP_NAME_IX索引还有一次全扫描EMP_EMAIL_UK索引。相比之下在列表7-8中执行GROUP BY扩展仅需要一次扫描EMP_NAME_IX索引。
??? The CUBE extension didn’t just reduce the SQL required to generate the same data as the UNION ALL and GROUP BY combination did, it also reduced the number of full index scans from four to one.? The optimizer chose to use index EMP_EMAIL_UK rather than the EMP_NAME_IX index, resulting in 10 physical reads rather than the nine seen in Listing 7-8.? Using the small data set in the Oracle demo schemas does not cause a large difference in execution time for the example queries.? With large data sets, however, the effect of using four INDEX FULL SCAN operations rather than just one would be quite obvious. ?
???? CUBE扩展不仅减化了UNION ALL和GROUP BY组合?生成相同数据(结果集)所需的SQL(代码量),还将全索引扫描的次数由4次减少到1次。优化器选择使用索引EMP_EMAIL_UK而不是 EMP_NAME_ID,导致10次物理读而不是在列表7-8中看到的9次。对例子所举的查询使用的是Oracle 演示shema的小批量数据不能再执行时间上产生很大的区别。然而随着数据量的增大,使用四次全索引扫描操作而不是一次的效果就会变得非常明显。
Listing 7-10.? Generate CUBE Rows with UNION ALL?
? 1? with emps as (
? 2???? select last_name, first_name from hr.employees
? 3? ) ,
? 4? mycube as (
? 5??? select last_name, first_name from emps
? 6??? union all
? 7??? select last_name, null first_name from emps
? 8??? union all
? 9??? select null last_name, first_name from emps
?10??? union all
?11??? select null last_name, null first_name from emps
?12? )
?13? select /*+ gather_plan_statistics */ *
?14? from mycube
?15? group by last_name, first_name;
LAST_NAME???????????????? FIRST_NAME
------------------------- --------------------
Atkinson????????????????? Mozhe
Bissot??????????????????? Laura
Grant???????????????????? Kimberely
…
301 rows selected. ?
Statistics
----------------------
??????? 759? recursive calls
????????? 0? db block gets
??????? 191? consistent gets
???????? 10? physical reads
????????? 0? redo size
?????? 5477? bytes sent via SQL*Net to client
??????? 557? bytes received via SQL*Net from client
????????? 5? SQL*Net roundtrips to/from client
????????? 6? sorts (memory)
????????? 0? sorts (disk)
??????? 301? rows processed ?
PLAN_TABLE_OUTPUT
---------------------------------
| Id? | Operation???????????????????????????? | Name????????????????? | Starts | E-Rows | A-Rows
---------------------------------
|?? 0 | SELECT STATEMENT??????? |???????????? ? ? ?????????? |????? 1?? |????????????? |??? 301
|?? 1 |? HASH GROUP BY??? ? ? ? ? ? |??????????????????????????? |????? 1?? |??? 428??? |??? 301
|?? 2 |?? VIEW??????????? ? ? ? ? ? ? ? ? ? ? ? |??????????????????????????? |????? 1 ? |??? 428 ?? |??? 428
|?? 3 |??? UNION-ALL?????????????????????? |???????????? ? ? ? ? ? ? ?? |????? 1?? |???????????? |??? 428
|?? 4 |???? INDEX FULL SCAN???????? | EMP_NAME_IX? |????? 1?? |??? 107??? |??? 107
|?? 5 |???? INDEX FULL SCAN???????? | EMP_NAME_IX? |????? 1 ? |??? 107 ?? |??? 107
|?? 6 |???? INDEX FULL SCAN???????? | EMP_NAME_IX? |????? 1?? |??? 107??? |??? 107
|?? 7 |???? INDEX FULL SCAN???????? | EMP_EMAIL_UK |????? 1? |??? 107??? |??? 107
---------------------------------- ?
?
?
?