《Pro Oracle SQL》Chapter 三 - 3.3.2 Sort-Merge Joins_SQL

《Pro Oracle SQL》Chapter 3 -- 3.3.2 Sort-Merge Joins

Sort-Merge Joins??? 排序-合并连接????? (page 114)
?? Sort-merge joins read the two tables to be joined independently, sorts the rows from each table (but only those rows that meet the conditions for the table in the? WHERE? clause) in order by the join key, and then merges the sorted rowsets.? The sort operations are the expensive part for this join method.? For large row sources that won’t fit into memory, the sorts will end up using temporary disk space to complete.? This can be quite memory and time-consuming to complete.? But once the rowsets are sorted, the merge happens quickly.? To merge, the database alternates down the two lists, compares the top rows, discards rows that are earlier in the sort order than the top of the other list, and only returns matching rows.
??? 排序-合并连接读取独立连接的两张表，每表都把行按连接键排序(但只有那些满足WHERE子句条件的行)，再合并排序好的行集。排序操作是这种连接方法“昂贵”的部分。对于大的行源内存中就可能放不下，排序最终就必须放到临时磁盘空间去完成。这将需要很多内存且消耗时间。但是一旦行集排序好了，合并将是很快的。合并时，数据库自上而下交替比较两列的连接键值，抛弃(连接键值)在一边有序列中有而另一边没有的行，而只返回匹配行。(注：上面这句话如果直接翻译的话是“抛弃(连接键值)在一有序列中较另一列中早出现的行”，怎么都觉得原文有错)。
??? Let’s use the same query used earlier and break it down into how the sort-merge join would be processed.??
??? 让我们用之前使用的相同查询，分解它，看看排序-合并连接如何进行的。
select empno, ename, dname, loc
from emp, dept
where emp.deptno = dept.deptno
?
This query would be processed as if it we rewritten like the following pseudocode:???? 查询过程就如下重写的伪代码：
?
select empno, ename, deptno from emp order by deptno
?
select dname, loc, deptno from dept order by deptno
?
compare the rowsets and return rows where deptno in both lists match?? 比较行集且返回在两个列表中都匹配的deptno的行集。
?
for an outer join, compare the rowsets and return all rows from the first list??? 对于外连接，比较行集且返回至第一列中的所有行
?
setting column values for the other table to null ?????????????????????????????????????????? 设定另一表的列值为null
注附上上述查询结果集：

???	EMPNO	ENAME	DEPTNO		DEPTNO	DNAME	LOC
1	7782	CLARK	10		10	ACCOUNTING	NEW YORK
2	7934	MILLER	10		20	RESEARCH	DALLAS
3	7839	KING	10		30	SALES	CHICAGO
4	8001	Tomcat	11		40	OPERATIONS	BOSTON
5	8000	Jack	11
6	7566	JONES	20
7	7369	SMITH	20
8	7902	FORD	20
9	7876	ADAMS	20
10	7788	SCOTT	20
11	7521	WARD	30
12	7844	TURNER	30
13	7499	ALLEN	30
14	7900	JAMES	30
15	7654	MARTIN	30
16	7698	BLAKE	30


黄色为	左边列抛弃行			绿色为	右边列抛弃行

Listing 3-19 shows the plan for this query.
Listing 3-19. Sort-Merge Join?
SQL> select /*+ ordered */ empno, ename, dname, loc
? 2? from scott.dept, scott.emp
? 3? where emp.deptno = dept.deptno;
?
-----------------------------------------
| Id? | Operation?????????????????? ? ? ? ? ? ? ? ? ? ? ? ? | Name?? ? ? ? | Rows? | Bytes | Cost (%CPU)|
-----------------------------------------
|?? 0 | SELECT STATEMENT??????????? ? ? ? ? ? ?? |??????? ? ? ????? |??? 14??? |?? 462? |???? 6? (17)??? |
|?? 1 |?? MERGE JOIN?????????????????????????????????? |??????? ? ? ? ? ? |??? 14??? |?? 462? |???? 6? (17)???? |
|?? 2 |? ?? TABLE ACCESS BY INDEX ROWID | DEPT??????? |???? 4 ? ? |??? 80?? |???? 2?? (0)???? |
|?? 3 |???? ? INDEX FULL SCAN??????????????????????? | PK_DEPT |???? 4 ? ?? |????? ? ?? |???? 1?? (0)???? |
|*? 4 |???? SORT JOIN???????????????? ? ? ? ? ? ? ? ? ?? |????????????????? |??? 14 ?? |?? 182?? |???? 4? (25)??? |
|?? 5 |??????? TABLE ACCESS FULL????????????????? | EMP???????? |??? 14 ? ? |?? 182? |???? 3?? (0)????? |
-----------------------------------------
?
Predicate Information (identified by operation id):
---------------------------------------------------
?
?? 4 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
?????? filter("EMP"."DEPTNO"="DEPT"."DEPTNO")
?
??? I used the same query as before but had to force the plan with an? ordered? hint.? Notice how the plan
operations show a? MERGE JOIN operation followed by an index access on the? dept table and a? SORT JOIN? operation of a full table scan on the? emp table.? The first thing to note is the use of the index scan on dept.? In this case, the optimizer chose to read the table data from the index since the index would
return the data in sorted order.? That means a separate sort step could be avoided.? The emp table was
full scanned and required a separate sort step since there was no index on? deptno that could be used.?
After both rowsets were ready and in sorted order, they were merged together.
??? 我使用前面相同的查询，但是需要用ordered提示强制(选择)计划。注意计划操作显示MERGE JOIN操作后面跟一个dept表的索引访问，和在emp表上全表扫描的SORT JOIN操作。首先要注意的是使用了在dept上的索引。在这种情况下，优化器选择从索引读取表数据因为索引将返回有序的数据。这意味着单独的排序步骤就可以省去。emp表是全表扫描而且需要单独的排序步骤，因为在deptno列上没有索引可用。在两个行集准备好且排序完后，他们再合并在一起。
??? A sort-merge join will access the blocks needed and then do the work to sort and merge them in
memory (or by using temp disk space if there isn’t enough memory).? So, when you do a comparison of
logical reads for a sort-merge join to a nested loops join, particularly for a query against a larger row
source, you will likely find that there are more block accesses required for the nested loops join.? Does
that mean that the sort-merge is a better choice?? It depends.? You have to take into account all the work required to complete the sort and merge steps and realize that work may end up taking much more
time than doing more block accesses might.
??? 排序-合并连接将访问所需的块再在内存中(或者若没有足够内存就用临时磁盘空间)做排序和合并。因此，当你比较排序-合并连接和嵌套循环连接的逻辑读，特别是针对大行源的查询，你可能发现嵌套循环连接需要更多的块访问。是否说排序-合并连接是更佳的选择？看情况。你必须考虑完成排序和合并步骤所需的所有工作和实现这些工作可能最终要比(直接的)块访问耗费更多的时间。
??? Sort-merge joins are typically best suited to queries that have limited data filtering and return lots of
rows.? They are also often a better choice if there are no suitable indexes that can be used to access the
data more directly.? Finally, a sort-merge is often the best choice when the join is an inequality.? For
example, a join condition of? WHERE table1.column1 between table2.column1 and table2.column2? would
be a candidate for a sort-merge.? As you’ll see in the next section, a hash join is not possible for such a
join; if the row sources are large, the sort-merge will likely be the only viable choice.
??? 排序-合并连接典型的最适合于有限的数据筛选而返回很多行。若没有合适的索引可直接的访问数据，它通常也是更好的选择。最后，排序-合并通常是不等值连接的最佳选择。例如连接条件WHERE table1.column1 between table2.column1 and table2.column2排序-合并将是候选者。在下一节你将看到，对于这种连接(情况)不可能用哈希连接(完成)。若是大行集，排序-合并很可能就是唯一可行的选择。