Oracle分析函数小结(3)- 数值分布 - cume_dist,percent_rank,ntile,percentile_disc,percentile_cont,ratio_to_report_报表

Oracle分析函数总结(3)- 数值分布 - cume_dist,percent_rank,ntile,percentile_disc,percentile_cont,ratio_to_report

分析函数的基本概念和语法 -> http://blog.csdn.net/fw0124/article/details/7842039

下面这些函数计算某个值在一组有序数据中的累计分布(cumulative distribution)

1) cume_dist()
计算结果为相对位置/总行数。返回值(0,1]。
例如在一个5行的组中，返回的累计分布值为0.2,0.4,0.6,0.8,1.0;
注意对于重复行，计算时取重复行中的最后一行的位置。

a)作为聚合函数的用法
语法：cume_dist(expr) with group (order by exp)
cume_dist()用参数中的指定的数据构造一条假定的数据并插入到现存行中，
然后计算这条假定数据在所有行中的相对位置。
例如下面的查询中,emp中总共有14行数据，
假定的数据(deptno=20,sal=4000)会插入到第9行，
因此相对位置 9/(14+1)=0.6。

[email protected]> select cume_dist(20,4000)  2      within group (order by deptno, sal) cume_dist  3  from emp; CUME_DIST----------        .6

b)作为分析函数的用法
语法：cume_dist() over([partition_clause] order_by_clause)
例如计算每个人在本部门按照薪水排列中的相对位置。

[email protected]> select ename, sal, deptno,  2      cume_dist() over(partition by deptno order by sal) cume_dist  3   from emp;ENAME                       SAL     DEPTNO  CUME_DIST-------------------- ---------- ---------- ----------MILLER                     1300         10 .333333333CLARK                      2450         10 .666666667KING                       5002         10          1SMITH                       800         20         .2ADAMS                      1100         20         .4JONES                      2975         20         .6SCOTT                      3000         20          1FORD                       3000         20          1JAMES                       950         30 .166666667MARTIN                     1250         30         .5WARD                       1250         30         .5TURNER                     1500         30 .666666667ALLEN                      1600         30 .833333333BLAKE                      2850         30          1已选择14行。

2) percent_rank()
和cume_dist的不同点在于计算分布结果的方法。
计算方法为 (相对位置-1)/(总行数-1),
因此第一行的结果为0。返回值[0,1]。
例如在一个5行的组中，返回的累计分布值为0,0.25,0.5,0.75,1.0;
注意对于重复行，计算时取重复行中的第一行的位置。

a)作为聚合函数的用法
语法：percent_rank(expr) with group (order by exp)
下面的例子中,类似cume_dist，假定数据(deptno=20,sal=4000)会插入到第9行，
计算相对位置 (9-1)/((14+1)-1)=0.57

[email protected]> select percent_rank(20,4000)  2      within group (order by deptno, sal) percent_rank  3  from emp;PERCENT_RANK------------  .571428571

b)作为分析函数的用法
语法： percent_rank() over([partition_clause] order_by_clause)
例如计算每个人在本部门按照薪水排列中的相对位置。

[email protected]> select ename, sal, deptno,  2      percent_rank() over(partition by deptno order by sal) percent_rank  3  from emp;ENAME                       SAL     DEPTNO PERCENT_RANK-------------------- ---------- ---------- ------------MILLER                     1300         10            0CLARK                      2450         10           .5KING                       5002         10            1SMITH                       800         20            0ADAMS                      1100         20          .25JONES                      2975         20           .5SCOTT                      3000         20          .75FORD                       3000         20          .75JAMES                       950         30            0MARTIN                     1250         30           .2WARD                       1250         30           .2TURNER                     1500         30           .6ALLEN                      1600         30           .8BLAKE                      2850         30            1已选择14行。

3) percentile_disc()
返回一个与输入的分布百分比值相对应的数据值，
分布百分比的计算方法同函数CUME_DIST()，
如果没有正好对应的数据值，就取大于该分布值的下一个值。

a)作为聚合函数的用法
语法： percentile_disc(expr) with group (order by expr)
例如，查询各部门中薪水分布处于40%位置的人的薪水。

[email protected]> select deptno,  2       percentile_disc(0.4) within group (order by sal) percentile_disc  3  from emp group by deptno;    DEPTNO PERCENTILE_DISC---------- ---------------        10            2450        20            1100        30            1250

2)作为分析函数的用法
语法：percentile_disc(expr) with group (order by expr) over([partition_clause])
例如，查询各部门中薪水分布处于40%位置的人的薪水。注意和cume_dist()的输出结果作对照。

[email protected]> select ename, sal, deptno,  2  cume_dist() over(partition by deptno order by sal) cume_dist,  3  percentile_disc(0.4) within group (order by sal)  4      over(partition by deptno) percentile_disc  5  from emp;ENAME                       SAL     DEPTNO  CUME_DIST PERCENTILE_DISC-------------------- ---------- ---------- ---------- ---------------MILLER                     1300         10 .333333333            2450CLARK                      2450         10 .666666667            2450KING                       5002         10          1            2450SMITH                       800         20         .2            1100ADAMS                      1100         20         .4            1100JONES                      2975         20         .6            1100SCOTT                      3000         20          1            1100FORD                       3000         20          1            1100JAMES                       950         30 .166666667            1250MARTIN                     1250         30         .5            1250WARD                       1250         30         .5            1250TURNER                     1500         30 .666666667            1250ALLEN                      1600         30 .833333333            1250BLAKE                      2850         30          1            1250已选择14行。

4) percentile_cont()
返回一个与输入的分布百分比值相对应的数据值，分布百分比的计算方法同函数PERCENT_RANK()，
如果没有正好对应的数据值，就通过下面算法来得到值：
RN = 1+ (P*(N-1)) 其中P是输入的分布百分比值，N是组内的行数
CRN = CEIL(RN) FRN = FLOOR(RN)
If (CRN = FRN = RN) then the result is
    (value of expression from row at RN)
Otherwise the result is
    (CRN - RN) * (value of expression for row at FRN) +
    (RN - FRN) * (value of expression for row at CRN)
上面的算法比较难理解，实际上可以看作X轴上最小坐标0，最大坐标(行数-1),
先根据输入百分比找到对应的X轴坐标，然后按比例计算对应的数值。

a)作为聚合函数的用法
语法percentile_cont(expr) with group (order by expr)
下面的例子中,部门号30中薪水处于50%位置的数额计算方法：
RN=1+(0.5*(6-1))=3.5;CRN=4;FRN=3;
结果=(4-3.5)*1250+(3.5-3)*1500=1375

[email protected]> select deptno,  2  percentile_cont(0.5) within group (order by sal) percentile_cont  3  from emp group by deptno;    DEPTNO PERCENTILE_CONT---------- ---------------        10            2450        20            2975        30            1375

b)作为分析函数的用法
语法：percentile_cont(expr) with group (order by expr) over([partition_clause])
例如，计算各部门中薪水处于50%位置的薪水数额，注意和percent_rank()的输出结果作对照。

[email protected]> select ename, sal, deptno,  2  percent_rank() over(partition by deptno order by sal) percent_rank,  3  percentile_cont(0.5) within group (order by sal)  4      over(partition by deptno) percentile_cont  5  from emp;ENAME                       SAL     DEPTNO PERCENT_RANK PERCENTILE_CONT-------------------- ---------- ---------- ------------ ---------------MILLER                     1300         10            0            2450CLARK                      2450         10           .5            2450KING                       5002         10            1            2450SMITH                       800         20            0            2975ADAMS                      1100         20          .25            2975JONES                      2975         20           .5            2975SCOTT                      3000         20          .75            2975FORD                       3000         20          .75            2975JAMES                       950         30            0            1375MARTIN                     1250         30           .2            1375WARD                       1250         30           .2            1375TURNER                     1500         30           .6            1375ALLEN                      1600         30           .8            1375BLAKE                      2850         30            1            1375已选择14行。

5) ntile()
只能用作分析函数。
语法: ntile(expr) over([partition_clause] order_by_clause)
可以看作是把有序的数据集合平均分配到expr指定的数量的桶中,将桶号分配给每一行。
如果不能平均分配，则较小桶号的桶分配额外的行，并且各个桶中能放的行数最多相差1。
例如如果桶的数量为4，一共有6行，则：
1号桶:1行,2行;
2号桶:3行,4行;
3号桶:5行;
4号桶:6行;
例如，NTILE列返回每一行所应该被分配的桶号。

[email protected]> select ename, sal, deptno,  2      ntile(4) over(partition by deptno order by sal) ntile  3   from emp;ENAME                       SAL     DEPTNO      NTILE-------------------- ---------- ---------- ----------MILLER                     1300         10          1CLARK                      2450         10          2KING                       5002         10          3SMITH                       800         20          1ADAMS                      1100         20          1JONES                      2975         20          2SCOTT                      3000         20          3FORD                       3000         20          4JAMES                       950         30          1MARTIN                     1250         30          1WARD                       1250         30          2TURNER                     1500         30          2ALLEN                      1600         30          3BLAKE                      2850         30          4已选择14行。

6) ratio_to_report()
计算当前行的数值在分组中所有行的数值总和中所占的比例。
语法：ratio_to_report(expr) over([partition_clause])
例如，计算每个人的薪水在部门薪水总额中的比例。

[email protected]> select ename, sal, deptno,  2      sum(sal) over(partition by deptno) dept_sal_sum,  3      ratio_to_report(sal) over(partition by deptno) ratio_to_report  4   from emp;ENAME                       SAL     DEPTNO DEPT_SAL_SUM RATIO_TO_REPORT-------------------- ---------- ---------- ------------ ---------------CLARK                      2450         10         8752      .279936015KING                       5002         10         8752      .571526508MILLER                     1300         10         8752      .148537477JONES                      2975         20        10875      .273563218FORD                       3000         20        10875      .275862069ADAMS                      1100         20        10875      .101149425SMITH                       800         20        10875      .073563218SCOTT                      3000         20        10875      .275862069WARD                       1250         30         9400      .132978723TURNER                     1500         30         9400      .159574468ALLEN                      1600         30         9400      .170212766JAMES                       950         30         9400       .10106383BLAKE                      2850         30         9400      .303191489MARTIN                     1250         30         9400      .132978723已选择14行。