Analytic functions in oracle

Last updated on May 19th, 2019 at 06:01 pm

This post is part of SQL tutorial and we would be discussing Analytic functions in oracle with examples, detailed explanation .

We have already studied about Aggregate function like avg ,sum ,count. Lets take an example

select count(*) from EMP;

select sum (bytes) from dba_segments where tablespace_name='TOOLS';

select deptno ,count(*) from emp group by deptno;

SQL> desc emp
Name Null? Type
----------------------------------------- -------- ----------------------------
EMPNO NOT NULL NUMBER(4)
ENAME VARCHAR2(10)
JOB VARCHAR2(9)
MGR NUMBER(4)
HIREDATE DATE
SAL NUMBER(7,2)
COMM NUMBER(7,2)
DEPTNO NUMBER(2)

SQL> select deptno ,count(*) from emp group by deptno;

DEPTNO COUNT(*)
---------- ----------
30              6
20              4
10              3

Here we can see that it reduces the number of rows in each of the queries. Now questions comes what to do if we need to have all the rows returned with count(*) also

For that oracle has provided a sets of analytic functions. So to solve the last problem , we can write as

select empno ,deptno , count(*) over (partition by deptno) from emp group by deptno;

SQL> select empno ,deptno , count(*) over (partition by deptno) from emp;

EMPNO DEPTNO COUNT(*)OVER(PARTITIONBYDEPTNO)
---------- ---------- -------------------------------
7839          10                  3
7782          10                  3
7934          10                  3
7788          20                  4
7876          20                  4
7902          20                  4
7369          20                  4
7844          30                  6
7654          30                  6
7698          30                  6
7900          30                  6
7499          30                  6
7521          30                  6

Here count(*) over (partition by dept_no) is the analytical version of the count aggregate function.

Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause.

Here is the general syntax

analytic_function([ arguments ]) OVER ([ query_partition_clause ] [ order_by_clause [ windowing_clause ] ])

Example

count(*) over (partition by deptno)

avg(Sal) over (partition by deptno)

Lets go over each part

query_partition_clause
It defined the group of rows. It can like below

partition by deptno : group of rows of same deptno
or
() : All rows

select empno ,deptno , count(*) over () from emp;

SQL> select empno ,deptno , count(*) over () from emp;

EMPNO DEPTNO COUNT(*)OVER()
---------- ---------- --------------
7839 10 13
7698 30 13
7788 20 13
7902 20 13
7369 20 13
7499 30 13
7782 10 13
7521 30 13
7654 30 13
7844 30 13
7876 20 13
7900 30 13
7934 10 13

13 rows selected.

[ order_by_clause [ windowing_clause ] ]

This clause is used when you want to order the rows in the partition. This is particularly useful if you want analytical function to consider the order of the rows.

Example will be row_number function

SQL> select
deptno, ename, sal, 2
3 row_number() over (partition by deptno order by sal) "row_number" from emp;

DEPTNO ENAME SAL row_number
---------- ---------- ---------- ----------
10 allen 0 1
10 CLARK 0 2
10 MILLER 0 3
20 SMITH 800 1
20 ADAMS 1100 2
20 FORD 3000 3
20 SCOTT 3000 4
30 JAMES 950 1
30 MARTIN 1250 2
30 WARD 1250 3
30 TURNER 1500 4
30 ALLEN 1600 5
30 BLAKE 2850 6

Another example would be

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE) running_sum from emp;

DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 CLARK 09-JUN-81 0 0
10 allen 17-NOV-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 800
20 FORD 03-DEC-81 3000 3800
20 SCOTT 19-APR-87 3000 6800
20 ADAMS 23-MAY-87 1100 7900
30 ALLEN 20-FEB-81 1600 1600
30 WARD 22-FEB-81 1250 2850
30 BLAKE 01-MAY-81 2850 5700
30 TURNER 08-SEP-81 1500 7200
30 MARTIN 28-SEP-81 1250 8450
30 JAMES 03-DEC-81 950 9400

13 rows selected.

Windowing_clause

This is always used with order by clause and give more control over the set of the rows in the group

With Windowing clause, For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.

When using order by clause and nothing is given for windowing_clause,below default value of the windowing_clause is taken
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW or RANGE UNBOUNDED PRECEDING
It means “The current and previous rows in the current partition are the rows that should be used in the computation”

The example below clearly states this. This goes the running average in the department

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE) running_sum from emp;

DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 CLARK 09-JUN-81 0 0
10 allen 17-NOV-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 800
20 FORD 03-DEC-81 3000 3800
20 SCOTT 19-APR-87 3000 6800
20 ADAMS 23-MAY-87 1100 7900
30 ALLEN 20-FEB-81 1600 1600
30 WARD 22-FEB-81 1250 2850
30 BLAKE 01-MAY-81 2850 5700
30 TURNER 08-SEP-81 1500 7200
30 MARTIN 28-SEP-81 1250 8450
30 JAMES 03-DEC-81 950 9400

13 rows selected.

Now windowing_clause can be defined with number of ways
Lets first understand the terminology

ROWS specifies the window in physical units (rows).
RANGE specifies the window as a logical offset. the RANGE windowing clause can be used only with  ORDER BY clauses containing columns or expressions of numeric or date datatypes
PRECEDING – get rows before the current one.
FOLLOWING – get rows after the current one.
UNBOUNDED – when used with PRECEDING or FOLLOWING, it returns all before or after. CURRENT ROW

So it is generally defined as

ROWS UNBOUNDED PRECEDING : The current and previous rows in the current partition are the rows that should be used in the computation

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE RANGE UNBOUNDED PRECEDING) running_sum from emp;

DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 CLARK 09-JUN-81 0 0
10 allen 17-NOV-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 800
20 FORD 03-DEC-81 3000 3800
20 SCOTT 19-APR-87 3000 6800
20 ADAMS 23-MAY-87 1100 7900
30 ALLEN 20-FEB-81 1600 1600
30 WARD 22-FEB-81 1250 2850
30 BLAKE 01-MAY-81 2850 5700
30 TURNER 08-SEP-81 1500 7200
30 MARTIN 28-SEP-81 1250 8450
30 JAMES 03-DEC-81 950 9400

13 rows selected.

RANGE UNBOUNDED PRECEDING : The current and previous rows in the current partition are the rows that should be used in the computation. Also since range is specified ,it all takes those values which are equal to the current rows.

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE RANGE UNBOUNDED PRECEDING) running_sum from emp;

DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 CLARK 09-JUN-81 0 0
10 allen 17-NOV-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 800
20 FORD 03-DEC-81 3000 3800
20 SCOTT 19-APR-87 3000 6800
20 ADAMS 23-MAY-87 1100 7900
30 ALLEN 20-FEB-81 1600 1600
30 WARD 22-FEB-81 1250 2850
30 BLAKE 01-MAY-81 2850 5700
30 TURNER 08-SEP-81 1500 7200
30 MARTIN 28-SEP-81 1250 8450
30 JAMES 03-DEC-81 950 9400

13 rows selected.

You may not see the difference between range and rows as hire_date is different for all.The difference will become more clear if we use sal as order by clause

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by sal RANGE UNBOUNDED PRECEDING) running_sum from emp;

DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 allen 17-NOV-81 0 0
10 CLARK 09-JUN-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 800
20 ADAMS 23-MAY-87 1100 1900
20 FORD 03-DEC-81 3000 7900
20 SCOTT 19-APR-87 3000 7900
30 JAMES 03-DEC-81 950 950
30 MARTIN 28-SEP-81 1250 3450
30 WARD 22-FEB-81 1250 3450
30 TURNER 08-SEP-81 1500 4950
30 ALLEN 20-FEB-81 1600 6550
30 BLAKE 01-MAY-81 2850 9400

13 rows selected.

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by sal ROWS UNBOUNDED PRECEDING) running_sum from emp;

DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 allen 17-NOV-81 0 0
10 CLARK 09-JUN-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 800
20 ADAMS 23-MAY-87 1100 1900
20 FORD 03-DEC-81 3000 4900
20 SCOTT 19-APR-87 3000 7900
30 JAMES 03-DEC-81 950 950
30 MARTIN 28-SEP-81 1250 2200
30 WARD 22-FEB-81 1250 3450
30 TURNER 08-SEP-81 1500 4950
30 ALLEN 20-FEB-81 1600 6550
30 BLAKE 01-MAY-81 2850 9400

13 rows selected.

You can find the difference on line 6

RANGE value_expr PRECEDING : The window begins with the row whose ORDER BY value is numeric expression rows less than, or preceding, the current row and ends with the current row being processed.

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE RANGE 365 PRECEDING) running_sum from emp;

DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 CLARK 09-JUN-81 0 0
10 allen 17-NOV-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 800
20 FORD 03-DEC-81 3000 3800
20 SCOTT 19-APR-87 3000 3000
20 ADAMS 23-MAY-87 1100 4100
30 ALLEN 20-FEB-81 1600 1600
30 WARD 22-FEB-81 1250 2850
30 BLAKE 01-MAY-81 2850 5700
30 TURNER 08-SEP-81 1500 7200
30 MARTIN 28-SEP-81 1250 8450
30 JAMES 03-DEC-81 950 9400

13 rows selected.

Here it takes all the rows where hiredate value falls within 365 days preceding the hiredate value of the current row

ROWS value_expr PRECEDING : The window begins with the row given and ends with the current row being processed

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE ROWS 2 PRECEDING) running_sum from emp;

DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 CLARK 09-JUN-81 0 0
10 allen 17-NOV-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 800
20 FORD 03-DEC-81 3000 3800
20 SCOTT 19-APR-87 3000 6800
20 ADAMS 23-MAY-87 1100 7100
30 ALLEN 20-FEB-81 1600 1600
30 WARD 22-FEB-81 1250 2850
30 BLAKE 01-MAY-81 2850 5700
30 TURNER 08-SEP-81 1500 5600
30 MARTIN 28-SEP-81 1250 5600
30 JAMES 03-DEC-81 950 3700

Here the window start from 2 rows preceding the current row

RANGE BETWEEN CURRENT ROW and value_expr FOLLOWING : The window begins with the current row and ends with the row whose ORDER BY value is numeric expression rows less than, or following

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE ROWS between current row and 1 FOLLOWING) running_sum from emp;

DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 CLARK 09-JUN-81 0 0
10 allen 17-NOV-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 3800
20 FORD 03-DEC-81 3000 3000
20 SCOTT 19-APR-87 3000 4100
20 ADAMS 23-MAY-87 1100 1100
30 ALLEN 20-FEB-81 1600 9400
30 WARD 22-FEB-81 1250 7800
30 BLAKE 01-MAY-81 2850 6550
30 TURNER 08-SEP-81 1500 3700
30 MARTIN 28-SEP-81 1250 2200
30 JAMES 03-DEC-81 950 950

13 rows selected.

ROWS BETWEEN CURRENT ROW and value_expr FOLLOWING : The window begins with the current row and ends with the rows after the current one

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE ROWS between current row and 1 FOLLOWING) running_sum from emp;

DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 CLARK 09-JUN-81 0 0
10 allen 17-NOV-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 3800
20 FORD 03-DEC-81 3000 6000
20 SCOTT 19-APR-87 3000 4100
20 ADAMS 23-MAY-87 1100 1100
30 ALLEN 20-FEB-81 1600 2850
30 WARD 22-FEB-81 1250 4100
30 BLAKE 01-MAY-81 2850 4350
30 TURNER 08-SEP-81 1500 2750
30 MARTIN 28-SEP-81 1250 2200
30 JAMES 03-DEC-81 950 950

13 rows selected.

RANGE BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE RANGE BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
) running_sum from emp;
2
DEPTNO ENAME HIREDATE SAL RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10 CLARK 09-JUN-81 0 0
10 allen 17-NOV-81 0 0
10 MILLER 23-JAN-82 0 0
20 SMITH 17-DEC-80 800 7900
20 FORD 03-DEC-81 3000 7900
20 SCOTT 19-APR-87 3000 7900
20 ADAMS 23-MAY-87 1100 7900
30 ALLEN 20-FEB-81 1600 9400
30 WARD 22-FEB-81 1250 9400
30 BLAKE 01-MAY-81 2850 9400
30 TURNER 08-SEP-81 1500 9400
30 MARTIN 28-SEP-81 1250 9400
30 JAMES 03-DEC-81 950 9400

13 rows selected.

RANGE BETWEEN value_expr PRECEDING and value_expr FOLLOWING

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE RANGE BETWEEN 365 PRECEDING and 365 FOLLOWING
) running_sum from emp; 2

DEPTNO       ENAME      HIREDATE      SAL      RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10           CLARK       09-JUN-81      0          0
10           ALLEN       17-NOV-81      0          0
10           MILLER      23-JAN-82      0          0
20           SMITH       17-DEC-80      800       3800
20           FORD        03-DEC-81      3000      3800
20           SCOTT       19-APR-87      3000      4100
20           ADAMS       23-MAY-87      1100      4100
30           ALLEN       20-FEB-81      1600      9400
30           WARD        22-FEB-81      1250      9400
30           BLAKE       01-MAY-81      2850      9400
30          TURNER       08-SEP-81      1500      9400
30          MARTIN       28-SEP-81      1250      9400
30          JAMES        03-DEC-81      950       9400

13 rows selected.

Some Important Notes
1.Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.

  1. Analytic functions are commonly used to compute cumulative, moving, centered, and reporting aggregates.

I hope you like this detailed explanation of Analytic functions in oracle

Related Articles

Oracle documentation

LEAD and LAG : Oracle Analytic Functions

RANK, DENSE_RANK and ROW_NUMBER :Oracle Analytic functions

How to use Oracle LISTAGG Function

Aggregating Data Using Group Functions

Leave a Reply