How to use Analytic functions in oracle (Over Partition by Keyword)

This post is part of Oracle SQL tutorial and we would be discussing Analytic functions in oracle(Over by partition) with examples, detailed explanation .

Analytic functions in oracle

We have already studied about Oracle Aggregate function like avg ,sum ,count. Lets take an example

First lets create the sample data

CREATE TABLE "DEPT"
( "DEPTNO" NUMBER(2,0),
"DNAME" VARCHAR2(14),
"LOC" VARCHAR2(13),
CONSTRAINT "PK_DEPT" PRIMARY KEY ("DEPTNO")
)

CREATE TABLE "EMP"
( "EMPNO" NUMBER(4,0),
"ENAME" VARCHAR2(10),
"JOB" VARCHAR2(9),
"MGR" NUMBER(4,0),
"HIREDATE" DATE,
"SAL" NUMBER(7,2),
"COMM" NUMBER(7,2),
"DEPTNO" NUMBER(2,0),
CONSTRAINT "PK_EMP" PRIMARY KEY ("EMPNO"),
CONSTRAINT "FK_DEPTNO" FOREIGN KEY ("DEPTNO")
REFERENCES "DEPT" ("DEPTNO") ENABLE
);

SQL> desc emp
Name Null? Type
---- ---- -----
EMPNO NOT NULL NUMBER(4)
ENAME VARCHAR2(10)
JOB VARCHAR2(9)
MGR NUMBER(4)
HIREDATE DATE
SAL NUMBER(7,2)
COMM NUMBER(7,2)
DEPTNO NUMBER(2)

SQL> desc dept
Name Null? Type
---- ----- ----
DEPTNO NOT NULL NUMBER(2)
DNAME VARCHAR2(14)
LOC VARCHAR2(13)


insert into DEPT values(10, 'ACCOUNTING', 'NEW YORK');
insert into dept values(20, 'RESEARCH', 'DALLAS');
insert into dept values(30, 'RESEARCH', 'DELHI');
insert into dept values(40, 'RESEARCH', 'MUMBAI');
commit;

insert into emp values( 7839, 'Allen', 'MANAGER', 7839, to_date('17-11-1981','dd-mm-yyyy'), 20, null, 10 );
insert into emp values( 7782, 'CLARK', 'MANAGER', 7839, to_date('9-06-1981','dd-mm-yyyy'), 0, null, 10 );
insert into emp values( 7934, 'MILLER', 'MANAGER', 7839, to_date('23-01-1982','dd-mm-yyyy'), 0, null, 10 );
insert into emp values( 7788, 'SMITH', 'ANALYST', 7788, to_date('17-12-1980','dd-mm-yyyy'), 800, null, 20 );
insert into emp values( 7902, 'ADAM, 'ANALYST', 7832, to_date('23-05-1987','dd-mm-yyyy'), 1100, null, 20 );
insert into emp values( 7876, 'FORD', 'ANALYST', 7566, to_date('3-12-1981','dd-mm-yyyy'), 3000, null, 20 );
insert into emp values( 7369, 'SCOTT', 'ANALYST', 7566, to_date('19-04-1987','dd-mm-yyyy'), 3000, null, 20 );
insert into emp values( 7698, 'JAMES', 'ANALYST', 7788, to_date('03-12-1981','dd-mm-yyyy'), 950, null, 30 );
insert into emp values( 7499, 'MARTIN', 'ANALYST', 7698, to_date('28-09-1981','dd-mm-yyyy'), 1250, null, 30 );
insert into emp values( 7844, 'WARD', 'ANALYST', 7698, to_date('22-02-1981','dd-mm-yyyy'), 1250, null, 30 );
insert into emp values( 7654, 'TURNER', 'ANALYST', 7698, to_date('08-09-1981','dd-mm-yyyy'), 1500, null, 30 );
insert into emp values( 7521, 'ALLEN', 'ANALYST', 7698, to_date('20-02-1981','dd-mm-yyyy'), 1600, null, 30 );
insert into emp values( 7900, 'BLAKE', 'ANALYST', 77698, to_date('01-05-1981','dd-mm-yyyy'), 2850, null, 30 );
commit;

Now the example of aggregate functions will be given as below

select count(*) from EMP;
---------
13

select sum (bytes) from dba_segments where tablespace_name='TOOLS';
-----
100

SQL> select deptno ,count(*) from emp group by deptno;

DEPTNO COUNT(*)
---------- ----------
30              6
20              4
10              3

Here we can see that it reduces the number of rows in each of the queries. Now questions comes what to do if we need to have all the rows returned with count(*) also

For that oracle has provided a sets of analytic functions. So to solve the last problem , we can write as

select empno ,deptno , count(*) over (partition by deptno) from emp group by deptno;

Here count(*) over (partition by dept_no) is the analytical version of the count aggregate function. The main key work which is different by aggregate function is over partition by

General Syntax for Analytic functions

Here is the general syntax

analytic_function([ arguments ]) OVER ([ query_partition_clause ] [ order_by_clause [ windowing_clause ] ])

Example

count(*) over (partition by deptno)

avg(Sal) over (partition by deptno)

Lets go over each part

query_partition_clause
It defined the group of rows. It can like below

partition by deptno : group of rows of same deptno
or
() : All rows

SQL> select empno ,deptno , count(*) over () from emp;

[ order_by_clause [ windowing_clause ] ]

This clause is used when you want to order the rows in the partition. This is particularly useful if you want analytical function to consider the order of the rows.

Example will be row_number function

SQL> select
deptno, ename, sal, 
 row_number() over (partition by deptno order by sal) "row_number" from emp;

Another example would be

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE) running_sum from emp;

Windowing_clause

This is always used with order by clause and give more control over the set of the rows in the group

With Windowing clause, For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.

When using order by clause and nothing is given for windowing_clause,below default value of the windowing_clause is taken
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW or RANGE UNBOUNDED PRECEDING
It means “The current and previous rows in the current partition are the rows that should be used in the computation”

The example below clearly states this. This goes the running average in the department

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE) running_sum from emp;

Now windowing_clause can be defined with number of ways
Lets first understand the terminology

Types of Windowing clause

ROWS UNBOUNDED PRECEDING : The current and previous rows in the current partition are the rows that should be used in the computation

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE ROWS UNBOUNDED PRECEDING) running_sum from emp;

RANGE UNBOUNDED PRECEDING : The current and previous rows in the current partition are the rows that should be used in the computation. Also since range is specified ,it all takes those values which are equal to the current rows.

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE RANGE UNBOUNDED PRECEDING) running_sum from emp;

You may not see the difference between range and rows as hire_date is different for all.The difference will become more clear if we use sal as order by clause

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by sal RANGE UNBOUNDED PRECEDING) running_sum from emp;

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by sal ROWS UNBOUNDED PRECEDING) running_sum from emp;

You can find the difference on line 6

RANGE value_expr PRECEDING : The window begins with the row whose ORDER BY value is numeric expression rows less than, or preceding, the current row and ends with the current row being processed.

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE RANGE 365 PRECEDING) running_sum from emp;

Here it takes all the rows where hiredate value falls within 365 days preceding the hiredate value of the current row

ROWS value_expr PRECEDING : The window begins with the row given and ends with the current row being processed

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE ROWS 2 PRECEDING) running_sum from emp;

Here the window start from 2 rows preceding the current row

RANGE BETWEEN CURRENT ROW and value_expr FOLLOWING : The window begins with the current row and ends with the row whose ORDER BY value is numeric expression rows less than, or following

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE ROWS between current row and 1 FOLLOWING) running_sum from emp;

ROWS BETWEEN CURRENT ROW and value_expr FOLLOWING : The window begins with the current row and ends with the rows after the current one

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE ROWS between current row and 1 FOLLOWING) running_sum from emp;

RANGE BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE RANGE BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
) running_sum from emp;

RANGE BETWEEN value_expr PRECEDING and value_expr FOLLOWING

SQL> select deptno, ename,hiredate, sal,sum(sal) over (partition by deptno order by HIREDATE RANGE BETWEEN 365 PRECEDING and 365 FOLLOWING
) running_sum from emp; 2

DEPTNO       ENAME      HIREDATE      SAL      RUNNING_SUM
---------- ---------- --------------- ---------- -----------
10           CLARK       09-JUN-81      0          0
10           ALLEN       17-NOV-81      0          0
10           MILLER      23-JAN-82      0          0
20           SMITH       17-DEC-80      800       3800
20           FORD        03-DEC-81      3000      3800
20           SCOTT       19-APR-87      3000      4100
20           ADAMS       23-MAY-87      1100      4100
30           ALLEN       20-FEB-81      1600      9400
30           WARD        22-FEB-81      1250      9400
30           BLAKE       01-MAY-81      2850      9400
30          TURNER       08-SEP-81      1500      9400
30          MARTIN       28-SEP-81      1250      9400
30          JAMES        03-DEC-81      950       9400

13 rows selected.

Some Important Notes
(1)Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.
(2)Analytic functions are commonly used to compute cumulative, moving, centered, and reporting aggregates.

I hope you like this detailed explanation of Analytic functions in oracle(over by Partition Clause)

Analytic functions in oracle

General Syntax for Analytic functions

Types of Windowing clause

Related Posts

Leave a Comment Cancel Reply