approx_count_distinct : Oracle Database 12c

Table of Contents

Introduction

approx_count_distinct is the new function included in 12c to estimate the distinct values in the column in a faster manner without deviating much from actual values.
It returns approximate number of rows that contain distinct values of expression.

approx_count_distinct Definition

We know that it take considerable time to project desired output using traditional COUNT(DISTINCT) appraoch.

With Oracle 12c (12.1.0.2), we have a function “APPROX_COUNT_DISTINCT” , which is claim to faster then tradition COUNT(DISTINCT <>) approach to get an idea on NDV.

It’s alternative to the COUNT (DISTINCT expr) function, which returns the exact number of rows that contain distinct values of expr.

For processing large amounts of data it’s significantly faster than COUNT, with negligible deviation of values from the exact result.

Statistically, the approx_count_distinct approximations provide a statistically insignificant difference from count distinct, so the approximation is statistically valid.

The APPROX_COUNT_DISTINCT() function ignores records that contain a null value for the expression. Plus is performs less work on the sorting and aggregations

How it works

In a traditional count distinct, Oracle’s read consistency mechanism is invoked, causing a large time lag when counting the number of distinct values in a very large table. Also, as the number of distinct values increase, the elapsed time and memory usage of the count distinct increases drastically.

In contrast, the approx_count_distinct bypasses the read consistency mechanism and give a fast and relatively accurate approximation of the number of distinct values in a table column.

Optimizer Changes

With APPROX_COUNT_DISTINCT we got an new Aggregation operation in execution plan “SORT AGGREGATE APPROX”.
As compare to tradition approach, we would need to perform GROUP BY and then AGGREGATION operation to serve COUNT and DISTINCT result.

If for any analysis, we just need an estimate of NDV with acceptable variation from actual values.
APPROX_COUNT_DISTINCT is the function to be used

Brief History

The APPROX_COUNT_DISTINCT function was added, but not documented, in Oracle 11g to improve the speed of calculating the number of distinct values when gathering statistics using the DBMS_STATS package. Oracle database 12c (12.1.0.2) now includes the function in the documentation, so we are free to use it in our applications as a supported SQL function.

How to use it

Traditional way

SELECT
COUNT(DISTINCT co11),
COUNT(DISTINCT col2)
FROM EXP_TABLE;

Using this function

SELECT
APPROX_COUNT_DISTINCT(co11),
APPROX_COUNT_DISTINCT(col2),
FROM EXP_TABLE;

You can compare the approximation also and see the difference in performance

Related Articles

New Optimizer Feature with 11g and 12c

Online move of active datafile in 12c

5 Simple (But Important) Things To Remember About Oracle Database 12c views ,parameters and packages

Top Oracle 12c New Features for developers

Very useful 10 new things in 12c database