approx_count_distinct is the new function included in 12c to estimate the distinct values in the column in a faster manner without deviating much from actual values.
It returns approximate number of rows that contain distinct values of expression.
We know that it take considerable time to project desired output using traditional COUNT(DISTINCT) appraoch.
With Oracle 12c (188.8.131.52), we have a function “APPROX_COUNT_DISTINCT” , which is claim to faster then tradition COUNT(DISTINCT <>) approach to get an idea on NDV.
It’s alternative to the COUNT (DISTINCT expr) function, which returns the exact number of rows that contain distinct values of expr.
For processing large amounts of data it’s significantly faster than COUNT, with negligible deviation of values from the exact result.
Statistically, the approx_count_distinct approximations provide a statistically insignificant difference from count distinct, so the approximation is statistically valid.
The APPROX_COUNT_DISTINCT() function ignores records that contain a null value for the expression. Plus is performs less work on the sorting and aggregations
How it works
In a traditional count distinct, Oracle’s read consistency mechanism is invoked, causing a large time lag when counting the number of distinct values in a very large table. Also, as the number of distinct values increase, the elapsed time and memory usage of the count distinct increases drastically.
In contrast, the approx_count_distinct bypasses the read consistency mechanism and give a fast and relatively accurate approximation of the number of distinct values in a table column.
With APPROX_COUNT_DISTINCT we got an new Aggregation operation in execution plan “SORT AGGREGATE APPROX”.
As compare to tradition approach, we would need to perform GROUP BY and then AGGREGATION operation to serve COUNT and DISTINCT result.
If for any analysis, we just need an estimate of NDV with acceptable variation from actual values.
APPROX_COUNT_DISTINCT is the function to be used
The APPROX_COUNT_DISTINCT function was added, but not documented, in Oracle 11g to improve the speed of calculating the number of distinct values when gathering statistics using the DBMS_STATS package. Oracle database 12c (184.108.40.206) now includes the function in the documentation, so we are free to use it in our applications as a supported SQL function.
How to use it
Using this function
You can compare the approximation also and see the difference in performance