In this post, we will see the important topic of histograms in Optimizer stats

## What is a histogram in Oracle

- A histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column.
- By gathering histogram data, the CBO can make improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with non-uniform data distributions.
- It is a graphical representation of the frequency distribution of the data by means of rectangles whose widths represent class intervals and heights represent frequencies. It is the same which we study in class 10 maths
- A histogram sorts values into “buckets,” as you might sort coins into buckets
- The histogram approach provides an efficient and compact way to represent data distributions.
- it is not much use if the column data is uniformly distributed

## Types of Histograms

With 12c we have four types of Histograms available

Frequency histograms

Height-Balanced histograms

top frequency histograms

Hybrid histograms

## How Oracle decides about the types of Histograms

- Based on the number of histogram buckets given and distinct values and the distribution of the data, the database chooses the type of histogram to create.
- In some cases, when creating a histogram, the database samples an internally predetermined number of rows.

## Height Balanced Histograms

- These are legacy types of Histograms
- Height-balanced histograms put approximately the same number of values into each interval so that the endpoints of the interval are determined by the number of values in that interval.
- Only the last (largest) values in each bucket appear as bucket (endpoint) values.
- A height-balanced histogram will be created if the number of histogram buckets (SIZE) indicates a value smaller than the number of distinct values in the column

## Frequency Histograms

- It is created when the number of histogram buckets (SIZE) specified is greater than or equal to the number of distinct column values.
- Here all the individual values in the column have a corresponding bucket, and the bucket number reflects the repetition count of each value.
- An analogy to a frequency histogram is sorting coins so that each individual coin initially gets its own bucket. For example, the first penny is in bucket 1, the second penny is in bucket 2, the first nickel is in bucket 3, and so on. You then consolidate all the pennies into a single penny bucket, all the nickels into a single nickel bucket, and so on with the remainder of the coins

## Top Frequency Histograms

- It is a variation on a frequency histogram that ignores nonpopular values that are statistically insignificant.
- The histogram is created using a small number of extremely popular distinct values
- A top-frequency histogram can produce a better histogram for highly popular values.

## Hybrid histograms

- Hybrid histograms combine the characteristics of both height-based histograms and frequency histograms.
- A hybrid histogram distributes values so that no value occupies more than one bucket, and then stores the endpoint repeat count value, which is the number of times the endpoint value is repeated, for each endpoint (bucket) in the histogram. By using the repeat count, the optimizer can obtain accurate estimates for almost popular values.

## How to check the type of histogram created

select COLUMN_NAME,NUM_DISTINCT,NUM_BUCKETS,HISTOGRAM from DBA_TAB_COL_STATISTICS where table_name='&table_name' and owner='&table_owner' order by column_name;

## Leave a Reply