Minimalistic Oracle: selectivity

Sunday, December 13, 2015

Cardinality, Selectivity, Projection and Selection - definitions

From Oracle documentation:

Cardinality

The ratio of distinct values to the number of table rows. A column with only two distinct values in a million-row table would have low cardinality

Selectivity

In a query, the measure of how many rows from a row set pass a predicate test, for example, WHERE last_name = 'Smith'. A selectivity of 0.0 means no rows, whereas a value of 1.0 means all rows. A predicate becomes more selective as the value approaches 0.0 and less selective (or more unselective) as the value approaches 1.0.

Projection

The ability of a SELECT statement to choose a *subset* of the columns in a given table.
In other words, this corresponds to the column listing in the SELECT statement.

Selection

The ability of a SELECT statement to choose a subset of rows in a given table.
In other words, selection corresponds to the WHERE clause of the statement.

Cheat Sheet for Cardinality vs Selectivity

A table with a total of 437 million rows has a column "category" with the following distinct values:

category |   count
---------+-----------
       0 |      1000
       1 | 106755762
       3 |      6940
       5 |     12449
       6 | 330571468
(5 rows)

* Definition: Number of distinct values in a column.

* High cardinality: Many unique values (e.g., customer_id).

* Low cardinality: Few unique values (e.g., categories with values {0,1,3,5,6}).

* Impact: High cardinality columns are often good candidates for indexing because they filter well.

Selectivity

Definition: Fraction of rows that match a condition.
Formula:

$Selectivity = \frac{Rows matching condition}{Total rows}$
High selectivity: Condition matches few rows → efficient filtering.
Low selectivity: Condition matches many rows → poor filtering.
Example:

category = 1 → 106M / 437M ≈ 0.24 (low selectivity).
category = 6 → 330M / 437M ≈ 0.75 (even worse).

Key Difference

Cardinality is about column diversity.
Selectivity is about query filtering power.

Rule of Thumb:

High cardinality + high selectivity → index is very useful.
Low cardinality + low selectivity → index may not help much unless combined with other columns.