Cardinality
The ratio of distinct values to the number of table rows. A column with only two distinct values in a million-row table would have low cardinality
Selectivity
In a query, the measure of how many rows from a row set pass a predicate test, for example, WHERE last_name = 'Smith'. A selectivity of 0.0 means no rows, whereas a value of 1.0 means all rows. A predicate becomes more selective as the value approaches 0.0 and less selective (or more unselective) as the value approaches 1.0.
Projection
The ability of a SELECT statement to choose a *subset* of the columns in a given table.
In other words, this corresponds to the column listing in the SELECT statement.
Selection
The ability of a SELECT statement to choose a subset of rows in a given table.
In other words, selection corresponds to the WHERE clause of the statement.
Cheat Sheet for Cardinality vs Selectivity
A table with a total of 437 million rows has a column "category" with the following distinct values:
category | count
---------+-----------
0 | 1000
1 | 106755762
3 | 6940
5 | 12449
6 | 330571468
(5 rows)* Definition: Number of distinct values in a column.
* High cardinality: Many unique values (e.g., customer_id).
* Low cardinality: Few unique values (e.g., categories with values {0,1,3,5,6}).
* Impact: High cardinality columns are often good candidates for indexing because they filter well.
Selectivity
- Definition: Fraction of rows that match a condition.
- Formula:
- High selectivity: Condition matches few rows → efficient filtering.
- Low selectivity: Condition matches many rows → poor filtering.
- Example:
category = 1→ 106M / 437M ≈ 0.24 (low selectivity).category = 6→ 330M / 437M ≈ 0.75 (even worse).
Key Difference
- Cardinality is about column diversity.
- Selectivity is about query filtering power.
Rule of Thumb:
- High cardinality + high selectivity → index is very useful.
- Low cardinality + low selectivity → index may not help much unless combined with other columns.