Showing posts with label selectivity. Show all posts
Showing posts with label selectivity. Show all posts

Sunday, December 13, 2015

Cardinality, Selectivity, Projection and Selection - definitions

From Oracle documentation:

Cardinality

The ratio of distinct values to the number of table rows. A column with only two distinct values in a million-row table would have low cardinality

Selectivity

In a query, the measure of how many rows from a row set pass a predicate test, for example, WHERE last_name = 'Smith'. A selectivity of 0.0 means no rows, whereas a value of 1.0 means all rows. A predicate becomes more selective as the value approaches 0.0 and less selective (or more unselective) as the value approaches 1.0.

Projection

The ability of a SELECT statement to choose a *subset* of the columns in a given table.
In other words, this corresponds to the column listing in the SELECT statement.

Selection

The ability of a SELECT statement to choose a subset of rows in a given table.
In other words, selection corresponds to the WHERE clause of the statement.

Cheat Sheet for Cardinality vs Selectivity 

A table with a total of 437 million rows has a column "category" with the following distinct values:

category |   count
---------+-----------
       0 |      1000
       1 | 106755762
       3 |      6940
       5 |     12449
       6 | 330571468
(5 rows)


* Definition: Number of distinct values in a column. 
* High cardinality: Many unique values (e.g., customer_id). 
* Low cardinality: Few unique values (e.g., categories with values {0,1,3,5,6}). 
* Impact: High cardinality columns are often good candidates for indexing because they filter well.

Selectivity

  • Definition: Fraction of rows that match a condition.
  • Formula:

    Selectivity=Rows matching conditionTotal rows

  • High selectivity: Condition matches few rows → efficient filtering.
  • Low selectivity: Condition matches many rows → poor filtering.
  • Example:
    • category = 1 → 106M / 437M ≈ 0.24 (low selectivity).
    • category = 6 → 330M / 437M ≈ 0.75 (even worse).

Key Difference

  • Cardinality is about column diversity.
  • Selectivity is about query filtering power.

Rule of Thumb:

  • High cardinality + high selectivity → index is very useful.
  • Low cardinality + low selectivity → index may not help much unless combined with other columns.