Minimalistic Oracle: optimizer

Showing posts with label optimizer. Show all posts

Wednesday, February 23, 2022

When explaining a query that is accessing a partitioned table, what does the Pstart=KEY or Pstop=KEY indicate?

The Pstart=KEY or Pstop=KEY indicate that the exact partition cannot be determined at compile time, but will most likely be found during run time.

Some earlier step in the plan is producing one or more values for the partition key, so that pruning can take place.

Example: I have a composite partitioned table, with a locally partitioned index:

create table published_documents(
  UNIQUE_ID                   VARCHAR2(160 BYTE) NOT NULL,
  REGYEAR                     NUMBER(18),
  DOCUMENT_TYPE               VARCHAR2(100 CHAR),
  DOCUMENT_NAME               VARCHAR2(1000 CHAR),
  TOPIC                       VARCHAR2(30 CHAR),
  VALID                       CHAR(1 BYTE),
  VERSION                     NUMBER(18),
  DATA_XML                    CLOB,
  FORMAT                      VARCHAR2(1000 CHAR),
  PERIOD                      VARCHAR2(1000 CHAR)
)
PARTITION BY LIST (DOCUMENT_TYPE)
SUBPARTITION BY LIST (PERIOD)
...
);

create index pub_docs_idx1 on published_documents
(regyear, document_type, period)
  local;

Send the following query to the database:

select  document_type, count(*)
from myuser.published_documents
partition(LEGAL)
group by document_type;

The output is as expected:

DOKUMENTTYPE	COUNT(*)
Affidavit	7845
Amending Agreement	29909
Contract	6647

And result in the following execution plan:

-------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     4 |   128 |   195M  (1)| 02:07:06 |       |       |
|   1 |  PARTITION LIST SINGLE|                     |     4 |   128 |   195M  (1)| 02:07:06 |   KEY |   KEY | 
|   2 |   HASH GROUP BY       |                     |     4 |   128 |   195M  (1)| 02:07:06 |       |       |
|   3 |    PARTITION LIST ALL |                      |  2198M|    65G|   195M  (1)| 02:07:03|     1 |   114 |
|   4 |     TABLE ACCESS FULL | PUBLISHED_DOCUMENTS |   2198M|    65G|   195M  (1)| 02:07:03|   KEY |   KEY |
-------------------------------------------------------------------------------------------------------------

When we specifiy a named partition, we can see how the optimzer is limiting its search only to the partition mentioned in the predicate, but it does not yet know how many subpartitions to scan. Since there is no mention of a date range to match the PERIOD column in the predicate, all 114 subpartitions must be scanned.

Note that the text "TABLE ACCESS FULL" in step 4 can be somewhat confusing: we are only talking about a full table access of the partition called "LEGAL", not the the entire table.

In my experience, specifying the partition name directly is rather unusual, and mostely done by DBAs.
Let's try it with a predicate that is more likely to be sent to the oracle server by a user or a batch program:

select dokumenttype, period, count(*)
from myuser.published_documents
where periode = '2018-01'
group by dokumenttype, period;

The output is as expected:

DOKUMENTTYPE	PERIODE	COUNT(*)
Affidavit	2018-01	7845
Amending Agreement	2018-01	29909
Contract	2018-01	6647
Payroll	2018-01	7824
HA_related	2018-01	36608
Banking	2018-01	14167
IT	2018-01	4094

The rows in the output above belongs to many different partitions, but they are all from the period 2018-01.

The explain plan for this query would be:

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                      | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                           |    50 |  1950 |  6589K  (1)| 00:04:18 |       |       |
|   1 |  PARTITION LIST ALL     |                           |    50 |  1950 |  6589K  (1)| 00:04:18 |     1 |    11 |
|   2 |   HASH GROUP BY         |                           |    50 |  1950 |  6589K  (1)| 00:04:18 |       |       |
|   3 |    PARTITION LIST SINGLE|                           |  8122K|   302M|  6589K  (1)| 00:04:18 |       |       |
|*  4 |     INDEX SKIP SCAN     |        PUB_DOCS_IDX1      |  8122K|   302M|  6589K  (1)| 00:04:18 |   KEY |   KEY |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("PERIOD"='2018-01')
       filter("PERIOD"='2018-01')

Here, too, we see that the optimizer first selects all 11 partitions, but then use the partitioned index PUB_DOCS_IDX1 to find the rows that would match the string '2018-01'. The optimizer does not yet know how many index subpartitions to scan; this will be determined during run-time.

Thanks to

Jim Brull

Justin Cave

Wednesday, June 12, 2019

How to use dbms_stats.report_col_usage

set long 10000
select dbms_stats.report_col_usage('SALES', 'ORDERS') from dual;
exit

Output could look something like this:

DBMS_STATS.REPORT_COL_USAGE('SALES','ORDERS')
--------------------------------------------------------------------------------
LEGEND:
.......

EQ         : Used in single table EQuality predicate
RANGE      : Used in single table RANGE predicate
LIKE       : Used in single table LIKE predicate
NULL       : Used in single table is (not) NULL predicate
EQ_JOIN    : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER     : Used in single table FILTER predicate
JOIN       : Used in JOIN predicate

DBMS_STATS.REPORT_COL_USAGE('SALES','ORDERS')
--------------------------------------------------------------------------------
GROUP_BY   : Used in GROUP BY expression
...............................................................................

###############################################################################

COLUMN USAGE REPORT FOR SALES.ORDERS
.......................................................

1. ORDER_DATE                          : EQ RANGE
2. ORDER_ID                            : EQ
3. ITEM_NAME                           : EQ
###############################################################################

This makes it easy to see which columns could benefit from an index.

Tuesday, April 17, 2018

Syntax for index hint

Use /*+ INDEX(table_name index_name) */

For example:

set autot trace exp stat
select  /*+ INDEX(MYTABLE MY_IDX1) */ min(entry_num)
from sales.sales_log
where sold_date >= '05.12.2017';

Which gave me the following plan:

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |               |     1 |    11 |     9   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE                             |               |     1 |    11 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |               |     1 |    11 |     9   (0)| 00:00:01 |     1 |    13 |
|*  3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES         |     1 |    11 |     9   (0)| 00:00:01 |     1 |    13 |
|   4 |     INDEX FULL SCAN                         | MY_IDX1       |     3 |       |     3   (0)| 00:00:01 |     1 |    13 |
-----------------------------------------------------------------------------------------------------------------------------

Line 4 of the explain plan output shows that the index MY_IDX1 is being used.

Tuesday, April 10, 2018

More on finding failed autostats jobs

In a previous post, I showed how to check if an autostats job had succeeded.

Recently I had to unlock table stats on a very large partitioned table, and then lock down older partition stats explicitly, to allow Oracle to gather statistics for the most recent partitions automatically with the auto stats job.

I wanted to know if Oracle was able to analyze the partitions within the designated 4 hours of weekday window.
When you read the code snippet below, please keep in mind that I have defined my weekday windows to start at 23:59 every evening.

I found the following query to be useful:

select distinct 
       opid, 
       target,
       target_type, 
       to_char(start_Time,'dd.mm.yyyy hh24:mi:ss') "starttime",
       to_char(end_time,'dd.mm.yyyy hh24:mi:ss')   "endtime",
       status
from dba_optstat_operation_tasks 
where opid IN (
            SELECT ID
            FROM DBA_OPTSTAT_OPERATIONS
            WHERE OPERATION LIKE 'gather_database_stats (auto)'
            AND TO_CHAR(START_TIME,'dd.mm.yyyy hh24:mi') = '09.04.2018 23:59'
)
and target like '%SALES%';

The result:

OPID	TARGET	TARGET_TYPE	starttime	endtime	STATUS
47815	SH.SALES	TABLE	09.04.2018 23:59:39	10.04.2018 03:25:17	COMPLETED
47815	SH.SALES.P_SALES_DATA_2016	TABLE PARTITION	10.04.2018 03:25:41	10.04.2018 03:59:00	TIMED OUT
47815	SH.SALES.P_SALES_DATA_MAX	TABLE PARTITION	10.04.2018 03:25:33	10.04.2018 03:25:33	COMPLETED

All in all, the autostat job started on April 9th, 2018 at 23:59, had the following outcome:

select status,count(*)
from dba_optstat_operation_tasks 
where opid IN (
            SELECT ID
            FROM DBA_OPTSTAT_OPERATIONS
            WHERE OPERATION LIKE 'gather_database_stats (auto)'
            AND TO_CHAR(START_TIME,'dd.mm.yyyy hh24:mi') = '09.04.2018 23:59'
)
group by status;

STATUS	COUNT(*)
FAILED	1
TIMED OUT	12
COMPLETED	339

From the output you can now proceed to find out why the analysis failed or timed out. The query below will list these objects for you:

select distinct opid, 
       target,
       target_type, 
    to_char(start_Time,'dd.mm.yyyy hh24:mi:ss') "starttime",
    to_char(end_time,'dd.mm.yyyy hh24:mi:ss')   "endtime",
    status
from dba_optstat_operation_tasks 
where opid IN (
            SELECT ID
            FROM DBA_OPTSTAT_OPERATIONS
            WHERE OPERATION LIKE 'gather_database_stats (auto)'
            AND TO_CHAR(START_TIME,'dd.mm.yyyy hh24:mi') = '09.04.2018 23:59'
)
and STATUS IN ('TIMED OUT','FAILED');

Tuesday, February 13, 2018

Why doesn't dbms_stats.set_schema_prefs seem to work?

When trying to alter the default settings for statistics gathering on partitioned tables, I had troubles setting schema-level preferences using dbms_stats.set_schema_prefs:

exec dbms_stats.set_schema_prefs('SCOTT','INCREMENTAL','TRUE');

PL/SQL procedure successfully completed.

But when you check if the setting has been accepted using dbms_stats.get_prefs, it still seem to be set to FALSE:

SELECT  DBMS_STATS.get_prefs(ownname=>'SCOTT',pname=>'INCREMENTAL') "INCREMENTAL" FROM DUAL;

INCREMENTAL
--------------------
FALSE

I found the following note by Maria Colgan:

http://www.nocoug.org/download/2011-11/Maria_Colgan_Optimizer_Statistics.pdf

which states that set_schema_prefs only applies to current objects in the schema and that new objects will pick up global preferences instead.

So according to the note, I was doing the right thing, after all I was trying to alter the default behaviour for the optimizer when working on existing objects.

How come the new setting didn't seem to stick?

In my case, a global change was acceptable, so I could try to set the parameter globally instead, using set_global_prefs:

BEGIN
  dbms_stats.set_global_prefs('INCREMENTAL','TRUE');
END;
/

PL/SQL procedure successfully completed.

After that, the setting certainly seem to stick:

SELECT  DBMS_STATS.get_prefs(ownname=>'SCOTT',pname=>'INCREMENTAL') "INCREMENTAL" FROM DUAL;

INCREMENTAL
--------------------
TRUE

Maria points out that a global change like the one above will apply to all existing objects as well as any new objects.

My conlution is that dbms_stats.get_prefs will report the settings that applies for future objects only, not existing ones. If that is the case, the output of the report is ambiguous, as it will give the DBA the impression that he/she has not made the change permanent.

To view the default settings for a particular schema, see this post.

Oracle Documentation on the subject is here

Tuesday, December 5, 2017

How to execute and capture an SQL statement in the SMB at the same time

An experienced co-worker showed me this neat trick to execute an SQL statement and capture its execution plan in the SQL Plan Management Base at the same time.

/* First, select the bind variables used in the SQL statement you're interested in, into the variable value_string.
Record them in a text file. You will be using them later
*/
SELECT DECODE (value_string, 'NULL,', ''''',', value_string)
  FROM (  SELECT DECODE (datatype_string,
                         'VARCHAR2(32)', '''' || value_string || ''',',
                         value_string || ',')
                    value_string
            FROM v$sql_bind_capture
           WHERE sql_id = '4d6sdzm63st1u' AND child_number = 1
        ORDER BY position);

/*
Get the SQL text including the bind usage (but not the bind variable values)
*/
SET SERVEROUTPUT ON

DECLARE
   l_sql      VARCHAR2 (32000);
   l_sql_id   VARCHAR2 (30) := '4d6sdzm63st1u';
BEGIN
   BEGIN
      SELECT sql_fulltext
        INTO l_sql
        FROM v$sql                                          
       WHERE sql_id = l_sql_id AND ROWNUM < 2;
   EXCEPTION
      WHEN NO_DATA_FOUND
      THEN
         SELECT sql_text
           INTO l_sql
           FROM dba_hist_sqltext
          WHERE sql_id = l_sql_id AND ROWNUM < 2;
   END;

   DBMS_OUTPUT.put_line (l_sql);

/* Capture the baseline for this statement */ 
   EXECUTE IMMEDIATE
      'alter session set optimizer_capture_sql_plan_baselines=true';

/* Send the SQL including the bind variable values to the database */
   EXECUTE IMMEDIATE l_sql
      USING 5022000,
            9834,
            9822,
            9,
            4,
            6;
/* Turn the capture off again */
   EXECUTE IMMEDIATE
      'alter session set optimizer_capture_sql_plan_baselines=false';
END;
/

Wednesday, November 15, 2017

How to find the number of full table scans

From the Oracle Documentation:

The view DBA_HIST_SEG_STAT view displays historical information about segment-level statistics. This view captures the top segments based on a set of criteria and captures information from V$SEGSTAT. The total value is the value of the statistics since instance startup. The delta value is the value of the statistics from the BEGIN_INTERVAL_TIME to the END_INTERVAL_TIME in the DBA_HIST_SNAPSHOT view.

 SELECT O.OBJECT_NAME,O.OWNER, SUM(S.TABLE_SCANS_TOTAL) "# of FTS",(SELECT NUM_ROWS FROM DBA_TABLES T WHERE T.TABLE_NAME = O.OBJECT_NAME) "num rows total"
 FROM DBA_HIST_SEG_STAT S JOIN DBA_OBJECTS O ON (S.OBJ# = O.DATA_OBJECT_ID)
 WHERE O.OWNER NOT IN ( 'SYS','DBSNMP','XDB')
 AND O.OBJECT_TYPE = 'TABLE'
 GROUP BY S.OBJ#,O.OBJECT_NAME,O.OWNER
 ORDER BY 4 DESC;

OBJECT_NAME	OWNER	# of FTS	num rows total
TABLE1	USER1	5702	1583895743
TABLE2	USER1	11	588921781
TABLE3	USER1	6195	241550246

As can be seen from above, the extreme number of rows in TABLE1, coupled with 5702 full table scans, should bring it to your attention as a candidate for partitioning!