Showing posts with label optimizer. Show all posts
Showing posts with label optimizer. Show all posts

Wednesday, February 23, 2022

When explaining a query that is accessing a partitioned table, what does the Pstart=KEY or Pstop=KEY indicate?

The Pstart=KEY or Pstop=KEY indicate that the exact partition cannot be determined at compile time, but will most likely be found during run time.

Some earlier step in the plan is producing one or more values for the partition key, so that pruning can take place.

Example: I have a composite partitioned table, with a locally partitioned index:
create table published_documents(
  UNIQUE_ID                   VARCHAR2(160 BYTE) NOT NULL,
  REGYEAR                     NUMBER(18),
  DOCUMENT_TYPE               VARCHAR2(100 CHAR),
  DOCUMENT_NAME               VARCHAR2(1000 CHAR),
  TOPIC                       VARCHAR2(30 CHAR),
  VALID                       CHAR(1 BYTE),
  VERSION                     NUMBER(18),
  DATA_XML                    CLOB,
  FORMAT                      VARCHAR2(1000 CHAR),
  PERIOD                      VARCHAR2(1000 CHAR)
)
PARTITION BY LIST (DOCUMENT_TYPE)
SUBPARTITION BY LIST (PERIOD)
...
);

create index pub_docs_idx1 on published_documents
(regyear, document_type, period)
  local;
Send the following query to the database:
select  document_type, count(*)
from myuser.published_documents
partition(LEGAL)
group by document_type;

The output is as expected:
DOKUMENTTYPE COUNT(*)
Affidavit
7845
Amending Agreement
29909
Contract
6647

And result in the following execution plan:
-------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     4 |   128 |   195M  (1)| 02:07:06 |       |       |
|   1 |  PARTITION LIST SINGLE|                     |     4 |   128 |   195M  (1)| 02:07:06 |   KEY |   KEY | 
|   2 |   HASH GROUP BY       |                     |     4 |   128 |   195M  (1)| 02:07:06 |       |       |
|   3 |    PARTITION LIST ALL |                      |  2198M|    65G|   195M  (1)| 02:07:03|     1 |   114 |
|   4 |     TABLE ACCESS FULL | PUBLISHED_DOCUMENTS |   2198M|    65G|   195M  (1)| 02:07:03|   KEY |   KEY |
-------------------------------------------------------------------------------------------------------------
When we specifiy a named partition, we can see how the optimzer is limiting its search only to the partition mentioned in the predicate, but it does not yet know how many subpartitions to scan. Since there is no mention of a date range to match the PERIOD column in the predicate, all 114 subpartitions must be scanned.

Note that the text "TABLE ACCESS FULL" in step 4 can be somewhat confusing: we are only talking about a full table access of the partition called "LEGAL", not the the entire table.

In my experience, specifying the partition name directly is rather unusual, and mostely done by DBAs.
Let's try it with a predicate that is more likely to be sent to the oracle server by a user or a batch program:
select dokumenttype, period, count(*)
from myuser.published_documents
where periode = '2018-01'
group by dokumenttype, period;
The output is as expected:
DOKUMENTTYPE PERIODE COUNT(*)
Affidavit 2018-01
7845
Amending Agreement 2018-01
29909
Contract 2018-01
6647
Payroll 2018-01
7824
HA_related 2018-01
36608
Banking 2018-01
14167
IT 2018-01
4094

The rows in the output above belongs to many different partitions, but they are all from the period 2018-01.

The explain plan for this query would be:
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                      | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                           |    50 |  1950 |  6589K  (1)| 00:04:18 |       |       |
|   1 |  PARTITION LIST ALL     |                           |    50 |  1950 |  6589K  (1)| 00:04:18 |     1 |    11 |
|   2 |   HASH GROUP BY         |                           |    50 |  1950 |  6589K  (1)| 00:04:18 |       |       |
|   3 |    PARTITION LIST SINGLE|                           |  8122K|   302M|  6589K  (1)| 00:04:18 |       |       |
|*  4 |     INDEX SKIP SCAN     |        PUB_DOCS_IDX1      |  8122K|   302M|  6589K  (1)| 00:04:18 |   KEY |   KEY |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("PERIOD"='2018-01')
       filter("PERIOD"='2018-01')
Here, too, we see that the optimizer first selects all 11 partitions, but then use the partitioned index PUB_DOCS_IDX1 to find the rows that would match the string '2018-01'. The optimizer does not yet know how many index subpartitions to scan; this will be determined during run-time.

Thanks to

  • Jim Brull
  • Justin Cave
  • Wednesday, June 12, 2019

    How to use dbms_stats.report_col_usage



    set long 10000
    select dbms_stats.report_col_usage('SALES', 'ORDERS') from dual;
    exit
    

    Output could look something like this:
    DBMS_STATS.REPORT_COL_USAGE('SALES','ORDERS')
    --------------------------------------------------------------------------------
    LEGEND:
    .......
    
    EQ         : Used in single table EQuality predicate
    RANGE      : Used in single table RANGE predicate
    LIKE       : Used in single table LIKE predicate
    NULL       : Used in single table is (not) NULL predicate
    EQ_JOIN    : Used in EQuality JOIN predicate
    NONEQ_JOIN : Used in NON EQuality JOIN predicate
    FILTER     : Used in single table FILTER predicate
    JOIN       : Used in JOIN predicate
    
    DBMS_STATS.REPORT_COL_USAGE('SALES','ORDERS')
    --------------------------------------------------------------------------------
    GROUP_BY   : Used in GROUP BY expression
    ...............................................................................
    
    ###############################################################################
    
    COLUMN USAGE REPORT FOR SALES.ORDERS
    .......................................................
    
    1. ORDER_DATE                          : EQ RANGE
    2. ORDER_ID                            : EQ
    3. ITEM_NAME                           : EQ
    ###############################################################################
    

    This makes it easy to see which columns could benefit from an index.

    Tuesday, April 17, 2018

    Syntax for index hint

    Use /*+ INDEX(table_name index_name) */

    For example:
    set autot trace exp stat
    select  /*+ INDEX(MYTABLE MY_IDX1) */ min(entry_num)
    from sales.sales_log
    where sold_date >= '05.12.2017';
    

    Which gave me the following plan:
    -----------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
    -----------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                            |               |     1 |    11 |     9   (0)| 00:00:01 |       |       |
    |   1 |  SORT AGGREGATE                             |               |     1 |    11 |            |          |       |       |
    |   2 |   PARTITION RANGE ITERATOR                  |               |     1 |    11 |     9   (0)| 00:00:01 |     1 |    13 |
    |*  3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES         |     1 |    11 |     9   (0)| 00:00:01 |     1 |    13 |
    |   4 |     INDEX FULL SCAN                         | MY_IDX1       |     3 |       |     3   (0)| 00:00:01 |     1 |    13 |
    -----------------------------------------------------------------------------------------------------------------------------
    

    Line 4 of the explain plan output shows that the index MY_IDX1 is being used.

    Tuesday, April 10, 2018

    More on finding failed autostats jobs



    In a previous post, I showed how to check if an autostats job had succeeded.

    Recently I had to unlock table stats on a very large partitioned table, and then lock down older partition stats explicitly, to allow Oracle to gather statistics for the most recent partitions automatically with the auto stats job.

    I wanted to know if Oracle was able to analyze the partitions within the designated 4 hours of weekday window.
    When you read the code snippet below, please keep in mind that I have defined my weekday windows to start at 23:59 every evening.

    I found the following query to be useful:

    select distinct 
           opid, 
           target,
           target_type, 
           to_char(start_Time,'dd.mm.yyyy hh24:mi:ss') "starttime",
           to_char(end_time,'dd.mm.yyyy hh24:mi:ss')   "endtime",
           status
    from dba_optstat_operation_tasks 
    where opid IN (
                SELECT ID
                FROM DBA_OPTSTAT_OPERATIONS
                WHERE OPERATION LIKE 'gather_database_stats (auto)'
                AND TO_CHAR(START_TIME,'dd.mm.yyyy hh24:mi') = '09.04.2018 23:59'
    )
    and target like '%SALES%';
    
    The result:

    OPID TARGET TARGET_TYPE starttime endtime STATUS
    47815
    SH.SALES TABLE 09.04.2018 23:59:39 10.04.2018 03:25:17 COMPLETED
    47815
    SH.SALES.P_SALES_DATA_2016 TABLE PARTITION 10.04.2018 03:25:41 10.04.2018 03:59:00 TIMED OUT
    47815
    SH.SALES.P_SALES_DATA_MAX TABLE PARTITION 10.04.2018 03:25:33 10.04.2018 03:25:33 COMPLETED

    All in all, the autostat job started on April 9th, 2018 at 23:59, had the following outcome:
    select status,count(*)
    from dba_optstat_operation_tasks 
    where opid IN (
                SELECT ID
                FROM DBA_OPTSTAT_OPERATIONS
                WHERE OPERATION LIKE 'gather_database_stats (auto)'
                AND TO_CHAR(START_TIME,'dd.mm.yyyy hh24:mi') = '09.04.2018 23:59'
    )
    group by status;
    

    STATUS COUNT(*)
    FAILED
    1
    TIMED OUT
    12
    COMPLETED
    339

    From the output you can now proceed to find out why the analysis failed or timed out. The query below will list these objects for you:

    select distinct opid, 
           target,
           target_type, 
        to_char(start_Time,'dd.mm.yyyy hh24:mi:ss') "starttime",
        to_char(end_time,'dd.mm.yyyy hh24:mi:ss')   "endtime",
        status
    from dba_optstat_operation_tasks 
    where opid IN (
                SELECT ID
                FROM DBA_OPTSTAT_OPERATIONS
                WHERE OPERATION LIKE 'gather_database_stats (auto)'
                AND TO_CHAR(START_TIME,'dd.mm.yyyy hh24:mi') = '09.04.2018 23:59'
    )
    and STATUS IN ('TIMED OUT','FAILED');
    

    Tuesday, February 13, 2018

    Why doesn't dbms_stats.set_schema_prefs seem to work?

    When trying to alter the default settings for statistics gathering on partitioned tables, I had troubles setting schema-level preferences using dbms_stats.set_schema_prefs:

    exec dbms_stats.set_schema_prefs('SCOTT','INCREMENTAL','TRUE');
    
    PL/SQL procedure successfully completed.
    

    But when you check if the setting has been accepted using dbms_stats.get_prefs, it still seem to be set to FALSE:

    SELECT  DBMS_STATS.get_prefs(ownname=>'SCOTT',pname=>'INCREMENTAL') "INCREMENTAL" FROM DUAL;
    
    INCREMENTAL
    --------------------
    FALSE
    

    I found the following note by Maria Colgan:

    http://www.nocoug.org/download/2011-11/Maria_Colgan_Optimizer_Statistics.pdf

    which states that set_schema_prefs only applies to current objects in the schema and that new objects will pick up global preferences instead.

    So according to the note, I was doing the right thing, after all I was trying to alter the default behaviour for the optimizer when working on existing objects.

    How come the new setting didn't seem to stick?

    In my case, a global change was acceptable, so I could try to set the parameter globally instead, using set_global_prefs:

    BEGIN
      dbms_stats.set_global_prefs('INCREMENTAL','TRUE');
    END;
    /
    
    PL/SQL procedure successfully completed.
    
    After that, the setting certainly seem to stick:

    SELECT  DBMS_STATS.get_prefs(ownname=>'SCOTT',pname=>'INCREMENTAL') "INCREMENTAL" FROM DUAL;
    
    INCREMENTAL
    --------------------
    TRUE
    

    Maria points out that a global change like the one above will apply to all existing objects as well as any new objects.

    My conlution is that dbms_stats.get_prefs will report the settings that applies for future objects only, not existing ones. If that is the case, the output of the report is ambiguous, as it will give the DBA the impression that he/she has not made the change permanent.

    To view the default settings for a particular schema, see this post.

    Oracle Documentation on the subject is here

    Tuesday, December 5, 2017

    How to execute and capture an SQL statement in the SMB at the same time

    An experienced co-worker showed me this neat trick to execute an SQL statement and capture its execution plan in the SQL Plan Management Base at the same time.

    /* First, select the bind variables used in the SQL statement you're interested in, into the variable value_string.
    Record them in a text file. You will be using them later
    */
    SELECT DECODE (value_string, 'NULL,', ''''',', value_string)
      FROM (  SELECT DECODE (datatype_string,
                             'VARCHAR2(32)', '''' || value_string || ''',',
                             value_string || ',')
                        value_string
                FROM v$sql_bind_capture
               WHERE sql_id = '4d6sdzm63st1u' AND child_number = 1
            ORDER BY position);
    
    /*
    Get the SQL text including the bind usage (but not the bind variable values)
    */
    SET SERVEROUTPUT ON
    
    DECLARE
       l_sql      VARCHAR2 (32000);
       l_sql_id   VARCHAR2 (30) := '4d6sdzm63st1u';
    BEGIN
       BEGIN
          SELECT sql_fulltext
            INTO l_sql
            FROM v$sql                                          
           WHERE sql_id = l_sql_id AND ROWNUM < 2;
       EXCEPTION
          WHEN NO_DATA_FOUND
          THEN
             SELECT sql_text
               INTO l_sql
               FROM dba_hist_sqltext
              WHERE sql_id = l_sql_id AND ROWNUM < 2;
       END;
    
       DBMS_OUTPUT.put_line (l_sql);
    
    /* Capture the baseline for this statement */ 
       EXECUTE IMMEDIATE
          'alter session set optimizer_capture_sql_plan_baselines=true';
    
    /* Send the SQL including the bind variable values to the database */
       EXECUTE IMMEDIATE l_sql
          USING 5022000,
                9834,
                9822,
                9,
                4,
                6;
    /* Turn the capture off again */
       EXECUTE IMMEDIATE
          'alter session set optimizer_capture_sql_plan_baselines=false';
    END;
    /
    

    Wednesday, November 15, 2017

    How to find the number of full table scans



    From the Oracle Documentation:

    The view DBA_HIST_SEG_STAT view displays historical information about segment-level statistics. This view captures the top segments based on a set of criteria and captures information from V$SEGSTAT. The total value is the value of the statistics since instance startup. The delta value is the value of the statistics from the BEGIN_INTERVAL_TIME to the END_INTERVAL_TIME in the DBA_HIST_SNAPSHOT view.

     SELECT O.OBJECT_NAME,O.OWNER, SUM(S.TABLE_SCANS_TOTAL) "# of FTS",(SELECT NUM_ROWS FROM DBA_TABLES T WHERE T.TABLE_NAME = O.OBJECT_NAME) "num rows total"
     FROM DBA_HIST_SEG_STAT S JOIN DBA_OBJECTS O ON (S.OBJ# = O.DATA_OBJECT_ID)
     WHERE O.OWNER NOT IN ( 'SYS','DBSNMP','XDB')
     AND O.OBJECT_TYPE = 'TABLE'
     GROUP BY S.OBJ#,O.OBJECT_NAME,O.OWNER
     ORDER BY 4 DESC;
    


    OBJECT_NAME OWNER # of FTS num rows total
    TABLE1 USER1
    5702
    1583895743
    TABLE2 USER1
    11
    588921781
    TABLE3 USER1
    6195
    241550246

    As can be seen from above, the extreme number of rows in TABLE1, coupled with 5702 full table scans, should bring it to your attention as a candidate for partitioning!