Minimalistic Oracle

Wednesday, February 23, 2022

When explaining a query that is accessing a partitioned table, what does the Pstart=KEY or Pstop=KEY indicate?

The Pstart=KEY or Pstop=KEY indicate that the exact partition cannot be determined at compile time, but will most likely be found during run time.

Some earlier step in the plan is producing one or more values for the partition key, so that pruning can take place.

Example: I have a composite partitioned table, with a locally partitioned index:

create table published_documents(
  UNIQUE_ID                   VARCHAR2(160 BYTE) NOT NULL,
  REGYEAR                     NUMBER(18),
  DOCUMENT_TYPE               VARCHAR2(100 CHAR),
  DOCUMENT_NAME               VARCHAR2(1000 CHAR),
  TOPIC                       VARCHAR2(30 CHAR),
  VALID                       CHAR(1 BYTE),
  VERSION                     NUMBER(18),
  DATA_XML                    CLOB,
  FORMAT                      VARCHAR2(1000 CHAR),
  PERIOD                      VARCHAR2(1000 CHAR)
)
PARTITION BY LIST (DOCUMENT_TYPE)
SUBPARTITION BY LIST (PERIOD)
...
);

create index pub_docs_idx1 on published_documents
(regyear, document_type, period)
  local;

Send the following query to the database:

select  document_type, count(*)
from myuser.published_documents
partition(LEGAL)
group by document_type;

The output is as expected:

DOKUMENTTYPE	COUNT(*)
Affidavit	7845
Amending Agreement	29909
Contract	6647

And result in the following execution plan:

-------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     4 |   128 |   195M  (1)| 02:07:06 |       |       |
|   1 |  PARTITION LIST SINGLE|                     |     4 |   128 |   195M  (1)| 02:07:06 |   KEY |   KEY | 
|   2 |   HASH GROUP BY       |                     |     4 |   128 |   195M  (1)| 02:07:06 |       |       |
|   3 |    PARTITION LIST ALL |                      |  2198M|    65G|   195M  (1)| 02:07:03|     1 |   114 |
|   4 |     TABLE ACCESS FULL | PUBLISHED_DOCUMENTS |   2198M|    65G|   195M  (1)| 02:07:03|   KEY |   KEY |
-------------------------------------------------------------------------------------------------------------

When we specifiy a named partition, we can see how the optimzer is limiting its search only to the partition mentioned in the predicate, but it does not yet know how many subpartitions to scan. Since there is no mention of a date range to match the PERIOD column in the predicate, all 114 subpartitions must be scanned.

Note that the text "TABLE ACCESS FULL" in step 4 can be somewhat confusing: we are only talking about a full table access of the partition called "LEGAL", not the the entire table.

In my experience, specifying the partition name directly is rather unusual, and mostely done by DBAs.
Let's try it with a predicate that is more likely to be sent to the oracle server by a user or a batch program:

select dokumenttype, period, count(*)
from myuser.published_documents
where periode = '2018-01'
group by dokumenttype, period;

The output is as expected:

DOKUMENTTYPE	PERIODE	COUNT(*)
Affidavit	2018-01	7845
Amending Agreement	2018-01	29909
Contract	2018-01	6647
Payroll	2018-01	7824
HA_related	2018-01	36608
Banking	2018-01	14167
IT	2018-01	4094

The rows in the output above belongs to many different partitions, but they are all from the period 2018-01.

The explain plan for this query would be:

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                      | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                           |    50 |  1950 |  6589K  (1)| 00:04:18 |       |       |
|   1 |  PARTITION LIST ALL     |                           |    50 |  1950 |  6589K  (1)| 00:04:18 |     1 |    11 |
|   2 |   HASH GROUP BY         |                           |    50 |  1950 |  6589K  (1)| 00:04:18 |       |       |
|   3 |    PARTITION LIST SINGLE|                           |  8122K|   302M|  6589K  (1)| 00:04:18 |       |       |
|*  4 |     INDEX SKIP SCAN     |        PUB_DOCS_IDX1      |  8122K|   302M|  6589K  (1)| 00:04:18 |   KEY |   KEY |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("PERIOD"='2018-01')
       filter("PERIOD"='2018-01')

Here, too, we see that the optimizer first selects all 11 partitions, but then use the partitioned index PUB_DOCS_IDX1 to find the rows that would match the string '2018-01'. The optimizer does not yet know how many index subpartitions to scan; this will be determined during run-time.

Thanks to

Jim Brull

Justin Cave

Thursday, February 3, 2022

Observation: rman saves files according to end date of backup

I noticed the following: An archivelog file backed up at 1.01.2022 23:57:18 will NOT be saved in the folder for 31.01.2022. Instead, it will be saved in the folder for 01.02.2022.
Output from RMAN:

 list archivelog from time '31.01.2022' until time '01.02.2022';

Output (excerpt)

450706  1    450701  A 31.01.2022 23:56:46
        Name: /u04/fra/proddb01/archivelog/2022_01_31/o1_mf_1_450701__vwojnvs6_.arc

450707  1    450702  A 31.01.2022 23:57:16
        Name: /u04/fra/proddb01/archivelog/2022_01_31/o1_mf_1_450702__vwokkx0p_.arc

450708  1    450703  A 31.01.2022 23:57:18
        Name: /u04/fra/proddb01/archivelog/2022_02_01/o1_mf_1_450703__vx4cmycs_.arc

The file /u04/fra/proddb01/archivelog/2022_02_01/o1_mf_1_450703__vx4cmycs_.arc has the timestamp Feb 1 00:05
So in this case, the last file generated on 31.01 actually ended up in the folder for files generated on the 01.02

Monday, January 31, 2022

Simple SQL to list

The following SQL lists the indexes defined on a table, along with the columns and their positioning:

SELECT I.INDEX_NAME,I.INDEX_TYPE,I.NUM_ROWS,I.DEGREE, C.COLUMN_NAME,C.COLUMN_POSITION
FROM DBA_INDEXES I JOIN DBA_IND_COLUMNS C
ON (I.INDEX_NAME = C.INDEX_NAME)
WHERE I.OWNER='MYSCHEMA'
AND I.OWNER = C.INDEX_OWNER
AND I.TABLE_NAME='MYTABLE'
ORDER BY I.INDEX_NAME, C.COLUMN_POSITION;

Wednesday, January 5, 2022

How to set a dynamic parameter in a postgreSQL database cluster

As with oracle, some parameters may be set dynamically in a postgreSQL database cluster. A postgreSQL database cluster uses a parameter file called postgres.conf. This file holds the cluster wide parameters. If you set a dynamic parameter using the ALTER SYSTEM SET command, the parameter will be written to yet another file called postgres.auto.conf, which values will always override the ones parameters in the postgres.conf Before the change, postgres.auto.conf look like this:

log_line_prefix = '[%m] – %p %q- %u@%h:%d – %a'
wal_level = 'replica'
hot_standby = 'on'

I then make a change to the system configuration:

alter system set hot_standby_feedback=on;
ALTER SYSTEM

After this change, the file postgres.auto.conf has another entry:

log_line_prefix = '[%m] – %p %q- %u@%h:%d – %a'
wal_level = 'replica'
hot_standby = 'on'
hot_standby_feedback = 'on'

I will then have to reload the database using the function pg_reload_conf() for the new parameter to take effect:

postgres=#  select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

The current logfile for the postgreSQL database cluster records this fact:

[2022-01-03 14:45:23.127 CET] – 1683 LOG:  received SIGHUP, reloading configuration files
[2022-01-03 14:45:23.129 CET] – 1683 LOG:  parameter "hot_standby_feedback" changed to "on"

For details, check the documentation

How to find out if a hot standby postgres database is receiving logs

 select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

https://www.postgresql.org/docs/11/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE

Wednesday, December 22, 2021

What is the difference between "force parallel" and "enable parallel" used in the "alter session" statement in Oracle?

What is the difference between these two statements?

ALTER SESSION ENABLE PARALLEL DML | DDL | QUERY;

and

ALTER SESSION FORCE PARALLEL DDL | DML | QUERY;

Answer:

The difference here lays in the details: the ENABLE statement merely enables parallelization using a concrete parallel directive or parallel hint. If this is not specified, Oracle will execute the statements sequenctually. The FORCE statement will parallelize everything it can with the default DOP (degree of parallelism), without you having to state anyting about this in your DML | DDL or query statements.

If the default DOP isn't good enough for you (for example during an index rebuild), you can force your session to use a DOP higher than the default, like this:

ALTER SESSION FORCE PARALLEL DDL | DML | QUERY PARALLEL 32;

This will override any other DOP in the same session and use 32 parallel workers.

Alter session in 19c is documentet here
The concept of forcing/enabling parallelization is explained here

Tuesday, December 21, 2021

How to work around error [INS-08101] Unexpected error while executing the action at state: ‘supportedOSCheck’

Thanks so much to Martin Berger for his blog post showing how to work around an error that shows up when you attempt to install Oracle 19c software on a RHEL8 distribution.

I wanted to install Oracle software on a RH8.5 Linux server:

cat /etc/redhat-release
Red Hat Enterprise Linux release 8.5 (Ootpa)

This is the error I received:

In my case, the only thing I had to do was to add another environmental variable:

export CV_ASSUME_DISTID=OEL7.8

Execute the installer again and you will see a different screen:

./runInstaller