Monday, September 20, 2021

PostgreSQL: how to use the to_date, to_char and cast conversion functions

Building on the example in my previous post, this is how you can use the to_date and to_char functions for predicates used against both the original, non partitioned table, and the partitioned version of the same.

Query the non-partitioned table, you could first convert to character, and use the desired format. After that you can convert to date:
select *
from myuser.mytable
where to_date(to_char(created_date,'YYYY.MM.DD'),'YYYY.MM.DD') between '2020.01.01' and  '2020.01.31';

To query the partitioned table, you need an an explicit cast to the same data type used by the partitioned key column:
select *
from myuser.mytable
where  cast(created_date as date) between '2020.01.01' and  '2020.01.31';
I realized this difference when the number of rows returned was not what I expected. After I changed my SQL statement to use CAST, the number of rows was exactly what I expected.

PostgreSQL: how to create a partitioned table using a conversion function

Consider the following table:
CREATE TABLE myuser.mytable (
    id integer NOT NULL,
    created_date timestamp without time zone NOT NULL,
    event_json text NOT NULL,
    format_version text NOT NULL,
    rep_id text NOT NULL,
    appversion text NOT NULL
);
Let's say you want to partition this table by RANGE, and you would like to use the column "created_date" which is of data type "TIMESTAMP WITHOUT TIME ZONE".
You could use a conversion function like CAST to convert the column to a partitioned one, like this:
  CREATE TABLE myuser.mytable (
    id integer NOT NULL,
    created_date timestamp without time zone NOT NULL,
    event_json text NOT NULL,
    format_version text NOT NULL,
    rep_id text NOT NULL,
    appversion text NOT NULL
)
partition by range(cast(created_date as date));
Obviously you proceed with adding your partitions the way you normally would in PostgreSQL, for example:
CREATE TABLE myuser.mytable_p_202001
 PARTITION OF myuser.mytable
 FOR VALUES FROM ('2020.01.01') TO ('2020.01.31');

CREATE TABLE myuser.mytable_p_202002
 PARTITION OF myuser.mytable
 FOR VALUES FROM ('2020.02.01') TO ('2020.02.29');

etc

CREATE TABLE myuser.mytable_p_default
 PARTITION OF myuser.mytable
 DEFAULT;

Beware: If you try to create a primary key constraint on the partition key column in this case, you will receive
DETAIL:  PRIMARY KEY constraints cannot be used when partition keys include expressions.

You are of course free to add a normal search index on the column:
CREATE INDEX myidx1 ON myuser.mytable USING btree ( cast (created_date as date) );
Read more about the CAST function in the documentation
Another good source is postgressqltutorial.com

Friday, September 17, 2021

How to solve error in pg_top stating "relation does not exist"

When you execute pg_top, you get to see the top database sessions in a way familiar to most people with experience from Unix environments.



There are many options to pg_top, and one that I wanted to use, was
E       - show execution plan (UPDATE/DELETE safe)
When you press E, you are prompted for a process ID:
Re-determine execution plan: 119692

But I kept getting the message:
Query Plan:

ERROR:  relation "myschema.mytable" does not exist
LINE 4:     from myschema.mytable
                 ^
Hit any key to continue:

Solution: Execute pg_top with database name, host name, port, username and password, like this:
$ pg_top -d mydb -h myhost.mydomain.com -p 5432  -U mysuperuser -W
After I executed pg_top in this manner, the E option worked as exected and displayed the execution plan.

Thursday, September 16, 2021

How to list the contents of a custom format PostgreSQL export file

List the content of a "custom format" export file created by pg_dump by specifying the -l flag:

pg_restore /data1/exports/musicdb_Fc.dmp -l


List the content of a "directory format" export file created by pg_dump by specifying the -l flag. Note that the path should point to the directory in which the toc.dat file resides, in my case /data1/exports/mydir:
pg_restore /data1/exports/mydir/ -l


In both cases, output will be something akin to
;
; Archive created at 2021-09-16 12:51:17 CEST
;     dbname: musicdb
;     TOC Entries: 29
;     Compression: -1
;     Dump Version: 1.13-0
;     Format: CUSTOM
;     Integer: 4 bytes
;     Offset: 8 bytes
;     Dumped from database version: 11.12
;     Dumped by pg_dump version: 11.12
;
;
; Selected TOC Entries:
;
10; 2615 45112854 SCHEMA - music postgres
3764; 0 0 ACL - SCHEMA music postgres
202; 1259 45112865 TABLE music albums music
203; 1259 45112868 TABLE music albums_1974 music
204; 1259 45112871 TABLE music albums_1979 music
201; 1259 45112863 SEQUENCE music albums_album_id_seq music
205; 1259 45112874 TABLE music albums_default music
207; 1259 45112879 TABLE music artists music
206; 1259 45112877 SEQUENCE music artists_artist_id_seq music
209; 1259 45112884 TABLE music bands music
208; 1259 45112882 SEQUENCE music bands_band_id_seq music
200; 1259 45112858 TABLE music songs music
199; 1259 45112856 SEQUENCE music songs_song_id_seq music
3750; 0 45112868 TABLE DATA music albums_1974 music
3751; 0 45112871 TABLE DATA music albums_1979 music
3752; 0 45112874 TABLE DATA music albums_default music
3754; 0 45112879 TABLE DATA music artists music
3756; 0 45112884 TABLE DATA music bands music
3748; 0 45112858 TABLE DATA music songs music
3765; 0 0 SEQUENCE SET music albums_album_id_seq music
3766; 0 0 SEQUENCE SET music artists_artist_id_seq music
3767; 0 0 SEQUENCE SET music bands_band_id_seq music
3768; 0 0 SEQUENCE SET music songs_song_id_seq music
3620; 2606 45112862 CONSTRAINT music songs songs_pkey music

Tuesday, September 14, 2021

How to export and import a schema using the directory format

What exactly is the directory format used by pg_dump? 

 The documentation states: 

  Dumps can be output in script or archive file formats

 and 

The alternative archive file formats must be used with pg_restore to rebuild the database. 

Do they bring any advantages over the plain-text format?

Archive file formats allows pg_restore to

  • be selective about what is restored
  • reorder the items prior to being restored
  • be portable across architectures.
  • be used to examine the contents of an archive
  • use parallel workers during export and import

    The documentation claims that pg_dump used archive file formats and pg_restore provides a flexible archival and transfer mechanism. 

    There are two types of archive file format output:

  • The “custom” format (-Fc)
  • The “directory” format (-Fd)

    Both of these allows for selection and reordering of all archived items, support parallel restoration, and are compressed by default, but note that only the “directory” format supports parallelization of the export utility pg_dump.

    In other words, using pg_dump + pg_restore to move data should be a transition that most seasoned Oracle DBAs will be able to do. The usage of these utilities is a lot similar to Oracle Data Pump.


    Export schema "music" in database "musicdb" using the directory format archive:
    pg_dump -Fd musicdb -n music -v -f /export/musicdbexport -j 2
    I am using two workers indicated by the -j flag.
    You need to point to a valid path for your directory, which is /export. pg_dump will create the subdirectory "musicdbexport" for you, and according to the documentation it should not be created beforehand.

    If you list the path after export, you'll see files like these:
    cd /export/musicdbexport
    total 40K
    drwxr-xr-x 3 postgres postgres 4.0K Sep  7 16:22 ..
    -rw-r--r-- 1 postgres postgres 8.0K Sep  7 16:22 toc.dat
    -rw-r--r-- 1 postgres postgres   53 Sep  7 16:22 3756.dat.gz
    -rw-r--r-- 1 postgres postgres  292 Sep  7 16:22 3754.dat.gz
    -rw-r--r-- 1 postgres postgres   67 Sep  7 16:22 3751.dat.gz
    -rw-r--r-- 1 postgres postgres  142 Sep  7 16:22 3748.dat.gz
    -rw-r--r-- 1 postgres postgres   25 Sep  7 16:22 3750.dat.gz
    drwx------ 2 postgres postgres 4.0K Sep  7 16:22 .
    -rw-r--r-- 1 postgres postgres   88 Sep  7 16:22 3752.dat.gz
    
    Transfer these file, including the toc.dat file, to the destination server.

    Import schema "music" in database "musicdb" using the directory format archive:
    pg_restore /pgdata/export -C -c -d postgres -j 4 -v
    -C means "create the target database" -c means "drop the database objects before creating" -j is the number of jobs started in parallel
    Remeber that the file toc.dat must exist in the directory, otherwise you'll see the error message pg_restore: [archiver] directory "/pgdata/export" does not appear to be a valid archive ("toc.dat" does not exist)

    Remember, you don't have to drop the database first, it's only an option you have. You could also choose to import the schema "music" directly into an existing database:
    pg_restore /pgdata/export -d musicdb -j 4 -v

    If you examine the log aftwards, you'll typically see output like this:
    pg_restore: connecting to database for restore
    
    So postgres is simply connecting to the default database (named "postgres") in order to create the database which schema "music" happens to be exported from.

    Following that, we see how the database "musicdb" is created:
    pg_restore: processing item 3759 ENCODING ENCODING
    pg_restore: processing item 3760 STDSTRINGS STDSTRINGS
    pg_restore: processing item 3761 SEARCHPATH SEARCHPATH
    pg_restore: processing item 3762 DATABASE musicdb
    pg_restore: creating DATABASE "musicdb"
    


    Then the pg_restore connects to the new database and creates the schema in there:
    pg_restore: connecting to new database "musicdb"
    pg_restore: processing item 10 SCHEMA music
    pg_restore: creating SCHEMA "music"
    


    After that, the objects are imported one by one:
    pg_restore: processing item 202 TABLE albums
    pg_restore: creating TABLE "music.albums"
    
    Like schema imports in Oracle, any users that may have grants on the objects being imported must exist beforehand for any GRANT statements to execute successfully.

    In my case, these errors were thrown because I did not take care to create a user called "music" up front:

    pg_restore: [archiver (db)] Error while PROCESSING TOC:
    pg_restore: [archiver (db)] Error from TOC entry 202; 1259 45112865 TABLE albums music
    pg_restore: [archiver (db)] could not execute query: ERROR:  role "music" does not exist
        Command was: ALTER TABLE music.albums OWNER TO music;
    


    pg_dump is documented here
    pg_restore is documenteted here
  • Wednesday, September 8, 2021

    How to analyze a table in PostgreSQL

    Just like Oracle, PostgreSQL stores statistics about the tables in an internal table called pg_statistics. This is used by the query planner to determine the most efficient execution path. To gather fresh statistics on a table, use the following simple syntax:
    \connect mydatabase
    
    analyze verbose myschema.mytable;
    
    Documentation here

    Thursday, August 26, 2021

    What is vacuuming in PostgreSQL and how do I vacuum a table?

    Under the headline "Vacuuming Basics" in the PostgreSQL documentation, you can read amongst other things:
    PostgreSQL's VACUUM command has to process each table on a regular basis for several reasons:
    * To recover or reuse disk space occupied by updated or deleted rows.
    * To update data statistics used by the PostgreSQL query planner.
    * To update the visibility map, which speeds up index-only scans.
    * To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.
    

    Each of these reasons dictates performing VACUUM operations of varying frequency and scope,


    Further explaination can be found together with the syntax schema for VACUUM:

    VACUUM reclaims storage occupied by dead tuples.
    In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.

    Vacuum a specific table in verbose mode:
    psql mydb
    You are now connected to database "mydb" as user "postgres".
    
    vacuum (verbose, analyze) myschema.mytable;