Tuesday, September 14, 2021

How to export and import a schema using the directory format

What exactly is the directory format used by pg_dump? 

 The documentation states: 

  Dumps can be output in script or archive file formats

 and 

The alternative archive file formats must be used with pg_restore to rebuild the database. 

Do they bring any advantages over the plain-text format?

Archive file formats allows pg_restore to

  • be selective about what is restored
  • reorder the items prior to being restored
  • be portable across architectures.
  • be used to examine the contents of an archive
  • use parallel workers during export and import

    The documentation claims that pg_dump used archive file formats and pg_restore provides a flexible archival and transfer mechanism. 

    There are two types of archive file format output:

  • The “custom” format (-Fc)
  • The “directory” format (-Fd)

    Both of these allows for selection and reordering of all archived items, support parallel restoration, and are compressed by default, but note that only the “directory” format supports parallelization of the export utility pg_dump.

    In other words, using pg_dump + pg_restore to move data should be a transition that most seasoned Oracle DBAs will be able to do. The usage of these utilities is a lot similar to Oracle Data Pump.


    Export schema "music" in database "musicdb" using the directory format archive:
    pg_dump -Fd musicdb -n music -v -f /export/musicdbexport -j 2
    I am using two workers indicated by the -j flag.
    You need to point to a valid path for your directory, which is /export. pg_dump will create the subdirectory "musicdbexport" for you, and according to the documentation it should not be created beforehand.

    If you list the path after export, you'll see files like these:
    cd /export/musicdbexport
    total 40K
    drwxr-xr-x 3 postgres postgres 4.0K Sep  7 16:22 ..
    -rw-r--r-- 1 postgres postgres 8.0K Sep  7 16:22 toc.dat
    -rw-r--r-- 1 postgres postgres   53 Sep  7 16:22 3756.dat.gz
    -rw-r--r-- 1 postgres postgres  292 Sep  7 16:22 3754.dat.gz
    -rw-r--r-- 1 postgres postgres   67 Sep  7 16:22 3751.dat.gz
    -rw-r--r-- 1 postgres postgres  142 Sep  7 16:22 3748.dat.gz
    -rw-r--r-- 1 postgres postgres   25 Sep  7 16:22 3750.dat.gz
    drwx------ 2 postgres postgres 4.0K Sep  7 16:22 .
    -rw-r--r-- 1 postgres postgres   88 Sep  7 16:22 3752.dat.gz
    
    Transfer these file, including the toc.dat file, to the destination server.

    Import schema "music" in database "musicdb" using the directory format archive:
    pg_restore /pgdata/export -C -c -d postgres -j 4 -v
    -C means "create the target database" -c means "drop the database objects before creating" -j is the number of jobs started in parallel
    Remeber that the file toc.dat must exist in the directory, otherwise you'll see the error message pg_restore: [archiver] directory "/pgdata/export" does not appear to be a valid archive ("toc.dat" does not exist)

    Remember, you don't have to drop the database first, it's only an option you have. You could also choose to import the schema "music" directly into an existing database:
    pg_restore /pgdata/export -d musicdb -j 4 -v

    If you examine the log aftwards, you'll typically see output like this:
    pg_restore: connecting to database for restore
    
    So postgres is simply connecting to the default database (named "postgres") in order to create the database which schema "music" happens to be exported from.

    Following that, we see how the database "musicdb" is created:
    pg_restore: processing item 3759 ENCODING ENCODING
    pg_restore: processing item 3760 STDSTRINGS STDSTRINGS
    pg_restore: processing item 3761 SEARCHPATH SEARCHPATH
    pg_restore: processing item 3762 DATABASE musicdb
    pg_restore: creating DATABASE "musicdb"
    


    Then the pg_restore connects to the new database and creates the schema in there:
    pg_restore: connecting to new database "musicdb"
    pg_restore: processing item 10 SCHEMA music
    pg_restore: creating SCHEMA "music"
    


    After that, the objects are imported one by one:
    pg_restore: processing item 202 TABLE albums
    pg_restore: creating TABLE "music.albums"
    
    Like schema imports in Oracle, any users that may have grants on the objects being imported must exist beforehand for any GRANT statements to execute successfully.

    In my case, these errors were thrown because I did not take care to create a user called "music" up front:

    pg_restore: [archiver (db)] Error while PROCESSING TOC:
    pg_restore: [archiver (db)] Error from TOC entry 202; 1259 45112865 TABLE albums music
    pg_restore: [archiver (db)] could not execute query: ERROR:  role "music" does not exist
        Command was: ALTER TABLE music.albums OWNER TO music;
    


    pg_dump is documented here
    pg_restore is documenteted here
  • No comments:

    Post a Comment