Monday, November 21, 2016

How to install and use the character set scanner

Although deprecated since the release of DMU (Database Migration utility for Unicode), there are still cases where you need to run csscan.

One such incident happened to me when migrating a database used for Oracles E-Business Suite.

Some errors were found in DMU, but when clicking on the number indicating the number of rows with errors, the Cleansing Editor showed up empty. To get the rowids of the offending rows, I had to resort to the character scanner.

If not installed do as follows:

1. edit the script $ORACLE_HOME/rdbms/admin/csminst.sql so that it has a valid password, and that its object will be saved in a separate tablespace
create user csmig identified by ***** account lock
/
alter user csmig default tablespace tools quota unlimited on tools
/
2. execute the script as sysdba
sqlplus / as sysdba @csminst.sql

That's it. The csscanner tool is now installed.

A simple example of usage would be when I scanned one - 1 - table.
First, I created a parameter file
userid='sys/**** as sysdba'
table=scott.emp
fromchar=WE8ISO8859P1
tochar=al32utf8
array=4096000
process=4
feedback=1000

Execute it as follows:
csscan parfile=myscan.par

This process creates three files for you:
1. scan.out - Screen log file
2. scan.err - individual exception report. This is where you would typically find your rows that contains errors
3. scan.txt - contains the Database Scan Summary Report.

Here's how my scan.err looked like after I scanned the table mentioned above:
Database Scan Individual Exception Report


[Database Scan Parameters]

Parameter                      Value
------------------------------ ------------------------------------------------
CSSCAN Version                 v2.1
Instance Name                  proddb01
Database Version               11.2.0.4.0
Scan type                      Selective tables
Scan CHAR data?                YES
Database character set         WE8ISO8859P1
FROMCHAR                       WE8ISO8859P1
TOCHAR                         al32utf8
Scan NCHAR data?               NO
Array fetch buffer size        4096000
Number of processes            4
Capture convertible data?      NO
------------------------------ ------------------------------------------------

[Data Dictionary individual exceptions]


[Application data individual exceptions]

User  : SCOTT
Table : EMP
Column: ATTRIBUTE15
Type  : VARCHAR2(150)
Number of Exceptions         : 2
Max Post Conversion Data Size: 153

ROWID              Exception Type      Size Cell Data(first 30 bytes)
------------------ ------------------ ----- ------------------------------
AAS/WmAHwAACI5ZAAC exceed column size   153 
AAS/WmAHwAACI5ZAAD exceed column size   153 
------------------ ------------------ ----- ------------------------------
As expected, the offending rows contained Scandinavian characters, which will use 2 bytes after conversion to a Unicode database. Therefore the text will no longer fit, and actions need to be taken by the data owner before the migration.

For more information, look up the official documentation here

No comments:

Post a Comment