I have been involved in several migration projects where we had to deal with Western European characters, which have invalid representation in the database, and needs to be converted.
Like I showed in my post
about usage of the chr function, you sometimes need to find the decimal and/or hexadecimal value for a particular character.
But how can you verify that your operating system actually agrees; how would your Operating System translate a character passed from a terminal, e.g. putty?
First, make sure your putty terminal has its translation-settings set to the character set of the server you are logging into: Right-click your putty terminal upper frame, and pick "Change Settings" from the menu. Go to the "translation" settings and then select the correct character set from the drop-down menu "Remote character set".
On Linux platforms, a simple way to check how a character would be translated would be to use the hexdump utility. Thanks to
Gray Watson for demonstrating this!
man hexdump
The hexdump utility is a filter which displays the specified files, or the standard input, if no files are specified, in a user specified format.
Let's try to hexdump the character
ø, and see what the internal hexadecimal representation is:
echo "ø" | hexdump -v -e '"x" 1/1 "%02X" " "'
xC3 xB8 x0A
The prefix x in front of the values represents hexadecimal values, so the important part here is "C3 and B8" - in a multibyte character set this represent the Scandinavian character
ø ( I must admit, I never figured out what the
0A represents. Anyone?)
Another way is to use the "od" utility:
man od
od - dump files in octal and other formats
-x same as -t x2, select hexadecimal 2-byte units
Using the -x flag straight off will give the hexadecimal value in 2-byte units:
echo "ø" | od -x
0000000 b8c3 000a
0000003
This time, the values are cast around, and should be read backwards for meaning. I have not found an explanation to why od does this. Anyone?
However, if you use the -t x
notation instead,:
echo "ø" | od -t x1
0000000 c3 b8 0a
0000003
The values come out as a DBA expects; c3b8 corresponds to decimal value 50104 which in turn represent the Scandinavian letter ø.
( And again, I never figured out what the 0a represents. Anyone?)