Tuesday, April 5, 2016

MS word characters that have no representation in WE8ISO8859P1 and WE8ISO8859P15

During a globalization effort, I found the following interesting information about the difference between the WE8MSWIN1252 and the WE8ISO8859P15 character set

* 27 codepoints are NOT defined/used in WE8ISO8859P15 but are filled in / used in WE8MSWIN1252

(note that the WE8MSWIN1252 codepoints

* 91 = U+2018 : LEFT SINGLE QUOTATION MARK
* 92 = U+2019 : RIGHT SINGLE QUOTATION MARK
* 93 = U+201C : LEFT DOUBLE QUOTATION MARK
* 94 = U+201D : RIGHT DOUBLE QUOTATION MARK

are the default quotation marks of the Microsoft Word product, so if you have data that comes from Microsoft Office products you *need* a WE8MSWIN1252 database characterset.

Make sure you clients NLS_LANG is also correct: WE8ISO8859P15 is *not* correct as NLS_LANG for windows clients
Note 179133.1 The correct NLS_LANG in a Windows Environment


A more common problem is that in an environment using English and West European or Latin American ( French, Spanish, Portuguese, Dutch, Italian,...) windows clients. a lot of setups use a NLS_LANG set to WE8ISO8859P15 on the client side. For windows systems this is not correct and provokes in most cases that there are actually WE8MSWIN1252 codes stored in the WE8ISO8859P15 database. The most commonly seen characters are the € symbol and these qoutes: ‘’“” - these are the 1252 "smart qoutes" used in Microsoft Office. They look similar to the "normal" US7ASCII qoute " in most fonts, but are different characters often and result in confusion. The Courrier New font for example distinct them quite good visibly.


So watch out for cut-n-paste errors based on MS Word documents! They often result in characters that have no representation under the most commonly used non-Unicode character set.

No comments:

Post a Comment