Introduction: ============= This file documents an approximate correlation between the data files provided in the !Unicode distribution and the encoding headers in GNU libiconv 1.9.1. Those with '?' in the iconv column either are not represented in iconv or I've missed the relevant header file ;) A number of encodings are present in the iconv distribution but not in !Unicode. These are documented at the end of this file. Changelog: ========== v 0.01 (09-Sep-2004) ~~~~~~~~~~~~~~~~~~~~ Initial Incarnation v 0.02 (11-Sep-2004) ~~~~~~~~~~~~~~~~~~~~ Documented additional encodings supported by the Iconv module. Corrected list of !Unicode deficiencies. !Unicode->iconv: ================ Unicode: iconv: notes: Acorn.Latin1 riscos1.h Apple.CentEuro mac_centraleurope.h Apple.Cyrillic mac_cyrillic.h Apple.Roman mac_roman.h Apple.Ukrainian mac_ukraine.h BigFive big5.h ISO2022.C0.40[ISO646] ? ISO2022.C1.43[IS6429] ? ISO2022.G94.40[646old] iso646_cn.h ISO2022.G94.41[646-GB] ? ISO2022.G94.42[646IRV] ? ISO2022.G94.43[FinSwe] ? ISO2022.G94.47[646-SE] ? ISO2022.G94.48[646-SE] ? ISO2022.G94.49[JS201K] jisx0201.h top of JIS range ISO2022.G94.4A[JS201R] jisx0201.h iso646_jp.h bottom of JIS range ISO2022.G94.4B[646-DE] ? ISO2022.G94.4C[646-PT] ? ISO2022.G94.54[GB1988] ? ISO2022.G94.56[Teltxt] ? ISO2022.G94.59[646-IT] ? ISO2022.G94.5A[646-ES] ? ISO2022.G94.60[646-NO] ? ISO2022.G94.66[646-FR] ? ISO2022.G94.69[646-HU] ? ISO2022.G94.6B[Arabic] ? ISO2022.G94.6C[IS6397] ? ISO2022.G94.7A[SerbCr] ? ISO2022.G94x94.40[JS6226] ? ISO2022.G94x94.41[GB2312] gb2312.h ISO2022.G94x94.42[JIS208] jis0x208.h ISO2022.G94x94.43[KS1001] ksc5601.h ISO2022.G94x94.44[JIS212] jis0x212.h ISO2022.G94x94.47[CNS1] cns11643_1.h the tables differ ISO2022.G94x94.48[CNS2] cns11643_2.h ISO2022.G94x94.49[CNS3] cns11643_3.h ISO2022.G94x94.4A[CNS4] cns11643_4.h ISO2022.G94x94.4B[CNS5] cns11643_5.h ISO2022.G94x94.4C[CNS6] cns11643_6.h ISO2022.G94x94.4D[CNS7] cns11643_7.h ISO2022.G96.41[Lat1] iso8859_1.h ISO2022.G96.42[Lat2] iso8859_2.h ISO2022.G96.43[Lat3] iso8859_3.h ISO2022.G96.44[Lat4] iso8859_4.h ISO2022.G96.46[Greek] ? ISO2022.G96.47[Arabic] iso8859_6.h ISO-8859-6 ignored ISO2022.G96.48[Hebrew] ? ISO2022.G96.4C[Cyrill] ? ISO2022.G96.4D[Lat5] iso8859_5.h ISO2022.G96.50[LatSup] ? ISO2022.G96.52[IS6397] ? ISO2022.G96.54[Thai] tis620.h ISO2022.G96.56[Lat6] iso8859_6.h ISO2022.G96.58[L6Sami] ? ISO2022.G96.59[Lat7] iso8859_7.h ISO2022.G96.5C[Welsh] ? ISO2022.G96.5D[Sami] ? ISO2022.G96.5E[Hebrew] ? ISO2022.G96.5F[Lat8] iso8859_8.h ISO2022.G96.62[Lat9] iso8859_9.h KOI8-R koi8_r.h Microsoft.CP1250 cp1250.h Microsoft.CP1251 cp1251.h Microsoft.CP1252 cp1252.h Microsoft.CP1254 cp1254.h Microsoft.CP866 cp866.h Microsoft.CP932 cp932.h cp932ext.h iconv->!Unicode: ================ Iconv has the following encodings, which are not present in !Unicode. Providing a suitable data file for !Unicode is trivial. Whether UnicodeLib will then act upon the addition of these is unknown. This list is ordered as per libiconv's NOTES file. European & Semitic languages: ISO-8859-16 (iso8859_16.h) KOI8-{U,RU,T} (koi8_xx.h) CP125{3,5,6,7} (cp125n.h) CP850 (cp850.h) CP862 (cp862.h) Mac{Croatian,Romania,Greek,Turkish,Hebrew,Arabic} (mac_foo.h) Japanese: None afaikt. Simplified Chinese: GB18030 (gb18030.h, gb18030ext.h) HZ-GB-2312 (hz.h) Traditional Chinese: CP950 (cp950.h) BIG5-HKSCS (big5hkscs.h) Korean: CP949 (cp949.h) Armenian: ARMSCII-8 (armscii_8.h) Georgian: Georgian-Academy, Georgian-PS (georgian_academy.h, georgian_ps.h) Thai: CP874 (cp874.h) MacThai (mac_thai.h) Laotian: MuleLao-1, CP1133 (mulelao.h, cp1133.h) Vietnamese: VISCII, TCVN (viscii.h, tcvn.h) CP1258 (cp1258.h) Unicode: BE/LE variants of normal encodings. I assume UnicodeLib handles these, but can't be sure. C99 / JAVA - well, yes. Iconv Module: ============= The iconv module is effectively a thin veneer around UnicodeLib. However, 8bit encodings are implemented within the module rather than using the support in UnicodeLib. The rationale for this is simply that, although UnicodeLib will understand (and act upon - reportedly...) additions to the ISO2022 Unicode resource, other encodings are ignored. As the vast majority of outstanding encodings fall into this category, and the code is fairly simple, it made sense to implement it within the module. With use of the iconv module, the list of outstanding encodings is reduced to: CP1255 (requires state-based transcoding) GB18030 (not 8bit - reportedly a requirement of PRC) HZ-GB-2312 (not 8bit - supported by IE4) CP950 (not 8bit - a (MS) variant of Big5) BIG5-HKSCS (not 8bit - again, a Big5 variant) CP949 (not 8bit) ARMSCII-8 (easily implemented, if required) VISCII (easily implemented, if required) CP1258, TCVN (requires state-based transcoding) Additionally, the rest of the CodePage encodings implemented in iconv but not listed above (due to omissions from the iconv documentation) are implemented by the iconv module.