From f8d8287cdbd7da9cd9392bcddf04860a10fa598e Mon Sep 17 00:00:00 2001 From: John Mark Bell Date: Mon, 10 Nov 2008 18:43:09 +0000 Subject: Import Iconv sources svn path=/trunk/iconv/; revision=5677 --- doc/API | 132 +++++++++++++++++++++++++++++++++++++ doc/ChangeLog | 71 ++++++++++++++++++++ doc/Uni->iconv | 205 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 408 insertions(+) create mode 100644 doc/API create mode 100644 doc/ChangeLog create mode 100644 doc/Uni->iconv (limited to 'doc') diff --git a/doc/API b/doc/API new file mode 100644 index 0000000..13fa22f --- /dev/null +++ b/doc/API @@ -0,0 +1,132 @@ +Iconv Module API +================ + +If using C, then you really should be using the libiconv stubs provided +(or UnixLib, if appropriate). See the iconv.h header file for further +documentation of these calls. + +Iconv_Open (&57540) +------------------- + + Create a conversion descriptor + + On Entry: r0 -> string containing name of destination encoding (eg "UTF-8") + r1 -> string containing name of source encoding (eg "CP1252") + + On Exit: r0 = conversion descriptor + All others preserved + + Either encoding name may have a number of parameters appended to them. + Parameters are separated by a pair of forward-slashes ("//"). + Currently defined parameters are: + + Parameter: Destination: Source: + + TRANSLIT Transliterate unrepresentable None + output. + + The conversion descriptor is an opaque value. The user should not, + therefore, assume anything about its meaning, nor modify it in any way. + Doing so is guaranteed to result in undefined behaviour. + + +Iconv_Iconv (&57541) +-------------------- + + This SWI is deprecated and Iconv_Convert should be used instead. + + +Iconv_Close (&57542) +-------------------- + + Destroy a conversion descriptor + + On Entry: r0 = conversion descriptor to destroy + + On Exit: r0 = 0 + All others preserved + + +Iconv_Convert (&57543) +--------------------- + + Convert a byte sequence to another encoding + + On Entry: r0 = conversion descriptor returned by Iconv_Open + r1 -> input buffer (or NULL to reset encoding context) + r2 = length of buffer pointed to by r1 + r3 -> output buffer + r4 = length of buffer pointed to by r3 + + On Exit: r0 = number of non-reversible conversions performed (always 0) + r1 -> updated input buffer pointer (after last input read) + r2 = number of bytes remaining in input buffer + r3 -> updated output buffer pointer (i.e. end of output) + r4 = number of free bytes in the output buffer + All others preserved + + Note that all strings should be NUL-terminated so, if calling from BASIC, + some terminating character munging may be needed. + + +Errors: + +Should an error occur, the SWI will return with V set and r0 -> error buffer. +Note that only the error number will be filled in and may be one of: + + ICONV_NOMEM (&81b900) + ICONV_INVAL (&81b901) + ICONV_2BIG (&81b902) + ICONV_ILSEQ (&81b903) + +These map directly to the corresponding C errno values. + + +Iconv_CreateMenu (&57544) +------------------------- + + Create a menu data structure containing all available encodings. + + On Entry: r0 = flags. All bits reserved, must be 0 + r1 -> buffer, or 0 to read required length + r2 = length of buffer in r1 + r3 -> currently selected encoding name, or 0 if none selected + r4 -> buffer for indirected data, or 0 to read length + r5 = length of buffer in r4 + + On Exit: r2 = required size of buffer in r1 if r1 = 0 on entry, + or length of data placed in buffer + r5 = required size of buffer in r4 if r4 = 0 on entry, + or length of data placed in buffer + + Menu titles are direct form text buffers. Menu entries are indirect text. + Entry text is stored in the buffer pointed to by R4 on entry to this call. + + +Iconv_DecodeMenu (&57545) +------------------------- + + Decode a selection in a menu generated by Iconv_CreateMenu. + Places the corresponding encoding name in the result buffer. + + On Entry: r0 = flags. All bits reserved, must be 0 + r1 -> menu definition + r2 -> menu selections, as per Wimp_Poll + r3 -> buffer for result or 0 to read required length + r4 = buffer length + + On Exit: r4 = required size of buffer if r3 = 0 on entry, + or length of data placed in buffer (0 if no selected + encoding) + + The menu selections block pointed to by r2 on entry should be based at + the root of the encodings menu structure (i.e. index 0 in the block + should correspond to the selection in the main encoding menu). + + This call will update the selection status of the menu(s) appropriately. + + +Example Code: +============= + +Example code may be found in the IconvEg BASIC file. diff --git a/doc/ChangeLog b/doc/ChangeLog new file mode 100644 index 0000000..96f5924 --- /dev/null +++ b/doc/ChangeLog @@ -0,0 +1,71 @@ +Iconv Changelog +=============== + +0.01 10-Sep-2004 +---------------- + + - Initial version - unreleased. + +0.02 27-Sep-2004 +---------------- + + - Use allocated SWI & error chunks. + - Fix issues in 8bit encoding handling. + - First public release. + +0.03 22-Jan-2005 +---------------- + + - Add Iconv_Convert SWI with improved interface. + - Deprecate Iconv_Iconv SWI. + - Add encoding name alias handling. + - Bundle !Unicode resource. + +0.04 08-Apr-2005 +---------------- + + - Improve parameter checking. + - Fix potential memory leaks. + - Add encoding menu creation and selection handling. + +0.05 27-Jun-2005 +---------------- + + - Improve encoding alias support, using external data file. + - Add StubsG build for A9home users. + +0.06 05-Nov-2005 +---------------- + + - Modified menu creation API to store indirected text in a + user-provided buffer. This change is backwards incompatible. + +0.07 11-Feb-2006 +---------------- + + - Corrected output values for E2BIG errors. + - Fixed input pointer update after successful conversion. + +0.08 11-Mar-2007 +---------------- + + - Tightened up parameter checking in various places. + - Improve aliases hash function. + - Make 8bit write function's return values match encoding_write + with encoding_WRITE_STRICT set. + - Fix bug in 8bit writing which resulted in the remaining buffer + size being reduced even if nothing was written. + - Improve support for endian-specific Unicode variants. + - Work around issue in UnicodeLib where remaining buffer size is + reduced if an attempt is made to write an unrepresentable character. + - Add rudimentary //TRANSLIT support - simply replaces with '?' for now. + - Make UnicodeLib handle raw ISO-8859-{1,2,9,10,15} and not attempt + ISO-6937-2-25 shift sequences. + - Remove StubsG build as A9home now has a C99 capable C library. + - Overhaul documentation. + +0.09 XX-XX-2008 +--------------- + + - Restructured source tree into cross-platform and RO-specific parts + - diff --git a/doc/Uni->iconv b/doc/Uni->iconv new file mode 100644 index 0000000..f10b6c7 --- /dev/null +++ b/doc/Uni->iconv @@ -0,0 +1,205 @@ +Introduction: +============= + +This file documents an approximate correlation between the data files +provided in the !Unicode distribution and the encoding headers in GNU +libiconv 1.9.1. + +Those with '?' in the iconv column either are not represented in iconv +or I've missed the relevant header file ;) + +A number of encodings are present in the iconv distribution but not +in !Unicode. These are documented at the end of this file. + +Changelog: +========== + +v 0.01 (09-Sep-2004) +~~~~~~~~~~~~~~~~~~~~ +Initial Incarnation + +v 0.02 (11-Sep-2004) +~~~~~~~~~~~~~~~~~~~~ +Documented additional encodings supported by the Iconv module. +Corrected list of !Unicode deficiencies. + + +!Unicode->iconv: +================ + +Unicode: iconv: notes: + +Acorn.Latin1 riscos1.h + +Apple.CentEuro mac_centraleurope.h +Apple.Cyrillic mac_cyrillic.h +Apple.Roman mac_roman.h +Apple.Ukrainian mac_ukraine.h + +BigFive big5.h + +ISO2022.C0.40[ISO646] ? + +ISO2022.C1.43[IS6429] ? + +ISO2022.G94.40[646old] iso646_cn.h +ISO2022.G94.41[646-GB] ? +ISO2022.G94.42[646IRV] ? +ISO2022.G94.43[FinSwe] ? +ISO2022.G94.47[646-SE] ? +ISO2022.G94.48[646-SE] ? +ISO2022.G94.49[JS201K] jisx0201.h top of JIS range +ISO2022.G94.4A[JS201R] jisx0201.h iso646_jp.h bottom of JIS range +ISO2022.G94.4B[646-DE] ? +ISO2022.G94.4C[646-PT] ? +ISO2022.G94.54[GB1988] ? +ISO2022.G94.56[Teltxt] ? +ISO2022.G94.59[646-IT] ? +ISO2022.G94.5A[646-ES] ? +ISO2022.G94.60[646-NO] ? +ISO2022.G94.66[646-FR] ? +ISO2022.G94.69[646-HU] ? +ISO2022.G94.6B[Arabic] ? +ISO2022.G94.6C[IS6397] ? +ISO2022.G94.7A[SerbCr] ? + +ISO2022.G94x94.40[JS6226] ? +ISO2022.G94x94.41[GB2312] gb2312.h +ISO2022.G94x94.42[JIS208] jis0x208.h +ISO2022.G94x94.43[KS1001] ksc5601.h +ISO2022.G94x94.44[JIS212] jis0x212.h +ISO2022.G94x94.47[CNS1] cns11643_1.h the tables differ +ISO2022.G94x94.48[CNS2] cns11643_2.h +ISO2022.G94x94.49[CNS3] cns11643_3.h +ISO2022.G94x94.4A[CNS4] cns11643_4.h +ISO2022.G94x94.4B[CNS5] cns11643_5.h +ISO2022.G94x94.4C[CNS6] cns11643_6.h +ISO2022.G94x94.4D[CNS7] cns11643_7.h + +ISO2022.G96.41[Lat1] iso8859_1.h +ISO2022.G96.42[Lat2] iso8859_2.h +ISO2022.G96.43[Lat3] iso8859_3.h +ISO2022.G96.44[Lat4] iso8859_4.h +ISO2022.G96.46[Greek] ? +ISO2022.G96.47[Arabic] iso8859_6.h ISO-8859-6 ignored +ISO2022.G96.48[Hebrew] ? +ISO2022.G96.4C[Cyrill] ? +ISO2022.G96.4D[Lat5] iso8859_5.h +ISO2022.G96.50[LatSup] ? +ISO2022.G96.52[IS6397] ? +ISO2022.G96.54[Thai] tis620.h +ISO2022.G96.56[Lat6] iso8859_6.h +ISO2022.G96.58[L6Sami] ? +ISO2022.G96.59[Lat7] iso8859_7.h +ISO2022.G96.5C[Welsh] ? +ISO2022.G96.5D[Sami] ? +ISO2022.G96.5E[Hebrew] ? +ISO2022.G96.5F[Lat8] iso8859_8.h +ISO2022.G96.62[Lat9] iso8859_9.h + +KOI8-R koi8_r.h + +Microsoft.CP1250 cp1250.h +Microsoft.CP1251 cp1251.h +Microsoft.CP1252 cp1252.h +Microsoft.CP1254 cp1254.h +Microsoft.CP866 cp866.h +Microsoft.CP932 cp932.h cp932ext.h + +iconv->!Unicode: +================ + +Iconv has the following encodings, which are not present in !Unicode. +Providing a suitable data file for !Unicode is trivial. Whether UnicodeLib +will then act upon the addition of these is unknown. +This list is ordered as per libiconv's NOTES file. + +European & Semitic languages: + + ISO-8859-16 (iso8859_16.h) + KOI8-{U,RU,T} (koi8_xx.h) + CP125{3,5,6,7} (cp125n.h) + CP850 (cp850.h) + CP862 (cp862.h) + Mac{Croatian,Romania,Greek,Turkish,Hebrew,Arabic} (mac_foo.h) + +Japanese: + + None afaikt. + +Simplified Chinese: + + GB18030 (gb18030.h, gb18030ext.h) + HZ-GB-2312 (hz.h) + +Traditional Chinese: + + CP950 (cp950.h) + BIG5-HKSCS (big5hkscs.h) + +Korean: + + CP949 (cp949.h) + +Armenian: + + ARMSCII-8 (armscii_8.h) + +Georgian: + + Georgian-Academy, Georgian-PS (georgian_academy.h, georgian_ps.h) + +Thai: + + CP874 (cp874.h) + MacThai (mac_thai.h) + +Laotian: + + MuleLao-1, CP1133 (mulelao.h, cp1133.h) + +Vietnamese: + + VISCII, TCVN (viscii.h, tcvn.h) + CP1258 (cp1258.h) + +Unicode: + + BE/LE variants of normal encodings. I assume UnicodeLib handles + these, but can't be sure. + C99 / JAVA - well, yes. + + +Iconv Module: +============= + +The iconv module is effectively a thin veneer around UnicodeLib. However, +8bit encodings are implemented within the module rather than using the +support in UnicodeLib. The rationale for this is simply that, although +UnicodeLib will understand (and act upon - reportedly...) additions to +the ISO2022 Unicode resource, other encodings are ignored. As the vast +majority of outstanding encodings fall into this category, and the code +is fairly simple, it made sense to implement it within the module. + +With use of the iconv module, the list of outstanding encodings is +reduced to: + + ISO-8859-16 (easily implemented, if required) + CP1255 (requires state-based transcoding) + + GB18030 (not 8bit - reportedly a requirement of PRC) + HZ-GB-2312 (not 8bit - supported by IE4) + + CP950 (not 8bit - a (MS) variant of Big5) + BIG5-HKSCS (not 8bit - again, a Big5 variant) + + CP949 (not 8bit) + + ARMSCII-8 (easily implemented, if required) + + VISCII (easily implemented, if required) + CP1258, TCVN (requires state-based transcoding) + +Additionally, the rest of the CodePage encodings implemented in iconv +but not listed above (due to omissions from the iconv documentation) +are implemented by the iconv module. \ No newline at end of file -- cgit v1.2.3