From 0d7224a6d8a77e5eebf5e18bded742490f3b20fd Mon Sep 17 00:00:00 2001 From: "Steven G. Johnson" Date: Tue, 15 Jul 2014 16:04:36 -0400 Subject: markdown and other cosmetic updates --- README.md | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 README.md (limited to 'README.md') diff --git a/README.md b/README.md new file mode 100644 index 0000000..e0efefc --- /dev/null +++ b/README.md @@ -0,0 +1,68 @@ +== libutf8proc == + +The [libutf8proc package](https://github.com/JuliaLang/libutf8proc) is +a lightly updated fork of the [utf8proc +library](http://www.public-software-group.org/utf8proc) from Jan +Behrens and the rest of the [Public Software +Group](http://www.public-software-group.org/), who deserve *nearly all +of the credit* for this package: a small, clean C library that +provides Unicode normalization, case-folding, and other operations for +data in the [UTF-8 encoding](http://en.wikipedia.org/wiki/UTF-8). + +The reason for this fork is that utf8proc is used for basic Unicode +support in the [Julia language](http://julialang.org/) and the Julia +developers wanted Unicode 7 support and other features, but the +Public Software Group currently does not seem to have the resources +necessary to update utf8proc. We hope that the fork can be merged +back into the mainline utf8proc package before too long. + +(The original utf8proc package also includes Ruby and PostgreSQL plug-ins. +We removed those from libutf8proc in order to focus exclusively on the C +library for the time being. We will strive to keep API changes to a minimum, +so libutf8proc should still be usable with the old plug-in code.) + +Like utf8proc, the libutf8proc package is licensed under the +free/open-source [MIT "expat" +license](http://opensource.org/licenses/MIT) (plus certain Unicode +data governed by the similarly permissive [Unicode data +license](http://www.unicode.org/copyright.html#Exhibit1)); please see +the included `LICENSE.md` file for more detailed information. + +=== Quick Start === + +For compilation of the C library run `make`. + +=== General Information === + +The C library is found in this directory after successful compilation +and is named `libutf8proc.a` (for the static library) and +`libutf8proc.so` (for the dynamic library). + +The Unicode version being supported is 5.0.0. +*Note:* Version 4.1.0 of Unicode Standard Annex #29 was used, as +version 5.0.0 had not been available at the time of implementation. + +For Unicode normalizations, the following options are used: + +* Normalization Form C: `STABLE`, COMPOSE` +* Normalization Form D: `STABLE`, `DECOMPOSE` +* Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT` +* Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT` + +=== C Library === + +The documentation for the C library is found in the `utf8proc.h` header file. +`utf8proc_map` is function you will most likely be using for mapping UTF-8 +strings, unless you want to allocate memory yourself. + +=== To Do === + +* detect stable code points and process segments independently in order to save memory +* do a quick check before normalizing strings to optimize speed +* support stream processing + +=== Contact === + +Bug reports, feature requests, and other queries can be filed at +the [libutf8proc page on Github](https://github.com/JuliaLang/libutf8proc). + -- cgit v1.2.3