path: root/
diff options
authorSteven G. Johnson <>2014-07-15 16:04:36 -0400
committerSteven G. Johnson <>2014-07-15 16:04:36 -0400
commit0d7224a6d8a77e5eebf5e18bded742490f3b20fd (patch)
tree18fea4ba0497978163668ff85be94d3a717a9fa8 /
parentc0f2b512a055c667cb751ef4526ea744f2428826 (diff)
markdown and other cosmetic updates
Diffstat (limited to '')
1 files changed, 68 insertions, 0 deletions
diff --git a/ b/
new file mode 100644
index 0000000..e0efefc
--- /dev/null
+++ b/
@@ -0,0 +1,68 @@
+== libutf8proc ==
+The [libutf8proc package]( is
+a lightly updated fork of the [utf8proc
+library]( from Jan
+Behrens and the rest of the [Public Software
+Group](, who deserve *nearly all
+of the credit* for this package: a small, clean C library that
+provides Unicode normalization, case-folding, and other operations for
+data in the [UTF-8 encoding](
+The reason for this fork is that utf8proc is used for basic Unicode
+support in the [Julia language]( and the Julia
+developers wanted Unicode 7 support and other features, but the
+Public Software Group currently does not seem to have the resources
+necessary to update utf8proc. We hope that the fork can be merged
+back into the mainline utf8proc package before too long.
+(The original utf8proc package also includes Ruby and PostgreSQL plug-ins.
+We removed those from libutf8proc in order to focus exclusively on the C
+library for the time being. We will strive to keep API changes to a minimum,
+so libutf8proc should still be usable with the old plug-in code.)
+Like utf8proc, the libutf8proc package is licensed under the
+free/open-source [MIT "expat"
+license]( (plus certain Unicode
+data governed by the similarly permissive [Unicode data
+license](; please see
+the included `` file for more detailed information.
+=== Quick Start ===
+For compilation of the C library run `make`.
+=== General Information ===
+The C library is found in this directory after successful compilation
+and is named `libutf8proc.a` (for the static library) and
+`` (for the dynamic library).
+The Unicode version being supported is 5.0.0.
+*Note:* Version 4.1.0 of Unicode Standard Annex #29 was used, as
+version 5.0.0 had not been available at the time of implementation.
+For Unicode normalizations, the following options are used:
+* Normalization Form C: `STABLE`, COMPOSE`
+* Normalization Form D: `STABLE`, `DECOMPOSE`
+* Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT`
+* Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT`
+=== C Library ===
+The documentation for the C library is found in the `utf8proc.h` header file.
+`utf8proc_map` is function you will most likely be using for mapping UTF-8
+strings, unless you want to allocate memory yourself.
+=== To Do ===
+* detect stable code points and process segments independently in order to save memory
+* do a quick check before normalizing strings to optimize speed
+* support stream processing
+=== Contact ===
+Bug reports, feature requests, and other queries can be filed at
+the [libutf8proc page on Github](