summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorSteven G. Johnson <stevenj@mit.edu>2014-07-15 16:04:36 -0400
committerSteven G. Johnson <stevenj@mit.edu>2014-07-15 16:04:36 -0400
commit0d7224a6d8a77e5eebf5e18bded742490f3b20fd (patch)
tree18fea4ba0497978163668ff85be94d3a717a9fa8 /README.md
parentc0f2b512a055c667cb751ef4526ea744f2428826 (diff)
downloadlibutf8proc-0d7224a6d8a77e5eebf5e18bded742490f3b20fd.tar.gz
libutf8proc-0d7224a6d8a77e5eebf5e18bded742490f3b20fd.tar.bz2
markdown and other cosmetic updates
Diffstat (limited to 'README.md')
-rw-r--r--README.md68
1 files changed, 68 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..e0efefc
--- /dev/null
+++ b/README.md
@@ -0,0 +1,68 @@
+== libutf8proc ==
+
+The [libutf8proc package](https://github.com/JuliaLang/libutf8proc) is
+a lightly updated fork of the [utf8proc
+library](http://www.public-software-group.org/utf8proc) from Jan
+Behrens and the rest of the [Public Software
+Group](http://www.public-software-group.org/), who deserve *nearly all
+of the credit* for this package: a small, clean C library that
+provides Unicode normalization, case-folding, and other operations for
+data in the [UTF-8 encoding](http://en.wikipedia.org/wiki/UTF-8).
+
+The reason for this fork is that utf8proc is used for basic Unicode
+support in the [Julia language](http://julialang.org/) and the Julia
+developers wanted Unicode 7 support and other features, but the
+Public Software Group currently does not seem to have the resources
+necessary to update utf8proc. We hope that the fork can be merged
+back into the mainline utf8proc package before too long.
+
+(The original utf8proc package also includes Ruby and PostgreSQL plug-ins.
+We removed those from libutf8proc in order to focus exclusively on the C
+library for the time being. We will strive to keep API changes to a minimum,
+so libutf8proc should still be usable with the old plug-in code.)
+
+Like utf8proc, the libutf8proc package is licensed under the
+free/open-source [MIT "expat"
+license](http://opensource.org/licenses/MIT) (plus certain Unicode
+data governed by the similarly permissive [Unicode data
+license](http://www.unicode.org/copyright.html#Exhibit1)); please see
+the included `LICENSE.md` file for more detailed information.
+
+=== Quick Start ===
+
+For compilation of the C library run `make`.
+
+=== General Information ===
+
+The C library is found in this directory after successful compilation
+and is named `libutf8proc.a` (for the static library) and
+`libutf8proc.so` (for the dynamic library).
+
+The Unicode version being supported is 5.0.0.
+*Note:* Version 4.1.0 of Unicode Standard Annex #29 was used, as
+version 5.0.0 had not been available at the time of implementation.
+
+For Unicode normalizations, the following options are used:
+
+* Normalization Form C: `STABLE`, COMPOSE`
+* Normalization Form D: `STABLE`, `DECOMPOSE`
+* Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT`
+* Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT`
+
+=== C Library ===
+
+The documentation for the C library is found in the `utf8proc.h` header file.
+`utf8proc_map` is function you will most likely be using for mapping UTF-8
+strings, unless you want to allocate memory yourself.
+
+=== To Do ===
+
+* detect stable code points and process segments independently in order to save memory
+* do a quick check before normalizing strings to optimize speed
+* support stream processing
+
+=== Contact ===
+
+Bug reports, feature requests, and other queries can be filed at
+the [libutf8proc page on Github](https://github.com/JuliaLang/libutf8proc).
+