# libmojibake [libmojibake](https://github.com/JuliaLang/libmojibake) is a lightly updated fork of the [utf8proc library](http://www.public-software-group.org/utf8proc) from Jan Behrens and the rest of the [Public Software Group](http://www.public-software-group.org/), who deserve *nearly all of the credit* for this package: a small, clean C library that provides Unicode normalization, case-folding, and other operations for data in the [UTF-8 encoding](http://en.wikipedia.org/wiki/UTF-8). The reason for this fork is that `utf8proc` is used for basic Unicode support in the [Julia language](http://julialang.org/) and the Julia developers wanted Unicode 7 support and other features, but the Public Software Group currently does not seem to have the resources necessary to update `utf8proc`. We hope that the fork can be merged back into the mainline `utf8proc` package before too long. (The original `utf8proc` package also includes Ruby and PostgreSQL plug-ins. We removed those from `libmojibake` in order to focus exclusively on the C library for the time being. We will strive to keep API changes to a minimum, so `libmojibake` should still be usable with the old plug-in code.) Like `utf8proc`, the `libmojibake` package is licensed under the free/open-source [MIT "expat" license](http://opensource.org/licenses/MIT) (plus certain Unicode data governed by the similarly permissive [Unicode data license](http://www.unicode.org/copyright.html#Exhibit1)); please see the included `LICENSE.md` file for more detailed information. ## Quick Start ## For compilation of the C library run `make`. ## General Information ## The C library is found in this directory after successful compilation and is named `libmojibake.a` (for the static library) and `libmojibake.so` (for the dynamic library). The Unicode version being supported is 5.0.0. *Note:* Version 4.1.0 of Unicode Standard Annex #29 was used, as version 5.0.0 had not been available at the time of implementation. For Unicode normalizations, the following options are used: * Normalization Form C: `STABLE`, COMPOSE` * Normalization Form D: `STABLE`, `DECOMPOSE` * Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT` * Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT` ## C Library ## The documentation for the C library is found in the `utf8proc.h` header file. `utf8proc_map` is function you will most likely be using for mapping UTF-8 strings, unless you want to allocate memory yourself. ## To Do ## * detect stable code points and process segments independently in order to save memory * do a quick check before normalizing strings to optimize speed * support stream processing ## Contact ## Bug reports, feature requests, and other queries can be filed at the [libmojibake page on Github](https://github.com/JuliaLang/libmojibake/issues).