summaryrefslogtreecommitdiff
path: root/README
blob: 692b61b5e540aee7102db137e93b833f2d8857cc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
Please read the LICENSE file, which is shipping with this software.


*** QUICK START ***

For compilation of the C library call "make c-library", for compilation of
the ruby library call "make ruby-library" and for compilation of the
PostgreSQL extension call "make pgsql-library".

For ruby you can also create a gem-file by calling "make ruby-gem".

"make all" can be used to build everything, but both ruby and PostgreSQL
installations are required in this case.


*** GENERAL INFORMATION ***

The C library is found in this directory after successful compilation and
is named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of
the files "utf8proc.rb" and "utf8proc_native.so", which are found in the
subdirectory "ruby/". If you chose to create a gem-file it is placed in the
"ruby/gem" directory. The PostgreSQL extension is named "utf8proc_pgsql.so"
and resides in the "pgsql/" directory.

Both the ruby library and the PostgreSQL extension are built as stand-alone
libraries and are therefore not dependent the dynamic version of the
C library files, but this behaviour might change in future releases.

The Unicode version being supported is 5.0.0.
Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as
      version 5.0.0 had not been available at the time of implementation.

For Unicode normalizations, the following options have to be used:
Normalization Form C:  STABLE, COMPOSE
Normalization Form D:  STABLE, DECOMPOSE
Normalization Form KC: STABLE, COMPOSE, COMPAT
Normalization Form KD: STABLE, DECOMPOSE, COMPAT


*** C LIBRARY ***

The documentation for the C library is found in the utf8proc.h header file.
"utf8proc_map" is most likely function you will be using for mapping UTF-8
strings, unless you want to allocate memory yourself.


*** TODO ***

- detect stable code points and process segments independently in order to
  save memory
- do a quick check before normalizing strings to optimize speed
- support stream processing


*** CONTACT ***

If you find any bugs or experience difficulties in compiling this software,
please contact us:

Project page: http://www.public-software-group.org/utf8proc