diff options
author | Steven G. Johnson <stevenj@mit.edu> | 2014-07-15 15:32:53 -0400 |
---|---|---|
committer | Steven G. Johnson <stevenj@mit.edu> | 2014-07-15 15:33:52 -0400 |
commit | c0f2b512a055c667cb751ef4526ea744f2428826 (patch) | |
tree | c3f1e8ec8c857026c5fbffa9305596a62388be11 /README | |
parent | ab9520d18845248ef79ee98e8d671f8eecfec288 (diff) | |
download | libutf8proc-c0f2b512a055c667cb751ef4526ea744f2428826.tar.gz libutf8proc-c0f2b512a055c667cb751ef4526ea744f2428826.tar.bz2 |
rm ruby and pgsql plugins: keep libutf8proc repo focused exclusively on the C library
Diffstat (limited to 'README')
-rw-r--r-- | README | 53 |
1 files changed, 0 insertions, 53 deletions
@@ -45,59 +45,6 @@ The documentation for the C library is found in the utf8proc.h header file. strings, unless you want to allocate memory yourself. -*** RUBY API *** - -The ruby library adds the methods "utf8map" and "utf8map!" to the String -class, and the method "utf8" to the Integer class. - -The String#utf8map method does the same as the "utf8proc_map" C function. -Options for the mapping procedure are passed as symbols, i.e: -"Hello".utf8map(:casefold) => "hello" - -The descriptions of all options are found in the C header file -"utf8proc.h". Please notice that the according symbols in ruby are all -lowercase. - -String#utf8map! is the destructive function in the meaning that the string -is replaced by the result. - -There are shortcuts for the 4 normalization forms specified by Unicode: -String#utf8nfd, String#utf8nfd!, -String#utf8nfc, String#utf8nfc!, -String#utf8nfkd, String#utf8nfkd!, -String#utf8nfkc, String#utf8nfkc! - -The method Integer#utf8 returns a UTF-8 string, which is containing the -unicode char given by the code point. -0x000A.utf8 => "\n" -0x2028.utf8 => "\342\200\250" - - -*** POSTGRESQL API *** - -For PostgreSQL there are two SQL functions supplied named "unifold" and -"unistrip". These functions function can be used to prepare index fields in -order to be folded in a way where string-comparisons make more sense, e.g. -where "bathtub" == "bath<soft hyphen>tub" -or "Hello World" == "hello world". - -CREATE TABLE people ( - id serial8 primary key, - name text, - CHECK (unifold(name) NOTNULL) -); -CREATE INDEX name_idx ON people (unifold(name)); -SELECT * FROM people WHERE unifold(name) = unifold('John Doe'); - -The function "unistrip" removes character marks like accents or diaeresis, -while "unifold" keeps then. - -NOTICE: The outputs of the function can change between releases, as - utf8proc does not follow a versioning stability policy. You have to - rebuild your database indicies, if you upgrade to a newer version - of utf8proc. - - *** TODO *** - detect stable code points and process segments independently in order to |