summaryrefslogtreecommitdiff
path: root/utf8proc.h
Commit message (Collapse)AuthorAgeFilesLines
* add utf8proc_unicode_version (#151)Steven G. Johnson2019-03-301-0/+5
|
* doc clarification (closing #110)Steven G. Johnson2019-03-301-1/+2
|
* doc fixes, don't export stdint and limits.h values UINT16_MAX and SSIZE_MAXSteven G. Johnson2018-07-241-9/+3
|
* Merge branch 'master' of https://github.com/JuliaLang/utf8procSteven G. Johnson2018-07-241-0/+8
|\
| * update data and algorithms for Unicode 11 (#140)Steven G. Johnson2018-07-241-0/+8
| |
* | copyright year updatesSteven G. Johnson2018-07-241-1/+1
|/
* NEWS for upcoming 2.2 release, version bumpSteven G. Johnson2018-05-021-2/+2
|
* Case folding fixes (#133)Steven G. Johnson2018-05-021-2/+12
| | | | | | | | | | | | | | | | | | | | | * Fixes allowing for “Full” folding and NFKC_CaseFold compliance. * Only include C (Common) and F (Full) foldings from CaseFolding.txt. Removed S (Simple) since F & S are specified to be exclusive. * Extend UTF8PROC_IGNORE to also ignore unassigned codepoints (such as \u2065) which are specified as being discarded by NFKC_CF. * Document the changes to UTF8PROC_IGNORE in header. * Add NFKC_CF helper function with documentation. * restore old IGNORE behavior, add UTF8PROC_STRIPNA, rename to utf8proc_NFKC_Casefold, add a test * success message * test that IGNORE does not strip NA * data update * NFKC_Casefold shouldn't strip NA
* Static library support improvements (#123)past-due2018-04-291-8/+12
| | | | | | | | | | * `#define UTF8PROC_STATIC` to disable DLLEXPORT `#define UTF8PROC_STATIC` to disable DLLEXPORT * [CMake] Automatically define UTF8PROC_STATIC if BUILD_SHARED_LIBS is off * [Makefile] Support additional UTF8PROC_DEFINES, which can be used to specify flags like `-DUTF8PROC_STATIC`
* version bump to 2.1.1 (#131)v2.1.1Steven G. Johnson2018-04-271-1/+1
|
* Update documentation to reflect Unicode 9.0.0. (#107)Christopher Baker2017-06-081-1/+1
| | | This makes the inline documentation match the README.
* removed inclusion of non-portable header file (#94)Árpád Goretity 2017-01-141-1/+1
|
* whoopsSteven G. Johnson2016-12-111-0/+1
|
* use ptrdiff_t rather than ssize_t, as ssize_t is non-standard (it is POSIX, ↵Steven G. Johnson2016-12-111-1/+1
| | | | not C)
* use stdbool.h and inttypes.h in MSVC 2013 and later, and use more ↵Steven G. Johnson2016-12-111-2/+8
| | | | C99-compatible definitions of false and true earlier (fix #90)
* new utf8proc_map_custom for hooking in user-defined custom mappings (#89)Steven G. Johnson2016-11-301-3/+35
| | | | | | | | | | * new utf8proc_map_custom for hooking in user-defined custom mappings * whoops, add test program * NEWS, version bump for 2.1 * change test functions to static so that gcc doesn't complain about missing prototypes
* typo in docstringsSteven G. Johnson2016-11-291-4/+3
|
* Tlsa/ucs4 normalize (#88)Michael Drake2016-11-211-2/+30
| | | | | | | | | | | | | | | | | | | * Split codepoint sequence normalisation out into separate function. This creates utf8proc_normalize_utf32() which takes and returns a UTF-32 string, applying the following options: - UTF8PROC_NLF2LS - UTF8PROC_NLF2PS - UTF8PROC_NLF2LF - UTF8PROC_STRIPCC - UTF8PROC_COMPOSE - UTF8PROC_STABLE The utf8proc_reencode() function has been updated to call the new utf8proc_normalize_utf32(). * Update code documentation: utf8proc_reencode handles UTF8PROC_CHARBOUND.
* Change definition of UINT16_MAX macro (#84)Jakub Vít2016-09-041-1/+1
| | | Change UINT16_MAX from `~(utf8proc_uint16_t)0` to fixed value `65535U` to prevent weird behaviour in complex expressions.
* NEWS and version numbers for 2.0.2 (#81)Tony Kelman2016-07-271-4/+5
| | | | | | | | * Add NEWS.md items for #79 and #80 * Prepare version numbers for 2.0.2 * Also update API version to 2.0.2
* NEWS and version bump for 2.0.1 release, to come out shortlySteven G. Johnson2016-07-131-1/+1
|
* Walk back ABI breaking changes (#76)Keno Fischer2016-07-131-3/+10
|
* Smaller tables (#68)Benito van der Zander2016-07-121-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | * convert sequences to utf-16 (saves 25kb) * store sequence length in properties instead using -1 termination (saves 10kb) * cache index for slightly faster data creation * store lower/upper/title mapping in sequence array (saves 25kb). Add utf8proc_totitle, as title_mapping cannot be used to get the title codepoint anymore. Rename xxx_mapping to xxx_seqindex, so programs assuming a value with the old meaning fail at compile time * change combination array data type to uint16 (saves 40kb) * merge 1st and 2nd comb index (saves 50kb) * kill empty prefix/suffix in combination array (saves 50kb) * there was no need to have a separate combination start array, it can be merged in a single array * some fixes * mark the table as const again * and regen
* Unicode 9 updates (#70)Keno Fischer2016-06-281-7/+25
| | | | | | | | | | | | | | | | | | | | | | | * Updates for Unicode 9.0.0 TR29 Changes - New rules GB10/(12/13) are used to combine emoji-zwj sequences/ (force grapheme breaks every two RI codepoints). Unfortunately this breaks statelessness of grapheme-boundary determination. Deal with this by ignoring the problem in utf8proc_grapheme_break, and by hacking in a special case in decompose - ZWJ moved to its own boundclass, update what is now GB9 accordingly. - Add comments to indicate which rule a given case implements - The Number of bound classes Now exceeds 4 bits, expand to 8 and reorganize fields * Import Unicode 9 data * Update Grapheme break API to expose state override * Bump MAJOR version
* Reduce the size of the binary.Michaël Meyer2015-12-091-2/+6
| | | | | Use integers instead of pointers in Unicode tables. Saves 226 kb / 716 kb in the compiled library.
* update Unicode version in header-file commentSteven G. Johnson2015-11-011-1/+1
|
* update copyright statements to list recent contributors and yearSteven G. Johnson2015-11-011-0/+1
|
* bump API/ABI version to 1.3, add NEWSSteven G. Johnson2015-05-291-1/+1
|
* add toupper/tolower functions (for JuliaLang/julia#11471)Steven G. Johnson2015-05-291-0/+15
|
* Fix #34 handle 66 Unicode non-characters, also improve performance and ↵Scott Paul Jones2015-05-291-0/+5
| | | | surrogate handling
* Prefix other C99 typedefs with utf8proc_Tony Kelman2015-04-061-32/+40
|
* Use a new typedef utf8proc_ssize_t to avoid define collisionsTony Kelman2015-04-051-13/+14
| | | | with MSVC
* rename DLLEXPORT to UTF8PROC_DLLEXPORT to prevent conflicts with other ↵Steven G. Johnson2015-03-301-23/+23
| | | | header files that define DLLEXPORT
* more documentation English and formatting cleanupsSteven G. Johnson2015-03-271-66/+64
|
* some documentation improvementsSteven G. Johnson2015-03-271-25/+27
|
* indentation consistencySteven G. Johnson2015-03-271-1/+1
|
* put the API version as #defines in the header file (as discussed in #30)Steven G. Johnson2015-03-271-1/+24
|
* mainpage dox tweaksSteven G. Johnson2015-03-231-2/+4
|
* Fix #26: use doxygen for generating API docsJonas Fonseca2015-03-211-310/+415
|
* update NEWS for 1.2-devSteven G. Johnson2015-03-121-1/+2
|
* remove requirement that get_property and decompose_char argument be in range ↵Steven G. Johnson2015-03-121-4/+0
| | | | 0x0 to 0x10ffff
* fix #2: add charwidth functionSteven G. Johnson2015-03-121-0/+16
|
* Minimal cmake build scriptTony Kelman2015-03-081-1/+1
| | | | | | move flags for MSVC rename lump.txt to lump.md, add data/*.txt to .gitignore
* rename back to utf8proc now that we are taking over maintenanceSteven G. Johnson2015-03-061-0/+424
|
* utf8proc.h -> mojibake.h (closes #10)Steven G. Johnson2014-07-181-387/+0
|
* C++/MSVC compatibility, indenting, for #4Steven G. Johnson2014-07-181-7/+9
|
* import of utf8proc-v1.1.6v1.1.6Steven G. Johnson2014-07-151-0/+385