| Commit message (Collapse) | Author | Age | Files | Lines |
|\ |
|
| | |
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* use width=1 for soft hyphen and for unassigned/PUA codepoints
* don't count unassigned codepoints when comparing with system wcwidth
* more tests
* indentation fixes
* NEWS for 135
* remove special-casing for arabic control characters affecting a span of numbers, which are sometimes zero-width and sometimes not
* regenerate
|
|
|
|
|
|
|
|
| |
* uppercase(0x00df) = 0x1e9e
* tests for titlecase and u+00df uppercase
* NEWS, another test
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fixes allowing for “Full” folding and NFKC_CaseFold compliance.
* Only include C (Common) and F (Full) foldings from CaseFolding.txt. Removed S (Simple) since F & S are specified to be exclusive.
* Extend UTF8PROC_IGNORE to also ignore unassigned codepoints (such as \u2065) which are specified as being discarded by NFKC_CF.
* Document the changes to UTF8PROC_IGNORE in header.
* Add NFKC_CF helper function with documentation.
* restore old IGNORE behavior, add UTF8PROC_STRIPNA, rename to utf8proc_NFKC_Casefold, add a test
* success message
* test that IGNORE does not strip NA
* data update
* NFKC_Casefold shouldn't strip NA
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* convert sequences to utf-16 (saves 25kb)
* store sequence length in properties instead using -1 termination (saves 10kb)
* cache index for slightly faster data creation
* store lower/upper/title mapping in sequence array (saves 25kb). Add utf8proc_totitle, as title_mapping cannot be used to get the title codepoint anymore. Rename xxx_mapping to xxx_seqindex, so programs assuming a value with the old meaning fail at compile time
* change combination array data type to uint16 (saves 40kb)
* merge 1st and 2nd comb index (saves 50kb)
* kill empty prefix/suffix in combination array (saves 50kb)
* there was no need to have a separate combination start array, it can be merged in a single array
* some fixes
* mark the table as const again
* and regen
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Updates for Unicode 9.0.0 TR29 Changes
- New rules GB10/(12/13) are used to combine emoji-zwj sequences/
(force grapheme breaks every two RI codepoints). Unfortunately this
breaks statelessness of grapheme-boundary determination. Deal with
this by ignoring the problem in utf8proc_grapheme_break, and by
hacking in a special case in decompose
- ZWJ moved to its own boundclass, update what is now GB9 accordingly.
- Add comments to indicate which rule a given case implements
- The Number of bound classes Now exceeds 4 bits, expand to 8 and
reorganize fields
* Import Unicode 9 data
* Update Grapheme break API to expose state override
* Bump MAJOR version
|
|
|
|
|
| |
Use integers instead of pointers in Unicode tables. Saves 226 kb / 716 kb in the
compiled library.
|
| |
|
| |
|
| |
|
|
|