libutf8proc.git - UTF8 Processing library (import)

	Commit message (Collapse)	Author	Age	Files	Lines
*	charwidth=1 for soft hyphen and unassigned codepoints (#135)	Steven G. Johnson	2018-07-24	1	-1893/+1893
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* use width=1 for soft hyphen and for unassigned/PUA codepoints * don't count unassigned codepoints when comparing with system wcwidth * more tests * indentation fixes * NEWS for 135 * remove special-casing for arabic control characters affecting a span of numbers, which are sometimes zero-width and sometimes not * regenerate
*	uppercase mapping ß (U+00df) to ẞ (U+1E9E) (#134)	Steven G. Johnson	2018-05-02	1	-1221/+1221
\| \| \| \| \| \| \| \|	* uppercase(0x00df) = 0x1e9e * tests for titlecase and u+00df uppercase * NEWS, another test
*	Case folding fixes (#133)	Steven G. Johnson	2018-05-02	1	-4177/+4177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fixes allowing for “Full” folding and NFKC_CaseFold compliance. * Only include C (Common) and F (Full) foldings from CaseFolding.txt. Removed S (Simple) since F & S are specified to be exclusive. * Extend UTF8PROC_IGNORE to also ignore unassigned codepoints (such as \u2065) which are specified as being discarded by NFKC_CF. * Document the changes to UTF8PROC_IGNORE in header. * Add NFKC_CF helper function with documentation. * restore old IGNORE behavior, add UTF8PROC_STRIPNA, rename to utf8proc_NFKC_Casefold, add a test * success message * test that IGNORE does not strip NA * data update * NFKC_Casefold shouldn't strip NA
*	update to unicode 10 (#132)	Steven G. Johnson	2018-04-27	1	-1728/+1826
\|
*	Ensure generated const data tables are hidden via "static" (#100)	Paul Smith	2017-02-19	1	-5/+5
\|
*	update to unifont 9.0.04	Steven G. Johnson	2016-12-11	1	-7548/+7549
\|
*	silence MSVC warning about conversion to uint8 (fix #86)	Steven G. Johnson	2016-11-30	1	-7545/+7544
\|
*	update to Unifont 9 (for Unicode 9 charwidths) (#75)	Steven G. Johnson	2016-07-12	1	-505/+506
\|
*	Smaller tables (#68)	Benito van der Zander	2016-07-12	1	-11669/+8966
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* convert sequences to utf-16 (saves 25kb) * store sequence length in properties instead using -1 termination (saves 10kb) * cache index for slightly faster data creation * store lower/upper/title mapping in sequence array (saves 25kb). Add utf8proc_totitle, as title_mapping cannot be used to get the title codepoint anymore. Rename xxx_mapping to xxx_seqindex, so programs assuming a value with the old meaning fail at compile time * change combination array data type to uint16 (saves 40kb) * merge 1st and 2nd comb index (saves 50kb) * kill empty prefix/suffix in combination array (saves 50kb) * there was no need to have a separate combination start array, it can be merged in a single array * some fixes * mark the table as const again * and regen
*	Unicode 9 updates (#70)	Keno Fischer	2016-06-28	1	-11102/+11438
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Updates for Unicode 9.0.0 TR29 Changes - New rules GB10/(12/13) are used to combine emoji-zwj sequences/ (force grapheme breaks every two RI codepoints). Unfortunately this breaks statelessness of grapheme-boundary determination. Deal with this by ignoring the problem in utf8proc_grapheme_break, and by hacking in a special case in decompose - ZWJ moved to its own boundclass, update what is now GB9 accordingly. - Add comments to indicate which rule a given case implements - The Number of bound classes Now exceeds 4 bits, expand to 8 and reorganize fields * Import Unicode 9 data * Update Grapheme break API to expose state override * Bump MAJOR version
*	Reduce the size of the binary.	Michaël Meyer	2015-12-09	1	-6668/+6668
\| \| \| \| \|	Use integers instead of pointers in Unicode tables. Saves 226 kb / 716 kb in the compiled library.
*	Update Unicode data	Peter Colberg	2015-10-29	1	-2635/+2629
\| \| \| \|	Fixes Travis builds on Ubuntu 12.04 LTS with Ruby 1.9.3-p551.
*	Update Unicode data	Jiahao Chen	2015-06-29	1	-2728/+2766
\|
*	Updated Unicode 8 data - now sorted internally by data generator	Jiahao Chen (陈家豪)	2015-06-26	1	-2770/+2761
\|
*	Update Unicode data	Jiahao Chen	2015-06-26	1	-893/+896
\|
*	fix #46 (make sure symbol-like codepoints have nonzero width even if they ↵	Steven G. Johnson	2015-06-24	1	-1803/+1715
\| \| \| \|	aren't in Unifont)
*	Updated data file to Unicode 8.0.0	Jiahao Chen	2015-06-23	1	-7430/+7972
\|
*	Prefix other C99 typedefs with utf8proc_	Tony Kelman	2015-04-06	1	-4/+4
\|
*	fix #2: add charwidth function	Steven G. Johnson	2015-03-12	1	-9939/+10242
\|
*	update graphemes for Unicode 7, add utf8proc_grapheme_break function	Steven G. Johnson	2014-12-12	1	-10315/+10549
\|
*	Update utf8proc_data.c (generated by data_generator.rb)	Jiahao Chen	2014-07-18	1	-8830/+11182
\|
*	import of utf8proc-v1.1.6v1.1.6	Steven G. Johnson	2014-07-15	1	-0/+13383