Case folding fixes (#133)

* Fixes allowing for “Full” folding and NFKC_CaseFold compliance. * Only include C (Common) and F (Full) foldings from CaseFolding.txt. Removed S (Simple) since F & S are specified to be exclusive. * Extend UTF8PROC_IGNORE to also ignore unassigned codepoints (such as \u2065) which are specified as being discarded by NFKC_CF. * Document the changes to UTF8PROC_IGNORE in header. * Add NFKC_CF helper function with documentation. * restore old IGNORE behavior, add UTF8PROC_STRIPNA, rename to utf8proc_NFKC_Casefold, add a test * success message * test that IGNORE does not strip NA * data update * NFKC_Casefold shouldn't strip NA
author: Steven G. Johnson <stevenj@mit.edu> 2018-05-02 08:15:02 -0400
committer: GitHub <noreply@github.com> 2018-05-02 08:15:02 -0400
commit: bdc8b9e4b2063e4b4563938d5077ee3b826cf342 (patch)
tree: b82ecf4a68d8b8841f4cb5aa4f903841f729bb47 /data/data_generator.rb
parent: 48949bd3ebd66bb94a40f4c3fcfb26dd4bf2be2b (diff)
download: libutf8proc-bdc8b9e4b2063e4b4563938d5077ee3b826cf342.tar.gz
libutf8proc-bdc8b9e4b2063e4b4563938d5077ee3b826cf342.tar.bz2
1 files changed, 1 insertions, 1 deletions
diff --git a/data/data_generator.rb b/data/data_generator.rb
index 37a3780..fa09617 100644
--- a/data/data_generator.rb
+++ b/data/data_generator.rb
@@ -104,7 +104,7 @@ $excl_version = $excl_version.chomp.split("\n").collect { |e| e.hex }
 $case_folding_string = File.open("CaseFolding.txt", :encoding => 'utf-8').read
 $case_folding = {}
 $case_folding_string.chomp.split("\n").each do |line|
-  next unless line =~ /([0-9A-F]+); [CFS]; ([0-9A-F ]+);/i
+  next unless line =~ /([0-9A-F]+); [CF]; ([0-9A-F ]+);/i
   $case_folding[$1.hex] = $2.split(" ").collect { |e| e.hex }
 end
author	Steven G. Johnson <stevenj@mit.edu>	2018-05-02 08:15:02 -0400
committer	GitHub <noreply@github.com>	2018-05-02 08:15:02 -0400
commit	bdc8b9e4b2063e4b4563938d5077ee3b826cf342 (patch)
tree	b82ecf4a68d8b8841f4cb5aa4f903841f729bb47 /data/data_generator.rb
parent	48949bd3ebd66bb94a40f4c3fcfb26dd4bf2be2b (diff)
download	libutf8proc-bdc8b9e4b2063e4b4563938d5077ee3b826cf342.tar.gz libutf8proc-bdc8b9e4b2063e4b4563938d5077ee3b826cf342.tar.bz2