librufl.git - RISC OS Unicode Font Library

	Commit message (Collapse)	Author	Age	Files	Lines
*	Modernize logging	John-Mark Bell	2022-06-02	2	-10/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Ensure that only one copy of the storage for the log state is needed (previously, it injected these into every compilation unit, which now results in compiler warnings). Replace __PRETTY_FUNCTION__ with the standardised __func__. The output is the same in either case in our usage here (and testing all the way back to GCC 3.4.6 yields no difference in output). This also fixes compilation with GCC 10 (which warns about the use of __PRETTY_FUNCTION__ in -pedantic mode).
*	Substitution table/CHD: fix build with GCC 10.	John-Mark Bell	2022-06-02	1	-6/+6
\| \| \| \| \|	Modern GCC correctly warned about a narrowing cast. This was unnecessary, so rework the code to stop using it.
*	Substitution table/direct: handle >255 fonts	John-Mark Bell	2022-05-30	1	-1/+1
\| \| \| \| \| \|	The direct substitution table constructor failed to allocate sufficient space to store the table in the case where there are more than 255 fonts installed on the system.
*	Fix error conditions in broken FEC case	John-Mark Bell	2022-05-27	1	-4/+7
\|
*	Partially revert public API type changes	John-Mark Bell	2022-05-27	3	-20/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	a4c41198 made a variety of consistency changes to the public API, including changing the type of the "string" parameter passed to many entry points from const char * to const uint8_t , as that better reflects the data. However, this then forces the user of the API to explicitly cast when passing string constants, or other strings (which, would be passed to standard library APIs as const char , even if UTF-8 encoded). Revert this part of the change so the type of "string" is once more const char * and cast to the type we actually want internally.
*	Dump substitution table glyph count, per-plane size	John-Mark Bell	2022-05-23	1	-9/+62
\| \| \| \| \|	This allows us to see the total extent of glyph coverage and which planes are the largest.
*	Add a test for a broken encoding file	John-Mark Bell	2022-05-22	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This file is broken in a number of ways: * It contains garbage content that does not form valid glyph name specifiers * It contains garbage directives * It tries to define more than 256 glyphs (which is not supported by non-UCS FontManagers) The latter point above uncovered a bug in the umap sanity checking where it failed to properly count the number of glyph indices being defined by the Encoding file.
*	Add test for fonts with no encodings at all	John-Mark Bell	2022-05-22	1	-0/+3
\| \| \| \| \| \| \| \|	This exposed a failure to clean up any FontManager error occurring when attempting to load this kind of font. Additionally, it also exposed a failure to initialise the umap count in an internal structure. This was probably harmless in reality, but caused the test to fail.
*	Initialise pointers with NULL	John-Mark Bell	2022-05-22	1	-7/+7
\|
*	Start brute-force scan at codepoint 1	John-Mark Bell	2022-05-22	1	-1/+1
\| \| \| \| \|	There's no point starting at 0, as it is not a valid codepoint and will never be valid.
*	Conditionally support UCS Encoding formats	John-Mark Bell	2022-05-22	1	-2/+17
\| \| \| \| \| \| \| \| \| \|	While the Encoding file parser is able to parse UCS glyph "names" (of the form /uniXXXX or /uXXXX[XXXX]) and the sparse Encoding file format supported by the UCS FontManager, we currently only parse Encoding files at all on systems running a non-UCS FontManager and thus these code paths are unreachable. Guard them with appropriate preprocessor definitions so that we can easily resurrect them if they are ever needed in future.
*	Add checks for reinitialising library.	John-Mark Bell	2022-05-22	2	-5/+9
\| \| \| \| \| \|	This will cause the second initialisation attempt to load the cache file. In doing so, we discover that cache loading on non-32bit platforms didn't work -- fix that, too.
*	Squash leaks in non-UCS FM case	John-Mark Bell	2022-05-22	2	-0/+8
\|
*	Size CHD bitmap correctly.	John-Mark Bell	2021-09-14	1	-1/+2
\| \| \| \| \| \|	Running tests under valgrind reveals that we were failing to size the CHD bitmap correctly, resulting in the opportunity for buffer overruns. Stop that happening by correcting the maths.
*	Ensure there is at least one menu entry	John-Mark Bell	2021-09-14	1	-4/+14
\| \| \| \| \|	In the case where there are no fonts at all on the system, ensure the menu building code copes.
*	Don't assume pointers are 32bits wide	John-Mark Bell	2021-08-15	1	-2/+2
\| \| \| \| \|	Use uintptr_t to cast between pointers and integers, instead of assuming that uint32_t will suffice.
*	Restrict total font faces to 16 bit range	John-Mark Bell	2021-08-15	1	-1/+3
\| \| \| \| \| \|	The substitution tables expect there to be no more than 65535 font faces available. Enforce this at load, so there aren't any unwanted surprises later.
*	Clean up types in internal structures	John-Mark Bell	2021-08-15	2	-11/+11
\|
*	Clean up types in public API	John-Mark Bell	2021-08-15	4	-31/+39
\|
*	Make dump of unicode maps optional	John-Mark Bell	2021-08-15	1	-2/+2
\| \| \| \| \| \| \| \| \|	Add a verbose flag to rufl_dump_state() and use it to control whether to dump the individual unicode maps generated when using a non-UCS Font Manager. Change rufl_test to not dump this state (ordinarily, anyway) as it is generally uninteresting and highly verbose.
*	Ignore UCS fonts if using a non-UCS Font Manager	John-Mark Bell	2021-08-15	1	-2/+27
\| \| \| \| \| \| \| \|	Attempting to use fonts constructed for the UCS Font Manager on older systems generally results in bad outcomes up to, and including, complete system freezes. As fixing the Font Manager on these systems is impractical, simply ignore these fonts completely when scanning for glyph coverage.
*	Clean up logging in the non-UCS Font Manager path	John-Mark Bell	2021-08-15	1	-43/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To obtain the full extent of a "language" font's glyph coverage we need to open and scan it in each of the available target encodings. All of the Latin1-6 + Welsh target encodings declare that they are based on the Base0 encoding and thus will cause the Font Manager to demand the existence of corresponding IntMetric0/Outlines0 font data files. A "language" font using a different base encoding (and corresponding target encodings based on it) would thus generate an error from the Font Manager. Additionally, without reinventing the Font Manager's own logic (and poking around the filesystem looking for IntMetrics and Encoding files), we don't know if a font is a "language" or a "symbol" font until we try to use it. Thus, we expect attempts to open "symbol" fonts with an explicit target encoding to generate an error from the Font Manager as well. As these are expected errors, there is no point logging them as it just produces a load of distracting noise.
*	Accept non-UCS Font Manager rejecting UCS fonts.	John-Mark Bell	2021-08-14	1	-2/+8
\| \| \| \| \| \| \|	If you attempt to use fonts supported by the UCS Font Manager with a non-UCS Font Manager, this will either work (in a limited way) or fail because the font data is incomprehensible to the non-UCS Font Manager. Cope with one particular instance of this.
*	Fix font scanning on non-UCS Font Managers	John-Mark Bell	2021-08-14	1	-1/+1
\| \| \| \| \|	We want to update the umap itself not whatever happens to be on the stack in the vicinity of its address.
*	Fix initialisation on UCS Font Manager 3.41-3.63	John-Mark Bell	2021-08-14	1	-13/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We cannot use Font_ReadEncodingFile to find the path to a font's source encoding because that is not what the API returns (it returns the path to the encoding file corresponding to the target encoding used to open the font handle) and there is no public API for obtaining the path of the source encoding. Additionally, there is no reliable way to replicate the UCS Font Manager's mapping of undefined and duplicate glyph names into the private use space at U+E000-U+EFFF. Therefore, take a different approach to supporting these versions of the Font Manager: abuse Font_EnumerateCharacters by probing every codepoint in the range [0, first_returned) to force the Font Manager to reveal the information we want. Once we have reached the first_returned codepoint, we can happily fall through to the normal flow (which will make use of the sparse nature of the Unicode space).
*	Fix shrinkwrap moving blocks	John-Mark Bell	2021-08-14	1	-23/+18
\| \| \| \| \|	All blocks subsequent to a full one get moved up and all their indices need rewriting.
*	Ensure dumping doesn't run off the end of a plane	John-Mark Bell	2021-08-14	1	-1/+1
\|
*	Fix bug in sparse encoding parser	John-Mark Bell	2021-08-14	1	-2/+3
\| \| \| \| \|	Spaces are valid characters in the sparse encoding so ensure we consume them correctly.
*	Add MedBold and Thin weights	John-Mark Bell	2021-08-12	1	-0/+2
\|
*	Clean up logging	John-Mark Bell	2021-08-11	1	-17/+0
\|
*	Use version-specific cache location.	John-Mark Bell	2021-08-11	2	-9/+53
\| \| \| \| \| \| \| \| \|	Move the cache location to a subdirectory within Scrap and encode the cache version in the filename. This allows software using different versions of RUfl to coexist on the same system without trying to share the same cache (and thus rescanning fonts every time).
*	Optimise substitution table storage	John-Mark Bell	2021-08-11	1	-189/+570
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider each Unicode plane independently (as they have very different properties). This means building a table for each plane and allows us to reduce the size of each entry in the CHD-addressed table from 64 to 32bits (which provides a significant immediate saving). Also introduce a direct linear mapping backend. This stores the table in a series of 256-entry blocks which are addressed from a fixed-size index. Block entries are either 8 or 16 bits wide (depending upon the number of fonts found on the system). This restores some of the storage efficiency of the old "giant array" approach, which is generally more efficient than a CHD (or other) hash-based implementation where the load factor is reasonably high (or the glyph:block ratio is sufficiently high). Select the direct or CHD storage mechanism based upon an estimate of the storage size for the data collected for a plane. In the testing I have performed (with the same fonts available as before) the combined effect of the above is to reduce the storage used significantly. Without the 8bit direct mapping entry size (which is a somewhat unfair comparison because even the "giant array" didn't have that feature) we see: Plane Codepoints Blocks Backend Storage Alternative 1 51483 224 Direct 115480 311328 (CHD) 2 1981 13 Direct 7448 9760 (CHD) 3 2293 201 CHD 17952 103704 (Direct) Total 55757 140880 (~= 2.5 bytes/glyph) The other 14 planes have no glyph coverage at all, so require no storage. With the 8bit direct mapping, we see: Plane Codepoints Blocks Backend Storage Alternative 1 51483 224 Direct 57880 311328 (CHD) 2 1981 13 Direct 3864 9760 (CHD) 3 2293 201 CHD 17952 103704 (Direct) Total 55757 79696 (~= 1.4 bytes/glyph) In summary: * separating the planes has shaved ~50% off the storage required by the CHD backend * introducing the direct mapping backend has shaved a further ~60% off that * using 8bit direct mapping has shaved another ~50% off that Cumulatively, then, storage requirements are now ~86% smaller than with CHD only (and about 40% less than the BMP-only "giant table", but now with astral character support).
*	Perform font substitution for astral characters, too.	John-Mark Bell	2021-08-09	8	-105/+726
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This significantly reworks the construction of the substitution table (and hides its implementation from the rest of the library). It is no longer practical to use a directly-indexed array so, instead, we front it with a perfect hash function. The storage required for the (unoptimised) hash data is currently about 6 bits per entry. Implementing compression would reduce this to the order of ~2 bits per entry. As the resulting data structure is sparse, we must store the original Unicode codepoint value along with the identity of the font providing a suitable glyph. This has necessitated expanding the size of substitution table entries from 16 to 64 bits (of which 27 bits are currently unused). With the 55757 codepoint coverage I have been testing with, this results in an increase in the substitution table storage requirements from the original 128kB directly-indexed array (covering the Basic Multilingual Plane only) to a rather fatter 512kB (for the codepoint+font id array) + ~41kB of hash metadata. This is still ~25% the size of a linear array, however, so is not completely outrageous.
*	Include extension plane data in RUfl_cache	John-Mark Bell	2021-08-09	2	-32/+79
\| \| \| \| \|	This requires us to bump the cache version, as it is a breaking change.
*	Merge UCS font scan implementations	John-Mark Bell	2021-08-09	1	-132/+27
\| \| \| \| \| \|	The only meaningful difference is how we enumerate the codepoints represented by a font. Factor this out so that we can share almost all of the implementation.
*	Include astral characters in font scan	John-Mark Bell	2021-08-09	1	-130/+285
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now construct extension plane data if astral characters are present. Systems with a non-UCS Font Manager are still restricted to using the Basic Multilingual Plane (as there is no mechanism for encoding astral characters in the font or encoding data). Rewrite the UCS Font Manager 3.41-3.63 support to scan the font encoding itself (as Font_EnumerateCharacters is broken on these Font Manager versions). This also fixes the post-scan shrink-wrapping for Font Manager 3.64 or later -- previously it would not coalesce block bitmaps when determining that a block was full.
*	Parse UCS-aware Encoding files	John-Mark Bell	2021-08-08	1	-24/+130
\| \| \| \| \| \| \| \| \| \|	1. Comprehend the /uniXXXX and /uXXXX - /uXXXXXXXX glyph names 2. Comprehend the sparse Encoding file format that explicitly specifies the glyph index rather than inferring it Support for both of these is conditional on the Font Manager being UCS-aware (thus ensuring that we continue to parse Encoding files in the same way as before on systems with no UCS Font Manager).
*	Refactor Encoding file parsing	John-Mark Bell	2021-08-08	1	-21/+54
\| \| \| \| \| \| \|	Change this into a callback-driven approach so that the logic for dealing with each individual (glyph index, ucs4) pair is hoisted out of the parsing code itself. This will allow us to use the same parser implementation in different scenarios.
*	Use UCS-4 for rendering and display 6-digit replacement characters.	John-Mark Bell	2021-08-08	2	-41/+51
\| \| \| \| \| \| \| \| \| \| \|	As we introduce support for discovering and rendering astral characters, ensure that we pass UCS-4 to the relevant Font Manager APIs and extend our replacement hex code generation to emit 6 digits for codepoints outside the Basic Multilingual Plane. This has necessitated a change to the API of the callback function provided to rufl_paint_callback(). Where, previously, a 16 bit UCS-2 string was exposed, we now expose UCS-4.
*	Fix use after free	John-Mark Bell	2021-08-08	1	-1/+1
\|
*	Pave the way for astral character support.	John-Mark Bell	2021-08-08	4	-39/+126
\| \| \| \| \| \| \| \| \| \| \| \| \|	No functional change, but redefine the meaning of the old "size" member of the rufl_character_set structure to allow for the addition of extension structures in future. This change is backwards compatible as it is reusing previously unused bits in the size field (which will be set to zero in all existing RUfl_caches). Rename the "size" field to "metadata" which better reflects its new usage. Update rufl_character_set_test and rufl_dump_state to follow this change (and fix up their parameter types while we're here).
*	Use types with explicit sizes	John-Mark Bell	2021-08-07	1	-6/+8
\|
*	Detect overlong and invalid UTF-8 sequences	John-Mark Bell	2021-08-07	1	-0/+6
\|
*	OSLib headers are system headers.	Michael Drake	2021-04-25	6	-6/+6
\|
*	Squash another warning	John-Mark Bell	2018-01-22	1	-1/+1
\|
*	Squash warning	John-Mark Bell	2018-01-22	1	-1/+1
\|
*	"Old" FontManager: improve Encoding file parser.	John-Mark Bell	2018-01-22	1	-12/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a non-Unicode world, a (non-Base) encoding may define glyphs for up to 256 character codes. Ensure that at most 256 Encoding file entries are used (as, otherwise, the character code will overflow). In particular, if symbol fonts created for the Unicode Font Manager (which does not have a 256 character limit for an encoding) are installed on a non-Unicode-capable system, only the first 256 glyphs in the font are accessible although the Encoding file may have more than 256 entries. Note, however, that the first 32 character codes will never be used as they are considered control codes. Thus, at most 224 usable characters may be defined. A further wrinkle is that glyph names may map to multiple Unicode codepoints, thus consuming multiple slots in the unicode map (which itself has a fixed size of 256 entries). Thus, it is technically possible for the unicode map to further limit the number of usable characters in a font to fewer than 224. However, unless the font is particularly baroque, this isn't a problem in the real world, because there are only 12 glyph names which map to more than one Unicode codepoint (they map to 2, each, for a total of 24 unicode map entries, if they're all present). Thus, to run out of space in the unicode map, you'd need a font which defines at least 4 of those glyphs twice (and defines the others once, and also defines known glyphs for every other character code). Fixes #2577.
*	Fix typo	John-Mark Bell	2018-01-22	1	-1/+1
\|
*	"Old" FontManager: log character being scanned too.	John-Mark Bell	2018-01-22	1	-2/+2
\|
*	Fix typo in log message	John-Mark Bell	2018-01-21	1	-1/+1
\|