diff options
author | Vincent Sanders <vince@kyllikki.org> | 2017-06-09 17:28:55 +0100 |
---|---|---|
committer | Vincent Sanders <vince@kyllikki.org> | 2017-06-09 17:30:00 +0100 |
commit | 703427a48612bf98fba599dfcd6e91485efd5b77 (patch) | |
tree | bc9df49dd3de746b738aac3ba88c204d9ab0051b /Docs/source-object-backing-store | |
parent | a8348f3bc930151bd9aa184c8372c6af0c782730 (diff) | |
download | netsurf-703427a48612bf98fba599dfcd6e91485efd5b77.tar.gz netsurf-703427a48612bf98fba599dfcd6e91485efd5b77.tar.bz2 |
Update documentation removing junk and moving to markdown for most text files
Diffstat (limited to 'Docs/source-object-backing-store')
-rw-r--r-- | Docs/source-object-backing-store | 204 |
1 files changed, 0 insertions, 204 deletions
diff --git a/Docs/source-object-backing-store b/Docs/source-object-backing-store deleted file mode 100644 index 0fb3614d4..000000000 --- a/Docs/source-object-backing-store +++ /dev/null @@ -1,204 +0,0 @@ -Source Object (low level) cache backing store -============================================= - -Introduction ------------- - -The source object cache provides a system to extend the life of source -objects (HTML files, images etc.) after they are no longer immediately -being used. - -Only fetch types where we have well defined rules on caching are -considered, in practice this limits us to HTTP(S). The section in -RFC2616 [1] on caching specifies these rules. - -To further extend the objects lifetime they can be pushed into a -backing store where the objects are available for reuse less quickly -than from RAM but faster than retrieving from the network again. - -The backing store implementation provides a key:value infrastructure -with a simple store, retrieve and invalidate interface. - -Generic filesystem backing store --------------------------------- - -Although the backing store interface is fully pluggable a generic -implementation based on storing objects on the filesystem in a -hierarchy of directories. - -The option to alter the backing store format exists and is controlled -by a version field. It is implementation defined what happens if a -version mis-match occurs. - -As the backing store only holds cache data one should not expect a -great deal of effort to be expended converting formats (i.e. the cache -may simply be discarded). - -Layout version 1.1 ------------------- - -An object has an identifier value generated from the URL (NetSurf -backing stores uses the URL as the unique key). The value used is -obtained using nsurl_hash() which is currently a 32 bit FNV so is -directly usable. - -This identifier is adequate to ensure the collision rate for the -hashed URL values (a collision for every 2^16 URLs added) is -sufficiently low the overhead of returning the wrong object (which -backing stores are permitted to do) is not significant. - -An entry list is maintained which contains all the metadata about a -given identifier. This list is limited in length to constrain the -resources necessary to maintain it. It is made persistent to avoid the -overhead of reconstructing it at initialisation and to keep the data -used to improve the eviction decisions. - -Each object is stored and retrieved directly into the filesystem using -a filename generated from a RFC4648 base32 encoding of an address -value. The objects address is derived from the identifier by cropping -it to a shorter length. - -A mapping between the object address and its entry is maintained which -uses storage directly proportional to the size of the address length. - -The cropping length is stored in the control file with the default -values set at compile time. This allows existing backing stores to -continue operating with existing data independently of new default -setting. This setting gives some ability to tune the default cache -index size to values suitable for a specific host operating system. - -E.g. Linux based systems can easily cope with several megabytes of -mmaped index but RISC OS might want to limit this to a few megabytes -of heap at most. - -The files are stored on disc using their base32 address value. -By creating a directory for each character of the encoded filename -(except the last which is of course the leafname) we create a -directory structure where no directory has more than 32 entries. - -E.g. A 19bit address of 0x1 would be base32 encoded into AAAB -resulting in the data being stored in a file path of -"/store/prefix/d/B/A/A/BAAAAA". - -An address of 0x00040001 encodes to BAAB and a file path of -"/store/prefix/m/B/A/A/BAABAAA" - -Version 1.0 ------------ - -The version 1 layout was identical to the 1.1 except base64url -encoding was used, this proved problematic as some systems filesystems -were case insensitive so upper and lower case letters collided. - -There is no upgrade provision from the previous version simply delete -the cache directory. - -Control files -~~~~~~~~~~~~~ - -control -+++++++ -A control file is used to hold a list of values describing how the -other files in the backing store should be used. - -entries -+++++++ - -this file contains a table of entries describing the files held on the -filesystem. - -Each control file table entry is 28 bytes and consists of - - - signed 64 bit value for last use time - - - 32bit full url hash allowing for index reconstruction and - additional collision detection. Also the possibility of increasing - the ADDRESS_LENGTH although this would require renaming all the - existing files in the cache and is not currently implemented. - - - unsigned 32bit length for data - - - unsigned 32bit length for metadata - - - unsigned 16bit value for number of times used. - - - unsigned 16bit value for flags - - - unsigned 16bit value for data block index (unused) - - - unsigned 16bit value for metatdata block index (unused) - -Address to entry index -~~~~~~~~~~~~~~~~~~~~~~ - -An entry index is held in RAM that allows looking up the address to -map to an entry in the control file. - -The index is the only data structure whose size is directly dependant -on the length of the hash specifically: - -(2 ^ (ADDRESS_BITS - 3)) * ENTRY_BITS) in bytes - -where ADDRESS_BITS is how long the address is in bits and ENTRY_BITS -is how many entries the control file (and hence the while -cache) may hold. - -RISCOS values -+++++++++++++ - -By limiting the ENTRY_BITS size to 14 (16,384 entries) the entries -list is limited to 448kilobytes. - -The typical values for RISC OS would set ADDRESS_BITS to 18. This -spreads the entries over 262144 hash values which uses 512 kilobytes -for the index. Limiting the hash space like this reduces the -effectiveness of the cache. - -A small ADDRESS_LENGTH causes a collision (two URLs with the same -address) to happen roughly for every 2 ^ (ADDRESS_BITS / 2) = 2 ^ 9 = -512 objects stored. This roughly translates to a cache miss due to -collision every ten pages navigated to. - -Larger systems -++++++++++++++ - -In general ENTRY_BITS set to 16 as this limits the store to 65536 -objects which given the average size of an object at 8 kilobytes -yields half a gigabyte of disc used which is judged to be sufficient. - -For larger systems e.g. those using GTK frontend we would most likely -select ADDRESS_BITS as 22 resulting in a collision every 2048 objects -but the index using some 8 Megabytes - -Typical values --------------- - -Example 1 -~~~~~~~~~ - -For a store with 1034 objects generated from a random navigation of -pages linked from the about:welcome page. - -Metadata total size is 593608 bytes an average of 574 bytes. The -majority of the storage is used to hold the URLs and headers. - -Data total size is 9180475 bytes a mean of 8879 bytes 1648726 in the -largest 10 entries which if excluded gives 7355 bytes average size - -Example 2 -~~~~~~~~~ - -355 pages navigated in 80 minutes from about:welcome page and a -handful of additional sites (google image search and reddit) - -2018 objects in cache at quit. 400 objects from news.bbc.co.uk alone - -Metadata total 987,439 bytes mean of 489 bytes - -data total 33,127,831 bytes mean of 16,416 bytes - -with one single 5,000,811 byte gif - -data totals without gif is 28,127,020 mean 13,945 - -[1] http://tools.ietf.org/html/rfc2616#section-13
\ No newline at end of file |