summaryrefslogtreecommitdiff
path: root/Docs/source-object-backing-store
diff options
context:
space:
mode:
authorVincent Sanders <vince@kyllikki.org>2017-06-09 17:28:55 +0100
committerVincent Sanders <vince@kyllikki.org>2017-06-09 17:30:00 +0100
commit703427a48612bf98fba599dfcd6e91485efd5b77 (patch)
treebc9df49dd3de746b738aac3ba88c204d9ab0051b /Docs/source-object-backing-store
parenta8348f3bc930151bd9aa184c8372c6af0c782730 (diff)
downloadnetsurf-703427a48612bf98fba599dfcd6e91485efd5b77.tar.gz
netsurf-703427a48612bf98fba599dfcd6e91485efd5b77.tar.bz2
Update documentation removing junk and moving to markdown for most text files
Diffstat (limited to 'Docs/source-object-backing-store')
-rw-r--r--Docs/source-object-backing-store204
1 files changed, 0 insertions, 204 deletions
diff --git a/Docs/source-object-backing-store b/Docs/source-object-backing-store
deleted file mode 100644
index 0fb3614d4..000000000
--- a/Docs/source-object-backing-store
+++ /dev/null
@@ -1,204 +0,0 @@
-Source Object (low level) cache backing store
-=============================================
-
-Introduction
-------------
-
-The source object cache provides a system to extend the life of source
-objects (HTML files, images etc.) after they are no longer immediately
-being used.
-
-Only fetch types where we have well defined rules on caching are
-considered, in practice this limits us to HTTP(S). The section in
-RFC2616 [1] on caching specifies these rules.
-
-To further extend the objects lifetime they can be pushed into a
-backing store where the objects are available for reuse less quickly
-than from RAM but faster than retrieving from the network again.
-
-The backing store implementation provides a key:value infrastructure
-with a simple store, retrieve and invalidate interface.
-
-Generic filesystem backing store
---------------------------------
-
-Although the backing store interface is fully pluggable a generic
-implementation based on storing objects on the filesystem in a
-hierarchy of directories.
-
-The option to alter the backing store format exists and is controlled
-by a version field. It is implementation defined what happens if a
-version mis-match occurs.
-
-As the backing store only holds cache data one should not expect a
-great deal of effort to be expended converting formats (i.e. the cache
-may simply be discarded).
-
-Layout version 1.1
-------------------
-
-An object has an identifier value generated from the URL (NetSurf
-backing stores uses the URL as the unique key). The value used is
-obtained using nsurl_hash() which is currently a 32 bit FNV so is
-directly usable.
-
-This identifier is adequate to ensure the collision rate for the
-hashed URL values (a collision for every 2^16 URLs added) is
-sufficiently low the overhead of returning the wrong object (which
-backing stores are permitted to do) is not significant.
-
-An entry list is maintained which contains all the metadata about a
-given identifier. This list is limited in length to constrain the
-resources necessary to maintain it. It is made persistent to avoid the
-overhead of reconstructing it at initialisation and to keep the data
-used to improve the eviction decisions.
-
-Each object is stored and retrieved directly into the filesystem using
-a filename generated from a RFC4648 base32 encoding of an address
-value. The objects address is derived from the identifier by cropping
-it to a shorter length.
-
-A mapping between the object address and its entry is maintained which
-uses storage directly proportional to the size of the address length.
-
-The cropping length is stored in the control file with the default
-values set at compile time. This allows existing backing stores to
-continue operating with existing data independently of new default
-setting. This setting gives some ability to tune the default cache
-index size to values suitable for a specific host operating system.
-
-E.g. Linux based systems can easily cope with several megabytes of
-mmaped index but RISC OS might want to limit this to a few megabytes
-of heap at most.
-
-The files are stored on disc using their base32 address value.
-By creating a directory for each character of the encoded filename
-(except the last which is of course the leafname) we create a
-directory structure where no directory has more than 32 entries.
-
-E.g. A 19bit address of 0x1 would be base32 encoded into AAAB
-resulting in the data being stored in a file path of
-"/store/prefix/d/B/A/A/BAAAAA".
-
-An address of 0x00040001 encodes to BAAB and a file path of
-"/store/prefix/m/B/A/A/BAABAAA"
-
-Version 1.0
------------
-
-The version 1 layout was identical to the 1.1 except base64url
-encoding was used, this proved problematic as some systems filesystems
-were case insensitive so upper and lower case letters collided.
-
-There is no upgrade provision from the previous version simply delete
-the cache directory.
-
-Control files
-~~~~~~~~~~~~~
-
-control
-+++++++
-A control file is used to hold a list of values describing how the
-other files in the backing store should be used.
-
-entries
-+++++++
-
-this file contains a table of entries describing the files held on the
-filesystem.
-
-Each control file table entry is 28 bytes and consists of
-
- - signed 64 bit value for last use time
-
- - 32bit full url hash allowing for index reconstruction and
- additional collision detection. Also the possibility of increasing
- the ADDRESS_LENGTH although this would require renaming all the
- existing files in the cache and is not currently implemented.
-
- - unsigned 32bit length for data
-
- - unsigned 32bit length for metadata
-
- - unsigned 16bit value for number of times used.
-
- - unsigned 16bit value for flags
-
- - unsigned 16bit value for data block index (unused)
-
- - unsigned 16bit value for metatdata block index (unused)
-
-Address to entry index
-~~~~~~~~~~~~~~~~~~~~~~
-
-An entry index is held in RAM that allows looking up the address to
-map to an entry in the control file.
-
-The index is the only data structure whose size is directly dependant
-on the length of the hash specifically:
-
-(2 ^ (ADDRESS_BITS - 3)) * ENTRY_BITS) in bytes
-
-where ADDRESS_BITS is how long the address is in bits and ENTRY_BITS
-is how many entries the control file (and hence the while
-cache) may hold.
-
-RISCOS values
-+++++++++++++
-
-By limiting the ENTRY_BITS size to 14 (16,384 entries) the entries
-list is limited to 448kilobytes.
-
-The typical values for RISC OS would set ADDRESS_BITS to 18. This
-spreads the entries over 262144 hash values which uses 512 kilobytes
-for the index. Limiting the hash space like this reduces the
-effectiveness of the cache.
-
-A small ADDRESS_LENGTH causes a collision (two URLs with the same
-address) to happen roughly for every 2 ^ (ADDRESS_BITS / 2) = 2 ^ 9 =
-512 objects stored. This roughly translates to a cache miss due to
-collision every ten pages navigated to.
-
-Larger systems
-++++++++++++++
-
-In general ENTRY_BITS set to 16 as this limits the store to 65536
-objects which given the average size of an object at 8 kilobytes
-yields half a gigabyte of disc used which is judged to be sufficient.
-
-For larger systems e.g. those using GTK frontend we would most likely
-select ADDRESS_BITS as 22 resulting in a collision every 2048 objects
-but the index using some 8 Megabytes
-
-Typical values
---------------
-
-Example 1
-~~~~~~~~~
-
-For a store with 1034 objects generated from a random navigation of
-pages linked from the about:welcome page.
-
-Metadata total size is 593608 bytes an average of 574 bytes. The
-majority of the storage is used to hold the URLs and headers.
-
-Data total size is 9180475 bytes a mean of 8879 bytes 1648726 in the
-largest 10 entries which if excluded gives 7355 bytes average size
-
-Example 2
-~~~~~~~~~
-
-355 pages navigated in 80 minutes from about:welcome page and a
-handful of additional sites (google image search and reddit)
-
-2018 objects in cache at quit. 400 objects from news.bbc.co.uk alone
-
-Metadata total 987,439 bytes mean of 489 bytes
-
-data total 33,127,831 bytes mean of 16,416 bytes
-
-with one single 5,000,811 byte gif
-
-data totals without gif is 28,127,020 mean 13,945
-
-[1] http://tools.ietf.org/html/rfc2616#section-13 \ No newline at end of file