Hubbub -- an HTML parser ======================== Overview -------- Hubbub is a flexible HTML parser. It aims to comply with the HTML5 specification. Requirements ------------ Hubbub requires the following tools: + A C99 capable C compiler + GNU make or compatible + Perl (for the testcases) + Pkg-config (for the testcases) + xsltproc (for the entity fetcher) + wget (for the entity fetcher) + doxygen (for the API documentation) Hubbub also requires the following libraries to be installed: + An iconv implementation (e.g. libiconv) + LibParserUtils -- see below for further information + JSON-C (for the testcases) -- see below for further information Hubbub can make use of the following, for debugging and testing purposes: + gcov and lcov, for test coverage data Compilation ----------- In order to compile Hubbub, you will need LibParserUtils. This can be obtained from SVN: $ svn co svn://svn.netsurf-browser.org/trunk/libparserutils/ In order to run tests, you will need JSON-C. You can obtain the version that Hubbub needs from SVN: $ svn co svn://svn.netsurf-browser.org/trunk/json-c/json-c/ Compile and install both of these before trying to make Hubbub. Note: By default, libparserutils only supports a few character sets. It may, however, be configured to use iconv() to provide charset conversion. To do this, do the following: $ cd /path/to/libparserutils $ echo "CFLAGS += -DWITH_ICONV_FILTER" \ >build/Makefile.config.override Then build libparserutils as normal. If necessary, modify the toolchain settings in the Makefile. Invoke make: $ make Verification ------------ To verify that the parser is working, it is necessary to specify a different makefile target than that used for normal compilation, thus: $ make test If you wish to see test coverage statistics, run: $ make coverage Then open the build/coverage/index.html file in a web browser. API documentation ----------------- Currently, there is none. However, the code is well commented and the public API may be found in the "include" directory. The "examples" directory contains commented examples of how to use hubbub. Additionally, you can use doxygen to auto-generate API documentation, thus: $ make docs Then open the build/docs/html/index.html file in a web browser. A note on character set aliases ------------------------------- Hubbub uses an external mapping file to encode relationships between character set names. This is the "Aliases" file. A copy may be found at test/data/Aliases. The path to this file is required when calling hubbub_initialise().