Hubbub -- an HTML parser
========================
Overview
--------
Hubbub is a flexible HTML parser. It aims to comply with the HTML5
specification.
Requirements
------------
Hubbub requires the following tools:
+ A C99 capable C compiler
+ GNU make or compatible
+ Perl (for the testcases)
+ Pkg-config (for the testcases)
+ xsltproc (for the entity fetcher)
+ wget (for the entity fetcher)
+ doxygen (for the API documentation)
Hubbub also requires the following libraries to be installed:
+ An iconv implementation (e.g. libiconv)
+ LibParserUtils -- see below for further information
+ JSON-C (for the testcases) -- see below for further information
Hubbub can make use of the following, for debugging and testing purposes:
+ gcov and lcov, for test coverage data
Compilation
-----------
In order to compile Hubbub, you will need LibParserUtils. This can be
obtained from SVN:
$ svn co svn://svn.netsurf-browser.org/trunk/libparserutils/
In order to run tests, you will need JSON-C. You can obtain the version
that Hubbub needs from SVN:
$ svn co svn://svn.netsurf-browser.org/trunk/json-c/json-c/
Compile and install both of these before trying to make Hubbub.
Note: By default, libparserutils only supports a few character sets. It may,
however, be configured to use iconv() to provide charset conversion.
To do this, do the following:
$ cd /path/to/libparserutils
$ echo "CFLAGS += -DWITH_ICONV_FILTER" \
>build/Makefile.config.override
Then build libparserutils as normal.
If necessary, modify the toolchain settings in the Makefile.
Invoke make:
$ make
Verification
------------
To verify that the parser is working, it is necessary to specify a
different makefile target than that used for normal compilation, thus:
$ make test
If you wish to see test coverage statistics, run:
$ make coverage
Then open the build/coverage/index.html file in a web browser.
API documentation
-----------------
Currently, there is none. However, the code is well commented and the
public API may be found in the "include" directory. The "examples"
directory contains commented examples of how to use hubbub.
Additionally, you can use doxygen to auto-generate API documentation, thus:
$ make docs
Then open the build/docs/html/index.html file in a web browser.
A note on character set aliases
-------------------------------
Hubbub uses an external mapping file to encode relationships between
character set names. This is the "Aliases" file. A copy may be found at
test/data/Aliases. The path to this file is required when calling
hubbub_initialise().