TODO list
=========

  + Update tokeniser to comply with latest spec draft (currently complies
    with 2007-06-12 draft)
  + Implement one or more tree builders
  + More charset convertors (or make the iconv codec significantly faster)
  + Parse error reporting from the tokeniser
  + Implement extraneous chunk insertion/tokenisation
  + Statistical charset autodetection
  + Shared library, for those platforms that support such things
  + Optimise it