2 datasets found

Keywords: web page cleaning

Filter Results
  • KrdWrd CANOLA Corpus 1.0

    The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and...
  • KrdWrd CANOLA Corpus 1.1

    The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and...
You can also access this registry using the API (see API Docs).