Texts in the Nanotechnology domain come from iNano (Interdisciplinary Nanoscience Center, AU), Nano (DTU), Niels Bohr Institutet, Forskningscenter Risø, Ministeriet for Sundhed og Forebyggelse (via DTU), Miljøstyrelsen, Aktuel Naturvidenskab and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011.
The corpus consists of 358,144 words in 157 files.
Communicative setting/Number of files: expert->advanced (13) expert->basic (144)
All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, sentence and paragrapgsegmentation, pos-tagging, lemmatisation and termhood annotation placed in separate text external spangroups.
"DK-CLARIN LSP Corpus - Nanotechnology domain" is a part of the Danish DK-CLARIN LSP corpus consisting of seven sub-corpora from following subject domains: Agriculture, Construction, Economics, Environment, Health, IT and Nanotechnology.