OpenLegalData (2022 - Corpus)

PID

OpenLegalData is a free and open platform that makes legal documents and information available to the public. The aim of this platform is to improve the transparency of jurisprudence with the help of open data and to help people without legal training to understand the justice system. The project is committed to the Open Data principles and the Free Access to Justice Movement.

OpenLegalData's DUMP as of 2022-10-18 was used to create this corpus. The data was cleaned, automatically annotated (TreeTagger: POS & Lemma) and grouped based on the metadata (jurisdiction - BundeslandID - sub-size if applicable - ex: Verwaltungsgerichtsbarkeit_11_05.cec6.gz - jurisdiction: administrative jurisdiction, BundeslandID = 11 - sub-corpus = 05). Sub-corpora are randomly split into 50 MB each.

Corpus data is available in CEC6 format. This can be converted into many different corpus formats - use the software www.CorpusExplorer.de if necessary.

Identifier
PID http://hdl.handle.net/11372/LRT-5196
Related Identifier https://notes.jan-oliver-ruediger.de/korpora/
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11372/LRT-5196
Provenance
Creator Rüdiger, Jan Oliver
Publisher Rüdiger, Jan Oliver
Publication Year 2023
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); http://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language German
Resource Type corpus
Format text/plain; charset=utf-8; application/x-gzip; downloadable_files_count: 194
Discipline Linguistics