Trusted Research Environments: Analysis of Characteristics and Data Availability

DOI

Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices  make available a majority of available sensitive data records included in this study. Methodology We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:

Peer-reviewed articles where available, TRE websites, TRE metadata catalogs. The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months. Technical details This dataset consists of five comma-separated values (.csv) files describing our inventory:

countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional) tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional) access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional) inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional) major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional). Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:

schema.sql: Schema definition file to create the tables and views used in the analysis. The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb

Identifier
DOI https://doi.org/10.48436/cv20m-sg117
Related Identifier IsSupplementTo https://dbrepo1.ec.tuwien.ac.at/pid/33
Related Identifier References https://gitlab.tuwien.ac.at/martin.weise/tres
Related Identifier References https://mybinder.org/v2/git/https%3A%2F%2Fgitlab.tuwien.ac.at%2Fmartin.weise%2Ftres/HEAD?labpath=analysis.ipynb
Related Identifier IsVersionOf https://doi.org/10.48436/b0ph9-kqe04
Metadata Access https://researchdata.tuwien.ac.at/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:researchdata.tuwien.ac.at:cv20m-sg117
Provenance
Creator Weise, Martin (ORCID: 0000-0003-4216-302X); Rauber, Andreas ORCID logo
Publisher TU Wien
Publication Year 2024
Rights Creative Commons Attribution 4.0 International; https://creativecommons.org/licenses/by/4.0/legalcode
OpenAccess true
Contact tudata(at)tuwien.ac.at
Representation
Language English
Resource Type Dataset
Version 1.0.0
Discipline Other