Dataset for publication: Usefulness of synthetic datasets for diatom automatic detection using a deep-learning approach


This repository contains the dataset and code used to generate synthetic dataset as explained in the paper "Usefulness of synthetic datasets for diatom automatic detection using a deep-learning approach".

Dataset : The dataset consists of two components: individual diatom images extracted from publicly available diatom atlases [1,2,3] and individual debris images. - Individual diatom images : currently, the repository consists of 166 diatom species, totalling 9230 images. These images were automatically extracted from atlases using PDF scraping, cleaned and verified by diatom taxonomists. The subfolders within each diatom specie indicates the origin of the images: RA[1], IDF[2], BRG[3]. Additional diatom species and images will be regularly updated in the repository. - Individual debris images : the debris images were extracted from real microscopy images. The repository contains 600 debris objects.

Code : Contains the code used to generate synthetic microscopy images. For details on how to use the code, kindly refer to the README file available in synthetic_data_generator/.

Related Identifier
Metadata Access
Creator Laviale, Martin ORCID logo; Venkataramanan, Aishwarya ORCID logo
Publisher Université de Lorraine
Contributor Laviale, Martin
Publication Year 2023
Rights Etalab (CC-BY); info:eu-repo/semantics/openAccess;
OpenAccess true
Contact Laviale, Martin (LIEC ; Université de Lorraine, CNRS ; France)
Resource Type Image; Dataset
Format application/zip; text/x-python; text/tab-separated-values; text/markdown; image/jpeg; application/octet-stream
Size 50188610; 8545; 4882; 12356; 1957; 1716; 2269; 7239; 1530; 3391; 652; 456
Version 1.0
Discipline Earth and Environmental Science; Environmental Research; Geosciences; Natural Sciences