Rsyst::diatom_rbcl_align_312bp database: a database adapted to DNA metabarcoding (version v7: 23-02-2018)

DOI

Method followed to obtain the Rsyst::diatom_rbcl_align_312bp database: 1/ Extraction of the 312bp rbcL barcode from the full Rsyst::diatom database rbcL alignment (using Diat_rbcL_108F and R3 primers). 2/ Sequences with ambiguities (N), homopolymers > 8 and length < 312bp are removed from the database. 3/ Resulting sequences are dereplicated into Individual Sequence Unit (ISUs) in order to identify taxa sharing identical DNA sequences on the 312bp rbcL barcode region. If necessary, taxonomy is harmonized between all taxa found in each single ISU. Finally, only ISUs are conserved in the database, each represented by 1 taxa ID and DNA sequence. 4/ Resulting ISUs database is assigned to itself using the Mothur assignment algorithm (classify.seqs command). Expected and newly obtained taxonomy are compared to evaluate potential source of biases (erroneous taxonomic name, taxa impossible to differentiate,...). If necessary, ambiguous sequence are removed from the database or taxonomy is adjusted. 5/ Finally, potential conflicting names are harmonized (e.g. "aff." and "cf." removed, "Nanofrustulum_sp._SZCZCH285" transformed into "Nanofrustulum_sp.") 6/ The ".fasta" file contains the rbcL 312bp DNA sequences and the ".txt" file contains the corresponding taxonomy (common sequence ID in both files). The sequence ID is composed by a accession number (present also in R-Syst::diatom library) and a the original taxnomic name given by the author of the sequence (eg: TCC7a-Rbcl-1|Fragilaria_vaucheriae). This sequence ID is shared by the .fasta and .txt files. 7/ The text file gathers the curated taxonomical information from empire to species level. For instance: TCC7a-Rbcl-1|Fragilaria_vaucheriae Eukaryota;Chromista;Chromobiota;Bacillariophyta;Fragilariophyceae;Fragilariales;Fragilariaceae;Fragilaria;Fragilaria_nanoides. In this case, Fragilaria_vaucheriae is the original species name given by the author and Fragilaria_nanoides is the curated species name adapted to metabarcoding. This database has been curated for a specific use with filtering procedure based on our own experience and is provided on an indicative basis. The original database, R-Syst::diatom, is the reference and you can curate it differently to meet your personal requirements and final usages.

Identifier
DOI https://doi.org/10.15454/HYRVUH
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.15454/HYRVUH
Provenance
Creator Vasselon, Valentin; Rimet, Frederic; Bouchez, Agnès
Publisher Recherche Data Gouv
Contributor Vasselon, Valentin
Publication Year 2018
Rights etalab 2.0; info:eu-repo/semantics/openAccess; https://spdx.org/licenses/etalab-2.0.html
OpenAccess true
Contact Vasselon, Valentin (INRA - Institut National de la Recherche Agronomique, SCIMABIO Interface)
Representation
Resource Type Dataset
Format application/octet-stream; text/plain
Size 488086; 234794
Version 1.0
Discipline Computer Science; Geosciences; Life Sciences; Ecology; Earth and Environmental Science; Basic Biological and Medical Research; Biology; Biospheric Sciences; Computer Science, Electrical and System Engineering; Engineering Sciences; Environmental Research; Medicine; Natural Sciences; Omics; Microbial Ecology and Applied Microbiology; Soil Sciences; Hydrology and Hydrogeology