Updated Metagenomic Species Pan-genomes (MSPs) of the human gastrointestinal microbiota

DOI

Dataset overview This dataset provides:

the updated Integrated Gene Catalog of the human gut microbiota (aka IGC2) 1,989 Metagenomic Species Pangenomes (MSPs)

This dataset can be used to analyze shotgun sequencing data of the human gut microbiota.

How to use this dataset To perform taxonomic, functionnal and strain level profiling with this dataset, we suggest using Meteor.

Methods Gene catalog construction The methodology for creating the IGC2 catalog is described in the original papers: Li et al., 2014 and Wen et al., 2017

MSP creation Reads from publicly available human gut metagenomes were aligned against the IGC2 catalog with the Meteor to produce a raw gene abundance table (10.4M genes quantified in >2000 samples). Then, co-abundant genes were binned in 1,989 Metagenomic Species Pan-genomes (MSPs, i.e. clusters of co-abundant genes that likely belong to the same microbial species) using MSPminer.

MSPs taxonomic annotation MSPs taxonomic annotation was performed by aligning MSP core and accessory genes against representative genomes of the Genome Taxonomy Database (GTDB r207) using blastn (task = megablast, word_size = 16). The 20 best hits for each gene were kept (--max-target-seq 20). Using an in-house pipeline, a species-level assignment was given if > 50% of the genes matched the representative genome of a given species, with a mean identity ≥ 95% and mean gene length coverage ≥ 90%. The remaining MSPs were assigned to a higher taxonomic level (genus to superkingdom), if more than 50% of their genes had the same annotation.

Construction of the phylogenetic tree 39 universal phylogenetic markers genes were extracted from the MSPs with fetchMGs. Then, the markers were separately aligned with MUSCLE. The alignments were merged and trimmed with trimAl (parameters: -automated1). Finally, the phylogenetic tree was computed with FastTreeMP (parameters: -gamma -pseudo -spr -mlacc 3 -slownni).

Identifier
DOI https://doi.org/10.15454/FLANUP
Related Identifier IsCitedBy https://doi.org/10.21203/rs.3.rs-339282/v1
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.15454/FLANUP
Provenance
Creator Plaza Onate, Florian ORCID logo; Pons, Nicolas; Gauthier, Franck; Almeida, Mathieu ORCID logo; Ehrlich, Stanislav Dusko ORCID logo; Le Chatelier, Emmanuelle ORCID logo
Publisher Recherche Data Gouv
Contributor Plaza Onate, Florian
Publication Year 2021
Rights etalab 2.0; info:eu-repo/semantics/openAccess; https://spdx.org/licenses/etalab-2.0.html
OpenAccess true
Contact Plaza Onate, Florian (INRAE)
Representation
Resource Type Dataset
Format text/tab-separated-values; text/plain; application/octet-stream; application/gzip; application/x-gzip
Size 508990; 100336; 69844; 60576; 35407049; 1469456191; 2402263879; 1899568
Version 6.1
Discipline Life Sciences; Biology; Omics; Pathology and Forensic Medicine; Microbial Ecology and Applied Microbiology