An updated catalog of genes and species of the pig gut microbiota

DOI

Dataset overview We built an updated catalog of 9.3M genes found in the pig gut microbiota. Co-abundant genes were binned in 1523 Metagenomic Species Pan-genomes (MSPs) for which we provide taxonomic labels and a phylogenetic tree. In addition, we reconstituted 7059 Metagenome-Assembled Genomes (MAGs) covering of 760 Metagenomic Species and we extracted 7331 viral genomes from assemblies. Finally, we used Pairwise Comparative Modelling to predict 6140 antibiotic resistance genes. This dataset can be used to analyze shotgun sequencing data of the pig gut microbiota.

Methods Sequencing data availability Sequencing data from Xiao et al. (PRJEB11755, n=287) and Kim et al. (PRJEB32496, n=36) was downloaded from the European Nucleotide Archive.

Sequencing data quality control Illumina adapters removal and read trimming was performed with fastp . Reads mapped on the host genome (GCF_000003025.6) with bowtie2 were removed with samtools.

Metagenomic assembly Metagenomic assembly was performed with metaSPAdes. Contigs of less than 1500 bp were removed.

MAGs creation Reads of each sample were aligned to their respective assembly with bowtie2 and results were indexed in sorted bam files with samtools. Then, contigs coverage was computed in each sample with jgi_summarize_bam_contig_depths. MAGs were generated with MetaBAT 2 and MaxBin2. Finally, results of both tools were combined with DAS Tool and MAGs quality was assessed with checkM. MAGs with completeness 5% were discarded.

Extraction of viral genomes Candidates viral sequences were identified in assemblies with VirFinder. Then, viral genomes quality was assessed with checkV and those low or undetermined quality were discarded.

Non-redundant gene catalog Genes were predicted on all contigs with Prodigal (parameters : -m -p meta ). Genes with missing start codon or shorter than 99 bp were discarded. Then, partial and complete genes were separately clustered with cd-hit-est (parameters -c 0.95 -aS 0.90 -G 0 -d 0 -M 0 -T 0 ). The two non-redundant gene sets were merged by considering at first complete genes from the longest contigs (contact us for futher details).

MSPs creation Using the Meteor software suite, reads from each sample were mapped against the non redundant catalog to build a raw gene abundance table (9.3 million genes quantified in 323 samples). This table was submitted to MSPminer that reconstituted 1523 clusters of co-abundant genes named Metagenomic-Species Pangenomes (MSPs). Quality control of each MSP was manually performed by visualizing heatmaps representative of the normalized gene abundance profiles.

Taxonomic annotation MAGs and MSPs were annotated with GTDB-Tk based on GTDB Release 05-RS95.

Construction of the phylogenetic tree 39 universal phylogenetic markers genes were extracted from the 1523 MSPs (or the corresponding MAGs if available) with fetchMGs. Then, the markers were separately aligned with MUSCLE. The 40 alignments were merged and trimmed with trimAl (parameters: -automated1). Finally, the phylogenetic tree was computed with FastTreeMP (parameters: -gamma -pseudo -spr -mlacc 3 -slownni).

Prediction of antibiotic resistance genes Antibiotic resistance genes were predicted with the Pairwise Comparative Modelling approach (last version available here).

Identifier
DOI https://doi.org/10.15454/OPAULL
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.15454/OPAULL
Provenance
Creator Plaza Onate, Florian ORCID logo; Ghozlane, Amine ORCID logo; Almeida, Mathieu ORCID logo
Publisher Recherche Data Gouv
Contributor Plaza Onate, Florian
Publication Year 2021
Rights etalab 2.0; info:eu-repo/semantics/openAccess; https://spdx.org/licenses/etalab-2.0.html
OpenAccess true
Contact Plaza Onate, Florian (INRAE)
Representation
Resource Type Dataset
Format application/x-gzip; application/gzip; text/tab-separated-values
Size 63797558; 30817; 35765; 19937965; 2432129876; 4338183470; 528639; 18224
Version 4.0
Discipline Geosciences; Life Sciences; Ecology; Veterinary Medicine; Biology; Omics; Microbial Ecology and Applied Microbiology