Dataset overview
We built an updated catalog of 9.3M genes found in the pig gut microbiota.
Co-abundant genes were binned in 1523 Metagenomic Species Pan-genomes (MSPs) for which we provide taxonomic labels and a phylogenetic tree. In addition, we reconstituted 7059 Metagenome-Assembled Genomes (MAGs) covering of 760 Metagenomic Species and we extracted 7331 viral genomes from assemblies.
Finally, we used Pairwise Comparative Modelling to predict 6140 antibiotic resistance genes.
This dataset can be used to analyze shotgun sequencing data of the pig gut microbiota.
Methods
Sequencing data availability
Sequencing data from Xiao et al. (PRJEB11755, n=287) and Kim et al. (PRJEB32496, n=36) was downloaded from the European Nucleotide Archive.
Sequencing data quality control
Illumina adapters removal and read trimming was performed with fastp . Reads mapped on the host genome (GCF_000003025.6) with bowtie2 were removed with samtools.
Metagenomic assembly
Metagenomic assembly was performed with metaSPAdes. Contigs of less than 1500 bp were removed.
MAGs creation
Reads of each sample were aligned to their respective assembly with bowtie2 and results were indexed in sorted bam files with samtools. Then, contigs coverage was computed in each sample with jgi_summarize_bam_contig_depths. MAGs were generated with MetaBAT 2 and MaxBin2. Finally, results of both tools were combined with DAS Tool and MAGs quality was assessed with checkM. MAGs with completeness 5% were discarded.
Extraction of viral genomes
Candidates viral sequences were identified in assemblies with VirFinder. Then, viral genomes quality was assessed with checkV and those low or undetermined quality were discarded.
Non-redundant gene catalog
Genes were predicted on all contigs with Prodigal (parameters : -m -p meta ). Genes with missing start codon or shorter than 99 bp were discarded. Then, partial and complete genes were separately clustered with cd-hit-est (parameters -c 0.95 -aS 0.90 -G 0 -d 0 -M 0 -T 0 ). The two non-redundant gene sets were merged by considering at first complete genes from the longest contigs (contact us for futher details).
MSPs creation
Using the Meteor software suite, reads from each sample were mapped against the non redundant catalog to build a raw gene abundance table (9.3 million genes quantified in 323 samples). This table was submitted to MSPminer that reconstituted 1523 clusters of co-abundant genes named Metagenomic-Species Pangenomes (MSPs).
Quality control of each MSP was manually performed by visualizing heatmaps representative of the normalized gene abundance profiles.
Taxonomic annotation
MAGs and MSPs were annotated with GTDB-Tk based on GTDB Release 05-RS95.
Construction of the phylogenetic tree
39 universal phylogenetic markers genes were extracted from the 1523 MSPs (or the corresponding MAGs if available) with fetchMGs. Then, the markers were separately aligned with MUSCLE. The 40 alignments were merged and trimmed with trimAl (parameters: -automated1). Finally, the phylogenetic tree was computed with FastTreeMP (parameters: -gamma -pseudo -spr -mlacc 3 -slownni).
Prediction of antibiotic resistance genes
Antibiotic resistance genes were predicted with the Pairwise Comparative Modelling approach (last version available here).