The deposited genetic data were obtained from environmental DNA (eDNA) samples collected around Réunion Island in the southwest Indian Ocean between May 2018 and June 2019. Surface water samples were collected from a vessel without access to cold storage or onboard laboratory facilities. To limit cross-contamination, all personnel onboard wore protective gowns and nitrile gloves during sample collection. Sampling was conducted at the front of the vessel to avoid contact with the hull, using 5L sterile bottles at the sea surface interface.
To develop an eDNA protocol, 11 surface water samples were collected during marine mammal sightings using various filtration capsules and preservation buffers. Based on assessments of feasibility, filtration time, DNA concentration, and cost, a protocol was established for sampling 20 sites around Réunion Island. At each site, a 10 L seawater sample (2 x 5 L) was collected from the sea surface interface using a Sterlitech filter, RNAlater solution, a peristaltic pump, and sterile tubing. Samples with marine mammal observations were favored to compare eDNA detections with recorded marine mammal sightings. Of the 20 samples, 14 were collected within close proximity (10-20 meters) of marine mammals.
Following primer testing, the ~230 bp hypervariable region of the 12S rRNA gene (MiMammal) was amplified. DNA amplifications were conducted with 12 PCR replicates in a final volume of 10 μL. The amplification mixture contained 1X Phusion Green Hot Start II High-Fidelity PCR Master Mix (Thermo Scientific), 0.4 μM of each of the tailed primers, 2 μM of our developped human blocking primer, 0.8 μg/μL bovine serum albumin (BSA - Thermo Scientific), 3% of DMSO (Thermo Scientific), 1.5 mM of MgCl2 (Invitrogen), and topped up with PCR grade water (Thermo Scientific). The human blocking primers were added in a 5x concentration relative to the mammal primers. PCR conditions comprised of an initial denaturation at 98 °C for 3 minutes, followed 45 cycles of 20 seconds at 98 °C, 15 seconds at 69 °C, and 15 seconds at 72 °C, and a final elongation step at 72 °C for 5 minutes. To monitor potential contaminants, a total of 3 negative extraction controls, 3 negative PCR controls (ultrapure water, 12 replicates), and 3 positive control sample (a mock community with a known composition) were amplified and sequenced in parallel to the samples. Amplification success was determined by gel electrophoresis. DNA was purified to remove PCR inhibitors using a DNeasy PowerClean Pro Cleanup Kit (Qiagen). Purified DNA extracts were quantified using a Qubit dsDNA HS Assay Kit on a Qubit 3.0 fluorometer (Thermo Scientific). PCR replicates were pooled and sequencing adapters were added. The final library was sequenced using an Illumina MiSeq V2 kit at 15 pM with a 10% PhiX spike.
Sequence data was processed using a NatureMetrics custom bioinformatics pipeline for quality filtering, dereplication, and taxonomic assignment. Samples were demultiplexed based on the combination of the i5 and i7 index tags. Paired-end reads for each sample were merged with USEARCH with a minimum overlap of 20% of the total read length. Forward and reverse primers were trimmed from the merged sequences with CUTADAPT and retained if the trimmed length was between 140bp and 200 bp. These sequences were quality filtered with USEARCH to retain only those with an expected error rate per base of 0.05 or below and dereplicated by sample, retaining singletons. Unique reads from all samples were denoised in a single analysis with UNOISE, requiring retained sequences to have a minimum abundance of 8 in at least one sample. After filtering, taxa were identified by comparing those sequences to the GenBank reference database. A taxon-by-sample table was generated by mapping all dereplicated reads for each sample to the denoised sequences with USEARCH at an identity threshold of 97%. Denoised sequences were identified via BLAST against the nucleotide (nt) database from GenBank. Identifications to species level were based on the highest available percentage identity ≥99%, with an e-score of 1e-20 and a hit length of at least 80% of the query sequence. In cases where multiple reference sequences match equally to the query sequence then a more conservative higher taxonomic classification is given. Only sequences with species - or genus - level identifications were included in the final results. Where a species is represented by multiple Operational Taxonomic Units (OTUs), the sequence with the highest percentage match to that species is taken as the representative. Typically, the other sequences have the same occurrence pattern and the lower sequence similarity can be attributed to PCR or sequencing errors. Only sequences with species- or genus-level identifications were included in the final results.
These genetic data provide a valuable resource for studying marine biodiversity around Réunion Island and will contribute to a better understanding of the distribution and diversity of marine mammals in this region of the Indian Ocean.