Genome sequence of Beluga and Narwhal

Dataset

Reference genomes provide a foundational framework for evolutionary investigations, ecological analysis, and conservation science, and yet the context to understand uncertainty and errors in reference genome construction is typically not provided for end-users. The reference genome for beluga (Delphinapterus leucas) was forwarded in 2017 based on linked reads and iterative scaffolding, and since improved upon with Hi-C data. Here, we forward an improved reference genome for beluga built using a combination of PacBio CLR, illumina short reads, and Hi-C data. We identified several large structural errors in the scaffolding of the original 2017 beluga assembly and unsupported scaffolding orientations in the Hi-C scaffolded version. We also found discrepancies in the order and orientation of contigs that remained in our PacBio assemblies, with inversions being notably abundant. Altogether, we forward a more accurate, if slightly less contiguous, representation of the beluga whale genome, and provide users with intermediate files, code, tables listing regions of uncertainty/discrepancy across assemblies, and gene annotations to critically evaluate, leverage, and potentially improve on our work.

Identifier
Source	https://data.blue-cloud.org/search-details?step=~01232A36E8C8C511E32BC0B1B82837E2CDCCBB2BA63
Metadata Access	https://data.blue-cloud.org/api/collections/32A36E8C8C511E32BC0B1B82837E2CDCCBB2BA63

Provenance
Instrument	Sequel II; PACBIO_SMRT
Publisher	Blue-Cloud Data Discovery & Access service; ELIXIR-ENA
Publication Year	2024
OpenAccess	true
Contact	blue-cloud-support(at)maris.nl

Representation
Discipline	Marine Science
Temporal Point	2006-01-01T00:00:00Z