Genome sequence of Beluga and Narwhal

Reference genomes provide a foundational framework for evolutionary investigations, ecological analysis, and conservation science, and yet the context to understand uncertainty and errors in reference genome construction is typically not provided for end-users. The reference genome for beluga (Delphinapterus leucas) was forwarded in 2017 based on linked reads and iterative scaffolding, and since improved upon with Hi-C data. Here, we forward an improved reference genome for beluga built using a combination of PacBio CLR, illumina short reads, and Hi-C data. We identified several large structural errors in the scaffolding of the original 2017 beluga assembly and unsupported scaffolding orientations in the Hi-C scaffolded version. We also found discrepancies in the order and orientation of contigs that remained in our PacBio assemblies, with inversions being notably abundant. Altogether, we forward a more accurate, if slightly less contiguous, representation of the beluga whale genome, and provide users with intermediate files, code, tables listing regions of uncertainty/discrepancy across assemblies, and gene annotations to critically evaluate, leverage, and potentially improve on our work.

Identifier
Source https://data.blue-cloud.org/search-details?step=~01232A36E8C8C511E32BC0B1B82837E2CDCCBB2BA63
Metadata Access https://data.blue-cloud.org/api/collections/32A36E8C8C511E32BC0B1B82837E2CDCCBB2BA63
Provenance
Instrument Sequel II; PACBIO_SMRT
Publisher Blue-Cloud Data Discovery & Access service; ELIXIR-ENA
Publication Year 2024
OpenAccess true
Contact blue-cloud-support(at)maris.nl
Representation
Discipline Marine Science
Temporal Point 2006-01-01T00:00:00Z