Background: Repetitive microsatellite DNA forms a universal component of eukaryote genomes. These repeats provide well-recognized bioinformatics challenges, for instance when assembling genomes or aligning sequence data. Moreover, the specific biochemical properties of repetitive DNA can influence the outcome of laboratory protocols. These properties however, are not often considered when analyzing whole genome sequence (WGS) data.
Results: Here we report that the Atlantic cod (Gadus morhua) genome contains an order of magnitude more dinucleotide repeats than the majority of vertebrate genomes, with over eight percent of its genome that can be classified as either AC or AG dinucleotide repeat. Furthermore, the abundance of these repeats can be inflated in ancient DNA (aDNA) WGS data generated from this species, in particular in poorly preserved samples. This repeat inflation is suppressed by a reduced number of amplification cycles and by the inclusion of manufactured dinucleotide repeat oligonucleotides during amplification.
Conclusion: Our data support a hypothesis whereby endogenous repetitive aDNA fragments self-prime, initiating a biased amplification reaction leading to artificially high levels of AC and AG repeats. This process appears to be particularly efficient in Atlantic cod –likely due to its high genomic content of repeats with relatively simple sequence complexity– providing an opportunity to detect a phenomenon not previously considered in an aDNA context. While the extent of self-priming in other studies is unclear, we nonetheless urge caution when quantifying repeat content in aDNA WGS data, given that self-priming may be difficult to detect if this process affects more complex repeat structures than dinucleotide repeats.