The CpG Landscape of Protein Coding DNA in Vertebrates
Abstract
DNA methylation has fundamental implications for vertebrate genome evolution by influencing the mutational landscape, particularly at CpG dinucleotides. Methylation-induced mutations drive a genome-wide depletion of CpG sites, creating a dinucleotide composition bias across the genome. Examination of the standard genetic code reveals CpG to be the only facultative dinucleotide; it is however unclear what specific implications CpG bias has on protein coding DNA. Here, we use theoretical considerations of the genetic code combined with empirical genome-wide analyses in six vertebrate species—human, mouse, chicken, great tit, frog, and stickleback—to investigate how CpG content is shaped and maintained in protein-coding genes. We show that protein-coding sequences consistently exhibit significantly higher CpG content than noncoding regions and demonstrate that CpG sites are enriched in genes involved in regulatory functions and stress responses, suggesting selective maintenance of CpG content in specific loci. These findings have important implications for evolutionary applications in both natural and managed populations: CpG content could serve as a genetic marker for assessing adaptive potential, while the identification of CpG-free codons provides a framework for genome optimization in breeding and synthetic biology. Our results underscore the intricate interplay between mutational biases, selection, and epigenetic regulation, offering new insights into how vertebrate genomes evolve under varying ecological and selective pressures.
https://doi.org/10.1111/eva.70101
