Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses

Lucaci, Alexander G.; Zehr, Jordan D.; Enard, David; Thornton, Joseph W.; Kosakovsky Pond,  Sergei L.

Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses

Lucaci, Alexander G.; Zehr, Jordan D.; Enard, David; Thornton, Joseph W.; Kosakovsky Pond, Sergei L.

2023

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Files

Abstract

Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not of direct interest) are presumed absent or are modeled with too crude of a simplification, estimates of key model parameters can become biased, often systematically, and lead to poor statistical performance. Previous work established that failing to accommodate multinucleotide (or multihit, MH) substitutions strongly biases dN/dS-based inference towards false-positive inferences of diversifying episodic selection, as does failing to model variation in the rate of synonymous substitution (SRV) among sites. Here, we develop an integrated analytical framework and software tools to simultaneously incorporate these sources of evolutionary complexity into selection analyses. We found that both MH and SRV are ubiquitous in empirical alignments, and incorporating them has a strong effect on whether or not positive selection is detected (⁠1.4-fold reduction) and on the distributions of inferred evolutionary rates. With simulation studies, we show that this effect is not attributable to reduced statistical power caused by using a more complex model. After a detailed examination of 21 benchmark alignments and a new high-resolution analysis showing which parts of the alignment provide support for positive selection, we show that MH substitutions occurring along shorter branches in the tree explain a significant fraction of discrepant results in selection detection. Our results add to the growing body of literature which examines decades-old modeling assumptions (including MH) and finds them to be problematic for comparative genomic data analysis. Because multinucleotide substitutions have a significant impact on natural selection detection even at the level of an entire gene, we recommend that selection analyses of this type consider their inclusion as a matter of routine. To facilitate this procedure, we developed, implemented, and benchmarked a simple and well-performing model testing selection detection framework able to screen an alignment for positive selection with two biologically important confounding processes: site-to-site synonymous rate variation, and multinucleotide instantaneous substitutions.

Details

Title

Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses

Author

Lucaci, Alexander G. : Temple University : (https://orcid.org/0000-0002-4896-6088)
Zehr, Jordan D. : Temple University : (https://orcid.org/0000-0003-2099-4172)
Enard, David : University of Arizona
Thornton, Joseph W. : University of Chicago : (https://orcid.org/0000-0001-9589-6994)
Kosakovsky Pond, Sergei L. : Temple University : (https://orcid.org/0000-0003-4817-4029)

Content Type

Article

Published in

Molecular Biology and Evolution

Keywords

codon-substitution models; evolutionary shortcuts; molecular evolution; multinucleotide substitutions

Identifier(s)

DOI: https://doi.org/10.1093/molbev/msad150

Data availability statement

All data is available at https://data.hyphy.org/web/busteds-mh/. We also cite the original sources of empirical datasets.

Funding Information

NIH/NIGMS, GM144468
NIH/NIAID, AI140970
NIH/NIAID, AI134384
NIH NIGMS, R35GM142677

Publication Date

2023-07-03

Language

English

Copyright Statement

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Licensing

CC BY

Record Appears in

Biological Sciences Division > Ecology and Evolution
Biological Sciences Division > Human Genetics
All

Record Created

2023-08-27

PDF

Statistics

Download Full History