arXiv.org > q-bio > arXiv:2101.05870 33 pages, 7 figures, 1 table
Timothy G. Raben, Louis Lello, Erik Widen, Stephen D.H. Hsu
Decoding the genome confers the capability to predict characteristics of the organism (phenotype) from DNA (genotype). We describe the present status and future prospects of genomic prediction of complex traits in humans. Some highly heritable complex phenotypes such as height and other quantitative traits can already be predicted with reasonable accuracy from DNA alone. For many diseases, including important common conditions such as coronary artery disease, breast cancer, type I and II diabetes, individuals with outlier polygenic scores (e.g., top few percent) have been shown to have 5 or even 10 times higher risk than average. Several psychiatric conditions such as schizophrenia and autism also fall into this category. We discuss related topics such as the genetic architecture of complex traits, sibling validation of polygenic scores, and applications to adult health, in vitro fertilization (embryo selection), and genetic engineering.
From the introduction:
I, on the other hand, knew nothing, except ... physics and mathematics and an ability to turn my hand to new things. — Francis Crick
The challenge of decoding the genome has loomed large over biology since the time of Watson and Crick. Initially, decoding referred to the relationship between DNA and specific proteins or molecular mechanisms, but the ultimate goal is to deduce the relationship between DNA and phenotype — the character of the organism itself. How does Nature encode the traits of the organism in DNA? In this review we describe recent advances toward this goal, which have resulted from the application of machine learning (ML) to large genomic data sets. Genomic prediction is the real decoding of the genome: the creation of mathematical models which map genotypes to complex traits.
It is a peculiarity of ML and artificial intelligence (AI) applied to complex systems that these methods can often “solve” a problem without explicating, in a manner that humans can absorb, the intricate mechanisms that lie intermediate between input and output. For example, AlphaGo  achieved superhuman mastery of an ancient game that had been under serious study for thousands of years. Yet nowhere in the resulting neural network with millions of connection strengths is there a human-comprehensible guide to Go strategy or game dynamics. Similarly, genomic prediction has produced mathematical functions which predict quantitative human traits with surprising accuracy — e.g., height, bone density, and cholesterol or lipoprotein A levels in blood (see Table 1); using typically thousands of genetic variants as input (see next section for details) — but without explicitly revealing the role of these variants in actual biochemical mechanisms. Characterizing these mechanisms — which are involved in phenomena such as bone growth, lipid metabolism, hormonal regulation, protein interactions — will be a project which takes much longer to complete.
If recent trends persist, in particular the continued growth of large genotype | phenotype data sets, we will likely have good genomic predictors for a host of human traits within the next decade. ...