Wednesday, August 03, 2016

Machine Learning for Personalized Medicine: Heritability-based models for prediction of complex traits (David Balding)

Highly recommended talk by David Balding on modern approaches to heritability, relatedness, etc. in statistical genetics. (I listened at 1.5x normal speed, which worked for me.)

MLPM (Machine Learning for Personalized Medicine) Summer School 2015
Monday 21st of September

Heritability-based models for prediction of complex traits
by David Balding

Complex trait genetics has been revolutionised over the past 5 years by developments related to the concept of heritability. Heritability is the fraction of phenotypic variation that can be attributed to genetic mechanisms (mostly we focus on narrow-sense heritability, which considers only additive genetic effects). Since we cannot identify and measure the causal genetic mechanisms, a traditional approach has been to use pedigree relatedness as a proxy for the sharing of causal alleles between individuals. Pedigree relatedness even came to be seen as central to the concept of heritability, which perhaps explains why it was not until 2010 that it became widely appreciated that genome-wide genetic markers (SNPs) offered at least a "noisy" way to directly measure causal alleles, and hence a new approach to assessing heritability. This approach is "noisy" because SNPs generally only tag causal variants imperfectly, depending on SNP density and linkage disequilibrium, and many SNPs may tag little or no causal variation. So genome-wide SNP-based heritability estimates are difficult to interpret, but they can provide a lower bound which was enough to show that SNPs usually tag much more causal variation than can be attributed to genome-wide significant SNPs. Another big step forward has been that heritability can be attributed to different genes, genomic regions or functional classes, and for many phenotypes it is found to be widely dispersed across the genome, with relatively little concentration in coding regions. Further, heritability has become a unit of common currency for gene-based tests and meta-analysis. I will review the ideas and the underlying mathematical models, and present some recent results.
Some comments:

1. He notes that after a few hundred years, it's highly likely that a given descendant carries no actual DNA from a specific ancestor (e.g., most descendants of Shakespeare alive today have none of his DNA).

2. @18min or so: a request to Chris Chang to add a modified definition of SNP relatedness to PLINK (i.e., new flag), with a different weighting for the heterozygous (1,1) case  ;-)

3. @29min or so: finally, a discussion of systematic errors in GCTA due to LD characteristics of causal variants. As I said here:
I've always felt that the real weakness of GCTA is the assumption of random effects. A consequence of this assumption is that if the true causal variants are atypical (e.g., in terms of linkage disequilibrium) among common SNPs, the results could be biased. It is impossible to evaluate this uncertainty at the moment because we do not yet know the (full) genetic architectures of any complex traits.
See also Heritability Estimates from Summary Statistics, No Genomic Dark Matter, and HaploSNPs and missing heritability.

4. @35min: again T1D stands out in terms of genetic architecture

5. @47min: predictive correlations of almost 0.6 for T1D

Slides for this talk. Slides for another Balding lecture: Introduction to Genomic Prediction.

No comments:

Blog Archive