Thursday, August 28, 2014

Determination of Nonlinear Genetic Architecture using Compressed Sensing

It is a common belief in genomics that nonlinear interactions (epistasis) in complex traits make the task of reconstructing genetic models extremely difficult, if not impossible. In fact, it is often suggested that overcoming nonlinearity will require much larger data sets and significantly more computing power. Our results show that in broad classes of plausibly realistic models, this is not the case.
Determination of Nonlinear Genetic Architecture using Compressed Sensing (arXiv:1408.6583)
Chiu Man Ho, Stephen D.H. Hsu
Subjects: Genomics (q-bio.GN); Applications (stat.AP)

We introduce a statistical method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. The computational and data resource requirements are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. Our method uses a generalization of compressed sensing (L1-penalized regression) applied to nonlinear functions of the sensing matrix. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using both real and simulated human genomes.
Click for larger image.

1 comment:

Richard Seiter said...

Very interesting. Do you have any significant nonlinear results from height and/or IQ data? (or do I need to be patient and wait for the papers ;-)

One question about this method. Is it possible to augment the effects detected by the linear CS step 1 by adding SNPs you expect to have an effect (especially a nonlinear effect) for other reasons (e.g. the Klotho KL-VS gene mentioned in which appears to have a nonlinear effect on IQ) and have the nonlinear CS step 2 detect the effect? It seems you could test for this in the simulations by assuming you have an oracle for the "fraction of model zeros" and adding them to your step 2 list. Any idea if/how this would work?

Blog Archive