Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Monday, October 18, 2021

Embryo Screening and Risk Calculus

Over the weekend The Guardian and The Times (UK) both ran articles on embryo selection. 



I recommend the first article. Philip Ball is an accomplished science writer and former scientist. He touches on many of the most important aspects of the topic, not easy given the length restriction he was working with. 

However I'd like to cover an aspect of embryo selection which is often missed, for example by the bioethicists quoted in Ball's article.

Several independent labs have published results on risk reduction from embryo selection, and all find that the technique is effective. But some people who are not following the field closely (or are not quantitative) still characterize the benefits -- incorrectly, in my view -- as modest. I honestly think they lack understanding of the actual numbers.

Some examples:
Carmi et al. find a ~50% risk reduction for schizophrenia from selecting the lowest risk embryo from a set of 5. For a selection among 2 embryos the risk reduction is ~30%. (We obtain a very similar result using empirical data: real adult siblings with known phenotype.) 
Visscher et al. find the following results, see Table 1 and Figure 2 in their paper. To their credit they compute results for a range of ancestries (European, E. Asian, African). We have performed similar calculations using siblings but have not yet published the results for all ancestries.  
Relative Risk Reduction (RRR)
Hypertension: 9-18% (ranges depend on specific ancestry) 
Type 2 Diabetes: 7-16% 
Coronary Artery Disease: 8-17% 
Absolute Risk Reduction (ARR)
Hypertension: 4-8.5% (ranges depend on specific ancestry) 
Type 2 Diabetes: 2.6-5.5% 
Coronary Artery Disease: 0.55-1.1%
I don't view these risk reductions as modest. Given that an IVF family is already going to make a selection they clearly benefit from the additional information that comes with genotyping each embryo. The cost is a small fraction of the overall cost of an IVF cycle.

But here is the important mathematical point which many people miss: We buy risk insurance even when the expected return is negative, in order to ameliorate the worst possible outcomes. 

Consider the example of home insurance. A typical family will spend tens of thousands of dollars over the years on home insurance, which protects against risks like fire or earthquake. However, very few homeowners (e.g., ~1 percent) ever suffer a really large loss! At the end of their lives, looking back, most families might conclude that the insurance was "a waste of money"!

So why buy the insurance? To avoid ruin in the event you are unlucky and your house does burn down. It is tail risk insurance.

Now consider an "unlucky" IVF family. At, say, the 1 percent level of "bad luck" they might have some embryos which are true outliers (e.g., at 10 times normal risk, which could mean over 50% absolute risk) for a serious condition like schizophrenia or breast cancer. This is especially likely if they have a family history. 

What is the benefit to this specific subgroup of families? It is enormous -- using the embryo risk score they can avoid having a child with very high likelihood of serious health condition. This benefit is many many times (> 100x!) larger than the cost of the genetic screening, and it is not characterized by the average risk reductions given above.

The situation is very similar to that of aneuploidy testing (screening against Down syndrome), which is widespread, not just in IVF. The prevalence of trisomy 21 (extra copy of chromosome 21) is only ~1 percent, so almost all families doing aneuploidy screening are "wasting their money" if one uses faulty logic! Nevertheless, the families in the affected category are typically very happy to have paid for the test, and even families with no trisomy warning understand that it was worthwhile.

The point is that no one knows ahead of time whether their house will burn down, or that one or more of their embryos has an important genetic risk. The calculus of average return is misleading -- i.e., it says that home insurance is a "rip off" when in fact it serves an important social purpose of pooling risk and helping the unfortunate. 

The same can be said for embryo screening in IVF -- one should focus on the benefit to "unlucky" families to determine the value. We can't identify the "unlucky" in advance, unless we do genetic screening!

Friday, July 02, 2021

Polygenic Embryo Screening: comments on Carmi et al. and Visscher et al.

In this post I discuss some recent papers on disease risk reduction from polygenic screening of embryos in IVF (PGT-P). I will focus on the science but at the end will include some remarks about practical and ethical issues. 

The two papers are 

Carmi et al. 

Visscher et al. 

Both papers study risk reduction in the following scenario: you have N embryos to choose from, and polygenic risk scores (PRS) for each which have been computed from SNP genotype. Both papers use simulated data -- they build synthetic child (embryo) genotypes in order to calculate expected risk reduction. 

I am very happy to see serious researchers like Carmi et al. and Visscher et al. working on this important topic. 

Here are some example results from the papers: 

Carmi et al. find a ~50% risk reduction for schizophrenia from selecting the lowest risk embryo from a set of 5. For a selection among 2 embryos the risk reduction is ~30%. (We obtain a very similar result using empirical data: real adult siblings with known phenotype.)

Visscher et al. find the following results, see Table 1 and Figure 2 in their paper. To their credit they compute results for a range of ancestries (European, E. Asian, African). We have performed similar calculations using siblings but have not yet published the results for all ancestries.

Relative Risk Reduction (RRR): 
Hypertension: 9-18% (ranges depend on specific ancestry) 
Type 2 Diabetes: 7-16% 
Coronary Artery Disease: 8-17% 

Absolute Risk Reduction (ARR): 
Hypertension: 4-8.5% (ranges depend on specific ancestry) 
Type 2 Diabetes: 2.6-5.5% 
Coronary Artery Disease: 0.55-1.1%

Note, families with a history of the disease would benefit much more than this. For example, parents with a family history of breast cancer or heart disease or schizophrenia will often produce some embryos with very high PRS and others in the normal range. Their absolute risk reduction from selection is many times larger than the population average results shown above. 

My research group has already published work in this area using data from actual siblings: tens of thousands of individuals who are late in life (e.g., 50-70 years old), for whom we have health records and genotypes. 


We have shown that polygenic risk predictors can identify, using genotype alone, which sibling in a pair has a specific disease condition: the sib with high PRS is much more likely to have the condition than the sib with normal range PRS. In those papers we also computed Relative Risk Reduction (RRR), which is directly relevant to embryo selection. Needless to say I think real sib data provides better validation of PRS than simulated genotypes. The adult sibs have typically experienced a shared family environment and also exhibit negligible population stratification relative to each other. Using real sib data reduces significantly some important confounds in PRS validation. 

See also these papers: Treff et al. [1] [2] [3]

Here are example results from our work on absolute and relative risk reduction. (Selection from 2 embryos.)


Regarding pleiotropy (discussed in the NEJM article), the Treff et al. results linked above show that selection using a Genomic Index, which is an aggregate of several polygenic risk scores, simultaneously reduces risks across all of the ~12 disease conditions in the polygenic disease panel. That is, risk reduction is not zero-sum, as far as we can tell: you are not forced to trade off one disease risk against another, at least for the 12 diseases on the panel. Further work on this is in progress. 

In related work we showed that DNA regions used to predict different risks are largely disjoint, which also supports this conclusion. See 



To summarize, several groups have now validated the risk reduction from polygenic screening (PGT-P). The methodologies are different (i.e., simulated genotypes vs studies using large numbers of adult siblings) but come to similar conclusions. 

Whether one should regard, for example, relative and absolute risk reduction in type 2 diabetes (T2D) of ~40% and ~3% (from figure above) as important or valuable is a matter of judgement. 

Studies suggest that type 2 diabetes results in an average loss of over 10 quality-adjusted life years -- i.e., more than a decade. So reducing an individual's risk of T2D by even a few percent seems significant to me. 

Now multiply that by a large factor, because selection using a genomic index (see figure) produces simultaneous risk reductions across a dozen important diseases.

Finally, polygenic predictors are improving rapidly as more genomic and health record data become available for machine learning. All of the power of modern AI technology will be applied to this data, and risk reductions from selection (PGT-P) will increase significantly over time. See this 2021 review article for more.


Practical Issues 

Aurea, the first polygenically screened baby (PGT-P), was born in May 2020.
See this panel discussion, which includes 

Dr. Simon Fishel (member of the team that produced the first IVF baby) 
Elizabeth Carr (first US IVF baby) 
Prof. Julian Savalescu (Uehiro Chair in Practical Ethics at the University of Oxford) 
Dr. Nathan Treff (Chief Scientist, Genomic Prediction)
Dr. Rafal Smigrodzki (MD PhD, father of Aurea) 

Astral Codex Ten recently posted on this topic: Welcome Polygenically Screened Babies :-) Many of the comments there are of high quality and worth reading. 


Ethical Issues 

Once the basic scientific results are established, one can meaningfully examine the many ethical issues surrounding embryo selection. 

My view has always been that new genomic technologies are so powerful that they should be widely understood and discussed -- by all of society, not just by scientists. 

However, to me it is clear that the potential benefits of embryo PRS screening (PGT-P) are very positive and that this technology will eventually be universally adopted. 

Today millions of babies are produced through IVF. In most developed countries roughly 3-5 percent of all births are through IVF, and in Denmark the fraction is about 10 percent! But when the technology was first introduced with the birth of Louise Brown in 1978, the pioneering scientists had to overcome significant resistance. There may be an alternate universe in which IVF was not allowed to develop, and those millions of children were never born.
Wikipedia: ...During these controversial early years of IVF, Fishel and his colleagues received extensive opposition from critics both outside of and within the medical and scientific communities, including a civil writ for murder.[16] Fishel has since stated that "the whole establishment was outraged" by their early work and that people thought that he was "potentially a mad scientist".[17]
I predict that within 5 years the use of polygenic risk scores will become common in some health systems (i.e., for adults) and in IVF. Reasonable people will wonder why the technology was ever controversial at all, just as in the case of IVF.

This is a very complex topic. For an in-depth discussion I refer you to this recent paper by Munday and Savalescu. Savalescu, Uehiro Chair in Practical Ethics at the University of Oxford, is perhaps the leading philosopher / bioethicist working in this area. 
Three models for the regulation of polygenic scores in reproduction 
Journal of Medical Ethics
The past few years have brought significant breakthroughs in understanding human genetics. This knowledge has been used to develop ‘polygenic scores’ (or ‘polygenic risk scores’) which provide probabilistic information about the development of polygenic conditions such as diabetes or schizophrenia. They are already being used in reproduction to select for embryos at lower risk of developing disease. Currently, the use of polygenic scores for embryo selection is subject to existing regulations concerning embryo testing and selection. Existing regulatory approaches include ‘disease-based' models which limit embryo selection to avoiding disease characteristics (employed in various formats in Australia, the UK, Italy, Switzerland and France, among others), and 'laissez-faire' or 'libertarian' models, under which embryo testing and selection remain unregulated (as in the USA). We introduce a novel 'Welfarist Model' which limits embryo selection according to the impact of the predicted trait on well-being. We compare the strengths and weaknesses of each model as a way of regulating polygenic scores. Polygenic scores create the potential for existing embryo selection technologies to be used to select for a wider range of predicted genetically influenced characteristics including continuous traits. Indeed, polygenic scores exist to predict future intelligence, and there have been suggestions that they will be used to make predictions within the normal range in the USA in embryo selection. We examine how these three models would apply to the prediction of non-disease traits such as intelligence. The genetics of intelligence remains controversial both scientifically and ethically. This paper does not attempt to resolve these issues. However, as with many biomedical advances, an effective regulatory regime must be in place as soon as the technology is available. If there is no regulation in place, then the market effectively decides ethical issues.
Dalton Conley (Princeton) and collaborators find that 68% of surveyed Americans had positive attitudes concerning polygenic screening of embryos.

Wednesday, May 12, 2021

Neural Tangent Kernels and Theoretical Foundations of Deep Learning

A colleague recommended this paper to me recently. See also earlier post Gradient Descent Models Are Kernel Machines.
Neural Tangent Kernel: Convergence and Generalization in Neural Networks 
Arthur Jacot, Franck Gabriel, Clément Hongler 
At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function fθ (which maps input vectors to output vectors) follows the kernel gradient of the functional cost (which is convex, in contrast to the parameter cost) w.r.t. a new kernel: the Neural Tangent Kernel (NTK). This kernel is central to describe the generalization features of ANNs. While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and it stays constant during training. This makes it possible to study the training of ANNs in function space instead of parameter space. Convergence of the training can then be related to the positive-definiteness of the limiting NTK. We prove the positive-definiteness of the limiting NTK when the data is supported on the sphere and the non-linearity is non-polynomial. We then focus on the setting of least-squares regression and show that in the infinite-width limit, the network function fθ follows a linear differential equation during training. The convergence is fastest along the largest kernel principal components of the input data with respect to the NTK, hence suggesting a theoretical motivation for early stopping. Finally we study the NTK numerically, observe its behavior for wide networks, and compare it to the infinite-width limit.
The results are remarkably well summarized in the wikipedia entry on Neural Tangent Kernels:

For most common neural network architectures, in the limit of large layer width the NTK becomes constant. This enables simple closed form statements to be made about neural network predictions, training dynamics, generalization, and loss surfaces. For example, it guarantees that wide enough ANNs converge to a global minimum when trained to minimize an empirical loss. ...

An Artificial Neural Network (ANN) with scalar output consists in a family of functions  parametrized by a vector of parameters .

The Neural Tangent Kernel (NTK) is a kernel  defined by

In the language of kernel methods, the NTK  is the kernel associated with the feature map  .

For a dataset  with scalar labels  and a loss function , the associated empirical loss, defined on functions , is given by

When training the ANN  is trained to fit the dataset (i.e. minimize ) via continuous-time gradient descent, the parameters  evolve through the ordinary differential equation:

During training the ANN output function follows an evolution differential equation given in terms of the NTK:

This equation shows how the NTK drives the dynamics of  in the space of functions  during training.

This is a very brief (3 minute) summary by the first author:



This 15 minute IAS talk gives a nice overview of the results, and their relation to fundamental questions (both empirical and theoretical) in deep learning. Longer (30m) version: On the Connection between Neural Networks and Kernels: a Modern Perspective.  




I hope to find time to explore this in more depth. Large width seems to provide a limiting case (analogous to the large-N limit in gauge theory) in which rigorous results about deep learning can be proved.

Some naive questions:

What is the expansion parameter of the finite width expansion?

What role does concentration of measure play in the results? (See 30m video linked above.)

Simplification seems to be a consequence of overparametrization. But the proof method seems to apply to a regularized (but still convex, e.g., using L1 penalization) loss function that imposes sparsity. It would be interesting to examine this specific case in more detail.

Notes to self:

The overparametrized (width ~ w^2) network starts in a random state and by concentration of measure this initial kernel K is just the expectation, which is the NTK. Because of the large number of parameters the effect of training (i.e., gradient descent) on any individual parameter is 1/w, and the change in the eigenvalue spectrum of K is also 1/w. It can be shown that the eigenvalue spectrum is positive and bounded away from zero, and this property does not change under training. Also, the evolution of f is linear in K up to corrections with are suppressed by 1/w. Hence evolution follows a convex trajectory and can achieve global minimum loss in a finite (polynomial) time. 

The parametric 1/w expansion may depend on quantities such as the smallest NTK eigenvalue k: the proof might require  k >> 1/w  or  wk large.

In the large w limit the function space has such high dimensionality that any typical initial f is close (within a ball of radius 1/w?) to an optimal f. 

These properties depend on specific choice of loss function.

Sunday, February 14, 2021

Physics and AI: some recent papers


Three AI paper recommendations from a theoretical physicist (former collaborator) who now runs an AI lab in SV. Less than 5 years after leaving physics research, he and his team have shipped AI products that are used by millions of people. (Figure above is from the third paper below.)

This paper elucidates the relationship between symmetry principles (familiar from physics) and specific mathematical structures like convolutions used in DL.
Covariance in Physics and CNN  
https://arxiv.org/abs/1906.02481 
Cheng, et al.  (Amsterdam)
In this proceeding we give an overview of the idea of covariance (or equivariance) featured in the recent development of convolutional neural networks (CNNs). We study the similarities and differences between the use of covariance in theoretical physics and in the CNN context. Additionally, we demonstrate that the simple assumption of covariance, together with the required properties of locality, linearity and weight sharing, is sufficient to uniquely determine the form of the convolution.

The following two papers explore connections between AI/ML and statistical physics, including renormalization group (RG) flow. 

Theoretical Connections between Statistical Physics and RL 
https://arxiv.org/abs/1906.10228 
Rahme and Adams  (Princeton)
Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function Z, and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and Q-functions can be derived from this partition function and interpreted via average energies, the Z-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for Z is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions. The policies learned via these Z-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. In addition to sampling actions proportionally to the exponential of the expected cumulative reward as Boltzmann policies would, these policies take entropy into account favoring states from which many outcomes are possible.

 

RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior
ttps://arxiv.org/abs/2010.00029 
Hu et al.   (UCSD and Berkeley AI Lab) 
Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key idea of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, called RG-Flow, which can separate information at different scales of images with disentangled representations at each scale. We demonstrate our method mainly on the CelebA dataset and show that the disentangled representations at different scales enable semantic manipulation and style mixing of the images. To visualize the latent representations, we introduce receptive fields for flow-based models and find that the receptive fields learned by RG-Flow are similar to those in convolutional neural networks. In addition, we replace the widely adopted Gaussian prior distribution by a sparse prior distribution to further enhance the disentanglement of representations. From a theoretical perspective, the proposed method has O(logL) complexity for image inpainting compared to previous generative models with O(L^2) complexity.
See related remarks: ICML notes (2018).
It may turn out that the problems on which DL works well are precisely those in which the training data (and underlying generative processes) have a hierarchical structure which is sparse, level by level. Layered networks perform a kind of coarse graining (renormalization group flow): first layers filter by feature, subsequent layers by combinations of features, etc. But the whole thing can be understood as products of sparse filters, and the performance under training is described by sparse performance guarantees (ReLU = thresholded penalization?). Given the inherent locality of physics (atoms, molecules, cells, tissue; atoms, words, sentences, ...) it is not surprising that natural phenomena generate data with this kind of hierarchical structure.

Monday, April 27, 2020

COVID-19: CDC US deaths by age group

Reader LondonYoung points to this CDC data set. Table 2 is reproduced below.

If we assume that CV-19 has infected a few percent of the total US population, we should multiply the numbers in the CV-19 deaths column by ~30x to extrapolate to a full population sweep. With that adjustment factor the impact on people younger than 25 is still very modest. It is only among people ~50y or older for whom the effect of a full CV-19 sweep is comparable to All-Cause deaths.

As a rough estimate I'd guess a full population sweep (under good medical conditions) costs about 10M QALYS. How much is that worth? A few trillion dollars?


Of course, we should keep in mind that there might be very negative long-term health consequences from serious cases of CV-19 infection that do not result in death.

Added:

1. Germany’s leading coronavirus expert Christian Drosten on Merkel’s leadership, the UK response, and the ‘prevention paradox’ (Guardian).

2. US National Academy of Sciences COVID-19 Update.

Saturday, April 25, 2020

COVID-19: False Positive Rates for Serological Tests

It looks like very few of the tests have false positive rates in the percent range. Since most populations (with the exception of NYC and some other highly impacted places) do not have infection rates higher than a few percent, there is a danger of overestimating total infection rates and underestimating IFR using these tests. (See, e.g., the recent Stanford-USC papers.)

Sure Biotech seems to be an HK company, while Wondfo is in Guangzhou.
NYTimes: ... Each test was evaluated with the same set of blood samples: from 80 people known to be infected with the coronavirus, at different points after infection; 108 samples donated before the pandemic; and 52 samples from people who were positive for other viral infections but had tested negative for SARS-CoV-2.

Tests made by Sure Biotech and Wondfo Biotech, along with an in-house Elisa test, produced the fewest false positives.

A test made by Bioperfectus detected antibodies in 100 percent of the infected samples, but only after three weeks of infection. None of the tests did better than 80 percent until that time period, which was longer than expected, Dr. Hsu said.

The lesson is that the tests are less likely to produce false negatives the longer ago the initial infection occurred, he said.

The tests were particularly variable when looking for a transient antibody that comes up soon after infection, called IgM, and more consistent in identifying a subsequent antibody, called IgG, that may signal longer-term immunity.

“You can see that antibody levels rise at different points for every patient,” Dr. Hsu said. The tests performed best when the researchers assessed both types of antibodies together. None of the tests could say whether the presence of these antibodies means a person is protected from reinfection, however.

The results overall are promising, Dr. Marson added. “There are multiple tests that have specificities greater than 95 percent.”
Preprint: Test performance evaluation of SARS-CoV-2 serological assays

From Table 2 in the paper:


Dr. Patrick Hsu -- quoted in the Times article above, and a co-author of the paper -- is no relation, although we know each other. He has appeared in this blog before for his CRISPR work.

Tuesday, February 04, 2020

Report of the University of California Academic Council Standardized Testing Task Force

The figures below are from the recently completed Report of the University of California Academic Council Standardized Testing Task Force. Note the large sample sizes.

Some remarks:

1. SAT and High School GPA (HSGPA) are both useful (and somewhat independent) predictors of college success. In terms of variance accounted for, we have the inequality:

SAT + HSGPA  >  SAT  >  HSGPA

There are some small deviations from this pattern, but it seems to hold overall. I believe that GPA has a relatively larger loading on conscientiousness (work ethic) than cognitive ability, with SAT the other way around. By combining the two we get more information than from either alone.

2. SAT and HSGPA are stronger predictors than family income or race. Within each of the family income or ethnicity categories there is substantial variation in SAT and HSGPA, with corresponding differences in student success. See bottom figure and combined model R^2 in second figure below; R^2 varies very little across family income and ethnic categories.







There is not much new here. In graduate admissions the undergraduate GPA and the GRE general + subject tests play a role similar to HSGPA and SAT. See GRE and SAT Validity.

See Correlation and Variance to understand better what the R^2 numbers above mean. R^2 ~ 0.26 means the correlation between predictor and outcome variable (e.g., freshman GPA) is R ~ 0.5 or so.

Test Preparation and SAT scores: "...combined effect of coaching on the SAT I is between 21 and 34 points. Similarly, extensive meta-analyses conducted by Betsy Jane Becker in 1990 and by Nan Laird in 1983 found that the typical effect of commercial preparatory courses on the SAT was in the range of 9-25 points on the verbal section, and 15-25 points on the math section."

Saturday, August 17, 2019

Polygenic Architecture and Risk Prediction for 14 Cancers and Schizophrenia

Two recent papers on polygenic risk prediction. As I've emphasized before, these predictors already have real clinical utility but they will get significantly better with more training data.
Assessment of Polygenic Architecture and Risk Prediction based on Common Variants Across Fourteen Cancers

Yan Zhang et al.

We analyzed summary-level data from genome-wide association studies (GWAS) of European ancestry across fourteen cancer sites to estimate the number of common susceptibility variants (polygenicity) contributing to risk, as well as the distribution of their associated effect sizes. All cancers evaluated showed polygenicity, involving at a minimum thousands of independent susceptibility variants. For some malignancies, particularly chronic lymphoid leukemia (CLL) and testicular cancer, there are a larger proportion of variants with larger effect sizes than those for other cancers. In contrast, most variants for lung and breast cancers have very small associated effect sizes. For different cancer sites, we estimate a wide range of GWAS sample sizes, required to explain 80% of GWAS heritability, varying from 60,000 cases for CLL to over 1,000,000 cases for lung cancer. The maximum relative risk achievable for subjects at the 99th risk percentile of underlying polygenic risk scores, compared to average risk, ranges from 12 for testicular to 2.5 for ovarian cancer. We show that polygenic risk scores have substantial potential for risk stratification for relatively common cancers such as breast, prostate and colon, but limited potential for other cancer sites because of modest heritability and lower disease incidence.



Some people are surprised that a mental disorder might be strongly controlled by genetics -- why? However, it has been known for some time that schizophrenia is highly heritable. I anticipate that good predictors for Autism and Alzheimer's disease will be available soon.
Penetrance and Pleiotropy of Polygenic Risk Scores for Schizophrenia in 106,160 Patients Across Four Health Care Systems

Amanda B. Zheutlin et al.

Objective:
Individuals at high risk for schizophrenia may benefit from early intervention, but few validated risk predictors are available. Genetic profiling is one approach to risk stratification that has been extensively validated in research cohorts. The authors sought to test the utility of this approach in clinical settings and to evaluate the broader health consequences of high genetic risk for schizophrenia.

Methods:
The authors used electronic health records for 106,160 patients from four health care systems to evaluate the penetrance and pleiotropy of genetic risk for schizophrenia. Polygenic risk scores (PRSs) for schizophrenia were calculated from summary statistics and tested for association with 1,359 disease categories, including schizophrenia and psychosis, in phenome-wide association studies. Effects were combined through meta-analysis across sites.

Results:
PRSs were robustly associated with schizophrenia (odds ratio per standard deviation increase in PRS, 1.55; 95% CI=1.4, 1.7), and patients in the highest risk decile of the PRS distribution had up to 4.6-fold higher odds of schizophrenia compared with those in the bottom decile (95% CI=2.9, 7.3). PRSs were also positively associated with other phenotypes, including anxiety, mood, substance use, neurological, and personality disorders, as well as suicidal behavior, memory loss, and urinary syndromes; they were inversely related to obesity.

Conclusions:
The study demonstrates that an available measure of genetic risk for schizophrenia is robustly associated with schizophrenia in health care settings and has pleiotropic effects on related psychiatric disorders as well as other medical syndromes. The results provide an initial indication of the opportunities and limitations that may arise with the future application of PRS testing in health care systems.

Tuesday, November 06, 2018

1 In 4 Biostatisticians Surveyed Say They Were Asked To Commit Scientific Fraud


In the survey reported below, about 1 in 4 biostatisticians were asked to commit scientific fraud. I don't know whether this bad behavior was more prevalent in industry as opposed to academia, but I am not surprised by the results.

I do not accept the claim that researchers in data-driven areas can be ignorant of statistics. It is common practice to outsource statistical analysis to people like the "consulting biostatisticians" surveyed below. But scientists who do not understand statistics will not be effective in planning future research, nor in understanding the implications of results in their own field. See the candidate gene and missing heritability nonsense the field of genetics has been subject to for the last decade.

I cannot count the number of times, in talking to a scientist with limited quantitative background, that I have performed -- to their amazement -- a quick back of the envelope analysis of a statistical design or new results. This kind of quick estimate is essential to understand whether the results in question should be trusted, or whether a prospective experiment is worth doing. The fact that they cannot understand my simple calculation means that they literally do not understand how inference in their own field should be performed.
Researcher Requests for Inappropriate Analysis and Reporting: A U.S. Survey of Consulting Biostatisticians

(Annals of Internal Medicine 554-558. Published: 16-Oct-2018. DOI: 10.7326/M18-1230)

Results:
Of 522 consulting biostatisticians contacted, 390 provided sufficient responses: a completion rate of 74.7%. The 4 most frequently reported inappropriate requests rated as “most severe” by at least 20% of the respondents were, in order of frequency, removing or altering some data records to better support the research hypothesis; interpreting the statistical findings on the basis of expectation, not actual results; not reporting the presence of key missing data that might bias the results; and ignoring violations of assumptions that would change results from positive to negative. These requests were reported most often by younger biostatisticians.
This kind of behavior is consistent with the generally low rate of replication for results in biomedical science, even those published in top journals:
What is medicine’s 5 sigma? (Editorial in the Lancet)... much of the [BIOMEDICAL] scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, [BIOMEDICAL] science has taken a turn towards darkness. As one participant put it, “poor methods get results”. The Academy of Medical Sciences, Medical Research Council, and Biotechnology and Biological Sciences Research Council have now put their reputational weight behind an investigation into these questionable research practices. The apparent endemicity of bad research behaviour is alarming. In their quest for telling a compelling story, scientists too often sculpt data to fit their preferred theory of the world. ...
More background on the ongoing replication crisis in certain fields of science. See also Bounded Cognition.

Friday, October 26, 2018

Harvard Admissions on Trial: Enter the Statisticians


Let's see if any other media outlets cover this very essential part of the trial -- the cross examination of each side's statistical experts. As far as I understand, the plaintiff's claim that "unhooked" Asian American applicants are discriminated against by Harvard relative to applicants of other ethnicities (including white applicants) is NOT DISPUTED by Harvard, nor by their statistical expert David Card (economist at Berkeley).
Chronicle: ...A main difference between the two economists’ analyses is which types of applicants they included. Arcidiacono excluded recruited athletes, the children of alumni, the children of Harvard faculty and staff members, and students on a “Dean’s List” made up partly of children of donors. Those applicants — about 7,000 out of the roughly 150,000 students in the six-year data set — are admitted at a much higher rate than the rest of the pool, which Arcidiacono said made them difficult to compare with the other applicants.

The judge, Allison D. Burroughs of the Federal District Court, had some questions about the decision to omit that group. She wondered how many Asian-American applicants in those excluded categories are admitted. As it turned out, they are admitted at higher rates than the white applicants.

“It looks to me like what you’re arguing is you have an admissions office that’s discriminating against Asians, but they only do it in certain places,” she said. Arcidiacono agreed.
Unhooked applicants make up 95% of all applicants, but only 2/3 of admits. Recruited athletes, legacies, rich donor kids, etc. are all admitted at much higher rates than ordinary kids -- while only 5% of the applicant pool they are 1/3 of the entering class!

There has never been any claim that Asian Americans who are, e.g., nationally ranked athletes or children of billionaires are discriminated against. Eoin Hu, a Chinese American, was the star running back at Harvard when I was there! Jeremy Lin may have been denied D1 scholarships by Stanford and Berkeley despite being first-team All-State and Northern California Division II Player of the Year, but Harvard Basketball was very happy to have him.

Special status is a much stronger effect than Asian ethnicity, so including hooked applicants only dilutes the statistical effect found by Arcidiacono. Card insisted on lumping together hooked and unhooked applicants in his analysis and has not (to my knowledge) rebutted Arcidiacono's analysis. Reportedly, 86 percent of recruited athletes were admitted, 33.6 percent of legacy students were admitted, 42.2 percent of applicants on the Dean or Director’s List (major donor kids) were admitted, and 46.7 percent of children of faculty or staff were admitted. Compare this to an admit rate of ~5 percent for unhooked applicants. It is clear that these are different categories of applicants that should not be conflated.

If your kid is an unhooked applicant, you can infer much more about his or her prospects from Arcidiacono's analysis than from Card's. The former covers 95% of the pool and is not subject to large idiosyncratic and distortionary effects from the special 5% that are advantaged for reasons having nothing to do with academic merit or even personality and leadership factors.

From the SFFA brief (still uncontested?):
Professor Arcidiacono thus correctly excluded special-category applicants to isolate and highlight Harvard’s discrimination against Asian Americans. Professor Card, by contrast, includes “special recruiting categories in his models” to “obscure the extent to which race is affecting admissions decisions for those not fortunate enough to belong to one of these groups.” At bottom, SFFA’s claim is that Harvard penalizes Asian-American applicants who are not legacies or recruited athletes. Professor Card has shown that he is unwilling and unable to contest that claim.
The question of how unhooked applicants are treated has been discussed in college admissions circles for some time. From 2006:
Inside Higher Ed covers a panel called “Too Asian?” at the annual meeting of the National Association for College Admission Counseling. Particularly telling are the comments of a former Stanford admissions officer about an internal study which found evidence of higher admission rates for white applicants over Asians of similar academic and leadership qualifications (all applicants in the study were "unhooked" - meaning not in any favored categories such as legacies or athletes). 

Sunday, October 14, 2018

Nature News on Polygenic Genomic Prediction


See also Population-wide Genomic Prediction of Health Risks.
The approach to predictive medicine that is taking genomics research by storm (Nature News)

Polygenic risk scores represent a giant leap for gene-based diagnostic tests. Here’s why they’re still so controversial.

... Supporters say that polygenic scores could be the next great stride in genomic medicine, but the approach has generated considerable debate. Some research presents ethical quandaries as to how the scores might be used: for example, in predicting academic performance. Critics also worry about how people will interpret the complex and sometimes equivocal information that emerges from the tests. And because leading biobanks lack ethnic and geographic diversity, the current crop of genetic screening tools might have predictive power only for the populations represented in the databases.

“Most people are keen to have a decent debate about this, because it raises all sorts of logistical and social and ethical issues,” says Mark McCarthy, a geneticist at the University of Oxford, UK. Even so, polygenic scores are racing to the clinic and are already being offered to consumers by at least one US company.

Peter Visscher, a geneticist at the University of Queensland, Australia, who pioneered the methods that underlie the trend, is broadly optimistic about the approach, but is still surprised by the speed of progress. “I’m absolutely convinced this is going to come sooner than we think,” he says. ...
Below are some remarks from earlier posts. Population-wide Genomic Prediction of Health Risks:
I estimate that within a year or so there will be more than 10 good genomic predictors covering very significant disease risks, ranging from heart disease to diabetes to hypothyroidism to various cancers. These predictors will be able to identify the, e.g., few percent of the population that are outliers in risk -- for example, have 5x or 10x the normal likelihood of getting the disease at some point in their lives. Risk predictions can be made at birth (or before! or in adulthood), and preventative care allocated appropriately. All of these risk scores can be computed using a genotype read from an inexpensive (< $50 per person) array that probes ~1M or so common SNPs.

Genomic Prediction of disease risk using polygenic scores:
It seems to me we are just at the tipping point -- soon it will be widely understood that with large enough data sets we can predict complex traits and complex disease risk from genotype, capturing most of the estimated heritable variance. People will forget that many "experts" doubted this was possible -- the term missing heritability will gradually disappear.

In just a few years genotyping will start to become "standard of care" in many health systems. In 5 years there will be ~100M genotypes in storage (vs ~20M now), a large fraction available for scientific analysis.

Tuesday, August 28, 2018

Scientists of Stature


The link below is to the published version of the paper we posted on biorxiv in late 2017 (see blog discussion). Our results have since been replicated by several groups in academia and in Silicon Valley.

Biorxiv article metrics: abstract views 31k, paper downloads 6k. Not bad! Perhaps that means the community understands now that genomic prediction of complex traits is a reality, given enough data.

Had we taken a poll on the eve of releasing our biorxiv article, I suspect 90+ percent of genomics researchers would have said that ~few cm accuracy in predicted human height from genotype alone was impossible.

Since our article appeared, interesting results for complex phenotypes such as educational attainment, heart disease, diabetes, and other disease risks have been obtained.
Accurate Genomic Prediction Of Human Height

Louis Lello, Steven G. Avery, Laurent Tellier, Ana I. Vazquez, Gustavo de los Campos and Stephen D. H. Hsu

GENETICS Early online August 27, 2018; https://doi.org/10.1534/genetics.118.301267

We construct genomic predictors for heritable but extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). The constructed predictors explain, respectively, ∼40, 20, and 9 percent of total variance for the three traits, in data not used for training. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The proportion of variance explained for height is comparable to the estimated common SNP heritability from Genome-Wide Complex Trait Analysis (GCTA), and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for SNPs. Thus, our results close the gap between prediction R-squared and common SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common variants. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier Genome-Wide Association Studies (GWAS) for out-of-sample validation of our results.
The published version of the paper contains several new analyses in response to reviewer comments.

We added detailed comparisons between the top SNPs activated in our predictor and earlier GIANT GWAS hits. We analyze the correlation structure of L1-activated SNPs -- the algorithm (as expected) automatically selects variants which are mostly decorrelated (statistically independent) from each other.

We compare our L1 method to simpler algorithms, such as windowing: choose a genomic window size (e.g., 200k bp) and use only the SNP in each window which accounts for the most variance. This does not work as well as L1 optimization, but can produce a respectable predictor.

We investigate the correlation structure of height-associated SNPs: to what extent can the best linear combination of GIANT GWAS-significant SNPs predict the state of one of the predictor SNPs? This raises the interesting question: how much total information (entropy) is in the human genome?

Sunday, August 19, 2018

Genomic Prediction: A Hypothetical (Embryo Selection), Part 2

The figures below are from the recent paper Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations (Nature Genetics), discussed previously here.

As you can see, genomic prediction of risk allows to identify outliers for conditions like heart disease and diabetes. Individuals who are top 1% in polygenic risk score are many times (approaching an order of magnitude) more likely to exhibit the condition than the typical person.

In an earlier post, Genomic Prediction: A Hypothetical (Embryo Selection), I pointed out a similar situation with regard to the SSGAC predictor for Educational Attainment. Negative outliers on that polygenic score (e.g., bottom 1%) are much more likely to have difficulty in school. I then posed this hypothetical:
You are an IVF physician advising parents who have exactly 2 viable embryos, ready for implantation.

The parents want to implant only one embryo.

All genetic and morphological information about the embryos suggest that they are both viable, healthy, and free of elevated disease risk.

However, embryo A has polygenic score (as in figure above) in the lowest quintile (elevated risk of struggling in school) while embryo B has polygenic score in the highest quintile (less than average risk of struggling in school). We could sharpen the question by assuming, e.g., that embryo A has score in the bottom 1% while embryo B is in the top 1%.

You have no other statistical or medical information to differentiate between the two embryos.

What do you tell the parents? Do you inform them about the polygenic score difference between the embryos?
We can pose the analogous hypothetical for the risk scores displayed below. Should the parents be informed if, for instance, one of the embryos is in the top 1% risk for heart disease or Type 2 Diabetes? Is there a difference between the case of the EA predictor and disease risk predictors?

In the case of monogenic (Mendelian) genetic risk, e.g., Tay-Sachs, Cystic Fibrosis, BRCA, etc., deliberate genetic screening is increasingly common, even if penetrance is imperfect (i.e., the probability of the condition given the presence of the risk variant is less than 100%).

Note, the risk ratio between top 1% and bottom 1% individuals is potentially very large (see below), although more careful analysis is probably required to understand this better.

These hypotheticals will not be hypothetical for very much longer: the future is here.



(CAD = coronary artery disease.)


Wednesday, May 30, 2018

Deep Learning as a branch of Statistical Physics

Via Jess Riedel, an excellent talk by Naftali Tishby given recently at the Perimeter Institute.

The first 15 minutes is a very nice summary of the history of neural nets, with an emphasis on the connection to statistical physics. In the large network (i.e., thermodynamic) limit, one observes phase transition behavior -- sharp transitions in performance, and also a kind of typicality (concentration of measure) that allows for general statements that are independent of some detailed features.

Unfortunately I don't know how to embed video from Perimeter so you'll have to click here to see the talk.

An earlier post on this work: Information Theory of Deep Neural Nets: "Information Bottleneck"

Title and Abstract:
The Information Theory of Deep Neural Networks: The statistical physics aspects

The surprising success of learning with deep neural networks poses two fundamental challenges: understanding why these networks work so well and what this success tells us about the nature of intelligence and our biological brain. Our recent Information Theory of Deep Learning shows that large deep networks achieve the optimal tradeoff between training size and accuracy, and that this optimality is achieved through the noise in the learning process.

In this talk, I will focus on the statistical physics aspects of our theory and the interaction between the stochastic dynamics of the training algorithm (Stochastic Gradient Descent) and the phase structure of the Information Bottleneck problem. Specifically, I will describe the connections between the phase transition and the final location and representation of the hidden layers, and the role of these phase transitions in determining the weights of the network.

About Tishby:
Naftali (Tali) Tishby נפתלי תשבי

Physicist, professor of computer science and computational neuroscientist
The Ruth and Stan Flinkman professor of Brain Research
Benin school of Engineering and Computer Science
Edmond and Lilly Safra Center for Brain Sciences (ELSC)
Hebrew University of Jerusalem, 96906 Israel

I work at the interfaces between computer science, physics, and biology which provide some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations.

Saturday, March 10, 2018

Risk, Uncertainty, and Heuristics



Risk = space of outcomes and probabilities are known. Uncertainty = probabilities not known, and even space of possibilities may not be known. Heuristic rules are contrasted with algorithms like maximization of expected utility.

See also Bounded Cognition and Risk, Ambiguity, and Decision (Ellsberg).

Here's a well-known 2007 paper by Gigerenzer et al.
Helping Doctors and Patients Make Sense of Health Statistics

Gigerenzer G1, Gaissmaier W2, Kurz-Milcke E2, Schwartz LM3, Woloshin S3.

Many doctors, patients, journalists, and politicians alike do not understand what health statistics mean or draw wrong conclusions without noticing. Collective statistical illiteracy refers to the widespread inability to understand the meaning of numbers. For instance, many citizens are unaware that higher survival rates with cancer screening do not imply longer life, or that the statement that mammography screening reduces the risk of dying from breast cancer by 25% in fact means that 1 less woman out of 1,000 will die of the disease. We provide evidence that statistical illiteracy (a) is common to patients, journalists, and physicians; (b) is created by nontransparent framing of information that is sometimes an unintentional result of lack of understanding but can also be a result of intentional efforts to manipulate or persuade people; and (c) can have serious consequences for health. The causes of statistical illiteracy should not be attributed to cognitive biases alone, but to the emotional nature of the doctor-patient relationship and conflicts of interest in the healthcare system. The classic doctor-patient relation is based on (the physician's) paternalism and (the patient's) trust in authority, which make statistical literacy seem unnecessary; so does the traditional combination of determinism (physicians who seek causes, not chances) and the illusion of certainty (patients who seek certainty when there is none). We show that information pamphlets, Web sites, leaflets distributed to doctors by the pharmaceutical industry, and even medical journals often report evidence in nontransparent forms that suggest big benefits of featured interventions and small harms. Without understanding the numbers involved, the public is susceptible to political and commercial manipulation of their anxieties and hopes, which undermines the goals of informed consent and shared decision making. What can be done? We discuss the importance of teaching statistical thinking and transparent representations in primary and secondary education as well as in medical school. Yet this requires familiarizing children early on with the concept of probability and teaching statistical literacy as the art of solving real-world problems rather than applying formulas to toy problems about coins and dice. A major precondition for statistical literacy is transparent risk communication. We recommend using frequency statements instead of single-event probabilities, absolute risks instead of relative risks, mortality rates instead of survival rates, and natural frequencies instead of conditional probabilities. Psychological research on transparent visual and numerical forms of risk communication, as well as training of physicians in their use, is called for. Statistical literacy is a necessary precondition for an educated citizenship in a technological democracy. Understanding risks and asking critical questions can also shape the emotional climate in a society so that hopes and anxieties are no longer as easily manipulated from outside and citizens can develop a better-informed and more relaxed attitude toward their health.

Blog Archive

Labels