Showing posts with label biology. Show all posts
Showing posts with label biology. Show all posts

Thursday, February 08, 2024

Lecture: Fermi Paradox, AI, Simulation Question — Manifold #53

 

This lecture covers DNA and the origin of life on Earth, the Fermi Paradox (is there alien life?), AI and its implications for the Simulation Question: Could our universe be a simulation? Are we machines, but don't know it? 


Further discussion of the Simulation Question in light of AGI, and a refinement from quantum mechanics: The Quantum Simulation Question
 
 
CORRECTION: 31:25 The size of our galaxy is not 100 million light years. I should have said ~100 THOUSAND = 100k light years instead!!!

Sunday, October 29, 2023

Wednesday, June 28, 2023

Embryo Selection: Healthy Babies vs Bad Arguments

Great article by Diana Fleischman, Ives Parr, Jonathan Anomaly, and Laurent Tellier.
Polygenic screening and its discontents 
... But monogenic and chromosomal screening can only address a part of disease risk because most health conditions that afflict people are polygenic, meaning they are not simply caused by one gene or by a chromosomal abnormality. Instead, they are caused by a huge number of small additive effects dispersed throughout the genome. For example, cancer, schizophrenia, and diabetes can be best predicted by models using tens of thousands of genes. 
A polygenic risk score (PRS) looks at a person’s DNA to see how many variants they have associated with a particular disease. Like BRCA1, polygenic risk scores are typically not determinative: “Polygenic screening is not a diagnosis: It is a prediction of relative future risk compared to other people.” In other words, someone with BRCA1 has a higher risk than someone without, and someone with a high breast cancer PRS has a higher risk than someone with a lower breast cancer PRS. But in principle, BRCA1 is just one gene out of thousands contributing to a PRS, with each bit contributing a small part of a total risk estimate. ... 

 

... Recently, a group of European scientists argued that polygenic screening should not be available to couples because it will lead to stigmatization, exacerbate inequalities, or lead to confusion by parents about how to weigh up information about risks before they decide which embryo to implant. These are indeed challenges, but they are not unique to embryo selection using polygenic scores, and they are not plausible arguments for restricting the autonomy of parents who wish to screen their embryos for polygenic traits. Furthermore, from an ethical perspective, it is unconscionable to deny polygenic screening to families with a history of any disease whose risk can be reduced by this lifesaving technology. 
Many new technologies are initially only available to people with more money, but these first adopters then end up subsidizing research that drives costs down and quality up. Many other medical choices involve complexity or might result in some people being stigmatized, but this is a reason to encourage genetic counseling and to encourage social tolerance. It is not a reason to marginalize, stigmatize, or criminalize IVF mothers and fathers who wish to use the best available science to increase the chances that their children will be healthy and happy.
This is a comment on the article:
1) They don't want to admit that some people are better than others, inherently. Boo hoo. 
2) You put a scorecard of embryos in front of everyone, and everyone has a pretty good ballpark estimate of which are better and which are worse. Nobody is going to pretend equality is true when they are choosing their kids genes. 
3) So bad feels. 
4) Must therefore retard all human progress and cause immense suffering because don't want to deal with bad feels. 
That's the anti-polygenic argument in a nutshell. I don't expect it to be very effective. At best it will cause it to take a bit longer before poor people have access.

Sunday, November 13, 2022

Smart Leftists vs Dumb Leftists

Tuesday, October 25, 2022

American Society of Human Genetics (ASHG) 2022 Posters

 

Thursday, October 06, 2022

Jeffrey Sachs: Lessons from the COVID Commission, Lab Leak Questions, and Nord Stream — Manifold Episode 21

 

Jeffrey D. Sachs is a world-renowned economics professor, bestselling author, innovative educator, and global leader in sustainable development. Professor Sachs serves as the Director of the Center for Sustainable Development at Columbia University and is a University Professor, Columbia's highest academic rank. 
 
Steve and Jeffrey discuss: 

0:00 Jeffrey Sachs’ experience on the Lancet Commission for COVID-19 
13:41 Potential for bioweapons research 
19:06 Why a lab leak is plausible 
32:38 Possible defenses for COVID coverup 
43:56 Government secrecy and other areas of concern 
48:08 Reflections on Nord Stream sabotage 

Resources: 

The Lancet Commission on lessons for the future from the COVID-19 pandemic, Sachs et al., Sept. 14 2022 

Why the Chair of the Lancet’s COVID-19 Commission Thinks The US Government Is Preventing a Real Investigation Into the Pandemic, Current Affairs, Aug 3 2022




My brief summary:

Sachs led a 2 year study of COVID-19 organized for the Lancet. One of the task forces was focused on COVID-19 origins. Sachs feels that members of this task force were engaged in a deliberate cover up which tried to push the natural origin hypothesis from the beginning. His conclusion is that a lab origin hypothesis is still viable, and indeed more likely than the natural origin hypothesis. 

The US is treaty bound to only do "defensive" bioweapons research and development, but this includes the creation and study of dangerous viral strains -- e.g., so that vaccine efficacy and related technologies can be studied. As far as I can tell the US spends ~$10 billion per annum on biodefense research, much of it funneled through NIAID (NIH institute for infectious diseases). Many of the researchers involved in "gain of function" genetic engineering are funded via NIAID and have been for decades. Sachs claims that the genetic engineering research to add a human-specific cleavage site to a coronavirus was actually performed, although the specific 2017 DEFUSE research plan (uncovered in 2021 investigation) was not funded. 

Tuesday, September 20, 2022

Sibling Variation in Phenotype and Genotype: Polygenic Trait Distributions and DNA Recombination Mapping with UK Biobank and IVF Family Data (medRxiv)

This is a new paper which uses Genomic Prediction IVF family data, including genotyped embryo samples.
Sibling Variation in Phenotype and Genotype: Polygenic Trait Distributions and DNA Recombination Mapping with UK Biobank and IVF Family Data
L. Lello, M. Hsu, E. Widen, and T. Raben  
We use UK Biobank and a unique IVF family dataset (including genotyped embryos) to investigate sibling variation in both phenotype and genotype. We compare phenotype (disease status, height, blood biomarkers) and genotype (polygenic scores, polygenic health index) distributions among siblings to those in the general population. As expected, the between-siblings standard deviation in polygenic scores is \sqrt{2} times smaller than in the general population, but variation is still significant. As previously demonstrated, this allows for substantial benefit from polygenic screening in IVF. Differences in sibling genotypes result from distinct recombination patterns in sexual reproduction. We develop a novel sibling-pair method for detection of recombination breaks via statistical discontinuities. The new method is used to construct a dataset of 1.44 million recombination events which may be useful in further study of meiosis.

Here are some figures illustrating the variation of polygenic scores among siblings from the same family.



The excerpt below describes the IVF family highlighted in blue above:

Among the families displayed in these figures, at position number 15 from the left, we encounter an interesting case of sibling polygenic distribution relative to the parents. In the family all siblings have significantly higher Health Index score than the parents. This arises in an interesting manner: the mother is a high-risk outlier for condition X and the father is a high-risk outlier for condition Y. (We do not specify X and Y, out of an abundance of caution for privacy, although the patients have consented that such information could be shared.) Their lower overall Health Index scores result from high risk of conditions X (mother) and Y (father). However, the embryos, each resulting from unique recombination of parental genotypes, are normal risk for both X and Y and each embryo has much higher Health Index score than the parents.
This case illustrates well the potential benefits from PGS embryo screening.

 
The second part of the paper introduces a new technique that directly probes DNA recombination -- the molecular mechanism responsible for sibling genetic differences. See figure above for some results. The new method detects recombination breaks via statistical discontinuities in pairwise comparisons of DNA regions.

From the discussion:
...This new sibling-pair method can be applied to large datasets with many thousands of sibling pairs. In this project we created a map of roughly 1.44 million recombination events using UKB genomes. Similar maps can now be created using other biobank data, including in non-European ancestry groups that have not yet received sufficient attention. The landmark deCODE results were obtained under special circumstances: the researchers had access to data resulting from a nationwide project utilizing genealogical records (unusually prevalent in Iceland) and widespread sequencing. Using the sibling-pair method results of comparable accuracy can be obtained from existing datasets around the world -- e.g., national biobanks in countries such as the USA, Estonia, China, Taiwan, Japan, etc.
The creator of this new sibling-pair method for recombination mapping is my son. He developed and tested the algorithm, and wrote all the code in Python. It's his high school science project :-)

Monday, September 05, 2022

Lunar Society (Dwarkesh Patel) Interview

 

Dwarkesh did a fantastic job with this interview. He read the scientific papers on genomic prediction and his questions are very insightful. Consequently we covered the important material that people are most confused about. 

Don't let the sensationalistic image above deter you -- I highly recommend this podcast!

0:00:00 Intro 
0:00:49 Feynman’s advice on picking up women 
0:12:21 Embryo selection 
0:24:54 Why hasn't natural selection already optimized humans? 
0:34:48 Aging 
0:43:53 First Mover Advantage 
0:54:24 Genomics in dating 
1:01:06 Ancestral populations 
1:08:33 Is this eugenics? 
1:16:34 Tradeoffs to intelligence 
1:25:36 Consumer preferences 
1:30:49 Gwern 
1:35:10 Will parents matter? 
1:46:00 Wordcels and shape rotators 
1:58:04 Bezos and brilliant physicists 
2:10:58 Elite education 

If you prefer audio-only click here.

Thursday, June 23, 2022

Polygenic Health Index, General Health, and Disease Risk

See published version: https://www.nature.com/articles/s41598-022-22637-8

Informal summary: We built a polygenic health index using risk predictors weighted by lifespan impact of the specific disease condition. This index seems to characterize general health. Individuals with higher index scores have decreased disease risk across almost all 20 diseases (no significant risk increases), and longer calculated life expectancy. When estimated Disability Adjusted Life Years (DALYs) are used as the performance metric, the gain from selection among 10 individuals (highest index score vs average) is found to be roughly 4 DALYs. We find no statistical evidence for antagonistic trade-offs in risk reduction across these diseases. Correlations between genetic disease risks are found to be mostly positive and generally mild.
 
Polygenic Health Index, General Health, and Disease Risk 
We construct a polygenic health index as a weighted sum of polygenic risk scores for 20 major disease conditions, including, e.g., coronary artery disease, type 1 and 2 diabetes, schizophrenia, etc. Individual weights are determined by population-level estimates of impact on life expectancy. We validate this index in odds ratios and selection experiments using unrelated individuals and siblings (pairs and trios) from the UK Biobank. Individuals with higher index scores have decreased disease risk across almost all 20 diseases (no significant risk increases), and longer calculated life expectancy. When estimated Disability Adjusted Life Years (DALYs) are used as the performance metric, the gain from selection among 10 individuals (highest index score vs average) is found to be roughly 4 DALYs. We find no statistical evidence for antagonistic trade-offs in risk reduction across these diseases. Correlations between genetic disease risks are found to be mostly positive and generally mild. These results have important implications for public health and also for fundamental issues such as pleiotropy and genetic architecture of human disease conditions. 
https://www.medrxiv.org/content/10.1101/2022.06.15.22276102v1

Some figures:









Extrapolating the DALY gain vs Health Index score curve (top figure) to the entire human population (e.g., 10 billion people) results in +30 or +40 DALYs more than average, or something like 120 total years of life.  The individual with the highest Health Index score in the world is predicted to live about 120 years.


I wanted to use this in the paper but my collaborators vetoed me 8-)
The days of our years are threescore years and ten; and if by reason of strength they be fourscore years, yet is their strength labour and sorrow; for it is soon cut off, and we fly away 
Psalm 90:10

Sunday, May 29, 2022

Genomic Prediction in Bloomberg


A nice article in Bloomberg describing polygenic embryo selection in IVF: DNA Testing for Embryos Promises to Predict Genetic Diseases, by Carey Goldberg.
Bloomberg: Simone Collins knew she was pregnant the moment she answered the phone. ... Embryo 3, the fertilized egg that Collins and her husband, Malcolm, had picked, could soon be their daughter—a little girl with, according to their tests, an unusually good chance of avoiding heart disease, cancer, diabetes, and schizophrenia. 
This isn’t a story about Gattaca-style designer babies. No genes were edited in the creation of Collins’s embryo. The promise, from dozens of fertility clinics around the world, is just that the new DNA tests they’re using can assess, in unprecedented detail, whether one embryo is more likely than the next to develop a range of illnesses long thought to be beyond DNA-based predictions. It’s a new twist on the industry-standard testing known as preimplantation genetic testing, which for decades has checked embryos for rare diseases, such as cystic fibrosis, that are caused by a single gene. 
One challenge with leading killers like cancer and heart disease is that they’re usually polygenic: linked to many different genes with complex interactions. Patients such as Collins can now take tests that assess thousands of DNA data points to decode these complexities and compute the disease risks. Genomic Prediction, the five-year-old New Jersey company that handled the tests for her fertility clinic, generates polygenic risk scores, predicting in percentage terms each embryo’s chances of contracting each disease in the panel, plus a composite score for overall health. Parents with multiple embryos can then weigh the scores when deciding which one to implant. 
... 
This new form of genetic embryo testing appears to move humanity one step closer to control of its evolution. The $14 billion IVF industry brings more than 500,000 babies into the world each year, and with infertility rates rising, the market is expected to more than double this decade. Companies including Genomic Prediction bet many going into that process have seen enough loved ones suffer from a polygenic disease to want risk scoring. 
[ Note I think the number of IVF babies born worldwide each year is more like 1 million, but there is some uncertainty in estimates. ] 
... 
In December, Genomic Prediction doubled its venture funding to about $25 million and says it will use the cash to expand and add to its testing panel. Boston IVF, one of the biggest fertility networks in the US, recently started offering Genomic Prediction’s polygenic testing to its patients, says CEO David Stern. “Like anything else, you have early adopters,” he says. “We have had patients who worked in the biotech field or the Harvard milieu who came in and asked for it.” Stern predicts that, like egg freezing, polygenic embryo testing will grow slowly at first, but steadily, and eventually demand will reflect the powerful appeal of lowering a child’s odds for disease. 
...
Believers such as Collins and her husband support government subsidies for fertility and parenthood but aren’t interested in any conversation about slowing down. “This is about the people who care about giving their children every opportunity,” she says. “I do not believe that law or social norms are going to stop parents from giving their kids advantages.”

This article is well-written and informative. It covers polygenic screening from multiple perspectives: the parents who want a healthy child, the IVF doctors and genetic counselors who help the parents toward that goal, the scientists who study polygenic prediction and its ability to differentiate risk among siblings (i.e., embryos), the bioethicists who worry about a slippery slope to GATTACA.

An important point that is not discussed in the article (understandable, given the complexity of the topics listed above), is that precise genotyping of embryos leads to higher success rates in IVF.

... improved success rates resulting from higher accuracy in aneuploidy screening of embryos will affect millions of families around the world, and over 60% of all IVF families in the US.  
The SNP array platform allows very accurate genotyping of each embryo at ~1 million locations in the genome, and the subsequent bioinformatic analysis produces a much more accurate prediction of chromosomal normality than the older methods. 
Millions of embryos are screened each year using PGT-A, about 60% of all IVF embryos in the US. 
Klaus Wiemer is the laborator director for Poma Fertility near Seattle. He conducted this study independently, without informing Genomic Prediction. 
There are ~3000 embryos in the dataset, all biopsied at Poma and samples allocated to three testing labs A,B,C using the two different methods. The family demographics (e.g., maternal age) were similar in all three groups. Lab B is Genomic Prediction and A,C are two of the largest IVF testing labs in the world, using NGS. 
The results imply lower false-positive rates, lower false-negative rates, and higher accuracy overall from our methods. These lead to a significantly higher pregnancy success rate. 
The new technology has the potential to help millions of families all over the world.


This increase in pregnancy success rates was not something we directly aimed for -- rather, we were simply trying to get the most accurate characterization of chromosomal abnormality (aneuploidy) using the high precision genotype from our platform. After Dr. Wiemer surprised us with these results, it became plausible that significant increases in success rates per IVF cycle could still exist as low-hanging fruit. The ~3k embryos used in his study are considered a big sample size in fertility research, whereas in genomics today a big sample is hundreds of thousands or a million individuals. 

Prioritizing research in IVF using large sample sizes could plausibly raise success rates per cycle to, e.g., ~80%. The qualitative experience of parents using IVF will improve with average success rates, perhaps relieving much of the angst and uncertainty.

Thursday, May 05, 2022

Raghuveer Parthasarathy: Four Physical Principles and Biophysics -- Manifold podcast #11

 

Raghu Parthasarathy is the Alec and Kay Keith Professor of Physics at the University of Oregon. His research focuses on biophysics, exploring systems in which the complex interactions between individual components, such as biomolecules or cells, can give rise to simple and robust physical patterns. 

Raghu is the author of a recent popular science book, So Simple a Beginning: How Four Physical Principles Shape Our Living World. 


Steve and Raghu discuss: 

0:00 Introduction 

1:34 Early life, transition from Physics to Biophysics 

20:15 So Simple a Beginning: discussion of the Four Physical Principles in the title, which govern biological systems 

26:06 DNA prediction 

37:46 Machine learning / causality in science 

46:23 Scaling (the fourth physical principle) 

54:12 Who the book is for and what high schoolers are learning in their bio and physics classes 

1:05:41 Science funding, grants, running a research lab 

1:09:12 Scientific careers and radical sub-optimality of the existing system 



Resources: 


Raghuveer Parthasarathy's lab at the University of Oregon - https://pages.uoregon.edu/raghu/ 
 
Raghuveer Parthasarathy's blog the Eighteenth Elephant - https://eighteenthelephant.com/


Added from comments:
key holez • 2 days ago 
It was a fascinating episode, and I immediately went out and ordered the book! One question that came to mind: given how much of the human genome is dedicated to complex regulatory mechanisms and not proteins as such, it seems unintuitive to me that so much of heritability seems to be additive. I would have thought that in a system with lots of complicated,messy on/off switches, small genetic differences would often lead to large phenotype differences -- but if what I've heard about polygenic prediction is right, then, empirically, assuming everything is linear seems to work just fine (outside of rare variants, maybe). Is there a clear explanation for how complex feedback patterns give rise to linearity in the end? Is it just another manifestation of the central limit theorem...?
steve hsu 
This is an active area of research. It is somewhat surprising even to me how well linearity / additivity holds in human genetics. Searches for non-linear effects on complex traits have been largely unsuccessful -- i.e., in the sense that most of the variance seems to be controlled by additive effects. By now this has been investigated for large numbers of traits including major diseases, quantitive traits such as blood biomarkers, height, cognitive ability, etc. 
One possible explanation is that because humans are so similar to each other, and have passed through tight evolutionary bottlenecks, *individual differences* between humans are mainly due to small additive effects, located both in regulatory and coding regions. 
To genetically edit a human into a frog presumably requires many changes in loci with big nonlinear effects. However, it may be the case that almost all such genetic variants are *fixed* in the human population: what makes two individuals different from each other is mainly small additive effects. 
Zooming out slightly, the implications for human genetic engineering are very positive. Vast pools of additive variance means that multiplex gene editing will not be impossibly hard...
This topic is discussed further in the review article: https://arxiv.org/abs/2101.05870

Friday, March 11, 2022

Genomic Prediction’s Stephen Hsu: Making superhumans will be possible (Sunday Times podcast)

 
Danny Fortson (Sunday Times) is based in Silicon Valley and has a regular podcast on technology. I really enjoyed this conversation.
Genomic Prediction’s Stephen Hsu: Making superhumans will be possible 
The Sunday Times’ tech correspondent Danny Fortson brings on Stephen Hsu, co-founder of Genomic Prediction, to talk about the plummeting price of genomic sequencing (5:00), predicting height and cancer (9:10), mining biobanks (14:25), scoring embryos (19:00), why investors are staying anonymous (28:00), the need for a society-wide discussion (32:30), when he was accused of being a eugenicist (37:25), how powerful genetic prediction can be (43:15), genetic engineering (49:45), and why Denmark is the future (59:30).

Thursday, February 24, 2022

ManifoldOne Podcast Episode #5: Shai Carmi (Hebrew University): Polygenic risk scores & embryo screening

 

Shai Carmi is Professor of Statistical and Medical Genetics at Hebrew University (Jerusalem). 




Topics and links: 

1. Shai's educational background. From statistical physics and network theory to genomics. 

2. Shai's paper on embryo selection: Schizophrenia risk. Modeling synthetic sibling genomes. Variance among sibs vs general population. RRR vs ARR, family history and elevated polygenic risk. 

3. Response to the ESHG opinion piece on embryo selection. https://twitter.com/ShaiCarmi/status/1487694576458481664 

4. Pleiotropy, Health Index scores. 

5. Genetic genealogy and DNA forensics. Solving cold cases, Othram, etc.  https://www.science.org/doi/10.1126/science.aau4832

6. Healthcare in Israel. Application of PRS in adult patients.


ManifoldOne podcast (transcript).

Thursday, February 03, 2022

ManifoldOne podcast Episode#2: Steve Hsu Q&A

 

Steve answers questions about recent progress in AI/ML prediction of complex traits from DNA, and applications in embryo selection. 

Highlights: 

1. Overview of recent advances in trait prediction 
2. Would cost savings from breast cancer early detection pay for genotyping of all women? 
3. How does IVF work? Economics of embryo selection 
4. Whole embryo genotyping increases IVF success rates (pregnancy per transfer) significantly 
5. Future predictions 


Some relevant scientific papers: 

Preimplantation Genetic Testing for Aneuploidy: New Methods and Higher Pregnancy Rates 

2021 review article on complex trait prediction 

Accurate Genomic Prediction of Human Height 

Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer 

Genetic architecture of complex traits and disease risk predictors 

Sibling validation of polygenic risk scores and complex trait prediction 

Thursday, January 06, 2022

BOLA2 Copy Number Variation: Phenotype Effects From A Human Accelerated Region

Human Accelerated Regions (HARs) are regions of DNA that were conserved throughout prior (e.g., vertebrate) evolution but are significantly different in the human genome.
Allen Institute: ... of the known 3,171 human accelerated regions, 99 percent of these human-specific mutations fall into "non-coding" regions of DNA, or regions of DNA that don't contain instructions for making a protein. Many of them are in stretches of our genome known as enhancers, regions which regulate nearby genes, and about half of those are nestled in enhancers that are active in the developing human brain.
Our analysis of DNA regions used in predictors for common diseases and complex human traits found that large portions of phenotype variance reside in non-coding regions. This has important consequences for pleiotropy and for our understanding of genetic architecture. 

Regarding HARs, in a 2013 post Neanderthals Dumb? I wrote:

This figure is from the Supplement (p.62) of a recent Nature paper describing a high quality genome sequence obtained from the toe of a female Neanderthal who lived in the Altai mountains in Siberia. Interestingly, copy number variation at 16p11.2 is one of the structural variants identified in a recent deCODE study as related to IQ depression; see earlier post Structural genomic variants (CNVs) affect cognition.

From the Supplement (p.62):
Of particular interest is the modern human-specific duplication on 16p11.2 which encompasses the BOLA2 gene. This locus is the breakpoint of the 16p11.2 micro-deletion, which results in developmental delay, intellectual disability, and autism5,6. We genotyped the BOLA2 gene in 675 diverse human individuals sequenced to low coverage as part of the 1000 Genome Project Phase I7 to assess the population distribution of copy numbers in homo-sapiens (Figure S8.3). While both the Altai Neandertal and Denisova individual exhibit the ancestral diploid copy number as seen in all the non-human great apes, only a single human individual exhibits this diploid copy number state.

Modern humans typically have many (e.g., 3-10) copies of BOLA2. In Neanderthals and apes, 2 copies. 
Variation in copy number presumably affects gene expression, even if the actual protein (coding base pairs) structure is not changed. There may be other mechanisms at work, of course.

Mutations in this 16p11.2 region are associated with schizophrenia, autism, brain size, reduced IQ, anemia, and other things. 

Since 2013 a number of papers have investigated the phenotype effects of BOLA2 copy number variation (CNV) and/or the 16p11.2 duplication/deletion. The latter is more complex as it affects multiple genes in addition to BOLA2. In the future, using whole exome or whole genome data in UKB, it should be possible to focus more specifically on effects of BOLA2 CNV.

For reference I note some of the results below.
Phenome-wide Burden of Copy-Number Variation in the UK Biobank (2019) 
16p11.2 C deletion: "We observe significant increases, on the order of one standard deviation, in weight, BMI, hip and waist circumference, reticulocyte count, and Cystatin C measures for these individuals. The larger 593 kb CNV associates with similar measures of body size and fat, as well as hypertension, diabetes/HbA1c, and abdominal hernia. These results are also indicative of effects due to developmental delay; namely, decreased measures of memory, higher Townsend deprivation (an index of material deprivation which considers employment, home/auto ownership, and household overcrowding in a person's neighborhood) ..."   
Note the effect sizes, e.g., on Townsend deprivation index, are extremely large, roughly 1 SD. The effect size for Prospective Memory score (related to ability to read, remember, and execute directions) is 2 SD!

 

 

Medical consequences of pathogenic CNVs in adults: analysis of the UK Biobank (2019)
Population percentage in parenthesis: 

See also:

The Human-Specific BOLA2 Duplication Modifies Iron Homeostasis and Anemia Predisposition in Chromosome 16p11.2 Autism Individuals (2019)
Quantifying the Effects of 16p11.2 Copy Number Variants on Brain Structure: A Multisite Genetic-First Study (2018)

Monday, October 18, 2021

Embryo Screening and Risk Calculus

Over the weekend The Guardian and The Times (UK) both ran articles on embryo selection. 



I recommend the first article. Philip Ball is an accomplished science writer and former scientist. He touches on many of the most important aspects of the topic, not easy given the length restriction he was working with. 

However I'd like to cover an aspect of embryo selection which is often missed, for example by the bioethicists quoted in Ball's article.

Several independent labs have published results on risk reduction from embryo selection, and all find that the technique is effective. But some people who are not following the field closely (or are not quantitative) still characterize the benefits -- incorrectly, in my view -- as modest. I honestly think they lack understanding of the actual numbers.

Some examples:
Carmi et al. find a ~50% risk reduction for schizophrenia from selecting the lowest risk embryo from a set of 5. For a selection among 2 embryos the risk reduction is ~30%. (We obtain a very similar result using empirical data: real adult siblings with known phenotype.) 
Visscher et al. find the following results, see Table 1 and Figure 2 in their paper. To their credit they compute results for a range of ancestries (European, E. Asian, African). We have performed similar calculations using siblings but have not yet published the results for all ancestries.  
Relative Risk Reduction (RRR)
Hypertension: 9-18% (ranges depend on specific ancestry) 
Type 2 Diabetes: 7-16% 
Coronary Artery Disease: 8-17% 
Absolute Risk Reduction (ARR)
Hypertension: 4-8.5% (ranges depend on specific ancestry) 
Type 2 Diabetes: 2.6-5.5% 
Coronary Artery Disease: 0.55-1.1%
I don't view these risk reductions as modest. Given that an IVF family is already going to make a selection they clearly benefit from the additional information that comes with genotyping each embryo. The cost is a small fraction of the overall cost of an IVF cycle.

But here is the important mathematical point which many people miss: We buy risk insurance even when the expected return is negative, in order to ameliorate the worst possible outcomes. 

Consider the example of home insurance. A typical family will spend tens of thousands of dollars over the years on home insurance, which protects against risks like fire or earthquake. However, very few homeowners (e.g., ~1 percent) ever suffer a really large loss! At the end of their lives, looking back, most families might conclude that the insurance was "a waste of money"!

So why buy the insurance? To avoid ruin in the event you are unlucky and your house does burn down. It is tail risk insurance.

Now consider an "unlucky" IVF family. At, say, the 1 percent level of "bad luck" they might have some embryos which are true outliers (e.g., at 10 times normal risk, which could mean over 50% absolute risk) for a serious condition like schizophrenia or breast cancer. This is especially likely if they have a family history. 

What is the benefit to this specific subgroup of families? It is enormous -- using the embryo risk score they can avoid having a child with very high likelihood of serious health condition. This benefit is many many times (> 100x!) larger than the cost of the genetic screening, and it is not characterized by the average risk reductions given above.

The situation is very similar to that of aneuploidy testing (screening against Down syndrome), which is widespread, not just in IVF. The prevalence of trisomy 21 (extra copy of chromosome 21) is only ~1 percent, so almost all families doing aneuploidy screening are "wasting their money" if one uses faulty logic! Nevertheless, the families in the affected category are typically very happy to have paid for the test, and even families with no trisomy warning understand that it was worthwhile.

The point is that no one knows ahead of time whether their house will burn down, or that one or more of their embryos has an important genetic risk. The calculus of average return is misleading -- i.e., it says that home insurance is a "rip off" when in fact it serves an important social purpose of pooling risk and helping the unfortunate. 

The same can be said for embryo screening in IVF -- one should focus on the benefit to "unlucky" families to determine the value. We can't identify the "unlucky" in advance, unless we do genetic screening!

Saturday, October 09, 2021

Leo Szilard, the Intellectual Bumblebee (lecture by William Lanouette)

 

This is a nice lecture on Leo Szilard by his biographer William Lanouette. See also ‘An Intellectual Bumblebee’ by Max Perutz.
Wikipedia: Leo Szilard was a Hungarian-American physicist and inventor. He conceived the nuclear chain reaction in 1933, patented the idea of a nuclear fission reactor in 1934, and in late 1939 wrote the letter for Albert Einstein's signature that resulted in the Manhattan Project that built the atomic bomb.
How Alexander Sachs, acting on behalf of Szilard and Einstein, narrowly convinced FDR to initiate the atomic bomb project: Contingency, History, and the Atomic Bomb

Szilard wrote children's stories and science fiction. His short story My Trial as a War Criminal begins after the USSR has defeated the US using biological weapons.
I was just about to lock the door of my hotel room and go to bed when there was a knock on the door and there stood a Russian officer and a young Russian civilian. I had expected something of this sort ever since the President signed the terms of unconditional surrender and the Russians landed a token occupation force in New York. The officer handed me something that looked like a warrant and said that I was under arrest as a war criminal on the basis of my activities during the Second World War in connection with the atomic bomb. There was a car waiting outside and they told me that they were going to take me to the Brookhaven National Laboratory on Long Island. Apparently, they were rounding up all the scientists who had ever worked in the field of atomic energy ...
This story was translated into Russian and it had a large impact on Andrei Sakharov, who showed it to his colleague Victor Adamsky:
A number of us discussed it. It was about a war between the USSR and the USA, a very devastating one, which brought victory to the USSR. Szilard and a number of other physicists are put under arrest and then face the court as war criminals for having created weapons of mass destruction. Neither they nor their lawyers could make up a cogent proof of their innocence. We were amazed by this paradox. You can’t get away from the fact that we were developing weapons of mass destruction. We thought it was necessary. Such was our inner conviction. But still the moral aspect of it would not let Andrei Dmitrievich and some of us live in peace.

See also The Many Worlds of Leo Szilard (APS symposium). Slides for Richard Garwin's excellent summary of Szilard's work, including nuclear physics, refrigeration, and Maxwell's Demon. One of Garwin's anecdotes:
Ted Puck was a distinguished biologist, originally trained in physics. ‘With the greatest possible reluctance I have come to the conclusion that it is not possible for me personally to work with you scientifically,’ he wrote Szilard. ‘Your mind is so much more powerful than mine that I find it impossible when I am with you to resist the tremendous polarizing forces of your ideas and outlook.’ Puck feared his ‘own flow of ideas would slow up & productivity suffer if we were to become continuously associated working in the same place and the same general kind of field.’ Puck said, ‘There is no living scientist whose intellect I respect more. But your tremendous intellectual force is a strain on a limited person like myself.’
Puck was a pioneer in single cell cloning, aided in part by Szilard:
When Szilard saw in 1954 that biologists Philip Marcus and Theodore Puck were having trouble growing individual cells into colonies, he concluded that “since cells grow with high efficiency when they have many neighbors, you should not let a single cell know it’s alone”. This was no flippant excursion into psychobiology. Rather, Szilard’s idea to use a layered feeder dish worked, while the open dish had not (Lanouette, 1992: 396–397).
After the war Szilard worked in molecular biology. This photo of Jacques Monod and Szilard is in the seminar room at Cold Spring Harbor Lab. Monod credits Szilard for the negative-feedback idea behind his 1965 Nobel prize.
“I have … recorded” in my Nobel lecture, said Monod, “how it was Szilard who decisively reconciled me with the idea (repulsive to me, until then) that enzyme induction reflected an anti-repressive effect, rather than the reverse, as I tried, unduly, to stick to.”

 

Thursday, July 22, 2021

Embryo Screening for Polygenic Disease Risk: Recent Advances and Ethical Considerations (Genes 2021 Special Issue)



It is a great honor to co-author a paper with Simon Fishel, the last surviving member of the team that produced the first IVF baby (Louise Brown) in 1978. His mentors and collaborators were Robert Edwards (Nobel Prize 2010) and Patrick Steptoe (passed before 2010). In the photo above, of the very first scientific conference on In Vitro Fertilization (1981), Fishel (far right), Steptoe, and Edwards are in the first row. More on Simon and his experiences as a medical pioneer below. 

This article appears in a Special Issue: Application of Genomic Technology in Disease Outcome Prediction.
Embryo Screening for Polygenic Disease Risk: Recent Advances and Ethical Considerations 
L. Tellier, J. Eccles, L. Lello, N. Treff, S. Fishel, S. Hsu 
Genes 2021, 12(8), 1105 
https://doi.org/10.3390/genes12081105 
Machine learning methods applied to large genomic datasets (such as those used in GWAS) have led to the creation of polygenic risk scores (PRSs) that can be used identify individuals who are at highly elevated risk for important disease conditions, such as coronary artery disease (CAD), diabetes, hypertension, breast cancer, and many more. PRSs have been validated in large population groups across multiple continents and are under evaluation for widespread clinical use in adult health. It has been shown that PRSs can be used to identify which of two individuals is at a lower disease risk, even when these two individuals are siblings from a shared family environment. The relative risk reduction (RRR) from choosing an embryo with a lower PRS (with respect to one chosen at random) can be quantified by using these sibling results. New technology for precise embryo genotyping allows more sophisticated preimplantation ranking with better results than the current method of selection that is based on morphology. We review the advances described above and discuss related ethical considerations.
I excerpt from the paper below. 

Some related links: 





Introduction:
Over a million babies are born each year via IVF [1,2]. It is not uncommon for IVF parents to have more than one viable embryo from which to choose, as typical IVF cycles can produce four or five. The embryo that is transferred may become their child, while the others might not be used at all. We refer to this selection problem as the “embryo choice problem”. In the past, selections were made based on criteria such as morphology (i.e., rate of development, symmetry, general appearance) and chromosomal normality as determined by aneuploidy testing. 
Recently, large datasets of human genomes together with health and disease histories have become available to researchers in computational genomics [3]. Statistical methods from machine learning have allowed researchers to build risk predictors (e.g., for specific disease conditions or related quantitative traits, such as height or longevity) that use the genotype alone as input information. Combined with the precision genotyping of embryos, these advances provide significantly more information that can be used for embryo selection to IVF parents. 
In this brief article, we provide an overview of the advances in genotyping and computational genomics that have been applied to embryo selection. We also discuss related ethical issues, although a full discussion of these would require a much longer paper. ...

 Ethical considerations:

For further clarification, we explore a specific scenario involving breast cancer. It is well known that monogenic BRCA1 and BRCA2 variants predispose women to breast cancer, but this population is small—perhaps a few per thousand in the general population. The subset of women who do not carry a BRCA1 or BRCA2 risk variant but are at high polygenic risk is about ten times as large as the BRCA1/2 group. Thus, the majority of breast cancer can be traced to polygenic causes in comparison with commonly tested monogenic variants. 
For BRCA carrier families, preimplantation screening against BRCA is a standard (and largely uncontroversial) recommendation [39]. The new technologies discussed here allow a similar course of action for the much larger set of families with breast cancer history who are not carriers of BRCA1 or BRCA2. They can screen their embryos in favor of a daughter whose breast cancer PRS is in the normal range, avoiding a potentially much higher absolute risk of the condition. 
The main difference between monogenic BRCA screening and the new PRS screening against breast cancer is that the latter technology can help an order of magnitude more families. From an ethical perspective, it would be unconscionable to deny PRS screening to BRCA1/2-negative families with a history of breast cancer. ...

 

On Simon Fishel's experiences as an IVF pioneer (see here):

Today millions of babies are produced through IVF. In most developed countries roughly 3-5 percent of all births are through IVF, and in Denmark the fraction is about 10 percent! But when the technology was first introduced with the birth of Louise Brown in 1978, the pioneering scientists had to overcome significant resistance. There may be an alternate universe in which IVF was not allowed to develop, and those millions of children were never born. 

Wikipedia: ...During these controversial early years of IVF, Fishel and his colleagues received extensive opposition from critics both outside of and within the medical and scientific communities, including a civil writ for murder.[16] Fishel has since stated that "the whole establishment was outraged" by their early work and that people thought that he was "potentially a mad scientist".[17] 

I predict that within 5 years the use of polygenic risk scores will become common in some health systems (i.e., for adults) and in IVF. Reasonable people will wonder why the technology was ever controversial at all, just as in the case of IVF.

Figure below from our paper. EHS = Embryo Health Score. 

Tuesday, June 29, 2021

Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank (published version)


This is the published version of our MedRxiv preprint discussed back in April 2021. It is in the special issue Application of Genomic Technology in Disease Outcome Prediction of the journal Genes. 

There is a lot in this paper: genomic prediction of important biomarkers (e.g., lipoprotein A, mean platelet (thrombocyte) volume, bilirubin, platelet count), prediction of important disease risks from biomarkers (novel ML in a ~65 dimensional space) with potential clinical applications. As is typical, genomic predictors trained in a European ancestry population perform less well in distant populations (e.g., S. Asians, E. Asians, Africans). This is probably due to different SNP LD (correlation) structure across populations. However predictors of disease risk using directly measured biomarkers do not show this behavior -- they can be applied even to distant ancestry groups.

The referees did not like our conditional probability notation:
( biomarkers | SNPs )   and   ( disease risk | biomarkers )
So we ended up with lots of acronyms to refer to the various predictors.

Some of the biomarkers identified by ML as important for predicting specific disease risk are not familiar to practitioners and have not been previously discussed (as far as we could tell from the literature) as relevant to that specific disease. One medical school professor and practitioner, upon seeing our results, said he would in future add several new biomarkers to routine blood tests ordered for his patients.
 
Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank 
Erik Widen 1,*,Timothy G. Raben 1, Louis Lello 1,2,* and Stephen D. H. Hsu 1,2 
1 Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI 48824, USA 
2 Genomic Prediction, Inc., 675 US Highway One, North Brunswick, NJ 08902, USA 
*Authors to whom correspondence should be addressed. 
Academic Editor: Sulev Koks 
Genes 2021, 12(7), 991; https://doi.org/10.3390/genes12070991 (registering DOI) 
Received: 30 March 2021 / Revised: 22 June 2021 / Accepted: 23 June 2021 / Published: 29 June 2021 
(This article belongs to the Special Issue Application of Genomic Technology in Disease Outcome Prediction) 
Abstract 
We use UK Biobank data to train predictors for 65 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, etc. from SNP genotype. For example, our Polygenic Score (PGS) predictor correlates ∼0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information); we call these predictors biomarker risk scores, BMRS. Individuals who are at high risk (e.g., odds ratio of >5× population average) can be identified for conditions such as coronary artery disease (AUC∼0.75), diabetes (AUC∼0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ∼10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: PRS) for common diseases to the risk predictors which result from the concatenation of learned functions BMRS and PGS, i.e., applying the BMRS predictors to the PGS output.


Figure 11. The ASCVD BMRS and the ASCVD Risk Estimator both make accurate risk predictions but with partially complementary information. (Upper left): Predicted risk by BMRS, the ASCVD Risk Estimator and a PRS predictor were binned and compared to the actual disease prevalence within each bin. The gray 1:1 line indicates perfect prediction. ... The ASCVD Risk Estimator was applied to 340k UKB samples while the others were applied to an evaluation set of 28k samples, all of European ancestry. (Upper right) shows a scatter plot and distributions of the risk predicted by BMRS versus the risk predicted by the ASCVD Risk Estimator for the 28k Europeans in the evaluation set. The BMRS distribution has a longer tail of high predicted risk, providing the tighter confidence interval in this region. The left plot y-axis is the actual prevalence within the horizontal and vertical cross-sections, as illustrated with the shaded bands corresponding to the hollow squares to the left. Notably, both predictors perform well despite the differences in assigned stratification. The hexagons are an overlay of the (lower center) heat map of actual risk within each bin (numbers are bin sizes). Both high risk edges have varying actual prevalence but with a very strong enrichment when the two predictors agree.

Monday, April 05, 2021

Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank

These new results arose from initial investigations of blood biomarker predictions from DNA. The lipoprotein A predictor we built correlates almost 0.8 with the measured result, and this agreement would probably be even stronger if day to day fluctuations were averaged out. It is the most accurate genomic predictor for a complex trait that we are aware of.

We then became interested in the degree to which biomarkers alone could be used to predict disease risk. Some of the biomarker-based disease risk predictors we built (e.g., for kidney or liver problems) do not, as far as we know, have widely used clinical counterparts. Further research may show that predictors of this kind have broad utility. 

Statistical learning in a space of ~50 biomarkers is considered a "high dimensional" problem from the perspective of medical diagnosis, however compared to genomic prediction using a million SNP features, it is rather straightforward. 
 
Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank  
Erik Widen, Timothy G. Raben, Louis Lello, Stephen D.H. Hsu 
doi: https://doi.org/10.1101/2021.04.01.21254711 
We use UK Biobank data to train predictors for 48 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, ... from SNP genotype. For example, our predictor correlates  ~ 0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information). Individuals who are at high risk (e.g., odds ratio of > 5x population average) can be identified for conditions such as coronary artery disease (AUC ~ 0.75), diabetes (AUC ~ 0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ~10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: (risk score | SNPs)) for common diseases to the risk predictors which result from the concatenation of learned functions (risk score | biomarkers) and (biomarker | SNPs).

Blog Archive

Labels