Saturday, July 20, 2019

The diffusion of knowledge

Szilard and Wigner told Einstein about their recent calculations... how the fission process might create chain reactions and nuclear bombs. "Daran habe ich gar nicht gedacht," said Einstein -- I did not think about that at all!
In the past two weeks I gave talks at ISIR2019 (Minneapolis), the Institute of Biomedical Sciences (Academia Sinica, Taipei -- home of the Taiwan biobank), Innovative Genomics Institute (IGI = CRISPR central, UC Berkeley and UCSF) and at OpenAI (AGI in San Francisco).
Title: Genomic Prediction of Complex Traits and Disease Risks via AI/ML and Large Genomic Datasets

Abstract: The talk is divided into two parts. The first gives an overview of the rapidly advancing area of genomic prediction of disease risks using polygenic scores. We can now identify risk outliers (e.g., with 5 or 10 times normal risk) for about 20 common disease conditions, ranging from diabetes to heart diseases to breast cancer, using inexpensive SNP genotypes (i.e., as offered by 23andMe). We can also predict some complex quantitative traits (e.g., adult height with accuracy of few cm, using ~20k SNPs). I discuss application of these results in precision medicine as well as embryo selection in IVF, and give some details about genetic architectures. The second part covers the AI/ML used to build these predictors, with an emphasis on "sparse learning" and phase transitions in high dimensional statistics.
Slides for the first part of the talk.

I also appeared on Dilbert creator Scott Adam's show.

Wednesday, July 17, 2019

Beijing 2019 Notes -- addendum



I just came across this beautiful video with 4k drone footage of Guangzhou, part of the Guangdong-Hong Kong-Macau Greater Bay Area in the Pearl River delta region.

In my earlier post on Beijing I emphasized the issue of scale in China -- massive scale that is evident in the video above.

I traveled in SE Asia before the 1997 currency / economic crisis. At that time there was plenty of evidence of a bubble in those countries -- unused infrastructure and real estate built on spec, few signs of real technological or productive capability, etc. China had aspects of that 10 years ago, but now it's apparent that earlier infrastructure investment is being put to good use.

As I walked around Beijing I strained to find things around me -- buildings, solar panels, batteries, cars, high speed trains, electronics, software infrastructure, even airplanes -- that couldn't be sourced in China. Other than a few specific tech stacks that will get serious attention in coming years (e.g., CPUs) I was not able to think of many areas in which China has not caught up technologically. See Can the US derail China 2025?

PS I'm back in the US now. Will be giving a talk today at IGI in Berkeley and at OpenAI on Thursday.

Thursday, July 11, 2019

Manifold Episode #14: Stuart Firestein on Why Ignorance and Failure Lead to Scientific Progress



Steve and Corey speak with Stuart Firestein (Professor of Neuroscience at Columbia University, specializing in the olfactory system) about his two books Ignorance: How It Drives Science, and Failure: Why Science Is So Successful. Stuart explains why he thinks that it is a mistake to believe that scientists make discoveries by following the “scientific method” and what he sees as the real relationship between science and art. We discuss Stuart’s recent research showing that current models of olfactory processing are wrong, while Steve delves into the puzzling infinities in calculations that led to the development of quantum electrodynamics. Stuart also makes the case that the theory of intelligent design is more intelligent than most scientists give it credit for and that it would be wise to teach it in science classes.

Stuart Firestein

Failure: Why Science Is so Successful

Ignorance: How it drives science

Transcript


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Wednesday, July 03, 2019

Beijing 2019 Notes

I'm at Beijing University in Zhongguancun. Some brief notes and photos below.

I had meetings with Beida professors, prominent tech entrepreneurs and VCs, policy analysts, IVF doctors and genetic scientists. I also had conversations with ordinary people -- drivers, maids, hotel and service staff.

I've been traveling to Beijing for about 15 years now and have observed significant improvements in infrastructure, general economic level, civil society, general behavior. This would of course be obvious to people living in China, which presumably explains the confidence people here have in their government and in continued advances in development. The hypothesis that this society is "brittle" or vulnerable to shocks seems unsupported.

The main thing to comprehend about China is scale. There are easily ~350M (i.e., population of US) people here living roughly first world lives: with access to education, good jobs, climate controlled apartment in major city, good public transportation, fast internet access, etc. Probably the number is twice as large depending on how one defines the category. For one thing, this means that the supply of engineers, technologists, lab scientists, project managers, entrepreneurs, etc. is very large. There are certainly poor people who lack opportunity, but the size of the population for which the education and economic system are working reasonably well is very large. Possibly a billion people out of ~1.4B.

Beijing is a microcosm of this phenomenon of scale. It's a huge city (over 20M people) with the kind of modern metro system only to be found in places like Tokyo or perhaps Seoul or Paris or London. One can ride the longer lines for 90 minutes without exiting, covering the entire extent of the city from one side to the other. Despite the public transport system, the roads are clogged with recent model cars, producing traffic conditions reminiscent of Los Angeles. I don't find the city as a whole all that livable -- it's too enormous for me -- but locals know all the many charming locations (see photos below). Beijing is reaching a level of development that reminds me of Tokyo.

Trump, the trade war, and US-China relations came up frequently in discussion. Chinese opinion tends to focus on the long term. Our driver for a day trip to the Great Wall was an older man from the countryside, who has lived only 3 years in Beijing. I was surprised to hear him expressing a very balanced opinion about the situation. He understood Trump's position remarkably well -- China has done very well trading with the US, and owes much of its technological and scientific development to the West. A recalibration is in order, and it is natural for Trump to negotiate in the interest of US workers.

China's economy is less and less export-dependent, and domestic drivers of growth seem easy to identify. For example, there is still a lot of low-hanging fruit in the form of "catch up growth" -- but now this means not just catching up with the outside developed world, but Tier 2 and Tier 3 cities catching up with Tier 1 cities like Beijing, Shanghai, Shenzhen, etc.

China watchers have noted the rapidly increasing government and private sector debt necessary to drive growth here. Perhaps this portends a future crisis. However, I didn't get any sense of impending doom for the Chinese economy. To be fair there was very little inkling of what would happen to the US economy in 2007-8.  Some of the people I met with are highly placed with special knowledge -- they are among the most likely to be aware of problems. Overall I had the impression of normalcy and quiet confidence, but perhaps this would have been different in an export/manufacturing hub like Shenzhen. [ Update: Today after posting this I did hear something about economic concerns... So situation is unclear. ]

Innovation is everywhere here. Perhaps the most obvious is the high level of convenience from the use of e-payment and delivery services. You can pay for everything using your mobile (increasingly, using just your face!), and you can have food and other items (think Amazon on steroids) delivered quickly to your apartment. Even museum admissions can be handled via QR code.

A highly placed technologist told me that in fields like AI or computer science, Chinese researchers and engineers have access to local in-depth discussions of important arXiv papers -- think StackOverflow in Mandarin. Since most researchers here can read English, they have access both to Western advances, and a Chinese language reservoir of knowledge and analysis. He anticipates that eventually the pace and depth of engineering implementation here will be unequaled.

IVF and genetic testing are huge businesses in China. Perhaps I'll comment more on this in the future. New technologies, in genomics as in other areas, tend to be received more positively here than in the US and Europe.


National Museum



Bookstore and Cafe on the grounds of the National Art Museum.







Tiananmen Square (see below for historical note)


An email sent to Julian Assange's attorney, whom I met at CogX in London:
Hi Jen,

I really enjoyed your Q&A today. Keep fighting the good fight.

Wikileaks diplomatic cables reveal no mass shootings in Tiananmen Square:

https://wikileaks.org/plusd/cables/89BEIJING18828_a.html

https://www.telegraph.co.uk/news/worldnews/wikileaks/8555142/Wikileaks-no-bloodshed-inside-Tiananmen-Square-cables-claim.html

Our media has been misrepresenting this historical event for 30 years
now. There was certainly violence, but not in the square itself.

Best wishes,
Steve
See comments for further discussion...

Note Added: In the comments AG points to a Quora post by a user called Janus Dongye Qimeng, an AI researcher in Cambridge UK, who seems to be a real China expert. I found these posts to be very interesting.

Infrastructure development in poor regions of China

Size of Chinese internet social network platforms

Can the US derail China 2025? (Core technology stacks in and outside China)

Huawei smartphone technology stack and impact of US entity list interdiction (software and hardware!)

Agriculture at Massive Scale

US-China AI competition


More recommenations: Bruno Maçães is one of my favorite modern geopolitical thinkers. A Straussian of sorts (PhD under Harvey Mansfield at Harvard), he was Secretary of State for European Affairs in Portugal, and has thought deeply about the future of Eurasia and of US-China relations. He spent the last year in Beijing and I was eager to meet with him while here. His recent essay Equilibrium Americanum appeared in the Berlin Policy Journal. Podcast interview -- we hope to have him on Manifold soon :-)

Thursday, June 27, 2019

Manifold Podcast #13: Joe Cesario on Political Bias and Problematic Research Methods in Social Psychology



Corey and Steve continue their discussion with Joe Cesario and examine methodological biases in the design and conduct of experiments in social psychology and ideological bias in the interpretation of the findings. Joe argues that experiments in his field are designed to be simple, but that in making experimental set ups simple researchers remove critical factors that actually matter for a police officer making a decision in the real world. In consequence, he argues that the results cannot be taken to show anything about actual police behavior. Joe maintains that social psychology as a whole is biased toward the left politically and that this affects how courses are taught and research conducted. Steve points out the university faculty on the whole tend to be shifted left relative to the general population. Joe, Corey, and Steve discuss the current ideological situation on campus and how it can be alienating for students from conservative backgrounds.

Joseph Cesario's Lab
https://www.cesariolab.com/

Transcript
https://manifoldlearning.com/2019/06/27/episode-013-transcript/


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Monday, June 24, 2019

Ulam on von Neumann, Godel, and Einstein


Ulam expresses so much in a few sentences! From his memoir, Adventures of a Mathematician. Above: Einstein and Godel. Bottom: von Neumann, Feynman, Ulam.
When it came to other scientists, the person for whom he [vN] had a deep admiration was Kurt Gödel. This was mingled with a feeling of disappointment at not having himself thought of "undecidability." For years Gödel was not a professor at Princeton, merely a visiting fellow, I think it was called. Apparently there was someone on the faculty who was against him and managed to prevent his promotion to a professorship. Johnny would say to me, "How can any of us be called professor when Gödel is not?" ...

As for Gödel, he valued Johnny very highly and was much interested in his views. I believe knowing the importance of his own discovery did not prevent Gödel from a gnawing uncertainty that maybe all he had discovered was another paradox à la Burali Forte or Russell. But it is much, much more. It is a revolutionary discovery which changed both the philosophical and the technical aspects of mathematics.

When we talked about Einstein, Johnny would express the usual admiration for his epochal discoveries which had come to him so effortlessly, for the improbable luck of his formulations, and for his four papers on relativity, on the Brownian motion, and on the photo-electric quantum effect. How implausible it is that the velocity of light should be the same emanating from a moving object, whether it is coming toward you or whether it is receding. But his admiration seemed mixed with some reservations, as if he thought, "Well, here he is, so very great," yet knowing his limitations. He was surprised at Einstein's attitude in his debates with Niels Bohr—at his qualms about quantum theory in general. My own feeling has always been that the last word has not been said and that a new "super quantum theory" might reconcile the different premises.

Saturday, June 22, 2019

Silicon Oligarchs: Winner Take All?


Joel Kotkin is a Presidential Fellow in Urban Futures at Chapman University and Executive Director for the Center for Opportunity Urbanism.
What Do the Oligarchs Have in Mind for Us?

...This tiny sliver of humanity, with their relatively small cadre of financiers, engineers, data scientists, and marketers, now control the exploitation of our personal data, what Alibaba founder, Jack Ma calls the “electricity of the 21st century.” Their “super platforms,” as one analyst noted, “now operate as “digital gatekeepers” lording over “e-monopsonies” that control enormous parts of the economy. Their growing power, notes a recent World Bank Study, is built on “natural monopolies” that adhere to web-based business, and have served to further widen class divides not only in the United States but around the world.

The rulers of the Valley and its Puget Sound doppelganger now account for eight of the 20 wealthiest people on the planet. Seventy percent of the 56 billionaires under 40 live in the state of California, with 12 in San Francisco alone. In 2017, the tech industry, mostly in California, produced 11 new billionaires. The Bay Area has more billionaires on the Forbes 400 list than any metro region other than New York and more millionaires per capita than any other large metropolis.

For an industry once known for competition, the level of concentration is remarkable. Google controls nearly 90 percent of search advertising, Facebook almost 80 percent of mobile social traffic, and Amazon about 75 percent of US e-book sales, and, perhaps most importantly, nearly 40 percent of the world’s “cloud business.” Together, Google and Apple control more than 95 percent of operating software for mobile devices, while Microsoft still accounts for more than 80 percent of the software that runs personal computers around the world.

The wealth generated by these near-monopolies funds the tech oligarchy’s drive to monopolize existing industries such as entertainment, education, and retail, as well as those of the future, such as autonomous cars, drones, space exploration, and most critically, artificial intelligence. Unless checked, they will have accumulated the power to bring about what could best be seen as a “post-human” future, in which society is dominated by artificial intelligence and those who control it.

What Do the Oligarchs Want?

The oligarchs are creating a “a scientific caste system,” not dissimilar to that outlined in Aldous Huxley’s dystopian 1932 novel, Brave New World. Unlike the former masters of the industrial age, they have little use for the labor of middle- and working-class people—they need only their data. Virtually all their human resource emphasis relies on cultivating and retaining a relative handful of tech-savvy operators. “Software,” Bill Gates told Forbes in 2005, “is an IQ business. Microsoft must win the IQ war, or we won’t have a future.”

Perhaps the best insight into the mentality of the tech oligarchy comes from an admirer, researcher Greg Ferenstein, who interviewed 147 digital company founders. The emerging tech world has little place for upward mobility, he found, except for those in the charmed circle at the top of the tech infrastructure; the middle and working classes become, as in feudal times, increasingly marginal.

This reflects their perception of how society will evolve. Ferenstein notes that most oligarchs believe “an increasingly greater share of economic wealth will be generated by a smaller slice of very talented or original people. Everyone else will increasingly subsist on some combination of part-time entrepreneurial ‘gig work’ and government aid.” Such part-time work has been growing rapidly, accounting for roughly 20 percent of the workforce in the US and Europe, and is expected to grow substantially, adds McKinsey. ...

Thursday, June 20, 2019

CRISPR babies: when will the world be ready? (Nature)

This Nature News article gives a nice overview of the current status of CRISPR technology and its potential application in human reproduction. As we discussed in this bioethics conversation (Manifold Podcast #9 with philosopher Sam Kerstein of the University of Maryland), it is somewhat challenging to come up with examples where gene editing is favored over embryo selection (a well-established technology) for avoidance of a disease-linked mutation.
Nature: ... He found out about a process called preimplantation genetic diagnosis or PGD. By conceiving through in vitro fertilization (IVF) and screening the embryos, Carroll and his wife could all but eliminate the chance of passing on the mutation. They decided to give it a shot, and had twins free of the Huntington’s mutation in 2006.

Now Carroll is a researcher at Western Washington University in Bellingham, where he uses another technique that might help couples in his position: CRISPR gene editing. He has been using the powerful tool to tweak expression of the gene responsible for Huntington’s disease in mouse cells. Because it is caused by a single gene and is so devastating, Huntington’s is sometimes held up as an example of a condition in which gene editing a human embryo — controversial because it would cause changes that would be inherited by future generations — could be really powerful. But the prospect of using CRISPR to alter the gene in human embryos still worries Carroll. “That’s a big red line,” he says. “I get that people want to go over it — I do, too. But we have to be super humble about this stuff.” There could be many unintended consequences, both for the health of individuals and for society. It would take decades of research, he says, before the technology could be used safely.



Thursday, June 13, 2019

Manifold Episode #12: James Cham on Venture Capital, Risk Taking, and the Future Impacts of AI



Manifold Show Page    YouTube Channel

James Cham is a partner at Bloomberg Beta, a venture capital firm focused on the future of work. James invests in companies applying machine intelligence to businesses and society. Prior to Bloomberg Beta, James was a Principal at Trinity Ventures and a VP at Bessemer Venture Partners. He was educated in computer science at Harvard and at the MIT Sloan School of Business.

James Cham
https://www.linkedin.com/in/jcham/

Bloomberg Beta
https://www.bloombergbeta.com/


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Validation of Polygenic Risk Scores for Coronary Artery Disease in French Canadians


This study reports a validation of Polygenic Risk Scores for Coronary Artery Disease in a French Canadian population. Outliers in PRS are much more likely to have CAD than typical individuals.

In our replication tests of a variety of traits (both disease risks and quantitative traits) using European ancestry validation datasets, there is strong consistency in performance of the predictors. (See AUC consistency below.) This suggests that the genomic predictors are robust to differences in environmental conditions and also moderate differences in ethnicity (i.e., within the European population). The results are not brittle, and I believe that widespread clinical applications are coming very soon.

Validation of Genome-wide Polygenic Risk Scores for Coronary Artery Disease in French Canadians

Florian Wünnemann , Ken Sin Lo , Alexandra Langford-Alevar , David Busseuil , Marie-Pierre Dubé , Jean-Claude Tardif , and Guillaume Lettre

Genomic and Precision Medicine

Abstract
Background: Coronary artery disease (CAD) represents one of the leading causes of morbidity and mortality worldwide. Given the healthcare risks and societal impacts associated with CAD, their clinical management would benefit from improved prevention and prediction tools. Polygenic risk scores (PRS) based on an individual's genome sequence are emerging as potentially powerful biomarkers to predict the risk to develop CAD. Two recently derived genome-wide PRS have shown high specificity and sensitivity to identify CAD cases in European-ancestry participants from the UK Biobank. However, validation of the PRS predictive power and transferability in other populations is now required to support their clinical utility.

Methods: We calculated both PRS (GPSCAD and metaGRSCAD) in French-Canadian individuals from three cohorts totaling 3639 prevalent CAD cases and 7382 controls, and tested their power to predict prevalent, incident and recurrent CAD. We also estimated the impact of the founder French-Canadian familial hypercholesterolemia deletion (LDLR delta > 15kb deletion) on CAD risk in one of these cohorts and used this estimate to calibrate the impact of the PRS.

Results: Our results confirm the ability of both PRS to predict prevalent CAD comparable to the original reports (area under the curve (AUC)=0.72-0.89). Furthermore, the PRS identified about 6-7% of individuals at CAD risk similar to carriers of the LDLR delta > 15kb mutation, consistent with previous estimates. However, the PRS did not perform as well in predicting incident or recurrent CAD (AUC=0.56-0.60), maybe due to confounding because 76% of the participants were on statin treatment. This result suggests that additional work is warranted to better understand how ascertainment biases and study design impact PRS for CAD.

Conclusions: Collectively, our results confirm that novel, genome-wide PRS are able to predict CAD in French-Canadians; with further improvements, this is likely to pave the way towards more targeted strategies to predict and prevent CAD-related adverse events.
American Heart Association hails potential of PRS:
"PRSs, built using very large data sets of people with and without heart disease, look for genetic changes in the DNA that influence disease risk, whereas individual genes might have only a small effect on disease predisposition," said Guillaume Lettre, Ph.D., lead author of the study and an associate professor at the Montreal Heart Institute and Université de Montréal in Montreal, Quebec, Canada. "The PRS is like having a snapshot of the whole genetic variation found in one's DNA and can more powerfully predict one's disease risk. Using the score, we can better understand whether someone is at higher or lower risk to develop a heart problem."

Early prediction would benefit prevention, optimal management and treatment strategies for heart disease. Because PRSs are simple and relatively inexpensive, their implementation in the clinical setting holds great promises. For heart disease, early detection could lead to simple yet effective therapeutic interventions such as the use of statins, aspirin or other medications.

... The American Heart Association named the use of polygenic risk scores as one of the biggest advances in heart disease and stroke research in 2018.

Sadly, reaction to these breakthroughs in human genomics will follow the usual pattern:
It's Wrong! Genomes are too complex to decipher, GWAS is a failure, precision medicine is all hype, biology is so ineffably beautiful and incomprehensible, Hey, whaddaya, you're a physicist! ...

It's Trivial! I knew it all along. Of course, everything is heritable to some degree. Well, if you just get enough data...

I did it First! (Please cite my paper...)

Sunday, June 09, 2019

L1 vs Deep Learning in Genomic Prediction

The paper below by some of my MSU colleagues examines the performance of a number of ML algorithms, both linear and nonlinear, including deep neural nets, in genomic prediction across several different species.

When I give talks about prediction of disease risks and complex traits in humans, I am often asked why we are not using fancy (trendy?) methods such as Deep Learning (DL). Instead, we focus on L1 penalization methods ("sparse learning") because 1. the theoretical framework (including theorems providing performance guarantees) is well-developed, and (relatedly) 2. the L1 methods perform as well or better than other methods in our own testing.

The term theoretical framework may seem unusual in ML, which is at the moment largely an empirical subject. Experience in theoretical physics shows that when powerful mathematical results are available, they can be very useful to guide investigation. In the case of sparse learning we can make specific estimates for how much data is required to "solve" a trait -- i.e., capture most of the estimated heritability in the predictor. Five years ago we predicted a threshold of a few hundred thousand genomes for height, and this turned out to be correct. Currently, this kind of performance characterization is not possible for DL or other methods.

What is especially powerful about deep neural nets is that they yield a quasi-convex (or at least reasonably efficient) optimization procedure which can learn high dimensional functions. The class of models is both tractable from a learning/optimization perspective, but also highly expressive. As I wrote here in my ICML notes (see also Elad's work which relates DL to Sparse Learning):
It may turn out that the problems on which DL works well are precisely those in which the training data (and underlying generative processes) have a hierarchical structure which is sparse, level by level. Layered networks perform a kind of coarse graining (renormalization group flow): first layers filter by feature, subsequent layers by combinations of features, etc. But the whole thing can be understood as products of sparse filters, and the performance under training is described by sparse performance guarantees (ReLU = thresholded penalization?).
However, currently in genomic prediction one typically finds that nonlinear interactions are small, which means features more complicated than single SNPs are unnecessary. (In a recent post I discussed a new T1D predictor that makes use of nonlinear haplotype interaction effects, but even there the effects are not large.) Eventually I expect this situation to change -- when we have enough whole genomes to work with, a DL approach which can (automatically) identify important features (motifs?) may allow us to go beyond SNPs and simple linear models.

Note, though, that from an information theoretic perspective (see, e.g., any performance theorems in compressed sensing) it is obvious that we will need much more data than we currently have to advance this program. Also, note that Visscher et al.'s recent GCTA work suggests that additive SNP models using rare variants (i.e., extracted from whole genome data), can account for nearly all the expected heritability for height. This implies that the power of nonlinear methods like DL may not yield qualitatively better results than simpler L1 approaches, even in the limit of very large whole genome datasets.
Benchmarking algorithms for genomic prediction of complex traits

Christina B. Azodi, Andrew McCarren, Mark Roantree, Gustavo de los Campos, Shin-Han Shiu

The usefulness of Genomic Prediction (GP) in crop and livestock breeding programs has led to efforts to develop new and improved GP approaches including non-linear algorithm, such as artificial neural networks (ANN) (i.e. deep learning) and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of GP datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and five non-linear algorithms, including ANNs. First, we found that hyperparameter selection was critical for all non-linear algorithms and that feature selection prior to model training was necessary for ANNs when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple GP algorithms (i.e. ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits than that of linear algorithms. Although ANNs did not perform best for any trait, we identified strategies (i.e. feature selection, seeded starting weights) that boosted their performance near the level of other algorithms. These results, together with the fact that even small improvements in GP performance could accumulate into large genetic gains over the course of a breeding program, highlights the importance of algorithm selection for the prediction of trait values.


Saturday, June 08, 2019

London: CogX, Founders Forum, Healthtech


I'm in London again to give the talk below and attend some meetings, including Founders Forum and their Healthtech event the day before.
CogX: The Festival of AI and Emerging Technology
King's Cross, London, N1C 4BH

When Machine Learning Met Genetic Engineering

3:30 pm Tuesday June 11 Cutting Edge stage

Speakers

Stephen Hsu
Senior Vice-President for Research and Innovation
Michigan State University

Helen O’Neill
Lecturer in Reproductive and Molecular Genetics
UCL

Martin Varsavsky
Executive Chairman
Prelude Fertility

Azeem Azhar (moderator)
Founder
Exponential View

Regent's Canal, Camden Town near King's Cross.





CogX speakers reception, Sunday evening:



HealthTech


Commanding heights of global capital:



Sunset, Camden locks:


Sunday, June 02, 2019

Genomic Prediction: Polygenic Risk Score for Type 1 Diabetes

In an earlier post I collected links related to recent progress in Polygenic Risk Scores (PRS) and health care applications. The paper below describes a new (published in 2019) predictor for Type 1 Diabetes (T1D) that achieves impressive accuracy (AUC > 0.9) using 67 SNPs. It incorporates model features such as nonlinear interactions between haplotypes.


T1D is highly heritable and tends to manifest at an early age. One application of this predictor is to differentiate between T1D and the more common (in later life) T2D. Another application is to embryo screening. Genomic Prediction has independently validated this predictor on sibling data and may implement it in their embryo biopsy pipeline, which includes tests for aneuploidy, single gene mutations, and polygenic risk.
Development and Standardization of an Improved Type 1 Diabetes Genetic Risk Score for Use in Newborn Screening and Incident Diagnosis

Sharp, et al.
Diabetes Care 2019;42:200–207 | https://doi.org/10.2337/dc18-1785

OBJECTIVE
Previously generated genetic risk scores (GRSs) for type 1 diabetes (T1D) have not captured all known information at non-HLA loci or, particularly, at HLA risk loci. We aimed to more completely incorporate HLA alleles, their interactions, and recently discovered non-HLA loci into an improved T1D GRS (termed the “T1D GRS2”) to better discriminate diabetes subtypes and to predict T1D in newborn screening studies.

RESEARCH DESIGN AND METHODS
In 6,481 case and 9,247 control subjects from the Type 1 Diabetes Genetics Consortium, we analyzed variants associated with T1D both in the HLA region and across the genome. We modeled interactions between variants marking strongly associated HLA haplotypes and generated odds ratios to create the improved GRS, the T1D GRS2. We validated our findings in UK Biobank. We assessed the impact of the T1D GRS2 in newborn screening and diabetes classification and sought to provide a framework for comparison with previous scores.

RESULTS
The T1D GRS2 used 67 single nucleotide polymorphisms (SNPs) and accounted for interactions between 18 HLA DR-DQ haplotype combinations. The T1D GRS2 was highly discriminative for all T1D (area under the curve [AUC] 0.92; P < 0.0001 vs. older scores) and even more discriminative for early-onset T1D (AUC 0.96). In simulated newborn screening, the T1D GRS2 was nearly twice as efficient as HLA genotyping alone and 50% better than current genetic scores in general population T1D prediction.

CONCLUSIONS
An improved T1D GRS, the T1D GRS2, is highly useful for classifying adult incident diabetes type and improving newborn screening. Given the cost-effectiveness of SNP genotyping, this approach has great clinical and research potential in T1D.
The figure below gives some idea as to the ability of the new predictor GRS2 (panels B and D) to differentiate cases vs controls, and T1D vs T2D.

Thursday, May 30, 2019

Manifold Episode #11: Joe Cesario on Police Decision Making and Racial Bias in Deadly Force Decisions



Manifold Show Page    YouTube Channel

Corey and Steve talk with Joe Cesario about his recent work which argues that, contrary to activist claims and media reports, there is no widespread racial bias in police shootings. Joe discusses his analysis of national criminal justice data and his experimental studies with police officers in a specially designed realistic simulator. He maintains that racial bias does exist in other uses of force such as tasering but that the decision to shoot is fundamentally different: it is driven by specific events and context, rather than race.

Cesario is associate professor of Psychology at Michigan State University. He studies social cognition and decision-making. His recent topics of study include police use of deadly force and computational modeling of fast decisions. Cesario is dedicated to reform in the practice, reporting, and publication of psychological science.

Is There Evidence of Racial Disparity in Police Use of Deadly Force? Analyses of Officer-Involved Fatal Shootings in 2015–2016
https://journals.sagepub.com/doi/abs/...

Example of officer completing shooting simulator
https://youtu.be/Le8zoqk-UVo

Overview of Current Research on Officer-Involved Shootings
https://www.cesariolab.com/police

Joseph Cesario Lab
https://www.cesariolab.com/


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Tuesday, May 28, 2019

NYTimes Op-Ed from the future (Ted Chiang): Genetics and Cognitive Enhancement

In this scenario Ted Chiang forecasts that recipients of government-funded genetic enhancement will not catch up to children of elites who receive similar enhancements. The latter are born to rich, highly educated parents and have access to elite social networks, better schools, etc. The system is still not entirely fair (i.e., invariant to accidents of birth), because many non-genetic advantages still exist. But can we ever achieve equality of outcome? At what cost?

Nevertheless, perhaps the beneficiaries of the Gene Equality Project are at least better off than their siblings who were not in the program?

It is interesting that the Times is already flirting with the idea of redistribution of genetic endowments. See also The Neanderthal Problem.
NYTIMES OP-ED FROM THE FUTURE

It’s 2059, and the Rich Kids Are Still Winning
DNA tweaks won’t fix our problems.

Ted Chiang is an award-winning science fiction writer.

Editors’ note: This is the first installment in a new series, “Op-Eds From the Future,” in which science fiction authors, futurists, philosophers and scientists write op-eds that they imagine we might read 10, 20 or even 100 years in the future.

Last week, The Times published an article about the long-term results of the Gene Equality Project, the philanthropic effort to bring genetic cognitive enhancements to low-income communities. The results were largely disappointing: While most of the children born of the project have now graduated from a four-year college, few attended elite universities and even fewer have found jobs with good salaries or opportunities for advancement. With the results in hand, it is time for us to re-examine the efficacy and desirability of genetic engineering.

The intentions behind the Gene Equality Project were good. Therapeutic genetic interventions, such as correcting the genes that cause cystic fibrosis and Huntington’s disease, have been covered by Medicare ever since their approval by the Food and Drug Administration, making them available to the children of low-income parents. However, augmentations like cognitive enhancements have never been covered — not even by private insurance — and were available only to affluent parents. Amid fears that we were witnessing the creation of a caste system based on genetic differences, the Gene Equality Project was begun 25 years ago, enabling 500 pairs of low-income parents to increase the intelligence of their children.

The project offered a common cognitive-enhancement protocol involving modifications to 80 genes associated with intelligence. Each individual modification had only a small effect on intelligence, but in combination they typically gave a child an I.Q. of 130, putting the child in the top 5 percent of the population. This protocol has become one of the most popular enhancements purchased by affluent parents, and it is often referenced in media profiles of the “New Elite,” the genetically engineered young people who are increasingly prevalent in management positions of corporate America today. Yet the 500 subjects of the Gene Equality Project are not enjoying career success that is remotely comparable to the success of the New Elite, despite having received the same protocol.

A range of explanations has been offered for the project’s results. White supremacist groups have claimed that its failure shows that certain races are incapable of being improved, given that many — although by no means all — of the beneficiaries of the project were people of color. Conspiracy theorists have accused the participating geneticists of malfeasance, claiming that they pursued a secret agenda to withhold genetic enhancements from the lower classes. But these explanations are unnecessary when one realizes the fundamental mistake underlying the Gene Equality Project: Cognitive enhancements are useful only when you live in a society that rewards ability, and the United States isn’t one.

It has long been known that a person’s ZIP code is an excellent predictor of lifetime income, educational success and health. Yet we continue to ignore this because it runs counter to one of the founding myths of this nation: that anyone who is smart and hardworking can get ahead. Our lack of hereditary titles has made it easy for people to dismiss the importance of family wealth and claim that everyone who is successful has earned it. The fact that affluent parents believe that genetic enhancements will improve their children’s prospects is a sign of this: They believe that ability will lead to success because they assume that their own success was a result of their ability.

For those who assume that the New Elite are ascending the corporate ladder purely on the basis of merit, consider that many of them are in leadership positions, but I.Q. has historically had only a weak correlation with effectiveness as a leader. Also consider that genetic height enhancement is frequently purchased by affluent parents, and the tendency to view taller individuals as more capable leaders is well documented. In a society increasingly obsessed with credentials, being genetically engineered is like having an Ivy-League M.B.A.: It is a marker of status that makes a candidate a safe bet for hiring, rather than an indicator of actual competence.

This is not to say that the genes associated with intelligence play no role in creating successful individuals — they absolutely do. They are an essential part of a positive feedback loop: When children demonstrate an aptitude at any activity, we reward them with more resources — equipment, private tutors, encouragement — to develop that aptitude; their genes enable them to translate those resources into improved performance, which we reward with even better resources, and the cycle continues until as adults they achieve exceptional career success. But low-income families living in neighborhoods with underfunded public schools often cannot sustain this feedback loop; the Gene Equality Project didn’t offer any resources besides better genes, and without these additional resources, the full potential of those genes was never realized.

We are indeed witnessing the creation of a caste system, not one based on biological differences in ability, but one that uses biology as a justification to solidify existing class distinctions. It is imperative that we put an end to this, but doing so will take more than free genetic enhancements supplied by a philanthropic foundation. It will require us to address structural inequalities in every aspect of our society, from housing to education to jobs. We won’t solve this by trying to improve people; we’ll only solve it by trying to improve the way we treat people.

This doesn’t necessarily mean that the Gene Equality Project is something that never needs to be repeated. Instead of thinking of it as a cure to an illness, we could think of it as a diagnostic test — something we would conduct at regular intervals to gauge how close we are to reaching our goal. When the beneficiaries of free genetic cognitive enhancements become as successful as the ones whose parents bought the enhancements for them, only then will we have reason to believe that we live in an equitable society.

Finally, let’s recall one of the arguments made during the original debate about legalizing genetic cognitive enhancements. Some proponents claimed that we had an ethical obligation to pursue cognitive enhancements because of the benefits to humanity that would accrue as a result. But there have surely been many geniuses whose world-changing contributions were lost because their potential was crushed by their impoverished surroundings.

Our goal should be to ensure that every individual has the opportunity to reach his or her full potential, no matter the circumstances of birth. That course of action would be just as beneficial to humanity as pursuing genetic cognitive enhancements, and it would do a much better job of fulfilling our ethical obligations.
This is one of the Reader Picks comments:
Mark
Philadelphia May 27

I have mixed feelings about the concept of this article. Surely, private schools confer numerous advantages to their students, who are from wealthy backgrounds and connections to higher education and corporate America.

But, look at Stuyvesant. The super intelligent and successful students are very often from middle class, lower-middle class, and even poor backgrounds. They are often first generation immigrants. They are just smart and hard working and their families care desperately about education.

Some kids are just smart, while others, are just average, or below average. You really think if you went into a school in the South Bronx and donated $1 billion the students would start cranking out perfect SATs?

Ask Zuckerberg how is $50 million donation to Newark public schools went. Darwinism is cruel, but some people aren't just cut out to be good students or white collar professionals.

Much of this has little to do with class and everything to do with drive and innate ability.

Saturday, May 25, 2019

Polygenic Risk Scores

I've collected some recent links related to polygenic risk scores below.

Many experts anticipate large scale clinical use of these scores within the next few years. Research progress has been very rapid -- it will be interesting to see how long it takes for these breakthroughs to be applied in health care. Graph below shows number of papers per year.

See also Harvard Business Review: AI and the Genetic Revolution (podcast).


Guardian article on UK Health Minister's proposal for widespread genetic testing:
"The latest predictive tests for a range of common diseases take a different approach: they aggregate the tiny contributions to risk made by hundreds or even thousands of genes to give a personalised score. Because the risk is spread out over many genes, people can end up at the very high-risk end of the spectrum by chance, without having a family history of a particular illness."

"Prof John Bell, a professor of medicine at Oxford university who led a recent government-commissioned review of the life sciences industry, said the approach could have a “quite profound” effect on the ability to manage disease. ... David Spiegelhalter, professor for the public understanding of risk at the University of Cambridge, agrees that genetic tests could allow the NHS to rapidly identify those who may need closer monitoring."
Editorial in New England Journal of Medicine anticipates broad clinical use of polygenic scores:
Has the Genome Granted Our Wish Yet?

"It is likely that tailoring decisions about prescribing preventive medicines or screening practices will be the main future use of genetic risk scores. If a PRS adds to existing clinical predictors of risk such as the Framingham Risk Score or the Q index for heart disease, it could be incorporated into preventive care as readily as any other biomarker."

"There seems little doubt that interpretation of these scores will become an accepted part of clinical practice in the future..."
Article in The Conversation by two Australian professors of Public Health (discussion refers to both monogenic and polygenic risks, AFAICT):
Population DNA testing for disease risk is coming. Here are five things to know

"As DNA testing becomes cheaper, it becomes more feasible to screen large numbers of healthy people for their risk of disease."

"We modelled the health and economic benefits of offering population DNA screening in Australia, focusing on young adults aged 18-25 years (about 2.6 million Australians). ... At A$200 per test (which could be realistic in the near future), savings in treatment costs could outweigh screening costs, saving the health-care system money and saving lives."
MyHeritage DNA (T2D, Heart Disease and Breast Cancer) joins 23andMe (T2D) in offering polygenic risk scores using common SNPs in their health reports. FDA regulatory stance allows DTC (Direct To Consumer) reports of this type as long as they are provided as information to be discussed with a physician, and not to diagnose a condition or prescribe care.

Someone tweeted me this photo from a recent conference presentation: increase of AUC (predictive power) with sample size. ~100k cases enough to capture most of common SNP heritability for diseases such as Testicular or Breast Cancer?



Figure below from our paper Genomic Prediction of Complex Disease Risk (bioRxiv).

Wednesday, May 22, 2019

Tomaso Poggio on AI, Neuroscience, and Physics



Highly recommended interview with MIT professor Tomaso Poggio, which I listened to recently on a plane. IIRC, I largely agreed with his positions except that I'm a bit more optimistic about AGI. I think his estimate for AGI was 100 or 200 years from now, whereas I think by the end of my lifetime is a distinct possibility.

Poggio (trained in theoretical physics) starts by describing the effect that Special Relativity had on him as a kid. It is a striking realization that pure thought experiments of the kind originally formulated by Einstein can have such far-reaching implications. See Physics as a Strange Attractor:
I suspect that Special Relativity, because it is easy to introduce (no mathematics beyond algebra is required), yet deep and beautiful and counterintuitive, stimulates many people of high ability to become interested in physics.
I notice (perhaps unsurprisingly) a lot of similarities in Poggio's views and those of his former student Demis Hassabis of DeepMind.
Tomaso Poggio is a professor at MIT and is the director of the Center for Brains, Minds, and Machines. Cited over 100,000 times, his work has had a profound impact on our understanding of the nature of intelligence, in both biological neural networks and artificial ones. He has been an advisor to many highly-impactful researchers and entrepreneurs in AI, including Demis Hassabis of DeepMind, Amnon Shashua of MobileEye, and Christof Koch of the Allen Institute for Brain Science. This conversation is part of the Artificial Intelligence podcast and the MIT course 6.S099: Artificial General Intelligence. The conversation and lectures are free and open to everyone. Audio podcast version is available on https://lexfridman.com/ai/

Thursday, May 16, 2019

Manifold Episode 10: Ron Unz on the Subprime Mortgage Crisis, The Unz Review, and the Harvard Admissions Scandal



Ron Unz is the publisher of the Unz Review, a controversial but widely read alternative media site hosting opinion outside of the mainstream, including from both the far right and the far left. Unz studied theoretical physics at Harvard, Cambridge and Stanford. He founded the software company Wall Street Analytics, acquired by Moody’s in 2006, and was behind the 1998 ballot initiative that ended bilingual education in California.

Podcast transcript

The Unz Review

The Myth of American Meritocracy - How corrupt are Ivy League admissions?

The Myth of American Meritocracy and Other Essays


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Wednesday, May 08, 2019

Harvard Business Review: AI and the Genetic Revolution (podcast)


Harvard Business Review podcast with Azeem Azhar (Exponential View).
AI and the Genetic Revolution

Michigan State University senior vice president Stephen Hsu, a theoretical physicist and the founder of Genomic Prediction, demonstrates how the machine learning revolution, combined with the dramatic fall in the cost of human genome sequencing, is driving a transformation in our relationship with our genes. Stephen and Azeem Azhar explore how the technology works, what predictions can and cannot yet be made (and why), and the ethical challenges created by this technology.

In this podcast, Azeem and Stephen also discuss:

FDA approval of the first genetic treatment for monogenic conditions and the work towards developing treatments for polygenic conditions like diabetes and cancer.

How this technology might exacerbate existing social inequalities or create new ones; is it just an issue of access, or does it go further?

Developing best practice protocols for governance and regulation of genomic technologies.
In the interview I mention that the number of genomics papers on polygenic risk scores has exploded just in the last year or so:

Tuesday, May 07, 2019

Embryo Screening: Polygenic Traits and Disease Risk

Several people asked me to comment on this paper, which appeared recently on biorxiv. It seems to be an update of earlier (simulation) analyses by Gwern [16] and Shulman and Bostrom [15] (cited in the paper) on potential gains from embryo selection using quantitative trait predictors (e.g., height, cognitive ability). In the paper the authors analyze real families using actual genetic and phenotype data.

The main limitations given current technology are the number of embryos available from which to select, and the accuracy of the polygenic predictors. The latter will almost certainly improve significantly for some traits in the near future, and for all traits eventually. The number of embryos available for selection may also increase if new methods allow oocytes (eggs) to be produced using stem cell technology (already demonstrated in mice; video).
Screening human embryos for polygenic traits has limited utility
E. Karavani et al.

Genome-wide association studies have led to the development of polygenic score (PS) predictors that explain increasing proportions of the variance in human complex traits. In parallel, progress in preimplantation genetic testing now allows genome-wide genotyping of embryos generated via in vitro fertilization (IVF). Jointly, these developments suggest the possibility of screening embryos for polygenic traits such as height or cognitive function. There are clear ethical, legal, and societal concerns regarding such a procedure, but these cannot be properly discussed in the absence of data on the expected outcomes of screening. Here, we use theory, simulations, and real data to evaluate the potential gain of PS-based embryo selection, defined as the expected difference in trait value between the top-scoring embryo and an average, unselected embryo. We observe that the gain increases very slowly with the number of embryos, but more rapidly with increased variance explained by the PS. Given currently available polygenic predictors and typical IVF yields, the average gain due to selection would be ≈2.5cm if selecting for height, and ≈2.5 IQ (intelligence quotient) points if selecting for cognitive function. These mean values are accompanied by wide confidence intervals; in real data drawn from nuclear families with up to 20 offspring each, we observe that the offspring with the highest PS for height was the tallest only in 25% of the families. We discuss prospects and limitations of PS-based embryo selection for the foreseeable future.
The authors of the paper seem to define "utility" in terms of expected gain in trait value. However, there is also utility in eliminating very negative outcomes, even if they have small probability. This does not shift the average very much but may still be highly desirable. For example, the odds of my house being destroyed by fire or earthquake in the next decade are small, but the outcome is negative enough that I will act to insure against it. If there is a 1% chance of a $100k house being destroyed, the expected loss is only $1k over the period. But without insurance the outcome might be devastating to a family.

One can compare this to screening for Down Syndrome, which has an incidence of roughly 1% (depending on parental age, etc.) but very serious consequences (see podcast discussion below).

At Genomic Prediction we have focused on screening against disease risk rather than on selection for quantitative traits, for both ethical and practical reasons. Even noisy (imperfect) predictors allow the identification of individuals who are high risk outliers -- e.g., are 5x times more likely to get the disease than a typical person.



When considering disease risk the key metric is not the polygenic score itself, because odds ratios are nonlinear functions of the score (or score percentile). For example (note, this is entirely hypothetical), consider 3 embryos with disease risk percentile scores (e.g., Breast Cancer, Type 1 Diabetes, Atrial Fibrillation, Coronary Artery Disease) given by column:

    #1    33   57   64   51

    #2    62   39   36   49

    #3    26   22   52   99.5

Even though the linear averages of the four risk percentiles for all three embryos are similar (contrived to be near 50), embryo #3 has unusually high risk for one condition (e.g., Coronary Artery Disease) and embryos #1 and #2 might be preferred.

Quantifying the utility to the family from this kind of screening is much more complex than for quantitative traits such as height or cognitive ability.

For more on ethical questions related to genetic engineering and embryo selection, see this podcast discussion with Sam Kerstein, chair of the philosophy department at the University of Maryland.

Friday, May 03, 2019

Janelia (HHMI) talk: Genomic Prediction of Complex Traits and Disease Risks via AI and Large Genomic Datasets



Janelia is the research campus of the Howard Hughes Medical Institute (HHMI), located near Washington DC. It is reputed to be heaven on earth for scientists :-)

I'll be visiting there next week (see title and abstract below). If you're at Janelia and want to meet with me there is still a little space on my schedule. Or just come to the talk and try to grab me afterward.

My talk is Tuesday May 7 12:30 – 1:30.
Genomic Prediction of Complex Traits and Disease Risks via AI and Large Genomic Datasets

Abstract: The talk is divided into two parts. The first gives an overview of the rapidly advancing area of genomic prediction of disease risks using polygenic scores. We can now identify risk outliers (e.g., with 5 or 10 times normal risk) for about 20 common disease conditions, ranging from diabetes to heart diseases to breast cancer, using inexpensive SNP genotypes (i.e., as offered by 23andMe). We can also predict some complex quantitative traits (e.g., adult height with accuracy of few cm, using ~20k SNPs). I discuss application of these results in precision medicine as well as embryo selection in IVF, and give some details concerning genetic architecture. The second part covers the AI/ML used to build these predictors, with an emphasis on "sparse learning" and phase transitions in high dimensional statistics.




Thursday, May 02, 2019

Manifold Episode #9: Philosopher S. Kerstein on the Morality of Genome Engineering



Corey and Steve speak with Samuel Kerstein, Professor of Philosophy and expert in Medical Ethics at the University of Maryland. They discuss the ethics of genome engineering and preimplantation embryo selection, and the inequality and narrowing of human diversity that might result from widespread adoption of these technologies. Among the topics covered: Why genome engineering at this time is immoral. Should we always pick the healthiest embryo? In the future will parents have a moral obligation to engineer their children? Will there be an arms race between countries to engineer their populations? Is Star Trek’s Khan a more advanced person (Steve) or just another smart psychopath (Sam) or both?

Samuel J. Kerstein

How to Treat Persons by Samuel J. Kerstein

CRISPR Babies – Episode #1

Transcript


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Tuesday, April 30, 2019

Dialogs


In a high corner office, overlooking Cambridge and the Harvard campus.
How big a role is deep learning playing right now in building genomic predictors?

So far, not a big one. Other ML methods perform roughly on par with DL. The additive component of variance is largest, and we have compressed sensing theorems showing near-optimal performance for capturing it. There are nonlinear effects, and eventually DL will likely be useful for learning multi-loci features. But at the moment everything is limited by statistical power, and nonlinear features are even harder to detect than additive ones. ...

The bottom line is that with enough statistical power predictors will capture the expected heritability for most traits. Are people in your field ready for this?

Some are, but for others it will be very difficult.
Conference on AI and Genomics / Precision Medicine (Boston).
I enjoyed your talk. I work for [leading AgBio company], but my PhD is in Applied Math. We've been computing Net Merit for bulls using SNPs for a long time. The human genetics people have been lagging...

Caught up now, though. And first derivative (sample size growth rate) is much larger...

Yes. It's funny because sperm is priced by Net Merit and when we or USDA revise models some farmers or breeders get very angry because the value of their bull can change a lot!
A Harvard Square restaurant.
I last saw Roman at the Fellows spring dinner, many years ago. I was back from Yale to see friends. He was drinking, with serious intent. He told me about working with Wilson at Cornell. He also told me an old story about Jeffrey and the Higgs mechanism. Jeffrey almost had it, soon after his work on the Goldstone boson. But Sidney talked him out of it -- something to the effect of "if you can only make sense of it in unitary gauge, it must be an artifact" ... Afterwards, at MIT they would say When push comes to shove, Sidney is wrong. ...

Genomics is in the details now. Lots of work to be done, but conceptually it's clear what to do. I wouldn't say that about AGI. There are still important conceptual breakthroughs that need to be made.
The Dunster House courtyard, overlooking the Charles.
We used to live here, can you let us in to look around?

I remember it all -- the long meals, the tutors, the students, the concerts in the library. Yo Yo Ma and Owen playing together.

A special time, at least for us. But long vanished except in memory.

Wheeler used to say that the past only exists as memory records.

Not very covariant! Why not a single four-manifold that exists all at once?
The Ritz-Carlton.
Flying private is like crack. Once you do it, you can't go back...
It's not like that. They never give you a number. They just tell you that the field house is undergoing a renovation and there's a naming opportunity. Then your kid is on the right list. They've been doing this for a hundred years...

Card had to do the analysis that way. Harvard was paying him...

I went to the session on VC for newbies. Now I realize "valuation" is just BS... Now you see how it really works...

Then Bobby says "What's an LP? I wanna be an LP because you gotta keep them happy."

Let me guess, you want a dataset with a million genomes and FICO scores?

I've helped US companies come to China for 20+ years. At first it was rough. Now if I'm back in the states for a while and return, Shenzhen seems like the Future. The dynamism is here.

To most of Eurasia it just looks like two competing hegemons. Both systems have their pluses and minuses, but it's not an existential problem...

Sure, Huawei is a big threat because they won't put in backdoors for the NSA. Who was tapping Merkel's cellphone? It was us...

Humans are just smart enough to create an AGI, but perhaps not smart enough to create a safe one.

Maybe we should make humans smarter first, so there is a better chance that our successors will look fondly on us. Genetically engineered super-geniuses might have a better chance at implementing Asimov's Laws of Robotics.  

Blog Archive

Labels