Saturday, August 17, 2019

Polygenic Architecture and Risk Prediction for 14 Cancers and Schizophrenia

Two recent papers on polygenic risk prediction. As I've emphasized before, these predictors already have real clinical utility but they will get significantly better with more training data.
Assessment of Polygenic Architecture and Risk Prediction based on Common Variants Across Fourteen Cancers

Yan Zhang et al.

We analyzed summary-level data from genome-wide association studies (GWAS) of European ancestry across fourteen cancer sites to estimate the number of common susceptibility variants (polygenicity) contributing to risk, as well as the distribution of their associated effect sizes. All cancers evaluated showed polygenicity, involving at a minimum thousands of independent susceptibility variants. For some malignancies, particularly chronic lymphoid leukemia (CLL) and testicular cancer, there are a larger proportion of variants with larger effect sizes than those for other cancers. In contrast, most variants for lung and breast cancers have very small associated effect sizes. For different cancer sites, we estimate a wide range of GWAS sample sizes, required to explain 80% of GWAS heritability, varying from 60,000 cases for CLL to over 1,000,000 cases for lung cancer. The maximum relative risk achievable for subjects at the 99th risk percentile of underlying polygenic risk scores, compared to average risk, ranges from 12 for testicular to 2.5 for ovarian cancer. We show that polygenic risk scores have substantial potential for risk stratification for relatively common cancers such as breast, prostate and colon, but limited potential for other cancer sites because of modest heritability and lower disease incidence.

Some people are surprised that a mental disorder might be strongly controlled by genetics -- why? However, it has been known for some time that schizophrenia is highly heritable. I anticipate that good predictors for Autism and Alzheimer's disease will be available soon.
Penetrance and Pleiotropy of Polygenic Risk Scores for Schizophrenia in 106,160 Patients Across Four Health Care Systems

Amanda B. Zheutlin et al.

Individuals at high risk for schizophrenia may benefit from early intervention, but few validated risk predictors are available. Genetic profiling is one approach to risk stratification that has been extensively validated in research cohorts. The authors sought to test the utility of this approach in clinical settings and to evaluate the broader health consequences of high genetic risk for schizophrenia.

The authors used electronic health records for 106,160 patients from four health care systems to evaluate the penetrance and pleiotropy of genetic risk for schizophrenia. Polygenic risk scores (PRSs) for schizophrenia were calculated from summary statistics and tested for association with 1,359 disease categories, including schizophrenia and psychosis, in phenome-wide association studies. Effects were combined through meta-analysis across sites.

PRSs were robustly associated with schizophrenia (odds ratio per standard deviation increase in PRS, 1.55; 95% CI=1.4, 1.7), and patients in the highest risk decile of the PRS distribution had up to 4.6-fold higher odds of schizophrenia compared with those in the bottom decile (95% CI=2.9, 7.3). PRSs were also positively associated with other phenotypes, including anxiety, mood, substance use, neurological, and personality disorders, as well as suicidal behavior, memory loss, and urinary syndromes; they were inversely related to obesity.

The study demonstrates that an available measure of genetic risk for schizophrenia is robustly associated with schizophrenia in health care settings and has pleiotropic effects on related psychiatric disorders as well as other medical syndromes. The results provide an initial indication of the opportunities and limitations that may arise with the future application of PRS testing in health care systems.

Thursday, August 15, 2019

Bruno Maçães on The Power Game in a Connected World

Bruno Maçães in Singapore at IRAHSS Geopolitics Reimagined, 22 July 2019.

Maçães is author of Belt and Road - A Chinese World Order and former Europe Minister of Portugal. He discusses the trade war, his recent visit to a Huawei factory, and the idea of hybrid warfare or weaponized interdependence.

I met with Bruno in Beijing last month. He is among the most insightful geopolitical thinkers today.
I was shown the assembly line for the P30 smartphone [~$1k flagship using Huawei chipset] and told that this assembly line just two or three years ago was operated by 140 operators people it is now down to 17 by the end of this year we'll be down to 15 it's a very long assembly line perhaps 200 250 meters takes about 30 minutes more important than the time it takes to assemble a P30 is the time between each unit and that's now down to 29 seconds so every 29 seconds a fully produced P30 comes out at the end 17 people operate now this this assembly line but the remarkable thing is that I actually looked very carefully at what the 17 were doing and it's very obvious they're not doing anything of significance they left there more in order to keep a certain control over the process...

this is not a new Cold War and I see no indications that were moving in that direction China and the United States continue to be turned towards to each other continue to be very interested in learning from each other and I think this is an important point their way of life their ideology the way they look at the world is not predicated on a negation of the other side the Soviet Union was from the very start a revolutionary movement whose whole identity was the negation of capitalist Western Way of life and organizing society now China and the United States in a way are much less connected they are not part of the same history and their dispute is not a dispute about who is fundamentally right about questions that involved both...

they're not necessarily involved in a death and life struggle between them the world we live in is I'll sum it up this way a world where and this is I think the puzzling element of it we are neither at war nor at peace we are somewhere in the middle conflict takes below takes place below the threshold of kinetic war and other forms of direct confrontation but it is no less intense because of that...

the tactics might include the purchase of infrastructure in other states the corruption or blackmail of foreign officials important elements of this new world that is not often talked about [CALLING EPSTEIN AND GHISLAINE MAXWELL] manipulation of energy flows or energy prices all of these elements are magnified in an integrated global economy the networks that bring us together are used as tools or instruments of conflict...
More Bruno, on the Belt and Road initiative.

Wednesday, August 14, 2019

Epstein and the Big Lie

The biggest Epstein conspiracy mystery is not how he died. The more important mystery is how he managed to operate out in the open for 15-20 years. Rumors concerning Epstein and leading figures like Bill Clinton have been around for at least that long. I have been following his activities, at least casually, for well over a decade.

In the 1990s I was a Bill Clinton supporter. I voted for him twice and supported his efforts to move the democratic party in a centrist, pro-business direction. But my brother is a Republican. He fed me a steady stream of anti-Clinton information that I (at the time) dismissed as crazy right-wing conspiracy theories. However, with the advent of the internet in the late 90s it became easier to obtain information that was not filtered by corrupt mainstream media outlets. I gradually realized that at least some of my brother's claims were correct. For example, Clinton's first presidential bid was almost derailed by charges of adultery by women like Gennifer Flowers. Supporters like myself dismissed these charges as a right-wing smear. However, years later, Clinton admitted under oath that he had indeed had sex with Flowers.

My first exposure to Hillary Clinton was her appearance on 60 Minutes after the SuperBowl in 1992. This was widely regarded to be the emotive performance ("stand by your man") that saved Bill Clinton's presidential candidacy. Hillary affects a fake southern accent and (I believe) lies boldly and convincingly about Flowers to an estimated 50 million Americans. Quite a display of talent.

A side-effect of my history as a Clinton supporter (and gradual enlightenment thanks to my brother!) is that I became quite interested in the tendency of the media to hide obvious truths from the general public. We Americans accept that foreign governments (e.g., the Soviets and "ChiComs") successfully brainwash their people to believe all sorts of crazy and false things. But we can't accept that the same might be true here. (The big difference is that people in the PRC and former Soviet states  -- especially intellectuals -- know propaganda when they see it, whereas most Americans do not...)

It was natural for me to become aware of Epstein once he was linked to Bill Clinton at the very birth of the Clinton Foundation. It was easy to uncover very disturbing aspects of the Epstein story -- including details of his private island, traffic in young women, connections to the rich, the powerful, and even to leading scientists, academics, (many of whom I know) and Harvard University. Almost anyone with access to the internet (let alone an actual journalist) could have discovered these things at any point in the last decade.

But just 6 months ago I could mention Epstein to highly educated "politically aware" acquaintances with absolutely no recognition on their part.

Some obvious, and still unanswered, questions:

Former Federal prosecutor and Labor Secretary R. Alexander Acosta said he was told to lay off Epstein, as he "belongs to intelligence" -- why no media followup on this? (Still don't believe in a Deep State?)

Clinton said he only flew on Epstein's plane 4 times (but 26 is also commonly reported) and never visited the island (despite many eyewitness claims to the contrary). No investigative reporting on this by mainstream media?

Epstein's partner Ghislaine Maxwell is the daughter of Robert Maxwell, a billionaire with possible Mossad connections. What were Epstein's links to Israeli intelligence and national interests? (Robert Maxwell's death is at least as mysterious as Epstein's ...)

Why did it take the FBI so long to get to Epstein's island? What have they found in Epstein's house and on his island? How much blackmail material is there and who is implicated?

Were it not for the possibility that the Epstein scandal might be damaging to Trump, would there be anything close to this level of mainstream media interest?

Why was there almost zero interest in Epstein in the previous 15-20 years?

Someone was protecting Epstein (someone with influence on the DOJ, FBI, perhaps US intelligence) long before Donald Trump had political power of any kind. Why?

What other obvious scandals are hidden in plain sight? Iraq WMD? Spygate? Compromised politicians and national leaders? Blackmail by national intelligence services? Ideology-driven Social Media and Search filtering of information? Ivy League discrimination against Asian Americans? ...

Thursday, August 08, 2019

Manifold Episode #16: John Schulman of OpenAI

John Schulman is a research scientist at OpenAI. He co-leads the Reinforcement Learning group and works on agent learning in virtual game worlds (e.g., Dota) as well as in robotics. John, Corey, and Steve talk about AI, AGI (Artificial General Intelligence), the Singularity (self-reinforcing advances in AI which lead to runaway behavior that is incomprehensible to humans), and the creation and goals of OpenAI. They discuss recent advances in language models (GPT-2) and whether these results raise doubts about the usefulness of linguistic research over the past 60 years. Does GPT-2 imply that neural networks trained using large amounts of human-generated text can encode "common sense" knowledge about the world? They also discuss what humans are better at than current AI systems, and near term examples of what is already feasible: for example, using AI drones to kill people.

John Schulman


Better Language Models and Their Implications (GPT-2)

Transcript of show

man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Sporting my OpenAI t-shirt. Wish I had worn this at Number 10 Downing Street earlier this week ;-)

Friday, August 02, 2019

Different Class Altogether

BBC Radio 4 profile of Dominic Cummings. (Sorry, didn't see any embed code.)

Some interesting comments from Dom's Oxford tutor, Robin James Lane Fox: (@4m50s)
He was extremely sharp, very sure of his own abilities, but had every reason to be... not narrow minded in any way...

BBC: Who is cleverer, Boris Johnson or Dominic Cummings?

Oh Dominic, by a long way.

BBC: A long way?

Different class altogether.

Robin James Lane Fox (Wikipedia), FRSL (born 5 October 1946)[1] is an English classicist, ancient historian and gardening writer known for his works on Alexander the Great.[2] Lane Fox is an Emeritus Fellow of New College, Oxford and Reader in Ancient History, University of Oxford. Fellow and Tutor in Ancient History at New College from 1977 to 2014...
See The Differences are EnormousCreators and Rulers, and The Gulf is Deep.

How Brexit was won, and the unreasonable effectiveness of physicists.

Wednesday, July 31, 2019

Jack Kirby Centennial Lecture

The kind of deep and heartfelt tribute only a lifetime fan (fanatic) can deliver. Very insightful history of the greatest American comic book artist.

See also I Love Jack Kirby.

Saturday, July 27, 2019

Brainpower Matters: The French H-Bomb

Michel Carayol, father of the French H-Bomb.

The article below illuminates several mysteries concerning the French development of thermonuclear weapons. Why did it take so long? Did the French really need help from the British? Who had the crucial idea of radiation compression?

The original inventors were Ulam and Teller. In the USSR it was Sakharov. The PRC inventor was Yu Min (see Note Added at bottom).

Without men such as these, how long would it have taken to develop breakthrough technologies that defined the modern age?

See also Les Grandes Ecoles, One hundred thousand brains, and Quantum GDP.


Nonproliferation Review 15:2 353, DOI 10.1080/10736700802117361

Based on the first-person account of coauthor Pierre Billaud, a prominent French participant, this article describes for the first time in such detail the history of the development of the French hydrogen bomb in the 1960s and the organization of military nuclear research in France. ...
On November 1, 1952, the United States conducted its first thermonuclear test, ‘‘Ivy Mike,’’ seven years and three and a half months after its Trinity test. It took the Soviet Union four years (August 29, 1949 -- August 12, 1953) and the United Kingdom four years and seven months (October 3, 1952 -- May 15, 1957) to achieve thermonuclear capacity. And in the following decade, China did it, with its sixth test, in fewer than three years (October 16, 1964 -- June 17, 1967). Yet after Gerboise Bleue it took France eight and a half years to reach the same landmark, detonating its first thermonuclear device on August 24, 1968. Why such a long delay, especially since the French were pioneers in nuclear research?

1965: What We Knew About the Technical Aspects

From 1955 to 1960, as we prepared for the first French atomic test, we were also pondering thermonuclear weapons. But the prospect of hydrogen weapons seemed so far into the future that we did not work seriously on it. ... Li6D was commonly considered the best fuel for thermonuclear weapons, but we did not have any idea about how to burn it. All the problems with the thermonuclear bomb can be summarized by this question: how to discover the process that will allow the Li6D to undergo a fusion reaction?

... Compared to our American colleagues in 1948, French scientists had many advantages: we knew that hydrogen bombs existed and worked and that they used Li6D, and we understood the reactions at work. We also had powerful computers, of U.S. origin, which were not available in the late 1940s. And we knew, more or less, the dimensions and weights of the nuclear weapons deployed at NATO bases in Europe and their yields. ...

De Gaulle: It’s taking forever! ... I want the first experiment to take place before I leave! Do you hear me? It’s of capital importance. Of the five nuclear powers, are we going to be the only one which hasn’t made it to the thermonuclear level? Are we going to let the Chinese get ahead of us? If we do not succeed while I am still here, we shall never make it! My successors, from whatever side, will not dare to go against the protests of the Anglo-Saxons, the communists, the old spinsters and the Church. And we shall not open the gate. But if a first explosion happens, my successors will not dare to stop halfway into the development of these weapons.

... In January 1967, I published a voluminous report wherein I presented and developed my idea from late 1965, left idle since, explaining why the current studies were going in the wrong direction and producing a ridiculously low thermonuclear efficiency. I proposed a scheme with two consecutive steps: a cold Li6D compression increasing the density, from the normal value of 0.8 g/cm3, by a factor of at least 20, followed by a sufficient temperature increase (the ignition). In this report, I also gave orders of magnitude of the energies involved in each step... [[ One can make the (flawed) analogy of Billaud to Ulam (multi-stage insight, but no mechanism for compression), and Carayol to Teller (proposed the right mechanism for compression, although in Teller's case he may have learned of it from von Neumann and Fuchs!!!). ]] 
In early April 1967, Carayol had the idea that the x-rays emitted from the fission explosion could transport the fission energy to the thermonuclear fuel chamber to induce the necessary compression. He published a brief paper wherein he presented, and justified mathematically, his architectural idea. This was the key to the solution for an efficient thermonuclear explosive device, consistent with the current data about U.S. hydrogen weapons. Carayol had rediscovered the radiative coupling concept first introduced by Americans Stanislaw Ulam and Edward Teller in January 1951.

Michel Carayol, the Genuine Father of the French H-Bomb

Michel Carayol was born in 1934 and died in 2003. His father was an industrialist and his mother a teacher. He entered Ecole Polytechnique in 1954, graduated in 1956, and joined the Armament. In 1962, he was part of the DEFA assigned to CEA-DAM at Limeil. In 1967, Carayol was part of the advanced studies branch.

... Soon after, in April 1967, Carayol wrote a brief report describing his proposal for a cylindrico-spherical case in dense metal, containing a fission device on one side and a thermonuclear sphere on the other. The report showed that the photons radiated by the primary *still very hot* in the X-ray frequency range, swept into the chamber rapidly enough to surround completely the thermonuclear sphere before the metal case would be vaporized. Carayol had discovered independently a scheme equivalent to the concept developed by Ulam and Teller in the 50s.
But Carayol's insight was ignored! It was British assistance that alerted project leadership to the value of Carayol's ideas. It is not enough for some isolated genius to make a breakthrough -- the people in charge have to understand its value.
... During the first months of 1967, Viard had told me, ‘‘A British physicist is showing some interest in what we do.’’ At several embassy parties, a first-rate British atomic scientist, Sir William Cook, former director during the 1950s of thermonuclear research at Aldermaston, the British center for atomic military applications, had approached the military attache´ at the French Embassy in London, Andre´ Thoulouze, an Air Force colonel, and had hinted to our nuclear research program. Thoulouze had previously been in charge of an air force base and knew Rene´ David, who would later work at the DAM. For this reason, instead of contacting the French main intelligence services, Thoulouze directly contacted our information bureau at CEA, the BRIS, where David was working at the time. In analyzing the fallout from the French tests, the Americans, the British, and the Soviets knew that we had not made any real progress on the thermonuclear path. In 1966 and 1967 we had tested some combination of fission with light elements. Cook told Thoulouze that we had to look for something simpler.

Two weeks after the Valduc seminar, on September 19, and while the work resulting from the Valduc decisions had not yet concretely gotten under way, Thoulouze came from London bearing information from this qualified source. Jacques Robert immediately convened a meeting, in the DAM’s headquarters in Paris, to debrief this information. Only three other people attended the meeting: Viard, Bonnet (DAM’s deputy), and Henri Coleau (head of the BRIS). The information, very brief and of a purely technical nature, did not consist of outlines or precise calculations. Nevertheless, it allowed Bonnet to declare immediately that the Carayol design, proposed unsuccessfully as early as April 1967, could be labeled as correct.23 Had this outline not already been in existence, we would have had a difficult time understanding the information and might have suspected an attempt to mislead us. In fact, this was a reciprocal validation: Carayol’s sketch authenticated the seriousness of the source, while the latter confirmed the value of Carayol’s ideas. Without realizing it, as very few were aware of Carayol’s discovery (and surely not Cook), he had given us a big tip and unexpected assistance, as this information also freed us from the ministerial harassment to which we had been constantly subjected. From that moment, things moved briskly.
Encyclopedia Britannica:
Physicist Michel Carayol laid out what would be the fundamental idea of radiation implosion in an April 1967 paper, but neither he nor his colleagues were immediately convinced that it was the solution, and the search continued.

In late September 1967, Carayol’s ideas were validated by an unlikely source, William Cook, who had overseen the British thermonuclear program in the mid-1950s. Cook, no doubt at his government’s behest, verbally passed on the crucial information to the French embassy’s military attaché in London. Presumably, the British provided this information for political reasons. British Prime Minister Harold Wilson was lobbying for the entry of the United Kingdom into the Common Market (European Economic Community), which was being blocked by de Gaulle.

Sakharov sketch:

Note Added: Perhaps someone can translate part of this paper, which gives some details about the Chinese thermonuclear step, credit to Yu Min. Did they invent a mechanism different from Ulam-Teller? I can't tell from this paper, but I suspect the initial Chinese design used U-T. There are claims that Yu Min later developed, in the pursuit of miniaturization and improved safety, a qualitatively different design.

Yu Min was a student of Peng Huanwu (also a key figure in the bomb effort), who was a student of Max Born. Yu Min only recently passed, in early 2019!

Friday, July 26, 2019

RadioLab on embryo selection in IVF

I'm in this RadioLab podcast covering genetic selection of embryos in IVF. Apologies to SSGAC, Robert Plomin, Ian Deary, James Lee, Tom Bouchard, and countless other dedicated scientists for the impression given that progress in genomics of cognitive ability is largely my work. See last paragraph below.

This is the email I sent to RadioLab this morning:
Hi Pat and Michelle,

Congratulations on a high quality podcast. I thought you were admirably fair and balanced. I also thought the production (esp. the music) was excellent.

My main comment is that the juxtaposition between my remarks and Benjamin's is misleading: when he says 60-40 or 55% chance of rank ordering properly, that is a very different question than identifying an outlier who is, say, in among the 1% highest risk. We are not trying to rank order embryos, but to warn against unusual risk of a medical condition.

To use the SAT analogy, given two kids with scores 1250 and 1200, only some of the time does the 1250 kid end up with a higher GPA. (You can't predict rank order very well.) But if the engineering dean admits an SAT 770 kid (i.e., a negative outlier compared to the average score of, say, 1300 among engineers) in his freshman class, he knows the likelihood is high that the kid will struggle. Benjamin is talking about the first scenario, and I am talking about the second.

Finally, I realize that to hook listeners you had to make me the focus of the episode. But I want to make clear that many scientists contribute to this work, which I feel will ultimately be beneficial to our species and civilization. I am just a small part of a worldwide research endeavor.

Best wishes,
For more on recent progress in genomic prediction, see The Diffusion of Knowledge.

Thursday, July 25, 2019

Manifold #15: Daniel Max of The New Yorker on Prion diseases and literary non-fiction

Daniel Max, staff writer at The New Yorker and author of Every Love Story is A Ghost Story, a biography of David Foster Wallace, speaks with Corey and Steve about his first book, The Family that Couldn’t Sleep. The discussion covers the emerging genre of literary non-fiction, Daniel’s process of writing The Family that Couldn’t Sleep, and how he approached and gained the trust of the family at the heart of the story. Corey probes Daniel about how he handled the complex scientific characters, Carl Gajdusek and Stanley Prusiner, who led research into prion disease for 40 years. Daniel recounts how Shirley Glasse (now Lindenbaum) discovered how prions were transmitted through ritual cannibalism in Papua New, a critical step in solving the mystery of what causes of the disease, but how credit was given to Gajdusek. The three discuss the painfully slow pace of research and the inspiring story of a young couple, Eric Minikel and Sonia Vallabh, who have changed careers to dedicate their lives to finding a cure.

Max’s New Yorker Page

Max’s initial 2001 article for the New York Times Magazine on the Italian Family with FFI

Max’s 2013 New Yorker story on Minikel and Vallabh


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Wednesday, July 24, 2019

Dominic Cummings "de facto chief executive" for UK Prime Minister Boris Johnson

Dominic Cummings sporting an OpenAI shirt. Great messaging! Go Dom :-)

Why Dominic Cummings is Johnson’s most important appointment (Spectator)

The closest analogy to the government Boris Johnson is forming is Blair’s and Brown’s New Labour government of 1997, when they appointed super powerful political advisers – Campbell, Powell, Balls, Whelan – to boss conservative Whitehall.

That is what Johnson is doing – in spades – by making former Vote Leave campaign chief Dominic Cummings his de facto chief executive as senior advisor, because Cummings is NEVER a passive adviser. Cummings has an extraordinary sense of purpose and objectives – and pity those who get in his path.

Cummings’s mandate is to deliver Brexit in 99 days, and in his spare time he’ll endeavour to reform Whitehall, since one of his obsessions is that the civil service is unfit for modern government. Sir Humphrey will be anxious, but so too will ministers and many Tory MPs, including Brexiters, who still nurse bruises from their encounters with him when he ran Vote Leave and earlier when he was an adviser to Michael Gove.

As proof that Johnson is placing serious trust in Cummings is that so many of Cummings’s Vote Leave team are moving in to Downing Street: Lee Cain as director of communications, Rob Oxley as press secretary and Oliver Lewis as a Brexit policy adviser.

Saturday, July 20, 2019

The diffusion of knowledge

Szilard and Wigner told Einstein about their recent calculations... how the fission process might create chain reactions and nuclear bombs. "Daran habe ich gar nicht gedacht," said Einstein -- I did not think about that at all!
In the past two weeks I gave talks at ISIR2019 (Minneapolis), the Institute of Biomedical Sciences (Academia Sinica, Taipei -- home of the Taiwan biobank), Innovative Genomics Institute (IGI = CRISPR central, UC Berkeley and UCSF) and at OpenAI (AGI in San Francisco).
Title: Genomic Prediction of Complex Traits and Disease Risks via AI/ML and Large Genomic Datasets

Abstract: The talk is divided into two parts. The first gives an overview of the rapidly advancing area of genomic prediction of disease risks using polygenic scores. We can now identify risk outliers (e.g., with 5 or 10 times normal risk) for about 20 common disease conditions, ranging from diabetes to heart diseases to breast cancer, using inexpensive SNP genotypes (i.e., as offered by 23andMe). We can also predict some complex quantitative traits (e.g., adult height with accuracy of few cm, using ~20k SNPs). I discuss application of these results in precision medicine as well as embryo selection in IVF, and give some details about genetic architectures. The second part covers the AI/ML used to build these predictors, with an emphasis on "sparse learning" and phase transitions in high dimensional statistics.
Slides for the first part of the talk.

I also appeared on Dilbert creator Scott Adam's show.

Wednesday, July 17, 2019

Beijing 2019 Notes -- addendum

I just came across this beautiful video with 4k drone footage of Guangzhou, part of the Guangdong-Hong Kong-Macau Greater Bay Area in the Pearl River delta region.

In my earlier post on Beijing I emphasized the issue of scale in China -- massive scale that is evident in the video above.

I traveled in SE Asia before the 1997 currency / economic crisis. At that time there was plenty of evidence of a bubble in those countries -- unused infrastructure and real estate built on spec, few signs of real technological or productive capability, etc. China had aspects of that 10 years ago, but now it's apparent that earlier infrastructure investment is being put to good use.

As I walked around Beijing I strained to find things around me -- buildings, solar panels, batteries, cars, high speed trains, electronics, software infrastructure, even airplanes -- that couldn't be sourced in China. Other than a few specific tech stacks that will get serious attention in coming years (e.g., CPUs) I was not able to think of many areas in which China has not caught up technologically. See Can the US derail China 2025?

PS I'm back in the US now. Will be giving a talk today at IGI in Berkeley and at OpenAI on Thursday.

Thursday, July 11, 2019

Manifold Episode #14: Stuart Firestein on Why Ignorance and Failure Lead to Scientific Progress

Steve and Corey speak with Stuart Firestein (Professor of Neuroscience at Columbia University, specializing in the olfactory system) about his two books Ignorance: How It Drives Science, and Failure: Why Science Is So Successful. Stuart explains why he thinks that it is a mistake to believe that scientists make discoveries by following the “scientific method” and what he sees as the real relationship between science and art. We discuss Stuart’s recent research showing that current models of olfactory processing are wrong, while Steve delves into the puzzling infinities in calculations that led to the development of quantum electrodynamics. Stuart also makes the case that the theory of intelligent design is more intelligent than most scientists give it credit for and that it would be wise to teach it in science classes.

Stuart Firestein

Failure: Why Science Is so Successful

Ignorance: How it drives science


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Wednesday, July 03, 2019

Beijing 2019 Notes

I'm at Beijing University in Zhongguancun. Some brief notes and photos below.

I had meetings with Beida professors, prominent tech entrepreneurs and VCs, policy analysts, IVF doctors and genetic scientists. I also had conversations with ordinary people -- drivers, maids, hotel and service staff.

I've been traveling to Beijing for about 15 years now and have observed significant improvements in infrastructure, general economic level, civil society, general behavior. This would of course be obvious to people living in China, which presumably explains the confidence people here have in their government and in continued advances in development. The hypothesis that this society is "brittle" or vulnerable to shocks seems unsupported.

The main thing to comprehend about China is scale. There are easily ~350M (i.e., population of US) people here living roughly first world lives: with access to education, good jobs, climate controlled apartment in major city, good public transportation, fast internet access, etc. Probably the number is twice as large depending on how one defines the category. For one thing, this means that the supply of engineers, technologists, lab scientists, project managers, entrepreneurs, etc. is very large. There are certainly poor people who lack opportunity, but the size of the population for which the education and economic system are working reasonably well is very large. Possibly a billion people out of ~1.4B.

Beijing is a microcosm of this phenomenon of scale. It's a huge city (over 20M people) with the kind of modern metro system only to be found in places like Tokyo or perhaps Seoul or Paris or London. One can ride the longer lines for 90 minutes without exiting, covering the entire extent of the city from one side to the other. Despite the public transport system, the roads are clogged with recent model cars, producing traffic conditions reminiscent of Los Angeles. I don't find the city as a whole all that livable -- it's too enormous for me -- but locals know all the many charming locations (see photos below). Beijing is reaching a level of development that reminds me of Tokyo.

Trump, the trade war, and US-China relations came up frequently in discussion. Chinese opinion tends to focus on the long term. Our driver for a day trip to the Great Wall was an older man from the countryside, who has lived only 3 years in Beijing. I was surprised to hear him expressing a very balanced opinion about the situation. He understood Trump's position remarkably well -- China has done very well trading with the US, and owes much of its technological and scientific development to the West. A recalibration is in order, and it is natural for Trump to negotiate in the interest of US workers.

China's economy is less and less export-dependent, and domestic drivers of growth seem easy to identify. For example, there is still a lot of low-hanging fruit in the form of "catch up growth" -- but now this means not just catching up with the outside developed world, but Tier 2 and Tier 3 cities catching up with Tier 1 cities like Beijing, Shanghai, Shenzhen, etc.

China watchers have noted the rapidly increasing government and private sector debt necessary to drive growth here. Perhaps this portends a future crisis. However, I didn't get any sense of impending doom for the Chinese economy. To be fair there was very little inkling of what would happen to the US economy in 2007-8.  Some of the people I met with are highly placed with special knowledge -- they are among the most likely to be aware of problems. Overall I had the impression of normalcy and quiet confidence, but perhaps this would have been different in an export/manufacturing hub like Shenzhen. [ Update: Today after posting this I did hear something about economic concerns... So situation is unclear. ]

Innovation is everywhere here. Perhaps the most obvious is the high level of convenience from the use of e-payment and delivery services. You can pay for everything using your mobile (increasingly, using just your face!), and you can have food and other items (think Amazon on steroids) delivered quickly to your apartment. Even museum admissions can be handled via QR code.

A highly placed technologist told me that in fields like AI or computer science, Chinese researchers and engineers have access to in-depth local discussions of important arXiv papers -- think StackOverflow in Mandarin. Since most researchers here can read English, they have access both to Western advances, and a Chinese language reservoir of knowledge and analysis. He anticipates that eventually the pace and depth of engineering implementation here will be unequaled.

IVF and genetic testing are huge businesses in China. Perhaps I'll comment more on this in the future. New technologies, in genomics as in other areas, tend to be received more positively here than in the US and Europe.

National Museum

Bookstore and Cafe on the grounds of the National Art Museum.

Tiananmen Square (see below for historical note)

An email sent to Julian Assange's attorney, whom I met at CogX in London:
Hi Jen,

I really enjoyed your Q&A today. Keep fighting the good fight.

Wikileaks diplomatic cables reveal no mass shootings in Tiananmen Square:

Our media has been misrepresenting this historical event for 30 years
now. There was certainly violence, but not in the square itself.

Best wishes,
See comments for further discussion...

Note Added: In the comments AG points to a Quora post by a user called Janus Dongye Qimeng, an AI researcher in Cambridge UK, who seems to be a real China expert. I found these posts to be very interesting.

Infrastructure development in poor regions of China

Size of Chinese internet social network platforms

Can the US derail China 2025? (Core technology stacks in and outside China)

Huawei smartphone technology stack and impact of US entity list interdiction (software and hardware!)

Agriculture at Massive Scale

US-China AI competition

More recommenations: Bruno Maçães is one of my favorite modern geopolitical thinkers. A Straussian of sorts (PhD under Harvey Mansfield at Harvard), he was Secretary of State for European Affairs in Portugal, and has thought deeply about the future of Eurasia and of US-China relations. He spent the last year in Beijing and I was eager to meet with him while here. His recent essay Equilibrium Americanum appeared in the Berlin Policy Journal. Podcast interview -- we hope to have him on Manifold soon :-)

Thursday, June 27, 2019

Manifold Podcast #13: Joe Cesario on Political Bias and Problematic Research Methods in Social Psychology

Corey and Steve continue their discussion with Joe Cesario and examine methodological biases in the design and conduct of experiments in social psychology and ideological bias in the interpretation of the findings. Joe argues that experiments in his field are designed to be simple, but that in making experimental set ups simple researchers remove critical factors that actually matter for a police officer making a decision in the real world. In consequence, he argues that the results cannot be taken to show anything about actual police behavior. Joe maintains that social psychology as a whole is biased toward the left politically and that this affects how courses are taught and research conducted. Steve points out the university faculty on the whole tend to be shifted left relative to the general population. Joe, Corey, and Steve discuss the current ideological situation on campus and how it can be alienating for students from conservative backgrounds.

Joseph Cesario's Lab


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Monday, June 24, 2019

Ulam on von Neumann, Godel, and Einstein

Ulam expresses so much in a few sentences! From his memoir, Adventures of a Mathematician. Above: Einstein and Godel. Bottom: von Neumann, Feynman, Ulam.
When it came to other scientists, the person for whom he [vN] had a deep admiration was Kurt Gödel. This was mingled with a feeling of disappointment at not having himself thought of "undecidability." For years Gödel was not a professor at Princeton, merely a visiting fellow, I think it was called. Apparently there was someone on the faculty who was against him and managed to prevent his promotion to a professorship. Johnny would say to me, "How can any of us be called professor when Gödel is not?" ...

As for Gödel, he valued Johnny very highly and was much interested in his views. I believe knowing the importance of his own discovery did not prevent Gödel from a gnawing uncertainty that maybe all he had discovered was another paradox à la Burali Forte or Russell. But it is much, much more. It is a revolutionary discovery which changed both the philosophical and the technical aspects of mathematics.

When we talked about Einstein, Johnny would express the usual admiration for his epochal discoveries which had come to him so effortlessly, for the improbable luck of his formulations, and for his four papers on relativity, on the Brownian motion, and on the photo-electric quantum effect. How implausible it is that the velocity of light should be the same emanating from a moving object, whether it is coming toward you or whether it is receding. But his admiration seemed mixed with some reservations, as if he thought, "Well, here he is, so very great," yet knowing his limitations. He was surprised at Einstein's attitude in his debates with Niels Bohr—at his qualms about quantum theory in general. My own feeling has always been that the last word has not been said and that a new "super quantum theory" might reconcile the different premises.

Saturday, June 22, 2019

Silicon Oligarchs: Winner Take All?

Joel Kotkin is a Presidential Fellow in Urban Futures at Chapman University and Executive Director for the Center for Opportunity Urbanism.
What Do the Oligarchs Have in Mind for Us?

...This tiny sliver of humanity, with their relatively small cadre of financiers, engineers, data scientists, and marketers, now control the exploitation of our personal data, what Alibaba founder, Jack Ma calls the “electricity of the 21st century.” Their “super platforms,” as one analyst noted, “now operate as “digital gatekeepers” lording over “e-monopsonies” that control enormous parts of the economy. Their growing power, notes a recent World Bank Study, is built on “natural monopolies” that adhere to web-based business, and have served to further widen class divides not only in the United States but around the world.

The rulers of the Valley and its Puget Sound doppelganger now account for eight of the 20 wealthiest people on the planet. Seventy percent of the 56 billionaires under 40 live in the state of California, with 12 in San Francisco alone. In 2017, the tech industry, mostly in California, produced 11 new billionaires. The Bay Area has more billionaires on the Forbes 400 list than any metro region other than New York and more millionaires per capita than any other large metropolis.

For an industry once known for competition, the level of concentration is remarkable. Google controls nearly 90 percent of search advertising, Facebook almost 80 percent of mobile social traffic, and Amazon about 75 percent of US e-book sales, and, perhaps most importantly, nearly 40 percent of the world’s “cloud business.” Together, Google and Apple control more than 95 percent of operating software for mobile devices, while Microsoft still accounts for more than 80 percent of the software that runs personal computers around the world.

The wealth generated by these near-monopolies funds the tech oligarchy’s drive to monopolize existing industries such as entertainment, education, and retail, as well as those of the future, such as autonomous cars, drones, space exploration, and most critically, artificial intelligence. Unless checked, they will have accumulated the power to bring about what could best be seen as a “post-human” future, in which society is dominated by artificial intelligence and those who control it.

What Do the Oligarchs Want?

The oligarchs are creating a “a scientific caste system,” not dissimilar to that outlined in Aldous Huxley’s dystopian 1932 novel, Brave New World. Unlike the former masters of the industrial age, they have little use for the labor of middle- and working-class people—they need only their data. Virtually all their human resource emphasis relies on cultivating and retaining a relative handful of tech-savvy operators. “Software,” Bill Gates told Forbes in 2005, “is an IQ business. Microsoft must win the IQ war, or we won’t have a future.”

Perhaps the best insight into the mentality of the tech oligarchy comes from an admirer, researcher Greg Ferenstein, who interviewed 147 digital company founders. The emerging tech world has little place for upward mobility, he found, except for those in the charmed circle at the top of the tech infrastructure; the middle and working classes become, as in feudal times, increasingly marginal.

This reflects their perception of how society will evolve. Ferenstein notes that most oligarchs believe “an increasingly greater share of economic wealth will be generated by a smaller slice of very talented or original people. Everyone else will increasingly subsist on some combination of part-time entrepreneurial ‘gig work’ and government aid.” Such part-time work has been growing rapidly, accounting for roughly 20 percent of the workforce in the US and Europe, and is expected to grow substantially, adds McKinsey. ...

Thursday, June 20, 2019

CRISPR babies: when will the world be ready? (Nature)

This Nature News article gives a nice overview of the current status of CRISPR technology and its potential application in human reproduction. As we discussed in this bioethics conversation (Manifold Podcast #9 with philosopher Sam Kerstein of the University of Maryland), it is somewhat challenging to come up with examples where gene editing is favored over embryo selection (a well-established technology) for avoidance of a disease-linked mutation.
Nature: ... He found out about a process called preimplantation genetic diagnosis or PGD. By conceiving through in vitro fertilization (IVF) and screening the embryos, Carroll and his wife could all but eliminate the chance of passing on the mutation. They decided to give it a shot, and had twins free of the Huntington’s mutation in 2006.

Now Carroll is a researcher at Western Washington University in Bellingham, where he uses another technique that might help couples in his position: CRISPR gene editing. He has been using the powerful tool to tweak expression of the gene responsible for Huntington’s disease in mouse cells. Because it is caused by a single gene and is so devastating, Huntington’s is sometimes held up as an example of a condition in which gene editing a human embryo — controversial because it would cause changes that would be inherited by future generations — could be really powerful. But the prospect of using CRISPR to alter the gene in human embryos still worries Carroll. “That’s a big red line,” he says. “I get that people want to go over it — I do, too. But we have to be super humble about this stuff.” There could be many unintended consequences, both for the health of individuals and for society. It would take decades of research, he says, before the technology could be used safely.

Thursday, June 13, 2019

Manifold Episode #12: James Cham on Venture Capital, Risk Taking, and the Future Impacts of AI

Manifold Show Page    YouTube Channel

James Cham is a partner at Bloomberg Beta, a venture capital firm focused on the future of work. James invests in companies applying machine intelligence to businesses and society. Prior to Bloomberg Beta, James was a Principal at Trinity Ventures and a VP at Bessemer Venture Partners. He was educated in computer science at Harvard and at the MIT Sloan School of Business.

James Cham

Bloomberg Beta

man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Validation of Polygenic Risk Scores for Coronary Artery Disease in French Canadians

This study reports a validation of Polygenic Risk Scores for Coronary Artery Disease in a French Canadian population. Outliers in PRS are much more likely to have CAD than typical individuals.

In our replication tests of a variety of traits (both disease risks and quantitative traits) using European ancestry validation datasets, there is strong consistency in performance of the predictors. (See AUC consistency below.) This suggests that the genomic predictors are robust to differences in environmental conditions and also moderate differences in ethnicity (i.e., within the European population). The results are not brittle, and I believe that widespread clinical applications are coming very soon.

Validation of Genome-wide Polygenic Risk Scores for Coronary Artery Disease in French Canadians

Florian Wünnemann , Ken Sin Lo , Alexandra Langford-Alevar , David Busseuil , Marie-Pierre Dubé , Jean-Claude Tardif , and Guillaume Lettre

Genomic and Precision Medicine

Background: Coronary artery disease (CAD) represents one of the leading causes of morbidity and mortality worldwide. Given the healthcare risks and societal impacts associated with CAD, their clinical management would benefit from improved prevention and prediction tools. Polygenic risk scores (PRS) based on an individual's genome sequence are emerging as potentially powerful biomarkers to predict the risk to develop CAD. Two recently derived genome-wide PRS have shown high specificity and sensitivity to identify CAD cases in European-ancestry participants from the UK Biobank. However, validation of the PRS predictive power and transferability in other populations is now required to support their clinical utility.

Methods: We calculated both PRS (GPSCAD and metaGRSCAD) in French-Canadian individuals from three cohorts totaling 3639 prevalent CAD cases and 7382 controls, and tested their power to predict prevalent, incident and recurrent CAD. We also estimated the impact of the founder French-Canadian familial hypercholesterolemia deletion (LDLR delta > 15kb deletion) on CAD risk in one of these cohorts and used this estimate to calibrate the impact of the PRS.

Results: Our results confirm the ability of both PRS to predict prevalent CAD comparable to the original reports (area under the curve (AUC)=0.72-0.89). Furthermore, the PRS identified about 6-7% of individuals at CAD risk similar to carriers of the LDLR delta > 15kb mutation, consistent with previous estimates. However, the PRS did not perform as well in predicting incident or recurrent CAD (AUC=0.56-0.60), maybe due to confounding because 76% of the participants were on statin treatment. This result suggests that additional work is warranted to better understand how ascertainment biases and study design impact PRS for CAD.

Conclusions: Collectively, our results confirm that novel, genome-wide PRS are able to predict CAD in French-Canadians; with further improvements, this is likely to pave the way towards more targeted strategies to predict and prevent CAD-related adverse events.
American Heart Association hails potential of PRS:
"PRSs, built using very large data sets of people with and without heart disease, look for genetic changes in the DNA that influence disease risk, whereas individual genes might have only a small effect on disease predisposition," said Guillaume Lettre, Ph.D., lead author of the study and an associate professor at the Montreal Heart Institute and Université de Montréal in Montreal, Quebec, Canada. "The PRS is like having a snapshot of the whole genetic variation found in one's DNA and can more powerfully predict one's disease risk. Using the score, we can better understand whether someone is at higher or lower risk to develop a heart problem."

Early prediction would benefit prevention, optimal management and treatment strategies for heart disease. Because PRSs are simple and relatively inexpensive, their implementation in the clinical setting holds great promises. For heart disease, early detection could lead to simple yet effective therapeutic interventions such as the use of statins, aspirin or other medications.

... The American Heart Association named the use of polygenic risk scores as one of the biggest advances in heart disease and stroke research in 2018.

Sadly, reaction to these breakthroughs in human genomics will follow the usual pattern:
It's Wrong! Genomes are too complex to decipher, GWAS is a failure, precision medicine is all hype, biology is so ineffably beautiful and incomprehensible, Hey, whaddaya, you're a physicist! ...

It's Trivial! I knew it all along. Of course, everything is heritable to some degree. Well, if you just get enough data...

I did it First! (Please cite my paper...)

Sunday, June 09, 2019

L1 vs Deep Learning in Genomic Prediction

The paper below by some of my MSU colleagues examines the performance of a number of ML algorithms, both linear and nonlinear, including deep neural nets, in genomic prediction across several different species.

When I give talks about prediction of disease risks and complex traits in humans, I am often asked why we are not using fancy (trendy?) methods such as Deep Learning (DL). Instead, we focus on L1 penalization methods ("sparse learning") because 1. the theoretical framework (including theorems providing performance guarantees) is well-developed, and (relatedly) 2. the L1 methods perform as well or better than other methods in our own testing.

The term theoretical framework may seem unusual in ML, which is at the moment largely an empirical subject. Experience in theoretical physics shows that when powerful mathematical results are available, they can be very useful to guide investigation. In the case of sparse learning we can make specific estimates for how much data is required to "solve" a trait -- i.e., capture most of the estimated heritability in the predictor. Five years ago we predicted a threshold of a few hundred thousand genomes for height, and this turned out to be correct. Currently, this kind of performance characterization is not possible for DL or other methods.

What is especially powerful about deep neural nets is that they yield a quasi-convex (or at least reasonably efficient) optimization procedure which can learn high dimensional functions. The class of models is both tractable from a learning/optimization perspective, but also highly expressive. As I wrote here in my ICML notes (see also Elad's work which relates DL to Sparse Learning):
It may turn out that the problems on which DL works well are precisely those in which the training data (and underlying generative processes) have a hierarchical structure which is sparse, level by level. Layered networks perform a kind of coarse graining (renormalization group flow): first layers filter by feature, subsequent layers by combinations of features, etc. But the whole thing can be understood as products of sparse filters, and the performance under training is described by sparse performance guarantees (ReLU = thresholded penalization?).
However, currently in genomic prediction one typically finds that nonlinear interactions are small, which means features more complicated than single SNPs are unnecessary. (In a recent post I discussed a new T1D predictor that makes use of nonlinear haplotype interaction effects, but even there the effects are not large.) Eventually I expect this situation to change -- when we have enough whole genomes to work with, a DL approach which can (automatically) identify important features (motifs?) may allow us to go beyond SNPs and simple linear models.

Note, though, that from an information theoretic perspective (see, e.g., any performance theorems in compressed sensing) it is obvious that we will need much more data than we currently have to advance this program. Also, note that Visscher et al.'s recent GCTA work suggests that additive SNP models using rare variants (i.e., extracted from whole genome data), can account for nearly all the expected heritability for height. This implies that the power of nonlinear methods like DL may not yield qualitatively better results than simpler L1 approaches, even in the limit of very large whole genome datasets.
Benchmarking algorithms for genomic prediction of complex traits

Christina B. Azodi, Andrew McCarren, Mark Roantree, Gustavo de los Campos, Shin-Han Shiu

The usefulness of Genomic Prediction (GP) in crop and livestock breeding programs has led to efforts to develop new and improved GP approaches including non-linear algorithm, such as artificial neural networks (ANN) (i.e. deep learning) and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of GP datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and five non-linear algorithms, including ANNs. First, we found that hyperparameter selection was critical for all non-linear algorithms and that feature selection prior to model training was necessary for ANNs when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple GP algorithms (i.e. ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits than that of linear algorithms. Although ANNs did not perform best for any trait, we identified strategies (i.e. feature selection, seeded starting weights) that boosted their performance near the level of other algorithms. These results, together with the fact that even small improvements in GP performance could accumulate into large genetic gains over the course of a breeding program, highlights the importance of algorithm selection for the prediction of trait values.

Saturday, June 08, 2019

London: CogX, Founders Forum, Healthtech

I'm in London again to give the talk below and attend some meetings, including Founders Forum and their Healthtech event the day before.
CogX: The Festival of AI and Emerging Technology
King's Cross, London, N1C 4BH

When Machine Learning Met Genetic Engineering

3:30 pm Tuesday June 11 Cutting Edge stage


Stephen Hsu
Senior Vice-President for Research and Innovation
Michigan State University

Helen O’Neill
Lecturer in Reproductive and Molecular Genetics

Martin Varsavsky
Executive Chairman
Prelude Fertility

Azeem Azhar (moderator)
Exponential View

Regent's Canal, Camden Town near King's Cross.

CogX speakers reception, Sunday evening:


Commanding heights of global capital:

Sunset, Camden locks:

Sunday, June 02, 2019

Genomic Prediction: Polygenic Risk Score for Type 1 Diabetes

In an earlier post I collected links related to recent progress in Polygenic Risk Scores (PRS) and health care applications. The paper below describes a new (published in 2019) predictor for Type 1 Diabetes (T1D) that achieves impressive accuracy (AUC > 0.9) using 67 SNPs. It incorporates model features such as nonlinear interactions between haplotypes.

T1D is highly heritable and tends to manifest at an early age. One application of this predictor is to differentiate between T1D and the more common (in later life) T2D. Another application is to embryo screening. Genomic Prediction has independently validated this predictor on sibling data and may implement it in their embryo biopsy pipeline, which includes tests for aneuploidy, single gene mutations, and polygenic risk.
Development and Standardization of an Improved Type 1 Diabetes Genetic Risk Score for Use in Newborn Screening and Incident Diagnosis

Sharp, et al.
Diabetes Care 2019;42:200–207 |

Previously generated genetic risk scores (GRSs) for type 1 diabetes (T1D) have not captured all known information at non-HLA loci or, particularly, at HLA risk loci. We aimed to more completely incorporate HLA alleles, their interactions, and recently discovered non-HLA loci into an improved T1D GRS (termed the “T1D GRS2”) to better discriminate diabetes subtypes and to predict T1D in newborn screening studies.

In 6,481 case and 9,247 control subjects from the Type 1 Diabetes Genetics Consortium, we analyzed variants associated with T1D both in the HLA region and across the genome. We modeled interactions between variants marking strongly associated HLA haplotypes and generated odds ratios to create the improved GRS, the T1D GRS2. We validated our findings in UK Biobank. We assessed the impact of the T1D GRS2 in newborn screening and diabetes classification and sought to provide a framework for comparison with previous scores.

The T1D GRS2 used 67 single nucleotide polymorphisms (SNPs) and accounted for interactions between 18 HLA DR-DQ haplotype combinations. The T1D GRS2 was highly discriminative for all T1D (area under the curve [AUC] 0.92; P < 0.0001 vs. older scores) and even more discriminative for early-onset T1D (AUC 0.96). In simulated newborn screening, the T1D GRS2 was nearly twice as efficient as HLA genotyping alone and 50% better than current genetic scores in general population T1D prediction.

An improved T1D GRS, the T1D GRS2, is highly useful for classifying adult incident diabetes type and improving newborn screening. Given the cost-effectiveness of SNP genotyping, this approach has great clinical and research potential in T1D.
The figure below gives some idea as to the ability of the new predictor GRS2 (panels B and D) to differentiate cases vs controls, and T1D vs T2D.

Blog Archive