Thursday, April 08, 2021

Freedom of Speech and Intellectual Diversity on Campus (MSU virtual conference)

The LeFrak Forum On Science, Reason, and Modern Democracy 
Department of Political Science 
Michigan State University 

Register here!

Thursday, April 8 -- Saturday, April 10; on ZOOM 
Conference Program: 
Keynote Address - Thursday, April 8, 
5:00-6:30pm EST 
Randall Kennedy, "The Race Question and Freedom of Expression." 
Randall Kennedy is the Michael R. Klein Professor at Harvard Law School, preeminent authority on the First Amendment in its relation to the American struggle for civil rights.


Day One: Intellectual Diversity - Friday, April 9  
11:30am - 1:00pm EST 
Panel 1: What are the empirical facts about lack of intellectual diversity in academia and what are the causes of existing imbalances? 
Paper: Lee Jussim, Distinguished Professor and Chair, Department of Psychology, Rutgers University, author of The Politics of Social Psychology. 
Discussant: Philip Tetlock, Annenberg University Professor, University of Pennsylvania, author of “Why so few conservatives and should we care?” and Cory Clark, Visiting Scholar, Department of Psychology, University of Pennsylvania, author of “Partisan Bias and its Discontents.” 
2:00pm - 3:30pm EST 
Panel 2: In what precise ways and to what degree is this imbalance a problem? 
Paper: Joshua Dunn, Professor and Chair, Department of Political Science, University of Colorado, co-author of Passing on the Right: Conservative Professors in the Progressive University. 
Discussant: Amna Khalid, Associate Professor of History, Carleton College, author of “Not A Vast Right-Wing Conspiracy: Why Left-Leaning Faculty Should Care About Threats to Free Expression on Campus." 
4:00pm - 5:45pm EST 
Panel 3: What is To Be Done? 
Paper: Musa Al-Gharbi, Paul F. Lazarsfeld Fellow in Sociology, Columbia University and Managing Editor, Heterodox Academy, author of “Why Care About Ideological Diversity in Social Research? The Definitive Response.” 
Paper: Conor Friedersdorf, Staff writer at The Atlantic and frequent contributor to its special series “The Speech Wars,” author of “Free Speech Will Survive This Moment.”


Day Two: Freedom of Speech - Saturday, April 10 
11:30am - 1:00pm EST 
Panel 1: An empirical accounting of the recent challenges to free speech on campus from left and right. What is the true character of the problem or problems here and do they constitute a “crisis”? 
Paper: Jonathan Marks, Professor and Chair, Department of Politics and International Relations, Ursinus College, author of Let's Be Reasonable: A Conservative Case for Liberal Education. 
Respondent: April Kelly-Woessner, Dean of the School of Public Service and Professor of Political Science at Elizabethtown College, author of The Still Divided Academy 
2:00pm - 3:45pm EST 
Panel 2: But is Free speech, as traditionally interpreted, even the right ideal? -- a Debate 
Ulrich Baer, University Professor of Comparative Literature, German, and English, NYU, author of What Snowflakes Get Right: Free Speech and Truth on Campus 
Keith Whittington, Professor of Politics, Princeton University, author of Speak Freely: Why Universities Must Defend Free Speech. 
4:30pm - 6:15pm EST  
Panel 3: What is To Be Done? 
Paper: Nancy Costello, Associate Clinical Professor of Law, MSU. Founder and Director of the First Amendment Law Clinic -- the only law clinic in the nation devoted to the defense of student press rights. Also, Director of the Free Expression Online Library and Resource Center. 
Paper: Jonathan Friedman, Project Director for campus free speech at PEN America – “a program of advocacy, analysis, and outreach in the national debate around free speech and inclusion at colleges and universities.”

Monday, April 05, 2021

Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank

These new results arose from initial investigations of blood biomarker predictions from DNA. The lipoprotein A predictor we built correlates almost 0.8 with the measured result, and this agreement would probably be even stronger if day to day fluctuations were averaged out. It is the most accurate genomic predictor for a complex trait that we are aware of.

We then became interested in the degree to which biomarkers alone could be used to predict disease risk. Some of the biomarker-based disease risk predictors we built (e.g., for kidney or liver problems) do not, as far as we know, have widely used clinical counterparts. Further research may show that predictors of this kind have broad utility. 

Statistical learning in a space of ~50 biomarkers is considered a "high dimensional" problem from the perspective of medical diagnosis, however compared to genomic prediction using a million SNP features, it is rather straightforward. 
Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank  
Erik Widen, Timothy G. Raben, Louis Lello, Stephen D.H. Hsu 
We use UK Biobank data to train predictors for 48 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, ... from SNP genotype. For example, our predictor correlates  ~ 0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information). Individuals who are at high risk (e.g., odds ratio of > 5x population average) can be identified for conditions such as coronary artery disease (AUC ~ 0.75, diabetes (AUC ~ 0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ~10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: (risk score | SNPs)) for common diseases to the risk predictors which result from the concatenation of learned functions (risk score | biomarkers) and (biomarker | SNPs).

Sunday, April 04, 2021

Inside Huawei, and Wuhan after the pandemic

The first three videos below are episodes of Japanese director Takeuchi Ryo's ongoing series on Huawei. 

Ryo lives in Nanjing and speaks fluent Mandarin. He became famous for his coverage of the lockdown and pandemic in Wuhan. The fourth video below tells the stories of 10 families: how they survived, and how their lives have changed.

The general consensus seems to be that Huawei is 2+ years ahead of other competitors in 5G technology, and has a very deep IP position (patent portfolio) as well. In AI applications my impression is that they are also strong, but not world leaders at the research frontier like Google Brain or DeepMind. Like most Chinese companies their strength is in practical deployment of systems at scale, not in publishing papers. In smartphones and laptops they compete head to head with Samsung, Apple, etc. in all areas, including chip design. Their HiSilicon subsidiary has designed Kirin CPUs that are on par with the best Qualcomm and Apple competitors used in flagship handsets. However, all three rely on TSMC to fabricate these designs.

Tuesday, March 30, 2021

Future of CRISPR (base & prime) and epigenome editing (Interview with Prof David R. Liu)


Excellent interview with David Liu of Harvard, which gives an overview of key innovations in gene editing since the discovery of CRISPR. 

Labs all around the world are busy building new tools and libraries for gene editing, with dramatic progress since CRISPR was first discovered less than 10 years ago.

Liu is optimistic about clinical applications over the next 10 years. He does not discuss germline editing (i.e., of embryos) but one can readily imagine how these advances in technology might be applied there.

Friday, March 26, 2021

John von Neumann, 1966 Documentary


This 1966 documentary on von Neumann was produced by the Mathematical Association of America. It includes interviews with Wigner, Ulam, Halmos, Goldstine, and others. 

At ~34m Bethe (leader of the Los Alamos theory division) gives primary credit to vN for the implosion method in fission bombs. While vN's previous work on shock waves and explosive lenses is often acknowledged as important for solving the implosion problem, this is the first time I have seen him given credit for the idea itself. Seth Neddermeyer's Enrico Fermi Award citation gives him credit for "invention of the implosion technique" and the original solid core design was referred to as the "Christy gadget" after Robert Christy. As usual, history is much more complicated than the simplified narrative that becomes conventional.
Teller: He could and did talk to my three-year-old son on his own terms and I sometimes wondered whether his relations to the rest of us were a little bit similar.
A recent application of vN's Quantum Ergodic Theorem: Macroscopic Superposition States in Isolated Quantum Systems.

Cloning vN (science fiction): short story, longer (AI vs genetic engineering).

Thursday, March 25, 2021

Meritocracy x 3

Three videos: 

1. Political philosopher Daniel Bell on PRC political meritocracy. 

2. Documentary on the 2020 Gao Kao: college entrance exam taken by ~11 million kids. 

3. Semiconductor Industry Association panel on PRC push to become self-sufficient in semiconductor technology. 

Sunday, March 21, 2021

The Contribution of Cognitive and Noncognitive Skills to Intergenerational Social Mobility (McGue et al. 2020)

If you have the slightest pretension to expertise concerning social mobility, meritocracy, inequality, genetics, psychology, economics, education, history, or any related subjects, I urge you to carefully study this paper.
The Contribution of Cognitive and Noncognitive Skills to Intergenerational Social Mobility  
(Psychological Science
Matt McGue, Emily A. Willoughby, Aldo Rustichini, Wendy Johnson, William G. Iacono, James J. Lee 
We investigated intergenerational educational and occupational mobility in a sample of 2,594 adult offspring and 2,530 of their parents. Participants completed assessments of general cognitive ability and five noncognitive factors related to social achievement; 88% were also genotyped, allowing computation of educational-attainment polygenic scores. Most offspring were socially mobile. Offspring who scored at least 1 standard deviation higher than their parents on both cognitive and noncognitive measures rarely moved down and frequently moved up. Polygenic scores were also associated with social mobility. Inheritance of a favorable subset of parent alleles was associated with moving up, and inheritance of an unfavorable subset was associated with moving down. Parents’ education did not moderate the association of offspring’s skill with mobility, suggesting that low-skilled offspring from advantaged homes were not protected from downward mobility. These data suggest that cognitive and noncognitive skills as well as genetic factors contribute to the reordering of social standing that takes place across generations.
From the paper:
We believe that a reasonable explanation of our findings is that the degree to which individuals are more or less skilled than their parents contributes to their upward or downward mobility. Behavioral genetic and genomic research has established the heritability of social achievements (Conley, 2016) as well as the skills thought to underlie them (Bouchard & McGue, 2003). Nonetheless, these associations may be due to passive gene–environment correlation, whereby high-achieving parents both transmit genes and provide a rearing environment that promotes their children’s social success (Scarr & McCartney, 1983). Our within-family design controlled for passive gene–environment correlation effects. Although offspring inherit all of their genes from their parents, they inherit a random subset of parental alleles because of meiotic segregation. Consequently, some offspring inherit a favorable subset of their parents’ alleles, whereas others inherit a less favorable subset. We found, as did previous researchers (Belsky et al., 2018), that the inheritance of a favorable subset of alleles was associated with an increased likelihood of upward mobility... 
...In summary, our analysis of intergenerational social mobility in a sample of 2,594 offspring from 1,321 families found that (a) most individuals were educationally and occupationally mobile, (b) mobility was predicted by offspring–parent differences in skills and genetic endowment, and (c) the relationship of offspring skills with social mobility did not vary significantly by parent social background. In an era in which there is legitimate concern over social stagnation, our findings are noteworthy in identifying the circumstances when parents’ educational and occupational success is not reproduced across generations.

See also Game Over: Genomic Prediction of Social Mobility (PNAS July 9, 2018: 201801238). Both papers provide out of sample validation of polygenic predictors for cognitive ability, specifically of the relationship to intergenerational social mobility.

Thursday, March 18, 2021

Council on Foreign Relations: The Rise and Fall of Great Powers? America, China, and the Global Order


Insights from Ray Dalio and Paul Kennedy (The Rise and Fall of the Great Powers, 1987) on the balance of power and future global order. I was in graduate school when Kennedy's book was first published and I still have the hardcover first edition somewhere. Dalio and Kennedy have both carefully studied historical examples and present, in my opinion, a realistic view of what is happening. Kennedy mentions the PRC naval build up as a very explicit, material comparison of strength, whereas Dalio focuses on financial and economic matters. Elizabeth Economy provides some interesting comments on internal Chinese politics, but I am unsure how much insight any US analysts can have into the fine details of this.

The Naval War College Review article mentioned by Paul Kennedy is: 

Related: PRC ASBM Test in South China Sea and links therein.

Panelists discuss the rise and fall of great powers and the competing grand strategies of the United States and China. 
Ray Dalio Founder, Co-chairman, and Co-chief Investment Officer, Bridgewater Associates, LP; Author, The Changing World Order: Why Nations Succeed and Fail 
CFR Member Elizabeth C. Economy Senior Fellow for China Studies, Council on Foreign Relations; Senior Fellow, Hoover Institution, Stanford University; Author, The Third Revolution: Xi Jinping and the New Chinese State; @LizEconomy 
Paul M. Kennedy J. Richardson Dilworth Professor of History and Director of International Security Studies, Yale University; Author, The Rise and Fall of the Great Powers

Bonus! Short WSJ piece on digital RMB rollout. SWIFT beware...


Wednesday, March 10, 2021

Academic Freedom Alliance

We live in an era of preference falsification. 

Vocal, dishonest, irrational activists have cowed all but the most courageous of the few remaining serious thinkers, even at our greatest universities. 

I hope the creation of the Academic Freedom Alliance will provide a much needed corrective to the dishonest reign of terror in place today.
Chronicle: When I spoke to the Princeton University legal scholar and political philosopher Robert P. George in August, he offered a vivid zoological metaphor to describe what happens when outrage mobs attack academics. When hunted by lions, herds of zebras “fly off in a million directions, and the targeted member is easily taken down and destroyed and eaten.” A herd of elephants, by contrast, will “circle around the vulnerable elephant.” 
... What had begun as a group of 20 Princeton professors organized to defend academic freedom at one college was rapidly scaling up its ambitions and capacity: It would become a nationwide organization. George had already hired an executive director and secured millions in funding. 
... Today, that organization, the Academic Freedom Alliance, formally issued a manifesto declaring that “an attack on academic freedom anywhere is an attack on academic freedom everywhere,” and committing its nearly 200 members to providing aid and support in defense of “freedom of thought and expression in their work as researchers and writers or in their lives as citizens,” “freedom to design courses and conduct classes using reasonable pedagogical judgment,” and “freedom from ideological tests, affirmations, and oaths.” 
... All members of the alliance have an automatic right for requests for legal aid to be considered, but the organization is also open to considering the cases of faculty nonmembers, university staff, or even students on a case-by-case basis. The alliance’s legal-advisory committee includes well-known lawyers such as Floyd Abrams and the prolific U.S. Supreme Court litigator Lisa S. Blatt. 
When I spoke to him in February, as the date of AFA’s public announcement drew closer, George expressed surprise and satisfaction at the success the organization had found in signing up liberals and progressives. “If anything we’ve gone too far — we’re imbalanced over to the left side of the agenda,” he noted wryly. “That’s because our yield was a little higher than we expected it to be when we got in touch with folks.” 
The yield was higher, as George would learn, quoting one such progressive member, because progressives in academe often feel themselves to be even more closely monitored for ideological orthodoxy by students and activist colleagues than their conservative peers. “‘You conservative guys, people like you and Adrian Vermeule, you think you’re vulnerable. You’re not nearly as vulnerable as we liberals are,’” George quoted this member as saying. “They are absolutely terrified, and they know they can never keep up with the wokeness. What’s OK today is over the line tomorrow, and nobody gave you the memo.” 
George went on to note that some of the progressives he spoke with were indeed too frightened of the very censorious atmosphere that the alliance proposes to challenge to be willing to affiliate with it, at least at the outset. 
... Nadine Strossen, a New York Law School law professor and former president of the ACLU, emphasized the problem of self-censorship that she saw the alliance as counteracting. “When somebody is attacked by a university official or, for lack of a better term, a Twitter mob, there are constant reports from all individuals targeted that they receive so many private communications and emails saying ‘I support you and agree with you, but I just can’t say it publicly.’” 
She hopes that the combined reputations of the organization’s members will provide a permission structure allowing other faculty members to stand up for their private convictions in public. While a lawsuit can vindicate someone’s constitutional or contractual rights, Strossen noted, only a change in the cultural atmosphere around these issues — a preference for open debate and free exchange over stigmatization and punishment as the default way to negotiate controversy in academe — could resolve the overall problem. 
The Princeton University political historian Keith E. Whittington, who is chairman of the alliance’s academic committee, echoed Strossen’s point. The recruitment effort, he said, aimed to gather “people who would be respectable and hopefully influential to college administrators — such that if a group like that came to them and said ‘Look, you’re behaving badly here on these academic-freedom principles,’ this is a group that they might pay attention to.” 
“Administrators feel very buffeted by political pressures, often only from one side,” Whittington told me. “They hear from all the people who are demanding action, and the easiest, lowest-cost thing to do in those circumstances is to go with the flow and throw the prof under the bus. So we do hope that we can help balance that equation a little bit, make it a little more costly for administrators.” ...
Perhaps amusingly, I am one of the progressive founding members of AFA. At least, I have for most of my life been politically to the left of Robby George and many of the original Princeton 20 that started the project. 

When I left the position of Senior Vice-President for Research and Innovation at MSU last summer, I wrote
6. Many professors and non-academics who supported me were afraid to sign our petition -- they did not want to be subject to mob attack. We received many communications expressing this sentiment. 
7. The victory of the twitter mob will likely have a chilling effect on academic freedom on campus.

For another vivid example of the atmosphere on US university campuses, see Struggles at Yale.  

Obama on political correctness:
... I’ve heard some college campuses where they don’t want to have a guest speaker who is too conservative or they don’t want to read a book if it has language that is offensive to African-Americans or somehow sends a demeaning signal towards women. I gotta tell you, I don’t agree with that either. I don’t agree that you, when you become students at colleges, have to be coddled and protected from different points of view. I think you should be able to — anybody who comes to speak to you and you disagree with, you should have an argument with ‘em. But you shouldn’t silence them by saying, "You can’t come because I'm too sensitive to hear what you have to say." That’s not the way we learn ...

Monday, March 08, 2021

Psychology Is: interview with Nick Fortino


This is a recent interview. Enjoy!
In episode 14 of the Psychology Is podcast, we have the special opportunity to talk to Dr. Steve Hsu, a physicist, professor at MSU, and founder of Genomic Prediction. We discuss the newest innovations related to genetic testing and editing, including Genomic Prediction and CRISPR. We also discuss what these innovations may make possible (for better or worse), and how we can proceed carefully as we learn to harness this new power.
For more, see this recent review article.

Inside AI/ML: Mark Saroufim


Great discussion and insider views of AI/ML research. 
Academics think of themselves as trailblazers, explorers — seekers of the truth. 
Any fundamental discovery involves a significant degree of risk. If an idea is guaranteed to work then it moves from the realm of research to engineering. Unfortunately, this also means that most research careers will invariably be failures at least if failures are measured via “objective” metrics like citations. 
Today we discuss the recent article from Mark Saroufim called Machine Learning: the great stagnation. We discuss the rise of gentleman scientists, fake rigor, incentives in ML, SOTA-chasing, "graduate student descent", distribution of talent in ML and how to learn effectively.
Topics include: OpenAI, GPT-3, RL: Dota & Starcraft, conference papers, incentives and incremental research, Is there an ML stagnation? Is theory useful? Is ML entirely empirical these days? How to suceed as a researcher, Why everyone is forced to become their own media company, and much more.

If you don't want to watch the video, read these (by Mark Saroufim) instead:

Machine Learning: The Great Stagnation 

Friday, March 05, 2021

Genetic correlation of social outcomes between relatives (Fisher 1918) tested using lineage of 400k English individuals

Greg Clark (UC Davis and London School of Economics) deserves enormous credit for producing a large multi-generational dataset which is relevant to some of the most fundamental issues in social science: inequality, economic development, social policy, wealth formation, meritocracy, and recent human evolution. If you have even a casual interest in the dynamics of human society you should study these results carefully...

See previous discussion on this blog. 

Clark recently posted this preprint on his web page. A book covering similar topics is forthcoming.
For Whom the Bell Curve Tolls: A Lineage of 400,000 English Individuals 1750-2020 shows Genetics Determines most Social Outcomes 
Gregory Clark, University of California, Davis and LSE (March 1, 2021) 
Economics, Sociology, and Anthropology are dominated by the belief that social outcomes depend mainly on parental investment and community socialization. Using a lineage of 402,000 English people 1750-2020 we test whether such mechanisms better predict outcomes than a simple additive genetics model. The genetics model predicts better in all cases except for the transmission of wealth. The high persistence of status over multiple generations, however, would require in a genetic mechanism strong genetic assortative in mating. This has been until recently believed impossible. There is however, also strong evidence consistent with just such sorting, all the way from 1837 to 2020. Thus the outcomes here are actually the product of an interesting genetics-culture combination.
The correlational results in the table below were originally deduced by Fisher under the assumption of additive genetic inheritance: h2 is heritability, m is assortativity by genotype, r assortativity by phenotype. (Assortative mating describes the tendency of husband and wife to resemble each other more than randomly chosen M-F pairs in the general population.)
Fisher, R. A. 1918. “The Correlation between Relatives on the Supposition of Mendelian Inheritance.” Transactions of the Royal Society of Edinburgh, 52: 399-433
Thanks to Clark the predictions of Fisher's models, applied to social outcomes, can now be compared directly to data through many generations and across many branches of English family trees. (Figures below from the paper.)

The additive model fits the data well, but requires high heritabilities h2 and a high level m of assortative mating. Most analysts, including myself, thought that the required values of m were implausibly large. However, using modern genomic datasets one can estimate the level of assortative mating by simply looking at the genotypes of married couples. 

From the paper:
(p.26) a recent study from the UK Biobank, which has a collection of genotypes of individuals together with measures of their social characteristics, supports the idea that there is strong genetic assortment in mating. Robinson et al. (2017) look at the phenotype and genotype correlations for a variety of traits – height, BMI, blood pressure, years of education - using data from the biobank. For most traits they find as expected that the genotype correlation between the parties is less than the phenotype correlation. But there is one notable exception. For years of education, the phenotype correlation across spouses is 0.41 (0.011 SE). However, the correlation across the same couples for the genetic predictor of educational attainment is significantly higher at 0.654 (0.014 SE) (Robinson et al., 2017, 4). Thus couples in marriage in recent years in England were sorting on the genotype as opposed to the phenotype when it comes to educational status. 
It is not mysterious how this happens. The phenotype measure here is just the number of years of education. But when couples interact they will have a much more refined sense of what the intellectual abilities of their partner are: what is their general knowledge, ability to reason about the world, and general intellectual ability. Somehow in the process of matching modern couples in England are combining based on the weighted sum of a set of variations at several hundred locations on the genome, to the point where their correlation on this measure is 0.65.
Correction: Height, Educational Attainment (EA), and cognitive ability predictors are controlled by many thousands of genetic loci, not hundreds! 

This is a 2018 talk by Clark which covers most of what is in the paper.

For out of sample validation of the Educational Attainment (EA) polygenic score, see Game Over: Genomic Prediction of Social Mobility.


Saturday, February 27, 2021

Infinity and Solipsism, Physicists and Science Fiction

The excerpt below is from Roger Zelazny's Creatures of Light and Darkness (1969), an experimental novel which is somewhat obscure, even to fans of Zelazny. 
Positing infinity, the rest is easy. 
The Prince Who Was A Thousand is ... a teleportationist, among other things ... the only one of his kind. He can transport himself, in no time at all, to any place that he can visualize. And he has a very vivid imagination. 
Granting that any place you can think of exists somewhere in infinity, if the Prince can think of it too, he is able to visit it. Now, a few theorists claim that the Prince’s visualizing a place and willing himself into it is actually an act of creation. No one knew about the place before, and if the Prince can find it, then perhaps what he really did was make it happen. However, positing infinity, the rest is easy.
This contains already the central idea that is expressed more fully in Nine Princes in Amber and subsequent books in that series.
While traveling (shifting) between Shadows, [the prince] can alter reality or create a new reality by choosing which elements of which Shadows to keep or add, and which to subtract.
Creatures of Light and Darkness also has obvious similarities to Lord of Light, which many regard as Zelazny's best book and even one of the greatest science fiction novels ever written. Both have been among my favorites since I read them as a kid.

Infinity, probability measures, and solipsism have received serious analysis by theoretical physicists: see, e.g.,  Boltzmann brains. (Which is less improbable: the existence of the universe around you, or the existence of a single brain whose memory records encode that universe?) Perhaps this means theorists have too much time on their hands, due to lack of experimental progress in fundamental physics. 

Science fiction is popular amongst physicists, but I've always been surprised that the level of interest isn't even higher. Two examples I know well: the late Sidney Coleman and my collaborator Bob Scherrer at Vanderbilt were/are scholars and creators of the genre. See these stories by Bob, and Greg Benford's Remembing Sid
... Sid and some others created a fannish publishing house, Advent Publishers, in 1956. He was a teenager when he helped publish Advent’s first book, Damon Knight’s In Search of Wonder. ... 
[Sid] loved SF whereas Einstein deplored it. Lest SF distort pure science and give people the false illusion of scientific understanding, Einstein recommended complete abstinence from any type of science fiction. “I never think of the future. It comes soon enough,” he said.
While I've never written science fiction, occasionally my research comes close -- it has at times addressed questions of the form: 

Do the Laws of Nature as we know them allow ... 

This research might be considered as the ultimate in hard SF ;-) 
Wikipedia: Hard science fiction is a category of science fiction characterized by concern for scientific accuracy and logic.

Note Added: Bob Scherrer writes: In my experience, about 1/3 of research physicists are SF fans, about 1/3 have absolutely no interest in SF, and the remaining 1/3 were avid readers of science fiction in middle school/early high school but then "outgrew" it.

Here is a recent story by Bob which I really enjoyed -- based on many worlds quantum mechanics :-) 

It was ranked #2 in the 2019 Analog Magazine reader poll!

Note Added 2: Kazuo Ishiguro (2017 Nobel Prize in Literature) has been evolving into an SF/fantasy writer over time. And why not? For where else can one work with genuinely new ideas? See Never Let Me Go (clones), The Buried Giant (post-Arthurian England), and his latest book Klara and the Sun.
NYTimes: ... we slowly discover (and those wishing to avoid spoilers should now skip to the start of the next paragraph), the cause of Josie’s mysterious illness is a gene-editing surgery to enhance her intellectual faculties. The procedure carries high risks as well as potential high rewards — the main one being membership in a professional superelite. Those who forgo or simply can’t afford it are essentially consigning themselves to economic serfdom.
WSJ: ... Automation has created a kind of technological apartheid state, which is reinforced by a dangerous “genetic editing” procedure that separates “lifted,” intellectually enhanced children from the abandoned masses of the “unlifted.” Josie is lifted, but the procedure is the cause of her illness, which is often terminal. Her oldest friend and love interest, Rick, is unlifted and so has few prospects despite his obvious brilliance. Her absentee father is an engineer who was outsourced by machines and has since joined a Community, one of the closed groups formed by those lacking social rank. In a conversational aside it is suggested that the Communities have self-sorted along racial lines and are heavily armed.

Sunday, February 21, 2021

Othram: Appalachian hiker found dead in tent identified via DNA forensics


Othram helps solve another mystery: the identity of a dead Appalachian hiker. 

There are ~50k unidentified deceased individuals in the US, with ~1k new cases each year.
CBS Sunday Morning: He was a mystery who intrigued thousands: Who was the hiker who walked almost the entire length of the Appalachian Trail, living completely off the grid, only to be found dead in a tent in Florida? It took years, and the persistence of amateur sleuths, to crack the case. Nicholas Thompson of The Atlantic Magazine tells the tale of the man who went by the name "Mostly Harmless," and about the efforts stirred by the mystery of his identity to give names to nameless missing persons.
See also Othram: the future of DNA forensics.

Thursday, February 18, 2021

David Reich: Prehistory of Europe and S. Asia from Ancient DNA


In case you have not followed the adventures of the Yamnaya (proto Indo-Europeans from the Steppe), I recommend this recent Harvard lecture by David Reich. It summarizes advances in our understanding of deep human history in Europe and South Asia resulting from analysis of ancient DNA. 
The new technology of ancient DNA has highlighted a remarkable parallel in the prehistory of Europe and South Asia. In both cases, the arrival of agriculture from southwest Asia after 9,000 years ago catalyzed profound population mixtures of groups related to Southwest Asian farmers and local hunter-gatherers. In both cases, the spread of ancestry ultimately deriving from Steppe pastoralists had a further major impact after 5,000 years ago and almost certainly brought Indo-European languages. Mixtures of these three source populations form the primary gradients of ancestry in both regions today. 
In this lecture, Prof. Reich will discuss his new book, Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past. 
There seems to be a strange glitch at 16:19 and again at 27:55 -- what did he say?

See also Reich's 2018 NYTimes editorial.

Wednesday, February 17, 2021

The Post-American World: Crooke, Escobar, Blumenthal, and Marandi


Even if you disagree violently with the viewpoints expressed in this discussion, it will inform you as to how the rest of the world thinks about the decline of US empire. 

The group is very diverse: a former UK diplomat, an Iranian professor educated in the West but now at University of Tehran, a progressive author and journalist (son of Clinton advisor Sidney Blumenthal) who spent 5 years reporting from Israel, and a Brazilian geopolitical analyst who writes for Asia Times (if I recall correctly, lives in Thailand).
Thirty years ago, the United States dominated the world politically, economically, and scientifically. But today? 
Watch this in-depth discussion with distinguished guests: 
Alastair Crooke - Former British Diplomat, Founder and Director of the Conflicts Forum 
Pepe Escobar - Brazilian Political Analyst and Author 
Max Blumenthal - American Journalist and Author from Grayzone 
Chaired by Dr. Mohammad Marandi - Professor at University of Tehran
See also two Escobar articles linked here. Related: Foreign Observers of US Empire.  

Sunday, February 14, 2021

Physics and AI: some recent papers

Three AI paper recommendations from a theoretical physicist (former collaborator) who now runs an AI lab in SV. Less than 5 years after leaving physics research, he and his team have shipped AI products that are used by millions of people. (Figure above is from the third paper below.)

This paper elucidates the relationship between symmetry principles (familiar from physics) and specific mathematical structures like convolutions used in DL.
Covariance in Physics and CNN 
Cheng, et al.  (Amsterdam)
In this proceeding we give an overview of the idea of covariance (or equivariance) featured in the recent development of convolutional neural networks (CNNs). We study the similarities and differences between the use of covariance in theoretical physics and in the CNN context. Additionally, we demonstrate that the simple assumption of covariance, together with the required properties of locality, linearity and weight sharing, is sufficient to uniquely determine the form of the convolution.

The following two papers explore connections between AI/ML and statistical physics, including renormalization group (RG) flow. 

Theoretical Connections between Statistical Physics and RL 
Rahme and Adams  (Princeton)
Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function Z, and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and Q-functions can be derived from this partition function and interpreted via average energies, the Z-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for Z is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions. The policies learned via these Z-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. In addition to sampling actions proportionally to the exponential of the expected cumulative reward as Boltzmann policies would, these policies take entropy into account favoring states from which many outcomes are possible.


RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior
Hu et al.   (UCSD and Berkeley AI Lab) 
Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key idea of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, called RG-Flow, which can separate information at different scales of images with disentangled representations at each scale. We demonstrate our method mainly on the CelebA dataset and show that the disentangled representations at different scales enable semantic manipulation and style mixing of the images. To visualize the latent representations, we introduce receptive fields for flow-based models and find that the receptive fields learned by RG-Flow are similar to those in convolutional neural networks. In addition, we replace the widely adopted Gaussian prior distribution by a sparse prior distribution to further enhance the disentanglement of representations. From a theoretical perspective, the proposed method has O(logL) complexity for image inpainting compared to previous generative models with O(L^2) complexity.
See related remarks: ICML notes (2018).
It may turn out that the problems on which DL works well are precisely those in which the training data (and underlying generative processes) have a hierarchical structure which is sparse, level by level. Layered networks perform a kind of coarse graining (renormalization group flow): first layers filter by feature, subsequent layers by combinations of features, etc. But the whole thing can be understood as products of sparse filters, and the performance under training is described by sparse performance guarantees (ReLU = thresholded penalization?). Given the inherent locality of physics (atoms, molecules, cells, tissue; atoms, words, sentences, ...) it is not surprising that natural phenomena generate data with this kind of hierarchical structure.

Sunday, February 07, 2021

Gradient Descent Models Are Kernel Machines (Deep Learning)

This paper shows that models which result from gradient descent training (e.g., deep neural nets) can be expressed as a weighted sum of similarity functions (kernels) which measure the similarity of a given instance to the examples used in training. The kernels are defined by the inner product of model gradients in the parameter space, integrated over the descent (learning) path.

Roughly speaking, two data points x and x' are similar, i.e., have large kernel function K(x,x'), if they have similar effects on the model parameters in the gradient descent. With respect to the learning algorithm, x and x' have similar information content. The learned model y = f(x) matches x to similar data points x_i: the resulting value y is simply a weighted (linear) sum of kernel values K(x,x_i).

This result makes it very clear that without regularity imposed by the ground truth mechanism which generates the actual data (e.g., some natural process), a neural net is unlikely to perform well on an example which deviates strongly (as defined by the kernel) from all training examples. See note added at bottom for more on this point, re: AGI, etc. Given the complexity (e.g., dimensionality) of the ground truth model, one can place bounds on the amount of data required for successful training.

This formulation locates the nonlinearity of deep learning models in the kernel function. The superposition of kernels is entirely linear as long as the loss function is additive over training data.
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine  
P. Domingos
Deep learning’s successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel. This improved understanding should lead to better learning algorithms.
From the paper:
... Here we show that every model learned by this method, regardless of architecture, is approximately equivalent to a kernel machine with a particular type of kernel. This kernel measures the similarity of the model at two data points in the neighborhood of the path taken by the model parameters during learning. Kernel machines store a subset of the training data points and match them to the query using the kernel. Deep network weights can thus be seen as a superposition of the training data points in the kernel’s feature space, enabling their efficient storage and matching. This contrasts with the standard view of deep learning as a method for discovering representations from data. ... 
... the weights of a deep network have a straightforward interpretation as a superposition of the training examples in gradient space, where each example is represented by the corresponding gradient of the model. Fig. 2 illustrates this. One well-studied approach to interpreting the output of deep networks involves looking for training instances that are close to the query in Euclidean or some other simple space (Ribeiro et al., 2016). Path kernels tell us what the exact space for these comparisons should be, and how it relates to the model’s predictions. ...
See also this video which discusses the paper. 

You can almost grasp the result from the figure and definitions below.

Note Added:
I was asked to elaborate further on this sentence, especially regarding AGI and human cognition: 

... without regularity imposed by the ground truth mechanism which generates the actual data (e.g., some natural process), a neural net is unlikely to perform well on an example which deviates strongly (as defined by the kernel) from all training examples.

It should not be taken as a suggestion that gradient descent models can't achieve AGI, or that our minds can't be (effectively) models of this kernel type. 

1. The universe is highly compressible: it is governed by very simple effective models. These models can be learned, which allows for prediction beyond specific examples.

2. A sufficiently complex neural net can incorporate layers of abstraction. Thus a new instance and a previously seen example might be similar in an abstract (non-explicit) sense, but that similarity is still incorporated into the kernel. When Einstein invented Special Relativity he was not exactly aping another physical theory he had seen before, but at an abstract level the physical constraint (speed of light constant in all reference frames) and algebraic incorporation of this fact into a description of spacetime (Lorentz symmetry) may have been "similar" to examples he had seen already in simple geometry / algebra. (See Poincare and Einstein for more.)
Ulam: Banach once told me, "Good mathematicians see analogies between theorems or theories, the very best ones see analogies between analogies." Gamow possessed this ability to see analogies between models for physical theories to an almost uncanny degree... 

Saturday, February 06, 2021

Enter the Finns: FinnGen and FINRISK polygenic prediction of cardiometabolic diseases, common cancers, alcohol use, and cognition

In 2018 Dr. Aarno Palotie visited MSU (video of talk) to give an overview of the FinnGen research project. FinnGen aims to collect the genomic data of 500k citizens in Finland in order to study the origins of diseases and their treatment. Finland is well suited for this kind of study because it is relatively homogenous and has a good national healthcare system.
Professor Aarno Palotie, M.D., Ph.D. is the research director of the Human Genomics program at FIMM. He is also a faculty member at the Center for Human Genome Research at the Massachusetts General Hospital in Boston and associate member of the Broad Institute of MIT and Harvard. He has a long track record in human disease genetics. He has held professorships and group leader positions at the University of Helsinki, UCLA, and the Wellcome Trust Sanger Institute. He has also been the director of the Finnish Genome Center and Laboratory of Molecular Genetics in the Helsinki University Hospital.
FinnGen is now producing very interesting results in polygenic risk prediction and clinical / public health applications of genomics. Below are a few recent papers.

1. This paper studies the use of PRS in prediction of five common diseases, with an eye towards clinical utility.
Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers 
Nature Medicine volume 26, 549–557(2020) 
Polygenic risk scores (PRSs) have shown promise in predicting susceptibility to common diseases1,2,3. We estimated their added value in clinical risk prediction of five common diseases, using large-scale biobank data (FinnGen; n = 135,300) and the FINRISK study with clinical risk factors to test genome-wide PRSs for coronary heart disease, type 2 diabetes, atrial fibrillation, breast cancer and prostate cancer. We evaluated the lifetime risk at different PRS levels, and the impact on disease onset and on prediction together with clinical risk scores. Compared to having an average PRS, having a high PRS contributed 21% to 38% higher lifetime risk, and 4 to 9 years earlier disease onset. PRSs improved model discrimination over age and sex in type 2 diabetes, atrial fibrillation, breast cancer and prostate cancer, and over clinical risk in type 2 diabetes, breast cancer and prostate cancer. In all diseases, PRSs improved reclassification over clinical thresholds, with the largest net reclassification improvements for early-onset coronary heart disease, atrial fibrillation and prostate cancer. This study provides evidence for the additional value of PRSs in clinical disease prediction. The practical applications of polygenic risk information for stratified screening or for guiding lifestyle and medical interventions in the clinical setting remain to be defined in further studies.

2. This paper is a well-powered study of genetic influence on alcohol use and effects on mortality.

Genomic prediction of alcohol-related morbidity and mortality 
Nature Translational Psychiatry volume 10, Article number: 23 (2020) 
While polygenic risk scores (PRS) have been shown to predict many diseases and risk factors, the potential of genomic prediction in harm caused by alcohol use has not yet been extensively studied. Here, we built a novel polygenic risk score of 1.1 million variants for alcohol consumption and studied its predictive capacity in 96,499 participants from the FinnGen study and 39,695 participants from prospective cohorts with detailed baseline data and up to 25 years of follow-up time. A 1 SD increase in the PRS was associated with 11.2 g (=0.93 drinks) higher weekly alcohol consumption (CI = 9.85–12.58 g, p = 2.3 × 10–58). The PRS was associated with alcohol-related morbidity (4785 incident events) and the risk estimate between the highest and lowest quintiles of the PRS was 1.83 (95% CI = 1.66–2.01, p = 1.6 × 10–36). When adjusted for self-reported alcohol consumption, education, marital status, and gamma-glutamyl transferase blood levels in 28,639 participants with comprehensive baseline data from prospective cohorts, the risk estimate between the highest and lowest quintiles of the PRS was 1.58 (CI = 1.26–1.99, p = 8.2 × 10–5). The PRS was also associated with all-cause mortality with a risk estimate of 1.33 between the highest and lowest quintiles (CI = 1.20–1.47, p = 4.5 × 10–8) in the adjusted model. In conclusion, the PRS for alcohol consumption independently associates for both alcohol-related morbidity and all-cause mortality. Together, these findings underline the importance of heritable factors in alcohol-related health burden while highlighting how measured genetic risk for an important behavioral risk factor can be used to predict related health outcomes.

3. This paper examines rare CNVs (Copy Number Variants) and PRS (Polygenic Risk Score) prediction using a combined Finnish sample of ~30k for whom education, income, and health outcomes are known. The study finds that low polygenic scores for Educational Attainment (EA) and intelligence predict worse outcomes in education, income, and health.
Polygenic burden has broader impact on health, cognition, and socioeconomic outcomes than most rare and high-risk copy number variants 

Abstract Copy number variants (CNVs) are associated with syndromic and severe neurological and psychiatric disorders (SNPDs), such as intellectual disability, epilepsy, schizophrenia, and bipolar disorder. Although considered high-impact, CNVs are also observed in the general population. This presents a diagnostic challenge in evaluating their clinical significance. To estimate the phenotypic differences between CNV carriers and non-carriers regarding general health and well-being, we compared the impact of SNPD-associated CNVs on health, cognition, and socioeconomic phenotypes to the impact of three genome-wide polygenic risk score (PRS) in two Finnish cohorts (FINRISK, n = 23,053 and NFBC1966, n = 4895). The focus was on CNV carriers and PRS extremes who do not have an SNPD diagnosis. We identified high-risk CNVs (DECIPHER CNVs, risk gene deletions, or large [>1 Mb] CNVs) in 744 study participants (2.66%), 36 (4.8%) of whom had a diagnosed SNPD. In the remaining 708 unaffected carriers, we observed lower educational attainment (EA; OR = 0.77 [95% CI 0.66–0.89]) and lower household income (OR = 0.77 [0.66–0.89]). Income-associated CNVs also lowered household income (OR = 0.50 [0.38–0.66]), and CNVs with medical consequences lowered subjective health (OR = 0.48 [0.32–0.72]). The impact of PRSs was broader. At the lowest extreme of PRS for EA, we observed lower EA (OR = 0.31 [0.26–0.37]), lower-income (OR = 0.66 [0.57–0.77]), lower subjective health (OR = 0.72 [0.61–0.83]), and increased mortality (Cox’s HR = 1.55 [1.21–1.98]). PRS for intelligence had a similar impact, whereas PRS for schizophrenia did not affect these traits. We conclude that the majority of working-age individuals carrying high-risk CNVs without SNPD diagnosis have a modest impact on morbidity and mortality, as well as the limited impact on income and educational attainment, compared to individuals at the extreme end of common genetic variation. Our findings highlight that the contribution of traditional high-risk variants such as CNVs should be analyzed in a broader genetic context, rather than evaluated in isolation. 
From the paper:
 ... we compared the impact of CNVs to the impact of the PRSs for educational attainment [24], schizophrenia [25], and general intelligence [26] on general health, morbidity, mortality, and socioeconomic burden. We analyzed these effects in two cohorts: one sampled at random from the Finnish working-age population (FINRISK), the other a Finnish birth cohort (Northern Finland Birth Cohort 1966; NFBC1966). Both cohorts link to national health records, enabling analysis of longitudinal health data and socioeconomic status data over several decades. 
... we observed a clear polygenic effect on socioeconomic outcome with educational attainment and IQ PRS scores. Belonging to the matched lowest PRS extremes (lowest 2.66%) of educational attainment or IQ had an overall stronger impact on the socioeconomic outcome than belonging to most high-risk CNV groups, and a generally stronger impact on health and survival, with the exception of household income-associated CNVs. 
... odds for subsequent level of education were even lower at the matched lowest extreme of PRSEA (OR = 0.31 [0.26–0.37]) and PRSIQ (OR = 0.51 [0.44–0.60]).
... Rare deleterious variants, including CNVs, can have a major impact on health outcomes for an individual and are thus under strong negative selection. However, such variants might not always have a strong phenotypic impact (incomplete penetrance), and as observed here, can have a very modest—if any—effect on well-being. The reason for this wide spectrum of outcomes remains speculative. From a genetic perspective, one hypothesis is that additional variants, both rare and common, modify the phenotypic outcome of a CNV carrier (Supplementary Figs. 11 and 12). This type of effect is observable in analyzes of hereditary breast and ovarian cancer in the UK Biobank [40] and in FinnGen [41], where strong-impacting variants’ penetrance is modified by compensatory polygenic effects. 
... As stated above, the observed effect of polygenic scores was broader than that of structural variants. We observed strong effects in PRSs for intelligence and educational attainment on education, income and socioeconomic status. 

Wednesday, February 03, 2021

Gerald Feinberg and The Prometheus Project

Gerald Feinberg (1933-1992) was a theoretical physicist at Columbia, perhaps best known for positing the tachyon -- a particle that travels faster than light. He also predicted the existence of the mu neutrino. 

Feinberg attended Bronx Science with Glashow and Weinberg. Interesting stories abound concerning how the three young theorists were regarded by their seniors at the start of their careers. 

I became aware of Feinberg when Pierre Sikivie and I worked out the long range force resulting from two neutrino exchange. Although we came to the idea independently and derived, for the first time, the correct result, we learned later that it had been studied before by Feinberg and Sucher. Sadly, Feinberg died of cancer shortly before Pierre and I wrote our paper. 

Recently I came across Feinberg's 1969 book The Prometheus Project, which is one of the first serious examinations (outside of science fiction) of world-changing technologies such as genetic engineering and AI. See reviews in SciencePhysics Today, and H+ Magazine. A scanned copy of the book can be found at Libgen.

Feinberg had the courage to engage with ideas that were much more speculative in the late 60s than they are today. He foresaw correctly, I believe, that technologies like AI and genetic engineering will alter not just human society but the nature of the human species itself. In the final chapter, he outlines a proposal for the eponymous Prometheus Project -- a global democratic process by which the human species can set long term goals in order to guide our approach to what today would be called the Singularity.


Tuesday, February 02, 2021

All Men Are Brothers -- 3 AM Edition

Afu Thomas is a German internet personality, known for his fluent Chinese, who lives in Shanghai. His videos are extremely popular in China and people often recognize him in public. 

These are a series of street interviews shot at 3 AM, in which he elicits sometimes moving and philosophical responses from ordinary people about hopes, dreams, family, money, happiness. These individuals, ranging from teen boys and girls to middle aged men, answer the questions simply but with insight and sincerity.  

The subtitled translations are very good. 

Wednesday, January 27, 2021

Yuri Milner interviews Donaldson, Kontsevich, Lurie, Tao, and Taylor (2015 Breakthrough Prize)


I came across this panel discussion recently, with Yuri Milner (former theoretical physicist, internet billionaire, and sponsor of the Breakthrough Prize) as interlocutor and panelists Simon Donaldson, Maxim Kontsevich, Jacob Lurie, Terence Tao, and Richard Taylor. 

Among the topics covered: the nature of mathematics, the simulation question, AGI and automated proof, human-machine collaboration in mathematics. Kontsevich marvels at the crystalline form of quantum mechanics: why linearity? why a vector space structure? 

Highly recommended!

See also 

The Quantum Simulation Hypothesis: Do we live in a quantum multiverse simulation? 

Sunday, January 24, 2021

Clinical Applications of Polygenic Risk Scores

Last week we posted a new paper (see bottom), prepared for the book Genomic Prediction of Complex Traits, Springer Nature series Methods in Molecular Biology. Someone asked me to comment more on clinical applications of polygenic risk scores. Here's what we say in the paper, using the specific example of breast cancer (emphasis added):
There is already signifcant interest in the application of PRS in a clinical setting, for example to identify high risk individuals who might receive early screening or preventative care [2, 13–24]. As a concrete example, women with high PRS scores for breast cancer can be offered early screening: already standard of care for those with BRCA risk variants [25, 26]. However, BRCA mutations affect no more than a few women per thousand in the general population [27–29]. Importantly, the number of (BRCA negative) women who are at high risk for breast cancer due to polygenic effects is an order of magnitude larger than the population of BRCA carriers [2, 10, 30–34]. From this one example it is clear that significant medical, public health, and cost benefits could result from PRS (e.g. [35]). It is well known that patients with atherosclerotic diseases, coronary artery disease (CAD), and lung diseases can benefit from early intervention [36–38]. ... Precision genetics is already used in identification of candidates for early intervention, and will become widespread in the near future (cf. Myriad’s riskScore test and other examples [33, 34]). In figure 4, we illustrate the predicted risk of breast cancer and coronary artery disease as function of age for high, medium and low risk groups, respectively.

We have verified in sibling studies that among two sisters the outlier with high risk score is much more likely to have breast cancer than the one with normal range score. The excerpt below is from the section on sibling validation: 
... We tested a variety of polygenic predictors using tens of thousands of genetic siblings for whom we have SNP genotypes, health status, and phenotype information in late adulthood. Siblings have typically experienced similar environments during childhood, and exhibit negligible population stratification relative to each other. Therefore, the ability to predict differences in disease risk or complex trait values between siblings is a strong test of genomic prediction in humans. We compare validation results obtained using non-sibling subjects to those obtained among siblings and find that typically most of the predictive power persists in within-family designs. Given 1 sibling with normal-range PRS score (less than 84th percentile) and 1 sibling with high PRS score (top few percentiles), the predictors identify the affected sibling about 70-90 percent of the time across a variety of disease conditions, including breast cancer, heart attack, diabetes, etc. For height, the predictor correctly identities the taller sibling roughly 80 percent of the time when the (male) height difference is 2 inches or more.
The evidence is strong that PRS outliers are at unusual absolute risk. In fact, the likelihood that an individual in the high PRS tail will eventually have the disease can approach 100% for some conditions -- see figure below. This is a concrete realization of precision medicine, at least for these individuals.

In addition to commercial products like Myriad's riskScore (which extends their BRCA panel to additional polygenic factors, and is already widely available), I am aware of many healthcare systems (including some national healthcare systems) that are seriously investigating the use of PRS in standard clinical care. 

Another example: a relative of mine had a prostate cancer diagnosis and took a (standard of care) genetic risk test which, like the pre-riskScore Myriad product, is simply a panel of rare monogenic risk variants. We and other groups have developed prostate cancer polygenic predictors which could be easily incorporated into standard of care and would likely be much more useful than the existing panel. I haven't looked carefully at the prostate cancer numbers but I strongly suspect that, as in the breast cancer example, many more men are at high risk due to high PRS than are carriers of the rare variants.

It's only a matter of time before these improvements in diagnostic screening become widespread.

Here is what we say about IVF applications:
In the past, parents with more viable embryos than they intended to use made a selection based on very little information — typically nothing more than the appearance or morphology of each blastocyst. With modern technology it has become common to genotype embryos before selection, in order to detect potential genetic issues such as trisomy 21 (Down Syndrome). Parents who are carriers of a single gene variant linked to a Mendelian condition can use genetic screening to avoid passing the risk variant on to their child. Millions of embryos are now genetically tested each year. With polygenic risk prediction, it is possible now to screen against outlier risk for many common disease conditions, not just rare single gene conditions. For example, the overwhelming majority of families with breast cancer history are not carriers of a BRCA risk variant, but rather have elevated polygenic risk. It is now possible for these families to select an embryo with average or even below average breast cancer risk if they so wish.

 Here is the paper:

From Genotype to Phenotype: polygenic prediction of complex human traits > q-bio > arXiv:2101.05870 33 pages, 7 figures, 1 table 
Timothy G. Raben, Louis Lello, Erik Widen, Stephen D.H. Hsu 
Decoding the genome confers the capability to predict characteristics of the organism (phenotype) from DNA (genotype). We describe the present status and future prospects of genomic prediction of complex traits in humans. Some highly heritable complex phenotypes such as height and other quantitative traits can already be predicted with reasonable accuracy from DNA alone. For many diseases, including important common conditions such as coronary artery disease, breast cancer, type I and II diabetes, individuals with outlier polygenic scores (e.g., top few percent) have been shown to have 5 or even 10 times higher risk than average. Several psychiatric conditions such as schizophrenia and autism also fall into this category. We discuss related topics such as the genetic architecture of complex traits, sibling validation of polygenic scores, and applications to adult health, in vitro fertilization (embryo selection), and genetic engineering.

Monday, January 18, 2021

From Genotype to Phenotype: polygenic prediction of complex human traits

New paper, prepared for the book Genomic Prediction of Complex Traits, Springer Nature series Methods in Molecular Biology.
From Genotype to Phenotype: polygenic prediction of complex human traits > q-bio > arXiv:2101.05870   33 pages, 7 figures, 1 table
Timothy G. Raben, Louis Lello, Erik Widen, Stephen D.H. Hsu 
Decoding the genome confers the capability to predict characteristics of the organism (phenotype) from DNA (genotype). We describe the present status and future prospects of genomic prediction of complex traits in humans. Some highly heritable complex phenotypes such as height and other quantitative traits can already be predicted with reasonable accuracy from DNA alone. For many diseases, including important common conditions such as coronary artery disease, breast cancer, type I and II diabetes, individuals with outlier polygenic scores (e.g., top few percent) have been shown to have 5 or even 10 times higher risk than average. Several psychiatric conditions such as schizophrenia and autism also fall into this category. We discuss related topics such as the genetic architecture of complex traits, sibling validation of polygenic scores, and applications to adult health, in vitro fertilization (embryo selection), and genetic engineering.

From the introduction:
I, on the other hand, knew nothing, except ... physics and mathematics and an ability to turn my hand to new things. — Francis Crick 
The challenge of decoding the genome has loomed large over biology since the time of Watson and Crick. Initially, decoding referred to the relationship between DNA and specific proteins or molecular mechanisms, but the ultimate goal is to deduce the relationship between DNA and phenotype — the character of the organism itself. How does Nature encode the traits of the organism in DNA? In this review we describe recent advances toward this goal, which have resulted from the application of machine learning (ML) to large genomic data sets. Genomic prediction is the real decoding of the genome: the creation of mathematical models which map genotypes to complex traits. 
It is a peculiarity of ML and artificial intelligence (AI) applied to complex systems that these methods can often “solve” a problem without explicating, in a manner that humans can absorb, the intricate mechanisms that lie intermediate between input and output. For example, AlphaGo [1] achieved superhuman mastery of an ancient game that had been under serious study for thousands of years. Yet nowhere in the resulting neural network with millions of connection strengths is there a human-comprehensible guide to Go strategy or game dynamics. Similarly, genomic prediction has produced mathematical functions which predict quantitative human traits with surprising accuracy — e.g., height, bone density, and cholesterol or lipoprotein A levels in blood (see Table 1); using typically thousands of genetic variants as input (see next section for details) — but without explicitly revealing the role of these variants in actual biochemical mechanisms. Characterizing these mechanisms — which are involved in phenomena such as bone growth, lipid metabolism, hormonal regulation, protein interactions — will be a project which takes much longer to complete. 
If recent trends persist, in particular the continued growth of large genotype | phenotype data sets, we will likely have good genomic predictors for a host of human traits within the next decade. ...

Blog Archive