Thursday, February 27, 2020

Adam Dynes on Noisy Retrospection: The Effect of Party Control on Policy Outcomes - Manifold #35



Steve and Corey talk to Adam Dynes of Brigham Young University about whether voting has an effect on policy outcomes. Adam’s work finds that control of state legislatures or governorships does not have an observable effect on macroscopic variables such as crime rates, the economy, etc. Possible explanations: parties push essentially the same policies, politicians don't keep promises, monied interest control everything. Are voting decisions just noisy mood affiliation? Perhaps time is better spent obsessing about sports teams, which at least generates pleasure.

1:22 - What is retrospective voting?
5:43 - Research findings on retrospective voting
14:02 - Uniparty/Monied interests?
17:23 - Martin Gilens' research
23:10 - Are people just voting based on noise or mood affiliation?
27:13 - Bryan Caplan - Myth of the Rational Voter
34:35 - Is time better spent obsessing about sports teams, which at least generates pleasure?
39:42 - After the fall of Athens, was democracy commonly referred to as irrational mob rule?
48:22 - Does this research translate to the national level?
52:19 - Super Nerdy Stuff: Statistical Analysis, Reproducibility & Null Results
56:40 - Reactions to the results

Transcript

Adam Dynes (Personal Website)

Adam Dynes (Faculty Profile)

Noisy Retrospection: The Effect of Party Control on Policy Outcomes

Related: 2016 blog post on Martin Gilens' work: American and Chinese Oligarchies.


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Saturday, February 22, 2020

Cold Spring Harbor Laboratory: Seminar and Photos

Last week I visited Cold Spring Harbor Laboratory to give a seminar.

The new material is in slides 13-17. See also Live Long and Prosper: Genetic Architecture of Complex Traits and Disease Risk Predictors. I believe the sibling validation results are extremely important: typically most of the predictive power persists in within-family validation tests. We have not released this paper but will soon -- the slides are a preview. To be honest I fully anticipated these results: the large number of out of sample predictor validations using unrelated individuals strongly suggests that real genetic effects are at work. However, many people are irrationally biased against -- have strong priors against -- genetic causation of complex traits (even disease risks). These family designs provide important "gold standard" evidence, which, one can hope, will enlighten even the most stubborn. The sad alternative is progress one funeral at a time...

Otherwise the talk is similar to the one I gave at the Berkeley/UCSF Innovative Genomics Institute last summer. Video of IGI talk.
Title: Genomic Prediction of Complex Traits and Disease Risks via AI/ML and Large Genomic Datasets

Abstract: The talk is divided into two parts. The first gives an overview of the rapidly advancing area of genomic prediction of disease risks using polygenic scores. We can now identify risk outliers (e.g., with 5 or 10 times normal risk) for about 20 common disease conditions, ranging from diabetes to heart diseases to breast cancer, using inexpensive SNP genotypes (i.e., as offered by 23andMe). We can also predict some complex quantitative traits (e.g., adult height with accuracy of few cm, using ~20k SNPs). I discuss application of these results in precision medicine as well as embryo selection in IVF, and give some details about genetic architectures. The second part covers the AI/ML used to build these predictors, with an emphasis on "sparse learning" and phase transitions in high dimensional statistics.
Some photos. The ones on the wall of the seminar room capture a golden era in molecular biology and the study of DNA. Leo Szilard on the right in the one below. Also, Jacques Monod, Crick and Watson, Wally Gilbert, Max Delbruck, Frank Stahl, Francois Jacob, David Baltimore. Of these individuals I have known four in person. I would give a lot to have met Crick and especially Szilard. While at CSHL I learned that James Watson is still alive and intellectually active.

See H. Judson's The Eighth Day of Creation (PDF) for a brilliant but readable history of the golden age of molecular biology.












Thursday, February 20, 2020

Yang Wang on Science and Technology in China, Hong Kong Protests, and Corona Virus - Manifold Podcast #34



Yang Wang is Dean of Science at the Hong Kong University of Science and Technology. Professor Wang received his BS degree in mathematics from University of Science and Technology of China in 1983, and his PhD degree from Harvard University in 1990 under the supervision of Fields medalist David Mumford. He served as Chair of the Mathematics department at Michigan State University before joining HKUST.

2:50 - US-China Relations: Has China advanced through the development of human capital or the theft of intellectual property?
16:23 - Academic Culture in China
33:00 - Hong Kong Protests: Economic inequality, housing prices, and outside actors.
1:04:09 - Corona Virus COVID-19: Has the Corona Virus established a new mode of online education in Hong Kong? Yang makes a forecast about the epidemic's trajectory.

Transcript

Yang Wang, Dean of Science at HKUST


Yang Wang (Faculty Profile)


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Friday, February 14, 2020

Live Long and Prosper: Genetic Architecture of Complex Traits and Disease Risk Predictors

New paper! Non-coding regions contribute significantly to genetic disease risk -- think twice before you opt for exome sequencing over array genotyping. Also, pleiotropy between common disease risks seems to be weak.

Credit to Soke Yuen Yong for performing essentially all the analysis in this paper.
Genetic Architecture of Complex Traits and Disease Risk Predictors

Soke Yuen Yong, Timothy G. Raben, Louis Lello, Stephen D. H. Hsu

Genomic prediction of complex human traits (e.g., height, cognitive ability, bone density) and disease risks (e.g., breast cancer, diabetes, heart disease, atrial fibrillation) has advanced considerably in recent years. Predictors have been constructed using penalized algorithms that favor sparsity: i.e., which use as few genetic variants as possible. We analyze the specific genetic variants (SNPs) utilized in these predictors, which can vary from dozens to as many as thirty thousand. We find that the fraction of SNPs in or near genic regions varies widely by phenotype. For the majority of disease conditions studied, a large amount of the variance is accounted for by SNPs outside of coding regions. The state of these SNPs cannot be determined from exome-sequencing data. This suggests that exome data alone will miss much of the heritability for these traits - i.e., existing PRS cannot be computed from exome data alone. We also study the fraction of SNPs and of variance that is in common between pairs of predictors. The DNA regions used in disease risk predictors so far constructed seem to be largely disjoint (with a few interesting exceptions), suggesting that individual genetic disease risks are largely uncorrelated. It seems possible in theory for an individual to be a low-risk outlier in all conditions simultaneously.

https://www.biorxiv.org/content/10.1101/2020.02.12.946608v1
doi: https://doi.org/10.1101/2020.02.12.946608
From the conclusions:
III. The DNA regions used in disease risk predictors so far constructed seem to be largely disjoint (with a few interesting exceptions), suggesting that individual genetic disease risks are largely uncorrelated.

Observation III has interesting implications for pleiotropy [63–65]. We found that genetic risks are largely uncorrelated for different conditions. This suggests that there can exist individuals with, e.g., low risk simultaneously in each of multiple conditions, for essentially any combination of conditions. There is no trade-off required between different disease risks ... One could speculate that a lucky individual with exceptionally low risk across multiple conditions might have an unusually long life expectancy.


Note added: Some clarifying remarks from the comments.
1. We used the output of the UKB variant calling pipeline for the 50k exomes they released -- it is essentially the output data that researchers have available from these exomes. This is discussed in great detail in some of the references as there were some technical issues with the pipeline. SNPs that are not called by this process are (presumably) not determined from the exome reads. Exome sequencing only probes a small fraction of the whole genome, after all.

In any case, we independently analyzed the locations of the SNPs and plenty are outside of coding regions, etc. Referring to the exome process specifically is just to give another "operational" definition of what is in coding vs non-coding regions since the boundaries of these regions are a bit ill-defined in the literature.

2. Plenty of people believe in strong pleiotropy, and are likely surprised by this result. High dimensionality alone is enough to make low pleiotropy plausible, but it *might* have been the case that some special genomic regions play an important role across many diseases. Lots of people with "strong biomedical intuition" told me this would be the case, but apparently not...

There is no way to know until you compare detailed genetic architectures on a disease by disease basis. We are the first to do that.

We don't claim that genetic correlations are close to zero. We just characterize the correlation/overlap between known predictor SNPs for the various diseases. (There is still plenty of heritability not yet discovered for each of the diseases -- need more training data. The predictors will improve a lot in time but these are the most significant SNPs -- i.e. the easiest to discover.)

From our results one can at least put a lower bound on the amount of "risk reduction" (or longevity gain!) that is independently and simultaneously available across various diseases (e.g., if one could make edits freely). It's a lot.

Thursday, February 13, 2020

Elizabeth Kolbert on Climate Change: Impacts and Mitigation Technologies



Steve and Corey talk to Elizabeth Kolbert, author of The Sixth Extinction, about the current state of the climate debate. All three are pessimistic about the possibility that emissions will be substantively reduced in the near term, and they discuss technologies for removing carbon from the atmosphere. They explore uncertainty in the models regarding temperature rise and precipitation, and contemplate a billion people on the move in response to climate change and population increase. They ask: what is more of a threat to humanity in the coming century, runaway AI or runaway climate change?

Transcript

Elizabeth Kolbert (The New Yorker)

Field Notes from a Catastrophe: Man, Nature, and Climate Change

The Sixth Extinction: An Unnatural History

Jobs and AI

Carbon Capture


Related:

Epistemic Caution and Climate Change

Certainties and Uncertainties in our Energy and Climate Futures: Steve Koonin


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Wednesday, February 12, 2020

The Double Helix: "to understand what life is, we must know how genes act"


I'll be visiting Cold Spring Harbor Laboratory next week to give a talk. I decided to reread Watson's The Double Helix in advance of my trip. I think I originally read it while an undergraduate, and hadn't really looked at it since.

There are striking (at least to me) aspects of even the first few pages of the book. When I went to college, physics students were already told that biology -- specifically, molecular biology, was the future and that physics was already such a mature subject that there was little left to discover. In retrospect this advice was not without merit, but my personal opinion, having spent an entire career as a physicist, and having also done some work in biology (albeit in a very computational area), is that there is nothing more valuable than the training one receives in theoretical physics. To deny that this training has turned out to be useful in a variety of other fields, including biology, computer science, engineering, finance, data science, ... is simply to deny reality.

For those interested in the history of science, the deep workings of academia, and intellectual history, I recommend investigating the origins of molecular biology. One has to admit that molecular methods revolutionized biology, and continue to do so. Yet, many molecular biology departments were established (originally with names like BioPhysics!) against the wishes of "real'' biologists of the era. It is telling that even today, at many universities, the department of molecular biology (or equivalent) is a separate department from the one with more classical roots (to use Bialek's terminology below). This, despite the fact that molecular techniques are now universally applied and central to most advances in biology.

A new field was created by invaders from outside. 
It grew and advanced enormously. 
It eventually ate its predecessor...

Watson on Crick (The Double Helix):
Already he is much talked about, usually with reverence, and someday he may be considered in the category of Rutherford or Bohr. But this was not true when, in the fall of 1951, I came to the Cavendish Laboratory of Cambridge University to join a small group of physicists and chemists working on the three-dimensional structures of proteins. At that time he was thirty-five, yet almost totally unknown. Although some of his closest colleagues realized the value of his quick, penetrating mind and frequently sought his advice, he was often not appreciated...

For almost forty years Bragg, a Nobel Prize winner and one of the founders of crystallography, had been watching X-ray diffraction methods solve structures of ever-increasing difficulty. The more complex the molecule, the happier Bragg became when a new method allowed its elucidation.

... Somewhere between Bragg the theorist and Perutz the experimentalist was Francis, who occasionally did experiments but more often was immersed in the theories for solving protein structures.

... Before my arrival in Cambridge, Francis only occasionally thought about deoxyribonucleic acid (DNA) and its role in heredity. This was not because he thought it uninteresting. Quite the contrary. A major factor in his leaving physics and developing an interest in biology had been the reading in 1946 of What Is Life? by the noted theoretical physicist Erwin Schrodinger. This book very elegantly propounded the belief that genes were the key components of living cells and that, to understand what life is, we must know how genes act.
From the 2013 post In search of principles: when biology met physics:
This is an excerpt from the introduction of Bill Bialek's book on biophysics. Bialek was a professor at Berkeley when I was a graduate student, but has since moved to Princeton. See also For the historians and the ladiesAs flies to wanton boys are we to the gods and Prometheus in the basement.
... In one view of history, there is a direct path from Bohr, Delbruck and Schrodinger to the emergence of molecular biology. Certainly Delbruck did play a central role, not least because of his insistence that the community should focus (as the physics tradition teaches us) on the simplest examples of crucial biological phenomena, reproduction and the transmission of genetic information. The goal of molecular biology to reduce these phenomena to interactions among a countable set of molecules surely echoed the physicists’ search for the fundamental constituents of matter, and perhaps the greatest success of molecular biology is the discovery that many of these basic molecules of life are universal, shared across organisms separated by hundreds of millions of years of evolutionary history. Where classical biology emphasized the complexity and diversity of life, the first generation of molecular biologists emphasized the simplicity and universality of life’s basic mechanisms, and it is not hard to see this as an influence of the physicists who came into the field at its start.



See also On Crick and Watson:
Crick, 35, had already had a career in physics interrupted by the war and despaired of making his great contribution to science. Watson was a callow 23, fresh from Indiana.
[Chargaff]: It was clear to me that I was faced with a novelty: enormous ambition and aggressiveness, ... Thinking of the many sweaty years of making preparations of nucleic acids and of the innumerable hours spent on analyzing them, I could not help being baffled. I am sure that, had I had more contact with, for instance, theoretical physicists, my astonishment would have been less great. In any event, there they were, speculating, pondering, angling for information. ...
Watson: ... to understand what life is, we must know how genes act.

This program is still being carried out!
Title: Genomic Prediction of Complex Traits and Disease Risks via AI/ML and Large Genomic Datasets

Abstract: The talk is divided into two parts. The first gives an overview of the rapidly advancing area of genomic prediction of disease risks using polygenic scores. We can now identify risk outliers (e.g., with 5 or 10 times normal risk) for about 20 common disease conditions, ranging from diabetes to heart diseases to breast cancer, using inexpensive SNP genotypes (i.e., as offered by 23andMe). We can also predict some complex quantitative traits (e.g., adult height with accuracy of few cm, using ~20k SNPs). I discuss application of these results in precision medicine as well as embryo selection in IVF, and give some details about genetic architectures. The second part covers the AI/ML used to build these predictors, with an emphasis on "sparse learning" and phase transitions in high dimensional statistics.
I will also present some new material (not yet published) on pleiotropy and also on within-family (sibling) validation of genomic predictors.

Thursday, February 06, 2020

Meghan Daum on the New Culture Wars - Manifold Podcast #32



Corey and Steve talk to Meghan Daum about her new book The Problem With Everything: My Journey Through The New Culture Wars. Meghan describes how she became aware of the "Red Pill" through what she calls "free speech YouTube" videos. The three ask whether their feeling of alienation from Gen-Z wokeness is just a sign of getting old or reflects principles of free speech and open debate. Megan argues that Gen-Z's focus on fairness leads to difficult compromises. They discuss social interactions in the pre-internet, early-internet, and woke-internet eras.

Transcript

Meghan Daum (Author Website)

Meghan Daum on Medium

The Problem with Everything: My Journey Through the New Culture Wars


man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Tuesday, February 04, 2020

Report of the University of California Academic Council Standardized Testing Task Force

The figures below are from the recently completed Report of the University of California Academic Council Standardized Testing Task Force. Note the large sample sizes.

Some remarks:

1. SAT and High School GPA (HSGPA) are both useful (and somewhat independent) predictors of college success. In terms of variance accounted for, we have the inequality:

SAT + HSGPA  >  SAT  >  HSGPA

There are some small deviations from this pattern, but it seems to hold overall. I believe that GPA has a relatively larger loading on conscientiousness (work ethic) than cognitive ability, with SAT the other way around. By combining the two we get more information than from either alone.

2. SAT and HSGPA are stronger predictors than family income or race. Within each of the family income or ethnicity categories there is substantial variation in SAT and HSGPA, with corresponding differences in student success. See bottom figure and combined model R^2 in second figure below; R^2 varies very little across family income and ethnic categories.







There is not much new here. In graduate admissions the undergraduate GPA and the GRE general + subject tests play a role similar to HSGPA and SAT. See GRE and SAT Validity.

See Correlation and Variance to understand better what the R^2 numbers above mean. R^2 ~ 0.26 means the correlation between predictor and outcome variable (e.g., freshman GPA) is R ~ 0.5 or so.

Test Preparation and SAT scores: "...combined effect of coaching on the SAT I is between 21 and 34 points. Similarly, extensive meta-analyses conducted by Betsy Jane Becker in 1990 and by Nan Laird in 1983 found that the typical effect of commercial preparatory courses on the SAT was in the range of 9-25 points on the verbal section, and 15-25 points on the math section."