Physicist, Startup Founder, Blogger, Dad

Saturday, September 22, 2018

The French Way: Alain Connes interview

I came across this interview with Fields Medalist Alain Connes (excerpt below) via an essay by Dominic Cummings (see his blog here).

Dom's essay is also highly recommended. He has spent considerable effort to understand the history of highly effective scientific / research organizations. There is a good chance that his insights will someday be put to use in service of the UK. Dom helped create a UK variant of Kolmogorov's School for Physics and Mathematics.

On the referendum and on Expertise: the ARPA/PARC ‘Dream Machine’, science funding, high performance, and UK national strategy

Topics discussed by Connes: CNRS as a model for nurturing talent, materialism and hedonic treadmill as the enemy to intellectual development, string theory (pro and con!), US, French, and Soviet systems for science / mathematics, his entry into Ecole Normale and the '68 Paris convulsions.

France and Ecole Normale produce great mathematicians far in excess of their population size.
Connes: I believe that the most successful systems so far were these big institutes in the Soviet union, like the Landau institute, the Steklov institute, etc. Money did not play any role there, the job was just to talk about science. It is a dream to gather many young people in an institute and make sure that their basic activity is to talk about science without getting corrupted by thinking about buying a car, getting more money, having a plan for career etc. ... Of course in the former Soviet Union there were no such things as cars to buy etc. so the problem did not arise. In fact CNRS comes quite close to that dream too, provided one avoids all interference from our society which nowadays unfortunately tends to become more and more money oriented.

Q: You were criticizing the US way of doing research and approach to science but they have been very successful too, right? You have to work hard to get tenure, and research grants. Their system is very unified in the sense they have very few institutes like Institute for Advanced Studies but otherwise the system is modeled after universities. So you become first an assistant professor and so on. You are always worried about your raise but in spite of all these hazards the system is working.

Connes: I don’t really agree. The system does not function as a closed system. The US are successful mostly because they import very bright scientists from abroad. For instance they have imported all of the Russian mathematicians at some point.

Q: But the system is big enough to accommodate all these people this is also a good point.

Connes: If the Soviet Union had not collapsed there would still be a great school of mathematics there with no pressure for money, no grants and they would be more successful than the US. In some sense once they migrated in the US they survived and did very well but I believed they would have bloomed better if not transplanted. By doing well they give the appearance that the US system is very successful but it is not on its own by any means. The constant pressure for producing reduces the “time unit” of most young people there. Beginners have little choice but to find an adviser that is sociologically well implanted (so that at a later stage he or she will be able to write the relevant recommendation letters and get a position for the student) and then write a technical thesis showing that they have good muscles, and all this in a limited amount of time which prevents them from learning stuff that requires several years of hard work. We badly need good technicians, of course, but it is only a fraction of what generates progress in research. It reminds me of an anecdote about Andre Weil who at some point had some problems with elliptic operators so he invited a great expert in the field and he gave him the problem. The expert sat at the kitchen table and solved the problem after several hours. To thank him, Andre Weil said “when I have a problem with electricity I call an electrician, when I have a problem with ellipticity I use an elliptician”.

From my point of view the actual system in the US really discourages people who are truly original thinkers, which often goes with a slow maturation at the technical level. Also the way the young people get their position on the market creates “feudalities” namely a few fields well implanted in key universities which reproduce themselves leaving no room for new fields.


Q: So you were in Paris [ Ecole Normale ] in the best place and in the best time.

Connes: Yes it was a good time. I think it was ideal that we were a small group of people and our only motivation was pure thought and no talking about careers. We couldn’t care the less and our main occupation was just discussing mathematics and challenging each other with problems. I don’t mean ”puzzles” but problems which required a lot of thought, time or speed was not a factor, we just had all the time we needed. If you could give that to gifted young people it would be perfect.
See also Defining Merit:
... As a parting shot, Wilson could not resist accusing Ford of anti-intellectualism; citing Ford's desire to change Harvard's image, Wilson asked bluntly: "What's wrong with Harvard being regarded as an egghead college? Isn't it right that a country the size of the United States should be able to afford one university in which intellectual achievement is the most important consideration?"

E. Bright Wilson was Harvard professor of chemistry and member of the National Academy of Sciences, later a recipient of the National Medal of Science. The last quote from Wilson could easily have come from anyone who went to Caltech! Indeed, both E. Bright Wilson and his son, Nobel Laureate Ken Wilson (theoretical physics), earned their doctorates at Caltech (the father under Linus Pauling, the son under Murray Gell-Mann).
Where Nobel winners get their start (Nature):
Top Nobel-producing undergraduate institutions

Rank School                Country               Nobelists per capita (UG alumni)
1 École Normale Supérieure France       0.00135
2 Caltech                               US             0.00067
3 Harvard University            US             0.00032
4 Swarthmore College          US             0.00027
5 Cambridge University       UK             0.00025
6 École Polytechnique          France       0.00025
7 MIT                                   US              0.00025
8 Columbia University         US              0.00021
9 Amherst College               US              0.00019
10 University of Chicago     US              0.00017

Thursday, September 20, 2018

Social Credit in China

I can't vouch for the accuracy of this documentary, but I suspect the opinions of the people interviewed -- white collar mom with high social credit core, and blacklisted investigative journalist -- are representative. Probably too much emphasis on cameras and face recognition, when in fact the smartphone each person is carrying generates as much or more data about their activities. See also PanOpticon in my Pocket.

Coming soon to the US?

Black Mirror:

Sunday, September 16, 2018

"The Mouthpiece of the Party of Davos": Bannon interview with Economist Editor in Chief

Steve Bannon, former White House chief strategist, interviewed by Zanny Minton Beddoes, The Economist’s Editor-in-chief (Open Future festival in New York on September 15th 2018). In contrast, New Yorker editor David Remnick surrendered to protests and disinvited Bannon from The New Yorker Festival two weeks ago.

For almost two decades I subscribed to The Economist and The New Yorker. But these days I read them only sporadically.

Whether you like or hate Steve Bannon, this interview is worth watching. Beddoes and questioners from the audience attack Bannon vigorously, but mostly allow him time to answer in full. Opening tactic is, no surprise, to insinuate racism, which Bannon explicitly rejects for the millionth time... If your source of information about Bannon is primarily the mainstream media, you might be surprised at what comes from the horse's mouth.

Topics covered: populism, nationalism, economic war with China, immigration, class struggle, tax and tariff policy, and Duty, Honor, Country.

Bannon @15:30 (talking over interruption):
Bannon: ... you keep getting bailed out by the Deplorables -- in World War One and World War Two, in the Cold War and whatever else is in the future it is working men and women that have bailed you out ...

Editor: We have to go on to something else...
Bannon @23:15, as the racism attack morphs into sexism:
... I'm a former naval officer that served in the South China Sea in the Pacific. My daughter is a graduate of the United States Military Academy at West Point and served with the 101st airborne in Iraq. She's an army captain today after serving in Eastern Europe, probably going to be deployed back to Afghanistan. She's on the faculty at West Point. I know how to help raise an empowered woman ...
General Douglas MacArthur, 1962 speech at West Point:
Duty, Honor, Country: Those three hallowed words reverently dictate what you ought to be, what you can be, what you will be. They are your rallying points: to build courage when courage seems to fail; to regain faith when there seems to be little cause for faith; to create hope when hope becomes forlorn.

Unhappily, I possess neither that eloquence of diction, that poetry of imagination, nor that brilliance of metaphor to tell you all that they mean.

The unbelievers will say they are but words, but a slogan, but a flamboyant phrase. Every pedant, every demagogue, every cynic, every hypocrite, every troublemaker, and I am sorry to say, some others of an entirely different character, will try to downgrade them even to the extent of mockery and ridicule.

Wednesday, September 12, 2018

Jordan Peterson: Identity Politics, IQ, Harvard and Asian admissions

First ~9min: Trump, the US Left and Right, Identity Politics

@10min: IQ

@24min: Harvard and Asian admissions. "The Asians are the wildcard..."

@37min: Nazism, Communism; UK Leftist: "I don't love Obama. I'm literally a communist, you idiot."

Coincidentally (or perhaps not) I know the room they are sitting in very well. I'll be there later today ;-)

Saturday, September 08, 2018

Illumina International Summit on Population Genomics 2018

Illumina International Summit on Population Genomics 2018

The aim of this meeting is to bring together individuals, initiatives, and projects that are shaping the future of genomics in healthcare. This meeting provides a forum for international networking, shared learning and collaboration to address the challenges in realising the potential of genomics to improve human health. The success and accumulated experience of local programmes provide potentially great synergies around the globe. We hope that you will be able to join us and offer your insights and passion for a future where genomics sits at the innovation centre of global healthcare.
I'm in the UK again for this conference and some secret meetings in London. See also London Calling.

Friday, September 07, 2018

Whisky and Weed with Joe Rogan and Elon Musk

Seems like Elon might have been high before the interview even started 8-) Early discussion focused on AI, Neuralink, Singularity risk, etc. Simulation @43min.

See also:

Don’t Worry, Smart Machines Will Take Us With Them: Why human intelligence and AI will co-evolve (Nautilus)

Living in a Simulation (2007)

Let R = the ratio of number of artificially intelligent virtual beings to the number of "biological" beings (humans). The virtual beings are likely to occupy the increasingly complex virtual worlds created in computer games, like Grand Theft Auto or World of Warcraft (WOW will earn revenues of a billion dollars this year and has millions of players). In the figure below I have plotted the likely behavior of R with time. Currently R is zero, but it seems plausible that it will eventually soar to infinity. (See previous posts on the Singularity.)

If R goes to infinity, we are overwhelmingly likely to be living in a simulation...

... Think of the ratio of orcs, goblins, pimps, superheroes and other intelligent game characters to actual player characters in any MMORPG. In an advanced version, the game characters would themselves be sentient, for that extra dose of realism! Are you a game character, or a player character? :-)

Wednesday, September 05, 2018

O Brave New World

About one in ten babies born in Denmark now is an IVF baby (8% in 2015). In many countries the fraction is roughly one in twenty. A million embryos per year (worldwide) go through genetic screening. Most of the screening is for aneuploidy (chromosomal normality; Downs Syndrome), but the same biopsy sample can be used for more sophisticated genotyping. Globally about 2 million cycles of IVF are performed each year. These quantities are experiencing ~7% annual growth rates.

The future is here
Genomic Prediction: A Hypothetical (Embryo Selection)
Genomic Prediction: A Hypothetical (Embryo Selection), Part 2

The Tempest, William Shakespeare
MIRANDA: O, wonder!
How many goodly creatures are there here!
How beauteous mankind is!
O brave new world,
That has such people in it!

PROSPERO: 'Tis new to thee. 

Tuesday, September 04, 2018

More Khruangbin!

See post from 2016: Khruangbin
... Khruangbin plays a spellbinding twist on surf rock and soul that would be considered trite if it weren’t so appetizing. The band’s origins are delightfully nerdy: Speer bonded with the bassist Laura Lee and the drummer Donald Johnson over lo-fi digital rips of cassette tapes featuring obscure Thai funk bands from the sixties and seventies, which they’d downloaded from the cult blog Monrakplengthai. The cassette tapes catalogue stray releases from Thai musicians who, influenced by imported rock records from bands like Santana and the Shadows, blended the misty, coiling guitars of foundational rock and roll with the melodic sensibilities of their own traditional folk tunes. Many of the songs were used in Bollywood films, reaching wide swaths of Southeast Asian audiences. “Shadow music,” as the sound came to be called, is exhilarating in part for its traceable roots.

Monday, September 03, 2018

PanOpticon in my Pocket: 0.35GB/month of surveillance, no charge!

Your location is monitored roughly every 10 minutes, if not more often, thanks to your phone. There are multiple methods: GPS or wifi connections or cell-tower pings, or even Bluetooth. This data is stored forever and is available to certain people for analysis. Technically the data is anonymous, but it is easy to connect your geolocation data to your real world identity -- the data shows where you sleep at night (home address) and work during the day. It can be cross-referenced with cookies placed on your browser by ad networks, so your online activities (purchases, web browsing, social media) can be linked to your spatial-temporal movements.

Some quantities which can be easily calculated using this data: How many people visited a specific Toyota dealership last month? How many times did someone test drive a car? Who were those people who test drove a car? How many people stopped / started a typical 9-5 job commute pattern? (BLS only dreams of knowing this number.) What was the occupancy of a specific hotel or rental property last month? How many people were on the 1:30 PM flight from LAX to Laguardia last Friday? Who were they? ...

Of course, absolute numbers may be noisy, but diffs from month to month or year to year, with reasonable normalization / averaging, can yield insights at the micro, macro, and individual firm level.

If your quant team is not looking at this data, it should be ;-)

Google Data Collection
Professor Douglas C. Schmidt, Vanderbilt University
August 15, 2018

... Both Android and Chrome send data to Google even in the absence of any user interaction. Our experiments show that a dormant, stationary Android phone (with Chrome active in the background) communicated location information to Google 340 times during a 24-hour period, or at an average of 14 data communications per hour. In fact, location information constituted 35% of all the data samples sent to Google. In contrast, a similar experiment showed that on an iOS Apple device with Safari (where neither Android nor Chrome were used), Google could not collect any appreciable data (location or otherwise) in the absence of a user interaction with the device.

e. After a user starts interacting with an Android phone (e.g. moves around, visits webpages, uses apps), passive communications to Google server domains increase significantly, even in cases where the user did not use any prominent Google applications (i.e. no Google Search, no YouTube, no Gmail, and no Google Maps). This increase is driven largely by data activity from Google’s publisher and advertiser products (e.g. Google Analytics, DoubleClick, AdWords)11. Such data constituted 46% of all requests to Google servers from the Android phone. Google collected location at a 1.4x higher rate compared to the stationary phone experiment with no user interaction. Magnitude wise, Google’s servers communicated 11.6 MB of data per day (or 0.35 GB/month) with the Android device. This experiment suggests that even if a user does not interact with any key Google applications, Google is still able to collect considerable information through its advertiser and publisher products.

f. While using an iOS device, if a user decides to forgo the use of any Google product (i.e. no Android, no Chrome, no Google applications), and visits only non-Google webpages, the number of times data is communicated to Google servers still remains surprisingly high. This communication is driven purely by advertiser/publisher services. The number of times such Google services are called from an iOS device is similar to an Android device. In this experiment, the total magnitude of data communicated to Google servers from an iOS device is found to be approximately half of that from the Android device.

g. Advertising identifiers (which are purportedly “user anonymous” and collect activity data on apps and 3rd-party webpage visits) can get connected with a user’s Google identity. This happens via passing of device-level identification information to Google servers by an Android device. Likewise, the DoubleClick cookie ID (which tracks a user’s activity on the 3rd-party webpages) is another purportedly “user anonymous” identifier that Google can connect to a user’s Google Account if a user accesses a Google application in the same browser in which a 3rd-party webpage was previously accessed. Overall, our findings indicate that Google has the ability to connect the anonymous data collected through passive means with the personal information of the user.

Tuesday, August 28, 2018

Scientists of Stature

The link below is to the published version of the paper we posted on biorxiv in late 2017 (see blog discussion). Our results have since been replicated by several groups in academia and in Silicon Valley.

Biorxiv article metrics: abstract views 31k, paper downloads 6k. Not bad! Perhaps that means the community understands now that genomic prediction of complex traits is a reality, given enough data.

Had we taken a poll on the eve of releasing our biorxiv article, I suspect 90+ percent of genomics researchers would have said that ~1 inch accuracy in predicted human height from genotype alone was impossible.

Since our article appeared, interesting results for complex phenotypes such as educational attainment, heart disease, diabetes, and other disease risks have been obtained.
Accurate Genomic Prediction Of Human Height

Louis Lello, Steven G. Avery, Laurent Tellier, Ana I. Vazquez, Gustavo de los Campos and Stephen D. H. Hsu

GENETICS Early online August 27, 2018; https://doi.org/10.1534/genetics.118.301267

We construct genomic predictors for heritable but extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). The constructed predictors explain, respectively, ∼40, 20, and 9 percent of total variance for the three traits, in data not used for training. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The proportion of variance explained for height is comparable to the estimated common SNP heritability from Genome-Wide Complex Trait Analysis (GCTA), and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for SNPs. Thus, our results close the gap between prediction R-squared and common SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common variants. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier Genome-Wide Association Studies (GWAS) for out-of-sample validation of our results.
The published version of the paper contains several new analyses in response to reviewer comments.

We added detailed comparisons between the top SNPs activated in our predictor and earlier GIANT GWAS hits. We analyze the correlation structure of L1-activated SNPs -- the algorithm (as expected) automatically selects variants which are mostly decorrelated (statistically independent) from each other.

We compare our L1 method to simpler algorithms, such as windowing: choose a genomic window size (e.g., 200k bp) and use only the SNP in each window which accounts for the most variance. This does not work as well as L1 optimization, but can produce a respectable predictor.

We investigate the correlation structure of height-associated SNPs: to what extent can the best linear combination of GIANT GWAS-significant SNPs predict the state of one of the predictor SNPs? This raises the interesting question: how much total information (entropy) is in the human genome?

Friday, August 24, 2018

Death from the Sky: Drone Assassination

This is a ~$1000 drone, max velocity ~70kph (~45mph), range ~30min flying time, controller range ~5km. It's only 1 kilo -- so payload is limited. It is optimized for photography, not for speed or range or payload. But it gives you an idea of what is possible at the same cost as, say, a couple of cheap AR15s... A real hobbyist could construct something cheaper, faster, with bigger payload. But this you can buy with one click ready to go.

It's never been easier for a bad guy to deliver an explosive charge (e.g., fraction of a kilo) to a target from a mile away. Operating a drone like this takes almost no training.

Defeating two of them coming from different directions, staggered by a few seconds, would be extremely hard even for an active security detail. Follow the target in their car and detonate the drone near the gas tank when the car stops at an intersection. Or have the drone waiting near the intersection if you know the route in advance.

If your target is commercial aviation, hit a 747 near its fuel tank as it waits to take off. A sitting duck, and no fooling around with military gear like MANPADs -- remember, you can be a mile or more away from the airport, sitting on your hotel room balcony, or in your car ready to hit the freeway.

Will this ever happen? Thank goodness terrorists tend to be incompetent... But 9/11 was a good example of what can happen when they are not.

See also Assassination by Drone.

Tuesday, August 21, 2018

MSU New Faculty Welcome 2018

This is my welcome message to new MSU faculty and staff, presented at the 2018 New Faculty and Staff Orientation lunch.
Good afternoon and Welcome!

We are so pleased that you are here at Michigan State University. You have joined a leading research university, at a very exciting time.

I usually don’t give lengthy remarks, but since I’m not standing between you and lunch, and because so many exciting things are happening on campus, I could not resist giving something of an overview today.

With increased funding, new infrastructure, and an aggressive hiring initiative, we are positioning MSU research, and the university, for continued success. This success is built on a rich research history spanning many decades and disciplines.

In an MSU lab in 1965, Barnett Rosenberg and his team discovered that cisplatin prevents the DNA in cancer cells from replicating. Cisplatin is now a widely used chemotherapy medication. His “ah-ha” moment led to further research, but not without difficulties. The team initially failed to replicate their first results. But they worked extremely hard to resolve the issue, and subsequently had the drug through trials and approved in record time. It’s this kind of Spartan tenacity and effectiveness that we should all emulate.

Two weeks ago we celebrated the 40 year anniversary of the FDA approval of cisplatin, a therapy that is still considered the gold standard to which most new cancer treatments are compared. This discovery not only continues to help those afflicted with cancer, but the resulting royalties also fuel new research and discovery in the form of internal grants and other investments from the MSU Foundation.

My sincere hope is that one of you someday discovers the next cisplatin, or makes scholarly advances of equal importance.

Our team at the MSU Innovation Center is ready to assist faculty and student entrepreneurs with the next “big idea”. They steward more than 150 discoveries annually into a pipeline of patents, products and startup businesses. In 2017, this productivity resulted in 75 license and option agreements with companies around the world, as well as $2.4M in royalties being distributed to our faculty and departments. Applied research helps to build a diversified economy and brings jobs to Michigan and beyond. It is an increasingly important part of university activity.

I’d like to give you a bit of context for the size and scope of the research enterprise here at MSU.

MSU research continues on an upward growth trajectory. For 2017, total research expenditures were about $700M. This is a number reported each year to NSF for their Higher Education Research and Development (HERD) report. Only 5 years ago our number was closer to $500M, so this represents significant growth.

Based on the HERD comparison data, MSU ranks 1st in the Big Ten and 2nd in the nation in combined Department of Energy and National Science Foundation research expenditures.

We expect to continue our leadership in DOE and NSF funding, in part due to the Facility for Rare Isotope Beams, but also due to our work with the Plant Research Laboratory, the Great Lakes Bioenergy Research Lab and other interdisciplinary and multi-institutional research projects.

Our strategic plan outlines a number of new initiatives that leverage our current strengths and/or build new capacity to expand our portfolio, increase our competitiveness, and ultimately solve many of tomorrow’s pressing problems.

As I mentioned, MSU is home to the Facility for Rare Isotope Beams. FRIB will be a scientific user facility for the Office of Nuclear Physics in the Office of Science of the U.S. Department of Energy.

FRIB will be operational in 2021 and will deliver the highest intensity beams of rare isotopes available anywhere in the world. Estimates of the total investment in this project are roughly $1 billion dollars--a huge milestone for MSU. Operated by MSU, FRIB will enable scientists to make discoveries about the properties of rare isotopes (which are unusual forms of the elements) in order to better understand the physics of nuclei, nuclear astrophysics, and the fundamental interactions of nature. It will also produce practical applications for society, including in medicine, homeland security, and industry.

Last weekend, FRIB held a public open house attracting some 3000 people. If you didn’t have a chance to visit, you will get a glimpse this Thursday at the new faculty research orientation. I hope to see many of you there.

But new infrastructure doesn’t stop with FRIB, and I’m sure you’ve noticed all the construction on campus.

Two years ago, we opened the new BioEngineering building, which houses the Institute for Quantitative Health Science and Engineering, colloquially referred to as “IQ”. This collaboration of the colleges of Engineering, Human Medicine and Natural Science will apply quantitative methods to biomedicine and life science in an interdisciplinary setting. IQ’s researchers will develop new medical tools and treatments that will advance biomedicine in creative ways. We hope it will fundamentally change the way healthcare is delivered.

We’re already far along in construction of another, larger building next to IQ that will house precision health researchers and several other new initiatives. This building, along with IQ and Radiology will create an entire area of campus dedicated to biomedical research.

Last year, we opened a new health research facility in Grand Rapids to complement our medical school there. Researchers in Grand Rapids, and our East Lansing biomedical neighborhood, will make discoveries in health science, and attract additional funding to expedite our growth trajectory. Our performance in NIH funding lags the stellar results I mentioned concerning NSF and DOE, but the investments listed above are meant to improve this situation. In addition, I should mention that for the first time MSU will have a research hospital on our campus, through a partnership with McLaren. MSU research integration with major health systems in Michigan has never been stronger, and we anticipate announcement of major collaborative efforts in the near future.

On August 31, we will break ground on a new STEM education building. New laboratory teaching and research spaces will support MSU’s increasing student enrollment in STEM fields. We look forward to the opportunities this new facility will create for both our students and faculty.

In June, construction began on a new music pavilion. This state-of-the-art facility will incorporate highly advanced acoustical engineering to create high-quality teaching, practice, rehearsal and research spaces that meet the needs of 21st century musicians. This addition further elevates our reputation in the arts, with a particular focus on student learning.

MSU will continue to invest in infrastructure improvements to support our faculty and students, increase our competitiveness, and to attract top recruits like yourselves to the university.

Another recent development is a new department called Computational Mathematics, Science, and Engineering or CMSE. This department was planned, authorized, and operational in only three years—quite a feat in academia. I often compare “startup time” (the fast pace at which things are accomplished in Silicon Valley) to “academic time” (i.e., nothing gets done, other than committee meetings, or a no-brainer project takes a decade to complete), but with CMSE this was a case of something on campus getting done in startup time. CMSE is one of very few such departments in the country -- it is focused on data science, machine learning, advanced computation and related applications, but is not a traditional CS department. It supports many of the new efforts on campus that require the analysis of large data sets and development of new tools and algorithms. Researchers in this department utilize datasets drawn from areas such as astrophysics, business analytics, mobile data, materials science, human and plant genomics, and many other areas. The department was conceived as fundamentally interdisciplinary -- bringing together experts in computation with subject matter experts in fields of science which are becoming increasingly reliant on data.

I can’t help mentioning a couple of big data examples related to my own research interests: we’ve created a compute resource with 500k human genomes from the UK Biobank, which is open to interested investigators on campus. All of the data is stored at our High Performance Computing Center or HPCC. Using this data, our collaboration demonstrated for the first time that machine learning applied to large genomic datasets could produce accurate predictors for complex human traits. We can now predict adult human height from genome alone, with accuracy of roughly 1 inch. The predictor uses ~20k genetic variants distributed throughout the genome. Predictors of complex disease risk, for conditions such as heart disease, diabetes, low blood platelet count, and breast cancer, have been developed and replicated in out-of-sample tests. See the NYTimes science section just a few days ago. This is only the beginning for genomics-informed Precision Medicine.

Over the summer, through a CEO friend in Silicon Valley, I obtained access for MSU researchers to mobile geolocation data covering the movements of over 30 million Americans. Yes, geolocation coordinates every 10 minutes or so for 30 million people, via their smartphones. I hope you all were aware of this when you clicked “I Accept” :-) If you can think of interesting research uses for this data, please contact Dirk Colbry in CMSE for more information.

The most important component of a university is not buildings, or even laboratory or compute or data infrastructure. The most important resource is people -- talented research faculty, postdocs, students, and support staff.

Some of you joining us today may have been hired under the Global Impact Initiative (GI2). Launched in 2014, the goal of GI2 is to hire 100 new faculty whose research has breakthrough potential to shape the future. Over the last four years, we’ve recruited new faculty with a focus on key areas of innovation, such as machine learning, precision medicine, computational genomics, autonomous vehicles, advanced materials, gene-editing, and advanced plant science. Nearly 80 positions have been filled, with candidates hired from Harvard, Stanford, Princeton, MIT, Johns Hopkins University, Lawrence Berkeley National Lab, Los Alamos National Lab, and many other top institutions. But we're not done yet. We look forward, with enthusiasm, to the next year of recruiting.

Working here, you will be surrounded by world-class faculty, including members of the National Academy; Guggenheim, Packard, and Sloan Fellows; a recipient of the Stockholm Water Prize; Pulitzer Prize winners; and many more.

In 2018 alone, faculty at MSU received a record 11 NSF CAREER awards across a number of disciplines including engineering, communication arts and sciences, physics and astronomy, plant science, and others. This speaks volumes about the caliber of our young faculty, and is one reason why I’m looking forward to seeing their progress.

As you begin your time here at MSU, we urge you to think big and act boldly. If you are a new faculty member, still near the beginning of your career, we want to support your growth in every way possible. If you are a senior faculty member, we want to push your research program to that next higher level of impact. And, we hope that you can provide valuable mentorship to younger scholars around you.

If there is a problem -- tell us about it! -- whether it has to do with grant submissions, or startup incubation, or child care, food options on campus, your functional or dysfunctional department. We’re here to fix things, and to provide the best possible environment for your teaching and research.

Only one in a thousand people in our society have the privilege to engage full time in discovery -- in curiosity driven research -- for the benefit of humankind. You are part of that lucky one in a thousand, and we are here to help you succeed.

The bar has been set very high, but with the resources and new opportunities here at MSU, your potential is limitless.

My very best wishes to you all :-)

Sunday, August 19, 2018

Genomic Prediction: A Hypothetical (Embryo Selection), Part 2

The figures below are from the recent paper Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations (Nature Genetics), discussed previously here.

As you can see, genomic prediction of risk allows to identify outliers for conditions like heart disease and diabetes. Individuals who are top 1% in polygenic risk score are many times (approaching an order of magnitude) more likely to exhibit the condition than the typical person.

In an earlier post, Genomic Prediction: A Hypothetical (Embryo Selection), I pointed out a similar situation with regard to the SSGAC predictor for Educational Attainment. Negative outliers on that polygenic score (e.g., bottom 1%) are much more likely to have difficulty in school. I then posed this hypothetical:
You are an IVF physician advising parents who have exactly 2 viable embryos, ready for implantation.

The parents want to implant only one embryo.

All genetic and morphological information about the embryos suggest that they are both viable, healthy, and free of elevated disease risk.

However, embryo A has polygenic score (as in figure above) in the lowest quintile (elevated risk of struggling in school) while embryo B has polygenic score in the highest quintile (less than average risk of struggling in school). We could sharpen the question by assuming, e.g., that embryo A has score in the bottom 1% while embryo B is in the top 1%.

You have no other statistical or medical information to differentiate between the two embryos.

What do you tell the parents? Do you inform them about the polygenic score difference between the embryos?
We can pose the analogous hypothetical for the risk scores displayed below. Should the parents be informed if, for instance, one of the embryos is in the top 1% risk for heart disease or Type 2 Diabetes? Is there a difference between the case of the EA predictor and disease risk predictors?

In the case of monogenic (Mendelian) genetic risk, e.g., Tay-Sachs, Cystic Fibrosis, BRCA, etc., deliberate genetic screening is increasingly common, even if penetrance is imperfect (i.e., the probability of the condition given the presence of the risk variant is less than 100%).

Note, the risk ratio between top 1% and bottom 1% individuals is potentially very large (see below), although more careful analysis is probably required to understand this better.

These hypotheticals will not be hypothetical for very much longer: the future is here.

(CAD = coronary artery disease.)

Tuesday, August 14, 2018

Genomic Prediction of disease risk using polygenic scores (Nature Genetics)

It seems to me we are just at the tipping point -- soon it will be widely understood that with large enough data sets we can predict complex traits and complex disease risk from genotype, capturing most of the estimated heritable variance. People will forget that many "experts" doubted this was possible -- the term missing heritability will gradually disappear.

In just a few years genotyping will start to become "standard of care" in many health systems. In 5 years there will be ~100M genotypes in storage (vs ~20M now), a large fraction available for scientific analysis.
Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations (Nature Genetics)

A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2,3,4,5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.
See also Genomic Prediction: A Hypothetical (Embryo Selection) and Accurate Genomic Prediction Of Human Height.

From the paper:
Using much larger studies and improved algorithms, we set out to revisit the question of whether a GPS can identify subgroups of the population with risk approaching or exceeding that of a mono- genic mutation. We studied five common diseases with major public health impact: CAD, atrial fibrillation, type 2 diabetes, inflamma- tory bowel disease, and breast cancer.

For each of the diseases, we created several candidate GPSs based on summary statistics and imputation from recent large GWASs in participants of primarily European ancestry (Table 1). Specifically, we derived 24 predictors based on a pruning and thresholding method, and 7 additional predictors using the recently described LDPred algorithm13 (Methods, Fig. 1 and Supplementary Tables 1–6). These scores were validated and tested within the UK Biobank, which has aggregated genotype data and extensive phenotypic information on 409,258 participants of British ancestry (average age: 57 years; 55% female)14,15.

We used an initial validation dataset of the 120,280 participants in the UK Biobank phase 1 genotype data release to select the GPSs with the best performance, defined as the maximum area under the receiver-operator curve (AUC). We then assessed the performance in an independent testing dataset comprised of the 288,978 partici- pants in the UK Biobank phase 2 genotype data release. For each disease, the discriminative capacity within the testing dataset was nearly identical to that observed in the validation dataset.

In the talk below @21:45 I discuss prospects for genomic prediction of disease risk.

Wednesday, August 08, 2018

Life and Fate, Before Sunset

This Hollywood oral history tells the story of Richard Linklater's "Before" Trilogy: Before Sunrise, Before Sunset, and Before Midnight. The films appeared 9 years apart, and tell the story of Jesse (Ethan Hawke) and Celine (Julie Delpy) in their 20s, 30s, and 40s. I find the second film to be the most interesting, really a masterpiece of filmmaking (I have a copy on the hard drive of the laptop I write this on :-). The events in Before Sunset take place in real time -- i.e., the story transpires over the run time of the movie, a single afternoon. Shooting it must have been extremely challenging for Delpy and Hawke, and for the crew.

The video above should start at 23:30, and explains how Linklater, Delpy, and Hawke came together to do the sequel. I think that event was, in some sense, the most contingent of those responsible for the trilogy. The first movie made very little money, and hence the idea to make a second, very different film -- about the complexity of life, the passage of time, lost chances -- was neither obvious nor inevitable.

The first movie is about a one night tryst between 20-something travelers, but the second movie takes place a decade later. The protagonists, while still young, have experienced more of life and the second film is richer and more complex, despite taking place over an even shorter period of time. I remember being excited to see it, not so much because of Before Sunrise (which I found entertaining, but not as special), but because of the intriguing premise of two lovers meeting again by chance after losing track of each other for so long.

Here's a scene from Before Sunset: a long take of walking and conversation in beautiful Paris, camera following Hawke and Delpy in a totally naturalistic way.

I hesitate to include this trailer because it's kind of cheesy, but if you're not familiar with the trilogy it explains the premise of the first two films.

The video below is a nice discussion of the trilogy. Just now I learned (thanks, AI!) that Before Sunrise is based on actual events in Linklater's life -- see here for the poignant story of the real life muse for these films.

Richard Linklater also directed Dazed and Confused -- one of the greatest high school movies ever made, and a beautiful evocation of adolescence in late-70s, early-80s America.

Saturday, August 04, 2018

Assassination by Drone

I have been waiting for this to happen:
Reuters: CARACAS - Drones loaded with explosives detonated close to a military event where Venezuelan President Nicolas Maduro was giving a speech on Saturday, but he and top government officials alongside him escaped unharmed from what Information Minister Jorge Rodriguez called an “attack” targeting the leftist leader. Seven National Guard soldiers were injured, Rodriguez added.
See this 2015 post on drone racing and ask yourself how you'd stop one of these drones from getting close to its target.

Countermeasures will be quite difficult, especially if drone operators use sophisticated frequency hopping control.

One doesn't even need pilot operators. The drones can be programmed to fly to a GPS coordinate using an evasive approach.

1. The exact coordinate can be marked by someone in the audience of a public appearance of the target.

2. It would be a formidable challenge even to stop some medium sized drones, each with a few kilo payload, from flying through the windows of the Oval Office (known GPS coordinate; known presence of targets at specific times).

This is still Science Fiction, for now:

Twenty years ago I told a PhD student that a terrorist -- willing to die and able to fly an airplane -- could probably take out the White House. After 9/11 he reminded me that I had identified this hole in the system well in advance. It's the same thing here with small and medium size drones. They are accessible to non-state actors with limited resources, and very difficult to defeat, even for state security.

Barista Bots

Still think low-skill immigration is a good idea?

If you accept the thesis that automation is a threat to low-skill employment, then you should be willing to reconsider the long term cost-benefit analysis of low-skill immigration.

Thursday, August 02, 2018

Arnold: The Will to Power

I don't know whether Arnold ever read Nietzsche, but he certainly developed the Will to Power early in life. I quite like the video above -- I even made my kids watch it :-)

When I was in high school I came across his book Arnold: The Education of a Bodybuilder, a combination autobiography and training manual published in 1977. I found a copy in the remainder section of the book store and bought it for a few dollars. The most interesting part of the book is the description of his early life in Austria and his introduction to weightlifting and bodybuilding. I highly recommend it to anyone interested in golden age bodybuilding, the early development of physical training, or the psychology of human drive and high achievement. Young Arnold displays a kind of unbridled and un-ironic egoism that can no longer be expressed without shame in today's feminized society.

Chapter 2: Before long, people began looking at me as a special person. Partly this was the result of my own changing attitude about myself. I was growing, getting bigger, gaining confidence. I was given consideration I had never received before; it was as though I were the son of a millionaire. I'd walk into a room at school and my classmates would offer me food or ask if they could help me with my homework. Even my teachers treated me differently. Especially after I started winning trophies in the weight-lifting contests I entered.

This strange new attitude toward me had an incredible effect on my ego. It supplied me with something I had been craving. I'm not sure why I had this need for special attention. Perhaps it was because I had an older brother who'd received more than his share of attention from our father. Whatever the reason, I had a strong desire to be noticed, to be praised. I basked in this new flood of attention. I turned even negative responses to my own satisfaction.

I'm convinced most of the people I knew didn't really understand what I was doing at all. They looked at me as a novelty, a freak. ...

"Why did you have to pick the least-favorite sport in Austria?" they always asked. It was true. We had only twenty or thirty bodybuilders in the entire country. I couldn't come up with an answer. I didn't know. It had been instinctive. I had just fallen in love with it. I loved the feeling of the gym, of working out, of having muscles all over.

Now, looking back, I can analyze it more clearly. My total involvement had a lot to do with the discipline, the individualism, and the utter integrity of bodybuilding. But at the time it was a mystery even to me. Bodybuilding did have its rewards, but they were relatively small. I wasn't competing yet, so my gratification had to come from other areas. In the summer at the lake I could surprise everyone by showing up with a different body. They'd say, "Jesus, Arnold, you grew again. When are you going to stop?"

"Never," I'd tell them. We'd all laugh. They thought it amusing. But I meant it.


The strangest thing was how my new body struck girls. There were a certain number of girls who were knocked out by it and a certain number who found it repulsive. There was absolutely no in-between. It seemed cut and dried. I'd hear their comments in the hallway at lunchtime, on the street, or at the lake. "I don't like it. He's weird—all those muscles give me the creeps." Or, "I love the way Arnold looks—so big and powerful. It's like sculpture. That's how a man should look."

These reactions gave me added motivation to continue building my body. I wanted to get bigger so I could really impress the girls who liked it and upset the others even more. Not that girls were my main reason for training. Far from it. But they added incentive and I figured as long as I was getting this attention from them I might as well use it. I had fun. I could tell if a girl was repelled by my size. And when I'd catch her looking at me in disbelief, I would casually raise my arm, flex my bicep, and watch her cringe. It was always good for a laugh. ...
Arnold, age 17 or 18:

Friday, July 27, 2018

Insight Podcast: James Lee interview on SSGAC EA3

Spencer Wells and Razib Khan interview James Lee (Professor of Psychology, University of Minnesota, BA Berkeley, PhD Harvard) about the recent SSGAC EA3 GWAS.

Comment: James mentions that EA3 may be approaching the GCTA h2 limit (~0.15? so limiting r ~ 0.4) already. But the limit for actual cognitive ability is much higher; with enough data I think we could get to r ~ 0.6 or even r ~ 0.7 eventually for common SNPs -- similar to height.

United Club, HK International Airport

James, me, Chris Chang. (About $1M worth of Illumina HiSeqs in crates behind us?)

Wednesday, July 25, 2018

Genomic Prediction: A Hypothetical (Embryo Selection)

The new SSGAC EA3 paper in Nature Genetics contains the following figure.

Add Health (National Longitudinal Study of Adolescent to Adult Health) and HRS (Health in Retirement Study) are two longitudinal cohorts under study by social scientists. Horizontal axis is polygenic score (computed from DNA alone). It appears that individuals with top quintile polygenic scores are about 5 times more likely to complete college than bottom quintile individuals.  (IIUC, HRS cohort grew up in an earlier era when college attendance rates were lower; Add Health participants are younger.)

Consider the following hypothetical:
You are an IVF physician advising parents who have exactly 2 viable embryos, ready for implantation. The parents want to implant only one embryo. 
All genetic and morphological information about the embryos suggest that they are both viable, healthy, and free of elevated disease risk.

However, embryo A has polygenic score (as in figure above) in the lowest quintile (elevated risk of struggling in school) while embryo B has polygenic score in the highest quintile (less than average risk of struggling in school). We could sharpen the question by assuming, e.g., that embryo A has score in the bottom 1% while embryo B is in the top 1%.

You have no other statistical or medical information to differentiate between the two embryos.

What do you tell the parents? Do you inform them about the polygenic score difference between the embryos?
Note, in the very near future this question will no longer be hypothetical...

See Nativity 2050 and The Future is Here: Genomic Prediction in MIT Technology Review.

Monday, July 23, 2018

SSGAC EA3: genomic prediction of educational attainment and related cognitive phenotypes

Years ago I predicted that:

1. Cognitive ability would turn out to be influenced by many thousands of genetic variants, each of small effect.

2. With large enough sample size we would detect these variants and eventually construct genomic predictors.

The Nature Genetics paper below from the SSGAC collaboration takes a significant step in that direction.

Although the study used over a million genotypes, the data had to be aggregated across many sub-cohorts using summary statistics only. This does not permit the L1-penalized optimization we used to build our height predictor.

For out of sample validation of the results below, see this PNAS paper, which (unusually) appeared before the paper on which it is based.

The lead author James Lee is on the left below. Chris Chang, author of Plink 2.0, is on the right. The photo was taken in 2010 at BGI -- they are standing in front of crates of Illumina sequencers.

Article | Published: 23 July 2018

Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals

James J. Lee, Robbee Wedow, […]David Cesarini
Nature Genetics (2018)

Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance in educational attainment and 7–10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.
A nice figure from the paper: Add Health (National Longitudinal Study of Adolescent to Adult Health) and HRS (Health in Retirement Study) are two longitudinal cohorts that have been genotyped; horizontal axis is polygenic score. It appears that individuals with top quintile polygenic scores are about 5 times more likely to complete college than bottom quintile individuals.

Here's a comment on the paper I provided to a journalist:
The EA3 predictor correlates about 0.35 with educational attainment, and slightly less well with measured cognitive ability. While this is far from perfect prediction, it does allow identification of individuals, using DNA alone, who are at unusual risk of being well below average in cognitive ability or struggling in school. Standardized tests, such as SAT, ACT, GRE, LSAT, etc., typically also correlate roughly 0.35 with educational outcomes like grade point average, degree completion, etc. In this sense, the genomic predictor is comparable to widely used tests and it will certainly improve as more data are analyzed. See figure.

Sunday, July 22, 2018


The 36th Annual International Symposium on Lattice Field Theory begins tomorrow, hosted by MSU. My opening remarks are below. No peeking if you are an attendee!
LATTICE 2018 Opening Remarks 7/23/2018

Good morning. I’d like to extend my warmest welcome to all of you on behalf of Michigan State University. We are very pleased and honored to be the hosts for The 36th Annual International Symposium on Lattice Field Theory.

It is my opinion that even within Physics, and even within Theoretical Physics, Lattice Field Theory is underappreciated. The idea that we can constructively realize quantum field theories in silico, that we can perform precision calculations in the deepest models of fundamental physics, is really incredible. It has taken many decades to get to this point: to master strongly coupled quantum fluctuations, spacetime trajectories of quantum fields like quarks and gluons, advanced algorithms and hardware designs, matching to effective field theories, and many other conceptually beautiful but ultimately concrete things.

Along with some recent AI advances like AlphaGo, the precise ab initio calculation of physical quantities in lattice QCD must be considered among the most impressive computations performed by the human species. If some Alien visitors were evaluating the accomplishments of our civilization, I would want them to take into account the work of people here today.

I first became aware of lattice gauge theory from John Preskill’s lecture notes for Physics 234, a year-long Caltech course on advanced topics in QCD. I never imagined, back in the 1980s, the successes that all of you have achieved today. The important message to young people is that one should not be dissuaded from attempting difficult projects.

At MSU we made the decision a few years ago to invest in lattice physics. We went from no lattice researchers, to one of the larger groups in the US. One of the drivers for this decision was the hope that lattice simulations would one day connect QCD to the experimental results coming from FRIB -- the MSU / DOE Facility for Rare Isotope Beams. Today we can compute, from first principles, the properties of light hadrons. In the coming decades, I believe we will compute real time scattering amplitudes and nuclear forces from QCD itself.

DOE and MSU are investing, all told, roughly a billion dollars in FRIB. While it is the Experimentalists who build and run the machine, and deserve the main credit, we as Theorists have the responsibility to ensure that the results of the experiment inform our deeper understanding of nuclear physics and QCD. Physicists are not stamp collectors -- we do not measure things just to measure them. We measure things which are important and have deep implications.

To reach the long awaited goal of connecting nuclear physics directly to QCD, we depend on the lattice community, on all of you. May the next 30 years see as much progress as the last.

Thank you very much.

Action photos!

London Calling

On my way home from StockholmICML I stopped in London to see my friend Dominic Cummings, give a talk at ASI Data Science, and have some Oligarch meetings. Sorry I can't share more details.

Here are some photos from the British Museum.

Bodhisattva: a person who is able to reach nirvana but delays doing so out of compassion in order to save suffering beings.
“Tenfold be your damnation," he said.. "There shall be no rebirth."

His hands came open then. A tall, nobly proportioned man lay upon the floor at his feet, his head resting upon his right shoulder.

His eye had finally closed.

Yama turned the corpse with the toe of his boot. "Build a pyre and burn this body," he said to the monks, not turning toward them. "Spare none of the rites. One of the highest has died this day.”

Lord of Light, Roger Zelazny.

Tuesday, July 17, 2018

ICML notes

It's never been a better time to work on AI/ML. Vast resources are being deployed in this direction, by corporations and governments alike. In addition to the marvelous practical applications in development, a theoretical understanding of Deep Learning may emerge in the next few years.

The notes below are to keep track of some interesting things I encountered at the meeting.

Some ML learning resources:

Depth First study of AlphaGo

I heard a more polished version of this talk by Elad at the Theory of Deep Learning workshop. He is trying to connect results in sparse learning (e.g., performance guarantees for L1 or threshold algos) to Deep Learning. (Video is from UCLA IPAM.)

It may turn out that the problems on which DL works well are precisely those in which the training data (and underlying generative processes) have a hierarchical structure which is sparse, level by level. Layered networks perform a kind of coarse graining (renormalization group flow): first layers filter by feature, subsequent layers by combinations of features, etc. But the whole thing can be understood as products of sparse filters, and the performance under training is described by sparse performance guarantees (ReLU = thresholded penalization?). Given the inherent locality of physics (atoms, molecules, cells, tissue; atoms, words, sentences, ...) it is not surprising that natural phenomena generate data with this kind of hierarchical structure.

Off-topic: At dinner with one of my former students and his colleague (both researchers at an AI lab in Germany), the subject of Finitism came up due to a throwaway remark about the Continuum Hypothesis.

Horizons of Truth
Chaitin on Physics and Mathematics

David Deutsch:
The reason why we find it possible to construct, say, electronic calculators, and indeed why we can perform mental arithmetic, cannot be found in mathematics or logic. The reason is that the laws of physics "happen" to permit the existence of physical models for the operations of arithmetic such as addition, subtraction and multiplication.
My perspective: We experience the physical world directly, so the highest confidence belief we have is in its reality. Mathematics is an invention of our brains, and cannot help but be inspired by the objects we find in the physical world. Our idealizations (such as "infinity") may or may not be well-founded. In fact, mathematics with infinity included may be very sick, as evidenced by Godel's results, or paradoxes in set theory. There is no reason that infinity is needed (as far as we know) to do physics. It is entirely possible that there are only a (large but) finite number of degrees of freedom in the physical universe.

Paul Cohen:
I will ascribe to Skolem a view, not explicitly stated by him, that there is a reality to mathematics, but axioms cannot describe it. Indeed one goes further and says that there is no reason to think that any axiom system can adequately describe it.
This "it" (mathematics) that Cohen describes may be the set of idealizations constructed by our brains extrapolating from physical reality. But there is no guarantee that these idealizations have a strong kind of internal consistency and indeed they cannot be adequately described by any axiom system.

Monday, July 09, 2018

Game Over: Genomic Prediction of Social Mobility

[ NOTE: The PNAS paper discussed below uses the SSGAC EA3 genomic predictor, trained on over a million genomes. The EA3 paper has now appeared in Nature Genetics. ]

The figure below shows SNP-based polygenic score and life outcome (socioeconomic index, on vertical axis) in four longitudinal cohorts, one from New Zealand (Dunedin) and three from the US. Each cohort (varying somewhat in size) has thousands of individuals, ~20k in total (all of European ancestry). The points displayed are averages over bins containing 10-50 individuals. For each cohort, the individuals have been grouped by childhood (family) social economic status. Social mobility can be predicted from polygenic score. Note that higher SES families tend to have higher polygenic scores on average -- which is what one might expect from a society that is at least somewhat meritocratic. The cohorts have not been used in training -- this is true out-of-sample validation. Furthermore, the four cohorts represent different geographic regions (even, different continents) and individuals born in different decades.

Everyone should stop for a moment and think carefully about the implications of the paragraph above and the figure below.

Caption from the PNAS paper.
Fig. 4. Education polygenic score associations with social attainment for Add Health Study, WLS, Dunedin Study, and HRS participants with low-, middle-, and high-socioeconomic status (SES) social origins. The figure plots polygenic score associations with socioeconomic attainment for Add Health Study (A), Dunedin Study (B), WLS (C), and HRS (D) participants who grew up in low-, middle-, and high-SES households. For the figure, low- middle-, and high-SES households were defined as the bottom quartile, middle 50%, and top quartile of the social origins score distributions for the Add Health Study, WLS, and HRS. For the Dunedin Study, low SES was defined as a childhood NZSEI of two or lower (20% of the sample), middle SES was defined as childhood NZSEI of three to four (63% of the sample), and high SES was defined as childhood NZSEI of five or six (17% of the sample). Attainment is graphed in terms of socioeconomic index scores for the Add Health Study, Dunedin Study, and WLS and in terms of household wealth in the HRS. Add Health Study and WLS socioeconomic index scores were calculated from Hauser and Warren (34) occupational income and occupational education scores. Dunedin Study socioeconomic index scores were calculated similarly, according to the Statistics New Zealand NZSEI (38). HRS household wealth was measured from structured interviews about assets. All measures were z-transformed to have mean = 0, SD = 1 for analysis. The individual graphs show binned scatterplots in which each plotted point reflects average x and y coordinates for a bin of 50 participants for the Add Health Study, WLS, and HRS and for a bin of 10 participants for the Dunedin Study. The red regression lines are plotted from the raw data. The box-and-whisker plots at the bottom of the graphs show the distribution of the education polygenic score for each childhood SES category. The blue diamond in the middle of the box shows the median; the box shows the interquartile range; and the whiskers show upper and lower bounds defined by the 25th percentile minus 1.5× the interquartile range and the 75th percentile plus 1.5× the interquartile range, respectively. The vertical line intersecting the x axis shows the cohort average polygenic score. The figure illustrates three findings observed consistently across cohorts: (i) participants who grew up in higher-SES households tended to have higher socioeconomic attainment independent of their genetics compared with peers who grew up in lower-SES households; (ii) participants’ polygenic scores were correlated with their social origins such that those who grew up in higher-SES households tended to have higher polygenic scores compared with peers who grew up in lower-SES households; (iii) participants with higher polygenic scores tended to achieve higher levels of attainment across strata of social origins, including those born into low-SES families.

The paper:
Genetic analysis of social-class mobility in five longitudinal studies, Belsky et al.

PNAS July 9, 2018. 201801238; published ahead of print July 9, 2018. https://doi.org/10.1073/pnas.1801238115

A summary genetic measure, called a “polygenic score,” derived from a genome-wide association study (GWAS) of education can modestly predict a person’s educational and economic success. This prediction could signal a biological mechanism: Education-linked genetics could encode characteristics that help people get ahead in life. Alternatively, prediction could reflect social history: People from well-off families might stay well-off for social reasons, and these families might also look alike genetically. A key test to distinguish biological mechanism from social history is if people with higher education polygenic scores tend to climb the social ladder beyond their parents’ position. Upward mobility would indicate education-linked genetics encodes characteristics that foster success. We tested if education-linked polygenic scores predicted social mobility in >20,000 individuals in five longitudinal studies in the United States, Britain, and New Zealand. Participants with higher polygenic scores achieved more education and career success and accumulated more wealth. However, they also tended to come from better-off families. In the key test, participants with higher polygenic scores tended to be upwardly mobile compared with their parents. Moreover, in sibling-difference analysis, the sibling with the higher polygenic score was more upwardly mobile. Thus, education GWAS discoveries are not mere correlates of privilege; they influence social mobility within a life. Additional analyses revealed that a mother’s polygenic score predicted her child’s attainment over and above the child’s own polygenic score, suggesting parents’ genetics can also affect their children’s attainment through environmental pathways. Education GWAS discoveries affect socioeconomic attainment through influence on individuals’ family-of-origin environments and their social mobility.

Note Added from comments: Plots would look much noisier if not for averaging many individuals into single point. Keep in mind that socioeconomic success depends on a lot more than just cognitive ability, or even cognitive ability + conscientiousness.

But, underlying predictor correlates ~0.35 with actual educational attainment, IIRC. That is, the polygenic score predicts EA about as well as standardized tests predict success in schooling.

This means you can at least use it to identify outliers: just as a very high/low test score (SAT, ACT, GRE) does not *guarantee* success/failure in school, nevertheless the signal is useful for selection = admissions.

Blog Archive