Information Processing: 04/2019

Tuesday, April 30, 2019

Dialogs

In a high corner office, overlooking Cambridge and the Harvard campus.

How big a role is deep learning playing right now in building genomic predictors?

So far, not a big one. Other ML methods perform roughly on par with DL. The additive component of variance is largest, and we have compressed sensing theorems showing near-optimal performance for capturing it. There are nonlinear effects, and eventually DL will likely be useful for learning multi-loci features. But at the moment everything is limited by statistical power, and nonlinear features are even harder to detect than additive ones. ...

The bottom line is that with enough statistical power predictors will capture the expected heritability for most traits. Are people in your field ready for this?

Some are, but for others it will be very difficult.

Conference on AI and Genomics / Precision Medicine (Boston).

I enjoyed your talk. I work for [leading AgBio company], but my PhD is in Applied Math. We've been computing Net Merit for bulls using SNPs for a long time. The human genetics people have been lagging...

Caught up now, though. And first derivative (sample size growth rate) is much larger...

Yes. It's funny because sperm is priced by Net Merit and when we or USDA revise models some farmers or breeders get very angry because the value of their bull can change a lot!

A Harvard Square restaurant.

I last saw Roman at the Fellows spring dinner, many years ago. I was back from Yale to see friends. He was drinking, with serious intent. He told me about working with Wilson at Cornell. He also told me an old story about Jeffrey and the Higgs mechanism. Jeffrey almost had it, soon after his work on the Goldstone boson. But Sidney talked him out of it -- something to the effect of "if you can only make sense of it in unitary gauge, it must be an artifact" ... Afterwards, at MIT they would say When push comes to shove, Sidney is wrong. ...

Genomics is in the details now. Lots of work to be done, but conceptually it's clear what to do. I wouldn't say that about AGI. There are still important conceptual breakthroughs that need to be made.

The Dunster House courtyard, overlooking the Charles.

We used to live here, can you let us in to look around?

I remember it all -- the long meals, the tutors, the students, the concerts in the library. Yo Yo Ma and Owen playing together.

A special time, at least for us. But long vanished except in memory.

Wheeler used to say that the past only exists as memory records.

Not very covariant! Why not a single four-manifold that exists all at once?

The Ritz-Carlton.

Flying private is like crack. Once you do it, you can't go back...

It's not like that. They never give you a number. They just tell you that the field house is undergoing a renovation and there's a naming opportunity. Then your kid is on the right list. They've been doing this for a hundred years...

Card had to do the analysis that way. Harvard was paying him...

I went to the session on VC for newbies. Now I realize "valuation" is just BS... Now you see how it really works...

Then Bobby says "What's an LP? I wanna be an LP because you gotta keep them happy."

Let me guess, you want a dataset with a million genomes and FICO scores?

I've helped US companies come to China for 20+ years. At first it was rough. Now if I'm back in the states for a while and return, Shenzhen seems like the Future. The dynamism is here.

To most of Eurasia it just looks like two competing hegemons. Both systems have their pluses and minuses, but it's not an existential problem...

Sure, Huawei is a big threat because they won't put in backdoors for the NSA. Who was tapping Merkel's cellphone? It was us...

Humans are just smart enough to create an AGI, but perhaps not smart enough to create a safe one.

Maybe we should make humans smarter first, so there is a better chance that our successors will look fondly on us. Genetically engineered super-geniuses might have a better chance at implementing Asimov's Laws of Robotics.

Wednesday, April 24, 2019

The Economist Babbage podcast: The future of genetic engineering

Babbage Podcast by The Economist. (21 minutes. Sorry, can't embed the player.)

Economist Senior Editor Kenneth Cukier takes a look at what it means to be human. He speaks to leading scientists, doctors and philosophers to ask if ethics and regulations are able to keep up with the technology of genetic engineering.

Great interviews with so many people I admire: @EricTopol and @JamieMetzl and @Ella_Road and @GulzaarBarn and @hsu_steve! Produced by the brilliant @simonjarvis1981 https://t.co/mh8XTGJ4UO
— Kenneth Cukier (@kncukier) April 24, 2019

Tuesday, April 23, 2019

Backpropagation in the Brain? Part 2

If I understand correctly the issue is how to realize something like backprop when most of the information flow is feed-forward (as in real neurons). How do you transport weights "non-locally"? The L2 optimization studied here doesn't actually transport weights. Rather, the optimized solution realizes the same set of weights in two places...

See earlier post Backpropagation in the Brain? Thanks for STS for the reference.

Center for Brains, Minds and Machines (CBMM)
Published on Apr 3, 2019
Speaker: Dr. Jon Bloom, Broad Institute

Abstract: When trained to minimize reconstruction error, a linear autoencoder (LAE) learns the subspace spanned by the top principal directions but cannot learn the principal directions themselves. In this talk, I'll explain how this observation became the focus of a project on representation learning of neurons using single-cell RNA data. I'll then share how this focus led us to a satisfying conversation between numerical analysis, algebraic topology, random matrix theory, deep learning, and computational neuroscience. We'll see that an L2-regularized LAE learns the principal directions as the left singular vectors of the decoder, providing a simple and scalable PCA algorithm related to Oja's rule. We'll use the lens of Morse theory to smoothly parameterize all LAE critical manifolds and the gradient trajectories between them; and see how algebra and probability theory provide principled foundations for ensemble learning in deep networks, while suggesting new algorithms. Finally, we'll come full circle to neuroscience via the "weight transport problem" (Grossberg 1987), proving that L2-regularized LAEs are symmetric at all critical points. This theorem provides local learning rules by which maximizing information flow and minimizing energy expenditure give rise to less-biologically-implausible analogues of backproprogation, which we are excited to explore in vivo and in silico. Joint learning with Daniel Kunin, Aleksandrina Goeva, and Cotton Seed.

Sunday, April 21, 2019

Chess with the human body

Keenan Cornelius is a world class blackbelt in jiujitsu. His skill level is sufficiently high that he can verbalize his tactics in real time as he rolls with lower belts.

"He's trying to bump me off mount, so I'm shifting my weight to my left knee to keep my weight off of his hips. But once he gives me an opening I'm going to slide under his lapel to finalize the choke..." Sometimes he is several moves ahead of his opponent!

Roy Dean does something similar, but with narration added in post-production, here. Roy's video is more precise (for one thing, he's not out of breath) but what Keenan is doing is super impressive :-)

Thursday, April 18, 2019

Manifold Episode #8 -- Sabine Hossenfelder on the Crisis in Particle Physics and Against the Next Big Collider

Manifold Show Page YouTube Channel

Hossenfelder is a Research Associate at the Frankfurt Institute of Advanced Studies. Her research areas include particle physics and quantum gravity. She discusses the current state of theoretical physics, and her recent book Lost in Math: How Beauty Leads Physics Astray.

The Uncertain Future of Particle Physics (NYT editorial)

Lost in Math: How Beauty Leads Physics Astray

Transcript

man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Tuesday, April 09, 2019

Genomic prediction of student flow through high school math curriculum

Compute polygenic EA scores for 3000 US high school students of European ancestry. Track individual progress from 9th to 12th grade, focusing on mathematics courses. The students are out-of-sample: not used in training of predictor. In fact, a big portion (over half?) of individuals used in predictor training are not even from the US -- they are from the UK/EU.

Results: predictor captures about as much variance as family background (SES = Social Economic Status). Students with lower polygenic scores are less likely to take advanced math (e.g., Geometry and beyond).

Typical education paths of individuals with, e.g., bottom few percentile polygenic score are radically different from those in the top percentiles, even after controlling for SES. For example, consider only rich kids or kids at superior schools and compare educational trajectory vs polygenic score. Looks like (bottom figure) odds ratio for taking Geometry in 9th grade is about 4-6x higher for top polygenic score kids.

Genetic Associations with Mathematics Tracking and Persistence in Secondary School

K. Paige Harden and Benjamin W. Domingue, et al.

...we address this question using student polygenic scores, which are DNA-based indicators of propensity to succeed in education8. We integrated genetic and official school transcript data from over 3,000 European-ancestry students from U.S. high schools. We used polygenic scores as a molecular tracer to understand how the flow of students through the high school math pipeline differs in socioeconomically advantaged versus disadvantaged schools. Students with higher education polygenic scores were tracked to more advanced math already at the beginning of high school and persisted in math for more years...

...including family-SES and school-SES as covariates attenuated the association between the education-PGS and mathematics tracking in the 9th-grade only by about 20% (attenuated from b = 0.583, SE = .034, to b = 0.461, SE = .036, p < 2 × 10-16, Supplementary Table S3). Note that the association with genetics was roughly comparable in magnitude to the association with familySES...

See also Game Over: Genomic Prediction of Social Mobility (some overlap in authors with the new paper).

A talk by the first author:

Thursday, April 04, 2019

Manifold Episode #7 -- David Skrbina on Ted Kaczynski, Technological Slavery, and the Future of Our Species

Manifold Show Page YouTube Channel

David Skrbina is a philosopher at the University of Michigan. He and Ted Kaczynski published the book Technological Slavery, which elaborates on the Unabomber manifesto and contains about 100 pages of correspondence between the two which took place over almost a decade. Skrbina discusses his and Kaczynski's views on deep problems of technological society, and whether violent opposition to it is justified.

David Skrbina's Featured Publications
https://www.davidskrbina.com/

Photos of Ted Kacynski
http://murderpedia.org/male.K/k/kaczynski-photos-3.htm

David Skrbina, Pen Pal of the Unabomber, on Ted Kaczynski's Philosophy
https://www.youtube.com/watch?v=4dQd7d3XxkA

Tribe by Sebastian Junger
http://www.sebastianjunger.com/tribe-by-sebastian-junger

Joe Rogan Experience #975 - Sebastian Junger
https://www.youtube.com/watch?v=W4KiOECVGLg

man·i·fold /ˈmanəˌfōld/ many and various.

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point.

Steve Hsu and Corey Washington have been friends for almost 30 years, and between them hold PhDs in Neuroscience, Philosophy, and Theoretical Physics. Join them for wide ranging and unfiltered conversations with leading writers, scientists, technologists, academics, entrepreneurs, investors, and more.

Steve Hsu is VP for Research and Professor of Theoretical Physics at Michigan State University. He is also a researcher in computational genomics and founder of several Silicon Valley startups, ranging from information security to biotech. Educated at Caltech and Berkeley, he was a Harvard Junior Fellow and held faculty positions at Yale and the University of Oregon before joining MSU.

Corey Washington is Director of Analytics in the Office of Research and Innovation at Michigan State University. He was educated at Amherst College and MIT before receiving a PhD in Philosophy from Stanford and a PhD in a Neuroscience from Columbia. He held faculty positions at the University Washington and the University of Maryland. Prior to MSU, Corey worked as a biotech consultant and is founder of a medical diagnostics startup.

Monday, April 01, 2019

Big Chickens (Economist video)

Big chickens! Modern breeds are about four times larger than those raised in the 1950s. I wonder how many population SDs of change that represents? About 40?

Interview with Genetic Engineering & Biotechnology News

Polygenic Risk Scores and Genomic Prediction: Q&A with Stephen Hsu

In this exclusive interview, Stephen Hsu (Michigan State University and co-founder of Genomic Prediction) discusses the application of polygenic risk scores (PRS) for complex traits in pre-implantation genetic screening. Interview conducted by Julianna LeMieux (GEN).

GEN: What motivated you to start Genomic Prediction?

STEVE HSU: It has a very long history. Laurent Tellier is the CEO and we’ve known each other since 2010. We’d been working on the background science of how to use machine learning to look at lots of genomes and then learn to predict phenotypes from that information.

We were betting on the continuing decline in cost for genotyping, and it paid off because now there are millions of genotypes available for analysis. We’d always thought that one of the best and earliest applications of this would be embryo selection because we can help families have a healthy child.

GEN: How did you first get interested in genomics in general, given your educational background in physics?

HSU: I was interested in genetics and evolution, molecular biology, since I was a kid. I grew up in the ’70s and ’80s and already at that time there was a lot of attention focused on the molecular biology revolution, recombinant DNA. We were always told physics is a very mature subject and biology is the subject of the future and it will just explode eventually with these new molecular techniques.

When I got to college and I took some classes in molecular biology, I realized that a lot of the deep questions—like how do you actually decipher a genome and figure out which pieces of the genetic code have direct consequences in phenotypes or complex traits?—would not be answerable with the technology of that time. So I put it aside and did theoretical physics, but got re-interested around the time I met Laurent. I became aware of the super exponential cost curve for genotyping, sequencing in particular. I realized, if this continues for another ten years or so, we’re going to be able to answer all these interesting questions I’ve been thinking about since I was a kid.

...

GEN: How do you generate a polygenic risk score for different diseases? Of the eight diseases listed on the Genomic Prediction website, are those diseases that your lab has basically generated that data for?

HSU: Many of them were produced by my research group, but the current best-performing breast cancer predictor actually comes from a large international consortium that works on breast cancer…

We use the same data that people would use for GWAS [genome-wide association studies]. For example, we might have 200,000 controls and 20 or 30,000 cases of people in their 50s and 60s who are old enough that they would have been diagnosed for diabetes (or something) if they had it. The algorithm knows which ones are the cases and which are the controls, and it also has about 1 million SNPs from each person, typically what you get from an Affymetrix or an Illumina array.

It is a learning algorithm that tries to tune its internal model so that it best predicts whether someone is actually a case or a control. There’s a bunch of fancy math involved in this—a high-dimensional optimization. You are basically finding the model that best predicts the data.

It is different from GWAS because GWAS is very simple—you look at a particular gene or SNP and you say is there statistical evidence that this particular SNP is associated with whether you have diabetes? You get a yes/no answer. If the P value is significant enough then you say we found a hit.

That problem mathematically is very different from the problem we solved. We are actually doing an optimization in a million-dimensional space to find simultaneously all the SNPs that should be activated in our predictor. This is all in the technical weeds but it is just different mathematics…

We think we can actually predict risk by doing this high-dimensional optimization. Initially, people just thought we were crazy. We wrote theoretical papers predicting how much data would you need to be able to accurately predict height or something like that. ... [ AND THOSE PREDICTIONS WERE CORRECT ... ]

Information Processing

About Me