Saturday, February 27, 2021

Infinity and Solipsism, Physicists and Science Fiction

The excerpt below is from Roger Zelazny's Creatures of Light and Darkness (1969), an experimental novel which is somewhat obscure, even to fans of Zelazny. 
Positing infinity, the rest is easy. 
The Prince Who Was A Thousand is ... a teleportationist, among other things ... the only one of his kind. He can transport himself, in no time at all, to any place that he can visualize. And he has a very vivid imagination. 
Granting that any place you can think of exists somewhere in infinity, if the Prince can think of it too, he is able to visit it. Now, a few theorists claim that the Prince’s visualizing a place and willing himself into it is actually an act of creation. No one knew about the place before, and if the Prince can find it, then perhaps what he really did was make it happen. However, positing infinity, the rest is easy.
This contains already the central idea that is expressed more fully in Nine Princes in Amber and subsequent books in that series.
While traveling (shifting) between Shadows, [the prince] can alter reality or create a new reality by choosing which elements of which Shadows to keep or add, and which to subtract.
Creatures of Light and Darkness also has obvious similarities to Lord of Light, which many regard as Zelazny's best book and even one of the greatest science fiction novels ever written. Both have been among my favorites since I read them as a kid.

Infinity, probability measures, and solipsism have received serious analysis by theoretical physicists: see, e.g.,  Boltzmann brains. (Which is less improbable: the existence of the universe around you, or the existence of a single brain whose memory records encode that universe?) Perhaps this means theorists have too much time on their hands, due to lack of experimental progress in fundamental physics. 

Science fiction is popular amongst physicists, but I've always been surprised that the level of interest isn't even higher. Two examples I know well: the late Sidney Coleman and my collaborator Bob Scherrer at Vanderbilt were/are scholars and creators of the genre. See these stories by Bob, and Greg Benford's Remembing Sid
... Sid and some others created a fannish publishing house, Advent Publishers, in 1956. He was a teenager when he helped publish Advent’s first book, Damon Knight’s In Search of Wonder. ... 
[Sid] loved SF whereas Einstein deplored it. Lest SF distort pure science and give people the false illusion of scientific understanding, Einstein recommended complete abstinence from any type of science fiction. “I never think of the future. It comes soon enough,” he said.
While I've never written science fiction, occasionally my research comes close -- it has at times addressed questions of the form: 

Do the Laws of Nature as we know them allow ... 

This research might be considered as the ultimate in hard SF ;-) 
Wikipedia: Hard science fiction is a category of science fiction characterized by concern for scientific accuracy and logic.

Note Added: Bob Scherrer writes: In my experience, about 1/3 of research physicists are SF fans, about 1/3 have absolutely no interest in SF, and the remaining 1/3 were avid readers of science fiction in middle school/early high school but then "outgrew" it.

Here is a recent story by Bob which I really enjoyed -- based on many worlds quantum mechanics :-) 

It was ranked #2 in the 2019 Analog Magazine reader poll!

Note Added 2: Kazuo Ishiguro (2017 Nobel Prize in Literature) has been evolving into an SF/fantasy writer over time. And why not? For where else can one work with genuinely new ideas? See Never Let Me Go (clones), The Buried Giant (post-Arthurian England), and his latest book Klara and the Sun.
NYTimes: ... we slowly discover (and those wishing to avoid spoilers should now skip to the start of the next paragraph), the cause of Josie’s mysterious illness is a gene-editing surgery to enhance her intellectual faculties. The procedure carries high risks as well as potential high rewards — the main one being membership in a professional superelite. Those who forgo or simply can’t afford it are essentially consigning themselves to economic serfdom.
WSJ: ... Automation has created a kind of technological apartheid state, which is reinforced by a dangerous “genetic editing” procedure that separates “lifted,” intellectually enhanced children from the abandoned masses of the “unlifted.” Josie is lifted, but the procedure is the cause of her illness, which is often terminal. Her oldest friend and love interest, Rick, is unlifted and so has few prospects despite his obvious brilliance. Her absentee father is an engineer who was outsourced by machines and has since joined a Community, one of the closed groups formed by those lacking social rank. In a conversational aside it is suggested that the Communities have self-sorted along racial lines and are heavily armed.

Sunday, February 21, 2021

Othram: Appalachian hiker found dead in tent identified via DNA forensics


Othram helps solve another mystery: the identity of a dead Appalachian hiker. 

There are ~50k unidentified deceased individuals in the US, with ~1k new cases each year.
CBS Sunday Morning: He was a mystery who intrigued thousands: Who was the hiker who walked almost the entire length of the Appalachian Trail, living completely off the grid, only to be found dead in a tent in Florida? It took years, and the persistence of amateur sleuths, to crack the case. Nicholas Thompson of The Atlantic Magazine tells the tale of the man who went by the name "Mostly Harmless," and about the efforts stirred by the mystery of his identity to give names to nameless missing persons.
See also Othram: the future of DNA forensics.

Thursday, February 18, 2021

David Reich: Prehistory of Europe and S. Asia from Ancient DNA


In case you have not followed the adventures of the Yamnaya (proto Indo-Europeans from the Steppe), I recommend this recent Harvard lecture by David Reich. It summarizes advances in our understanding of deep human history in Europe and South Asia resulting from analysis of ancient DNA. 
The new technology of ancient DNA has highlighted a remarkable parallel in the prehistory of Europe and South Asia. In both cases, the arrival of agriculture from southwest Asia after 9,000 years ago catalyzed profound population mixtures of groups related to Southwest Asian farmers and local hunter-gatherers. In both cases, the spread of ancestry ultimately deriving from Steppe pastoralists had a further major impact after 5,000 years ago and almost certainly brought Indo-European languages. Mixtures of these three source populations form the primary gradients of ancestry in both regions today. 
In this lecture, Prof. Reich will discuss his new book, Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past. 
There seems to be a strange glitch at 16:19 and again at 27:55 -- what did he say?

See also Reich's 2018 NYTimes editorial.

Wednesday, February 17, 2021

The Post-American World: Crooke, Escobar, Blumenthal, and Marandi


Even if you disagree violently with the viewpoints expressed in this discussion, it will inform you as to how the rest of the world thinks about the decline of US empire. 

The group is very diverse: a former UK diplomat, an Iranian professor educated in the West but now at University of Tehran, a progressive author and journalist (son of Clinton advisor Sidney Blumenthal) who spent 5 years reporting from Israel, and a Brazilian geopolitical analyst who writes for Asia Times (if I recall correctly, lives in Thailand).
Thirty years ago, the United States dominated the world politically, economically, and scientifically. But today? 
Watch this in-depth discussion with distinguished guests: 
Alastair Crooke - Former British Diplomat, Founder and Director of the Conflicts Forum 
Pepe Escobar - Brazilian Political Analyst and Author 
Max Blumenthal - American Journalist and Author from Grayzone 
Chaired by Dr. Mohammad Marandi - Professor at University of Tehran
See also two Escobar articles linked here. Related: Foreign Observers of US Empire.  

Sunday, February 14, 2021

Physics and AI: some recent papers

Three AI paper recommendations from a theoretical physicist (former collaborator) who now runs an AI lab in SV. Less than 5 years after leaving physics research, he and his team have shipped AI products that are used by millions of people. (Figure above is from the third paper below.)

This paper elucidates the relationship between symmetry principles (familiar from physics) and specific mathematical structures like convolutions used in DL.
Covariance in Physics and CNN 
Cheng, et al.  (Amsterdam)
In this proceeding we give an overview of the idea of covariance (or equivariance) featured in the recent development of convolutional neural networks (CNNs). We study the similarities and differences between the use of covariance in theoretical physics and in the CNN context. Additionally, we demonstrate that the simple assumption of covariance, together with the required properties of locality, linearity and weight sharing, is sufficient to uniquely determine the form of the convolution.

The following two papers explore connections between AI/ML and statistical physics, including renormalization group (RG) flow. 

Theoretical Connections between Statistical Physics and RL 
Rahme and Adams  (Princeton)
Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function Z, and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and Q-functions can be derived from this partition function and interpreted via average energies, the Z-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for Z is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions. The policies learned via these Z-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. In addition to sampling actions proportionally to the exponential of the expected cumulative reward as Boltzmann policies would, these policies take entropy into account favoring states from which many outcomes are possible.


RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior
Hu et al.   (UCSD and Berkeley AI Lab) 
Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key idea of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, called RG-Flow, which can separate information at different scales of images with disentangled representations at each scale. We demonstrate our method mainly on the CelebA dataset and show that the disentangled representations at different scales enable semantic manipulation and style mixing of the images. To visualize the latent representations, we introduce receptive fields for flow-based models and find that the receptive fields learned by RG-Flow are similar to those in convolutional neural networks. In addition, we replace the widely adopted Gaussian prior distribution by a sparse prior distribution to further enhance the disentanglement of representations. From a theoretical perspective, the proposed method has O(logL) complexity for image inpainting compared to previous generative models with O(L^2) complexity.
See related remarks: ICML notes (2018).
It may turn out that the problems on which DL works well are precisely those in which the training data (and underlying generative processes) have a hierarchical structure which is sparse, level by level. Layered networks perform a kind of coarse graining (renormalization group flow): first layers filter by feature, subsequent layers by combinations of features, etc. But the whole thing can be understood as products of sparse filters, and the performance under training is described by sparse performance guarantees (ReLU = thresholded penalization?). Given the inherent locality of physics (atoms, molecules, cells, tissue; atoms, words, sentences, ...) it is not surprising that natural phenomena generate data with this kind of hierarchical structure.

Sunday, February 07, 2021

Gradient Descent Models Are Kernel Machines (Deep Learning)

This paper shows that models which result from gradient descent training (e.g., deep neural nets) can be expressed as a weighted sum of similarity functions (kernels) which measure the similarity of a given instance to the examples used in training. The kernels are defined by the inner product of model gradients in the parameter space, integrated over the descent (learning) path.

Roughly speaking, two data points x and x' are similar, i.e., have large kernel function K(x,x'), if they have similar effects on the model parameters in the gradient descent. With respect to the learning algorithm, x and x' have similar information content. The learned model y = f(x) matches x to similar data points x_i: the resulting value y is simply a weighted (linear) sum of kernel values K(x,x_i).

This result makes it very clear that without regularity imposed by the ground truth mechanism which generates the actual data (e.g., some natural process), a neural net is unlikely to perform well on an example which deviates strongly (as defined by the kernel) from all training examples. See note added at bottom for more on this point, re: AGI, etc. Given the complexity (e.g., dimensionality) of the ground truth model, one can place bounds on the amount of data required for successful training.

This formulation locates the nonlinearity of deep learning models in the kernel function. The superposition of kernels is entirely linear as long as the loss function is additive over training data.
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine  
P. Domingos
Deep learning’s successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel. This improved understanding should lead to better learning algorithms.
From the paper:
... Here we show that every model learned by this method, regardless of architecture, is approximately equivalent to a kernel machine with a particular type of kernel. This kernel measures the similarity of the model at two data points in the neighborhood of the path taken by the model parameters during learning. Kernel machines store a subset of the training data points and match them to the query using the kernel. Deep network weights can thus be seen as a superposition of the training data points in the kernel’s feature space, enabling their efficient storage and matching. This contrasts with the standard view of deep learning as a method for discovering representations from data. ... 
... the weights of a deep network have a straightforward interpretation as a superposition of the training examples in gradient space, where each example is represented by the corresponding gradient of the model. Fig. 2 illustrates this. One well-studied approach to interpreting the output of deep networks involves looking for training instances that are close to the query in Euclidean or some other simple space (Ribeiro et al., 2016). Path kernels tell us what the exact space for these comparisons should be, and how it relates to the model’s predictions. ...
See also this video which discusses the paper. 

You can almost grasp the result from the figure and definitions below.

Note Added:
I was asked to elaborate further on this sentence, especially regarding AGI and human cognition: 

... without regularity imposed by the ground truth mechanism which generates the actual data (e.g., some natural process), a neural net is unlikely to perform well on an example which deviates strongly (as defined by the kernel) from all training examples.

It should not be taken as a suggestion that gradient descent models can't achieve AGI, or that our minds can't be (effectively) models of this kernel type. 

1. The universe is highly compressible: it is governed by very simple effective models. These models can be learned, which allows for prediction beyond specific examples.

2. A sufficiently complex neural net can incorporate layers of abstraction. Thus a new instance and a previously seen example might be similar in an abstract (non-explicit) sense, but that similarity is still incorporated into the kernel. When Einstein invented Special Relativity he was not exactly aping another physical theory he had seen before, but at an abstract level the physical constraint (speed of light constant in all reference frames) and algebraic incorporation of this fact into a description of spacetime (Lorentz symmetry) may have been "similar" to examples he had seen already in simple geometry / algebra. (See Poincare and Einstein for more.)
Ulam: Banach once told me, "Good mathematicians see analogies between theorems or theories, the very best ones see analogies between analogies." Gamow possessed this ability to see analogies between models for physical theories to an almost uncanny degree... 

Saturday, February 06, 2021

Enter the Finns: FinnGen and FINRISK polygenic prediction of cardiometabolic diseases, common cancers, alcohol use, and cognition

In 2018 Dr. Aarno Palotie visited MSU (video of talk) to give an overview of the FinnGen research project. FinnGen aims to collect the genomic data of 500k citizens in Finland in order to study the origins of diseases and their treatment. Finland is well suited for this kind of study because it is relatively homogenous and has a good national healthcare system.
Professor Aarno Palotie, M.D., Ph.D. is the research director of the Human Genomics program at FIMM. He is also a faculty member at the Center for Human Genome Research at the Massachusetts General Hospital in Boston and associate member of the Broad Institute of MIT and Harvard. He has a long track record in human disease genetics. He has held professorships and group leader positions at the University of Helsinki, UCLA, and the Wellcome Trust Sanger Institute. He has also been the director of the Finnish Genome Center and Laboratory of Molecular Genetics in the Helsinki University Hospital.
FinnGen is now producing very interesting results in polygenic risk prediction and clinical / public health applications of genomics. Below are a few recent papers.

1. This paper studies the use of PRS in prediction of five common diseases, with an eye towards clinical utility.
Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers 
Nature Medicine volume 26, 549–557(2020) 
Polygenic risk scores (PRSs) have shown promise in predicting susceptibility to common diseases1,2,3. We estimated their added value in clinical risk prediction of five common diseases, using large-scale biobank data (FinnGen; n = 135,300) and the FINRISK study with clinical risk factors to test genome-wide PRSs for coronary heart disease, type 2 diabetes, atrial fibrillation, breast cancer and prostate cancer. We evaluated the lifetime risk at different PRS levels, and the impact on disease onset and on prediction together with clinical risk scores. Compared to having an average PRS, having a high PRS contributed 21% to 38% higher lifetime risk, and 4 to 9 years earlier disease onset. PRSs improved model discrimination over age and sex in type 2 diabetes, atrial fibrillation, breast cancer and prostate cancer, and over clinical risk in type 2 diabetes, breast cancer and prostate cancer. In all diseases, PRSs improved reclassification over clinical thresholds, with the largest net reclassification improvements for early-onset coronary heart disease, atrial fibrillation and prostate cancer. This study provides evidence for the additional value of PRSs in clinical disease prediction. The practical applications of polygenic risk information for stratified screening or for guiding lifestyle and medical interventions in the clinical setting remain to be defined in further studies.

2. This paper is a well-powered study of genetic influence on alcohol use and effects on mortality.

Genomic prediction of alcohol-related morbidity and mortality 
Nature Translational Psychiatry volume 10, Article number: 23 (2020) 
While polygenic risk scores (PRS) have been shown to predict many diseases and risk factors, the potential of genomic prediction in harm caused by alcohol use has not yet been extensively studied. Here, we built a novel polygenic risk score of 1.1 million variants for alcohol consumption and studied its predictive capacity in 96,499 participants from the FinnGen study and 39,695 participants from prospective cohorts with detailed baseline data and up to 25 years of follow-up time. A 1 SD increase in the PRS was associated with 11.2 g (=0.93 drinks) higher weekly alcohol consumption (CI = 9.85–12.58 g, p = 2.3 × 10–58). The PRS was associated with alcohol-related morbidity (4785 incident events) and the risk estimate between the highest and lowest quintiles of the PRS was 1.83 (95% CI = 1.66–2.01, p = 1.6 × 10–36). When adjusted for self-reported alcohol consumption, education, marital status, and gamma-glutamyl transferase blood levels in 28,639 participants with comprehensive baseline data from prospective cohorts, the risk estimate between the highest and lowest quintiles of the PRS was 1.58 (CI = 1.26–1.99, p = 8.2 × 10–5). The PRS was also associated with all-cause mortality with a risk estimate of 1.33 between the highest and lowest quintiles (CI = 1.20–1.47, p = 4.5 × 10–8) in the adjusted model. In conclusion, the PRS for alcohol consumption independently associates for both alcohol-related morbidity and all-cause mortality. Together, these findings underline the importance of heritable factors in alcohol-related health burden while highlighting how measured genetic risk for an important behavioral risk factor can be used to predict related health outcomes.

3. This paper examines rare CNVs (Copy Number Variants) and PRS (Polygenic Risk Score) prediction using a combined Finnish sample of ~30k for whom education, income, and health outcomes are known. The study finds that low polygenic scores for Educational Attainment (EA) and intelligence predict worse outcomes in education, income, and health.
Polygenic burden has broader impact on health, cognition, and socioeconomic outcomes than most rare and high-risk copy number variants 

Abstract Copy number variants (CNVs) are associated with syndromic and severe neurological and psychiatric disorders (SNPDs), such as intellectual disability, epilepsy, schizophrenia, and bipolar disorder. Although considered high-impact, CNVs are also observed in the general population. This presents a diagnostic challenge in evaluating their clinical significance. To estimate the phenotypic differences between CNV carriers and non-carriers regarding general health and well-being, we compared the impact of SNPD-associated CNVs on health, cognition, and socioeconomic phenotypes to the impact of three genome-wide polygenic risk score (PRS) in two Finnish cohorts (FINRISK, n = 23,053 and NFBC1966, n = 4895). The focus was on CNV carriers and PRS extremes who do not have an SNPD diagnosis. We identified high-risk CNVs (DECIPHER CNVs, risk gene deletions, or large [>1 Mb] CNVs) in 744 study participants (2.66%), 36 (4.8%) of whom had a diagnosed SNPD. In the remaining 708 unaffected carriers, we observed lower educational attainment (EA; OR = 0.77 [95% CI 0.66–0.89]) and lower household income (OR = 0.77 [0.66–0.89]). Income-associated CNVs also lowered household income (OR = 0.50 [0.38–0.66]), and CNVs with medical consequences lowered subjective health (OR = 0.48 [0.32–0.72]). The impact of PRSs was broader. At the lowest extreme of PRS for EA, we observed lower EA (OR = 0.31 [0.26–0.37]), lower-income (OR = 0.66 [0.57–0.77]), lower subjective health (OR = 0.72 [0.61–0.83]), and increased mortality (Cox’s HR = 1.55 [1.21–1.98]). PRS for intelligence had a similar impact, whereas PRS for schizophrenia did not affect these traits. We conclude that the majority of working-age individuals carrying high-risk CNVs without SNPD diagnosis have a modest impact on morbidity and mortality, as well as the limited impact on income and educational attainment, compared to individuals at the extreme end of common genetic variation. Our findings highlight that the contribution of traditional high-risk variants such as CNVs should be analyzed in a broader genetic context, rather than evaluated in isolation. 
From the paper:
 ... we compared the impact of CNVs to the impact of the PRSs for educational attainment [24], schizophrenia [25], and general intelligence [26] on general health, morbidity, mortality, and socioeconomic burden. We analyzed these effects in two cohorts: one sampled at random from the Finnish working-age population (FINRISK), the other a Finnish birth cohort (Northern Finland Birth Cohort 1966; NFBC1966). Both cohorts link to national health records, enabling analysis of longitudinal health data and socioeconomic status data over several decades. 
... we observed a clear polygenic effect on socioeconomic outcome with educational attainment and IQ PRS scores. Belonging to the matched lowest PRS extremes (lowest 2.66%) of educational attainment or IQ had an overall stronger impact on the socioeconomic outcome than belonging to most high-risk CNV groups, and a generally stronger impact on health and survival, with the exception of household income-associated CNVs. 
... odds for subsequent level of education were even lower at the matched lowest extreme of PRSEA (OR = 0.31 [0.26–0.37]) and PRSIQ (OR = 0.51 [0.44–0.60]).
... Rare deleterious variants, including CNVs, can have a major impact on health outcomes for an individual and are thus under strong negative selection. However, such variants might not always have a strong phenotypic impact (incomplete penetrance), and as observed here, can have a very modest—if any—effect on well-being. The reason for this wide spectrum of outcomes remains speculative. From a genetic perspective, one hypothesis is that additional variants, both rare and common, modify the phenotypic outcome of a CNV carrier (Supplementary Figs. 11 and 12). This type of effect is observable in analyzes of hereditary breast and ovarian cancer in the UK Biobank [40] and in FinnGen [41], where strong-impacting variants’ penetrance is modified by compensatory polygenic effects. 
... As stated above, the observed effect of polygenic scores was broader than that of structural variants. We observed strong effects in PRSs for intelligence and educational attainment on education, income and socioeconomic status. 

Wednesday, February 03, 2021

Gerald Feinberg and The Prometheus Project

Gerald Feinberg (1933-1992) was a theoretical physicist at Columbia, perhaps best known for positing the tachyon -- a particle that travels faster than light. He also predicted the existence of the mu neutrino. 

Feinberg attended Bronx Science with Glashow and Weinberg. Interesting stories abound concerning how the three young theorists were regarded by their seniors at the start of their careers. 

I became aware of Feinberg when Pierre Sikivie and I worked out the long range force resulting from two neutrino exchange. Although we came to the idea independently and derived, for the first time, the correct result, we learned later that it had been studied before by Feinberg and Sucher. Sadly, Feinberg died of cancer shortly before Pierre and I wrote our paper. 

Recently I came across Feinberg's 1969 book The Prometheus Project, which is one of the first serious examinations (outside of science fiction) of world-changing technologies such as genetic engineering and AI. See reviews in SciencePhysics Today, and H+ Magazine. A scanned copy of the book can be found at Libgen.

Feinberg had the courage to engage with ideas that were much more speculative in the late 60s than they are today. He foresaw correctly, I believe, that technologies like AI and genetic engineering will alter not just human society but the nature of the human species itself. In the final chapter, he outlines a proposal for the eponymous Prometheus Project -- a global democratic process by which the human species can set long term goals in order to guide our approach to what today would be called the Singularity.


Tuesday, February 02, 2021

All Men Are Brothers -- 3 AM Edition

Afu Thomas is a German internet personality, known for his fluent Chinese, who lives in Shanghai. His videos are extremely popular in China and people often recognize him in public. 

These are a series of street interviews shot at 3 AM, in which he elicits sometimes moving and philosophical responses from ordinary people about hopes, dreams, family, money, happiness. These individuals, ranging from teen boys and girls to middle aged men, answer the questions simply but with insight and sincerity.  

The subtitled translations are very good. 

Blog Archive