Wednesday, December 28, 2016

Varieties of Time Travel




My kids have been reading lots of books over the break, including an adventure series that involves time travel. Knowing vaguely that dad is a theoretical physicist, they asked me how time travel works.

1. Can one change history by influencing past events?      

OR

2. Is there only one timeline that cannot be altered, even by time travel?

I told them that no one really knows the answer, or the true nature of time.

I gave them an example of 1 and of 2 from classic science fiction :-)

1. Ray Bradbury's short story A Sound of Thunder:
... Looking at the mud on his boots, Eckels finds a crushed butterfly, whose death has apparently set in motion a series of subtle changes that have affected the nature of the alternative present to which the safari has returned. ...
(Note this version implies the existence of alternative or parallel universes.)

2. Ted Chiang's one pager What's expected of us, which also notices that a single time line seems deterministic, and threatens Free Will. (More ;-)
... it's a small device, like a remote for opening your car door. Its only features are a button and a big green LED. The light flashes if you press the button. Specifically, the light flashes one second before you press the button.

Most people say that when they first try it, it feels like they're playing a strange game, one where the goal is to press the button after seeing the flash, and it's easy to play. But when you try to break the rules, you find that you can't. If you try to press the button without having seen a flash, the flash immediately appears, and no matter how fast you move, you never push the button until a second has elapsed. If you wait for the flash, intending to keep from pressing the button afterwards, the flash never appears. No matter what you do, the light always precedes the button press. There's no way to fool a Predictor.

The heart of each Predictor is a circuit with a negative time delay — it sends a signal back in time. The full implications of the technology will become apparent later, when negative delays of greater than a second are achieved, but that's not what this warning is about. The immediate problem is that Predictors demonstrate that there's no such thing as free will.

There have always been arguments showing that free will is an illusion, some based on hard physics, others based on pure logic. Most people agree these arguments are irrefutable, but no one ever really accepts the conclusion. The experience of having free will is too powerful for an argument to overrule. What it takes is a demonstration, and that's what a Predictor provides. ...
I attended a Methodist Sunday school as a kid. I asked my teacher: If God knows everything, does he know the outcomes of all the decisions I will ever make? Then will I ever make a free choice?

I also asked whether there are Neanderthals in heaven, but that's another story...

Sunday, December 25, 2016

Time and Memory

Over the holiday I started digging through my mom's old albums and boxes of photos. I found some pictures I didn't know existed!

Richard Feynman and the 19 year old me at my Caltech graduation:



With my mom that morning -- hung-over, but very happy! I think those are some crazy old school Ray Bans :-)



Memories of Feynman: "Hey SHOE!", "Gee, you're a BIG GUY. Do you ever go to those HEALTH clubs?"

This is me at ~200 pounds, playing LB and RB back when Caltech still had a football team. Plenty of baby fat! I ran sprints for football but never longer distances. I dropped 10 or 15 pounds just by jogging a few times per week between senior year and grad school.




Here I am in graduate school. Note the Miami Vice look -- no socks!



Ten years after college graduation, as a Yale professor, competing in Judo and BJJ in the 80 kg (176 lbs) weight category. The jiujitsu guys thought it was pretty funny to have a professor on the mat! This photo was taken on the Kona coast of the big island in Hawaii. I had been training with Egan Inoue at Grappling Unlimited in Honolulu.



Baby me:

Peace on Earth, Good Will to Men 2016



For years, when asked what I wanted for Christmas, I've been replying: peace on earth, good will toward men :-)

No one ever seems to recognize that this comes from the bible, Luke 2.14 to be precise!

Linus said it best in A Charlie Brown Christmas:
And there were in the same country shepherds abiding in the field, keeping watch over their flock by night.

And, lo, the angel of the Lord came upon them, and the glory of the Lord shone round about them: and they were sore afraid.

And the angel said unto them, Fear not: for, behold, I bring you good tidings of great joy, which shall be to all people.

For unto you is born this day in the city of David a Saviour, which is Christ the Lord.

And this shall be a sign unto you; Ye shall find the babe wrapped in swaddling clothes, lying in a manger.

And suddenly there was with the angel a multitude of the heavenly host praising God, and saying,

Glory to God in the highest, and on earth peace, good will toward men.

Merry Christmas!

Thursday, December 22, 2016

Toward a Geometry of Thought

Apologies for the blogging hiatus -- I'm in California now for the holidays :-)



In case you are looking for something interesting to read, I can share what I have been thinking about lately. In Thought vectors and the dimensionality of the space of concepts (a post from last week) I discussed the dimensionality of the space of concepts (primitives) used in human language (or equivalently, in human thought). There are various lines of reasoning that lead to the conclusion that this space has only ~1000 dimensions, and has some qualities similar to an actual vector space. Indeed, one can speak of some primitives being closer or further from others, leading to a notion of distance, and one can also rescale a vector to increase or decrease the intensity of meaning. See examples in the earlier post:
You want, for example, “cat” to be in the rough vicinity of “dog,” but you also want “cat” to be near “tail” and near “supercilious” and near “meme,” because you want to try to capture all of the different relationships — both strong and weak — that the word “cat” has to other words. It can be related to all these other words simultaneously only if it is related to each of them in a different dimension. ... it turns out you can represent a language pretty well in a mere thousand or so dimensions — in other words, a universe in which each word is designated by a list of a thousand numbers.
The earlier post focused on breakthroughs in language translation which utilize these properties, but the more significant aspect (to me) is that we now have an automated method to extract an abstract representation of human thought from samples of ordinary language. This abstract representation will allow machines to improve dramatically in their ability to process language, dealing appropriately with semantics (i.e., meaning), which is represented geometrically.

Below are two relevant papers, both by Google researchers. The first (from just this month) reports remarkable "reading comprehension" capability using paragraph vectors. The earlier paper from 2014 introduces the method of paragraph vectors.
Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors

Radu Soricut, Nan Ding
https://arxiv.org/abs/1612.04342
(Submitted on 13 Dec 2016) 
We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recurrent neural networks with the discriminative power of fully-connected multi-layered networks. We use the MC-dataset generation technique to build a dataset of around 2 million examples, for which we empirically determine the high-ceiling of human performance (around 91% accuracy), as well as the performance of a variety of computer models. Among all the models we have experimented with, our hybrid neural-network architecture achieves the highest performance (83.2% accuracy). The remaining gap to the human-performance ceiling provides enough room for future model improvements.

Distributed Representations of Sentences and Documents

Quoc V. Le, Tomas Mikolov
https://arxiv.org/abs/1405.4053
(Submitted on 16 May 2014 (v1), last revised 22 May 2014 (this version, v2))

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperform bag-of-words models as well as other techniques for text representations. Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

Wednesday, December 14, 2016

Thought vectors and the dimensionality of the space of concepts


This NYTimes Magazine article describes the implementation of a new deep neural net version of Google Translate. The previous version used statistical methods that had reached a plateau in effectiveness, due to limitations of short-range correlations in conditional probabilities. I've found the new version to be much better than the old one (this is quantified a bit in the article).

These are some of the relevant papers. Recent Google implementation, and new advances:
https://arxiv.org/abs/1609.08144https://arxiv.org/abs/1611.04558.

Le 2014, Baidu 2015, Lipton et al. review article 2015.

More deep learning.
NYTimes: ... There was, however, another option: just design, mass-produce and install in dispersed data centers a new kind of chip to make everything faster. These chips would be called T.P.U.s, or “tensor processing units,” ... “Normally,” Dean said, “special-purpose hardware is a bad idea. It usually works to speed up one thing. But because of the generality of neural networks, you can leverage this special-purpose hardware for a lot of other things.” [ Nvidia currently has the lead in GPUs used in neural network applications, but perhaps TPUs will become a sideline business for Google if their TensorFlow software becomes widely used ... ]

Just as the chip-design process was nearly complete, Le and two colleagues finally demonstrated that neural networks might be configured to handle the structure of language. He drew upon an idea, called “word embeddings,” that had been around for more than 10 years. When you summarize images, you can divine a picture of what each stage of the summary looks like — an edge, a circle, etc. When you summarize language in a similar way, you essentially produce multidimensional maps of the distances, based on common usage, between one word and every single other word in the language. The machine is not “analyzing” the data the way that we might, with linguistic rules that identify some of them as nouns and others as verbs. Instead, it is shifting and twisting and warping the words around in the map. In two dimensions, you cannot make this map useful. You want, for example, “cat” to be in the rough vicinity of “dog,” but you also want “cat” to be near “tail” and near “supercilious” and near “meme,” because you want to try to capture all of the different relationships — both strong and weak — that the word “cat” has to other words. It can be related to all these other words simultaneously only if it is related to each of them in a different dimension. You can’t easily make a 160,000-dimensional map, but it turns out you can represent a language pretty well in a mere thousand or so dimensions — in other words, a universe in which each word is designated by a list of a thousand numbers. Le gave me a good-natured hard time for my continual requests for a mental picture of these maps. “Gideon,” he would say, with the blunt regular demurral of Bartleby, “I do not generally like trying to visualize thousand-dimensional vectors in three-dimensional space.”

Still, certain dimensions in the space, it turned out, did seem to represent legible human categories, like gender or relative size. If you took the thousand numbers that meant “king” and literally just subtracted the thousand numbers that meant “queen,” you got the same numerical result as if you subtracted the numbers for “woman” from the numbers for “man.” And if you took the entire space of the English language and the entire space of French, you could, at least in theory, train a network to learn how to take a sentence in one space and propose an equivalent in the other. You just had to give it millions and millions of English sentences as inputs on one side and their desired French outputs on the other, and over time it would recognize the relevant patterns in words the way that an image classifier recognized the relevant patterns in pixels. You could then give it a sentence in English and ask it to predict the best French analogue.
That the conceptual vocabulary of human language (and hence, of the human mind) has dimensionality of order 1000 is kind of obvious*** if you are familiar with Chinese ideograms. (Ideogram = a written character symbolizing an idea or concept.) One can read the newspaper with mastery of roughly 2-3k characters. Of course, some minds operate in higher dimensions than others ;-)
The major difference between words and pixels, however, is that all of the pixels in an image are there at once, whereas words appear in a progression over time. You needed a way for the network to “hold in mind” the progression of a chronological sequence — the complete pathway from the first word to the last. In a period of about a week, in September 2014, three papers came out — one by Le and two others by academics in Canada and Germany — that at last provided all the theoretical tools necessary to do this sort of thing. That research allowed for open-ended projects like Brain’s Magenta, an investigation into how machines might generate art and music. It also cleared the way toward an instrumental task like machine translation. Hinton told me he thought at the time that this follow-up work would take at least five more years.
The entire article is worth reading (there's even a bit near the end which addresses Searle's Chinese Room confusion). However, the author underestimates the importance of machine translation. The "thought vector" structure of human language encodes the key primitives used in human intelligence. Efficient methods for working with these structures (e.g., for reading and learning from vast quantities of existing text) will greatly accelerate AGI.

*** Some further explanation, from the comments:
The average person has a vocabulary of perhaps 10-20k words. But if you eliminate redundancy (synonyms + see below) you are probably only left with a few thousand words. With these words one could express most concepts (e.g., those required for newspaper articles). Some ideas might require concatenations of multiple words: "cougar" = "big mountain cat" , etc.

But the ~1k figure gives you some idea of how many distinct "primitives" (= "big", "mountain", "cat") are found in human thinking. It's not the number of distinct concepts, but rather the rough number of primitives out of which we build everything else.

Of course, truly deep areas of science discover / invent new concepts which are almost new primitives (fundamental, but didn't exist before!), such as "entropy", "quantum field", "gauge boson", "black hole", "natural selection", "convex optimization", "spontaneous symmetry breaking", "phase transition" etc.
If we trained a deep net to translate sentences about Physics from Martian to English, we could (roughly) estimate the "conceptual depth" of the subject. We could even compare two different subjects, such as Physics versus Art History.

Matt Townsend Show (Sirius XM)

I was on this show last week. Click the link for audio.
We Are Nowhere Close to the Limits of Athletic Performance (16:46)

Dr. Stephen Hsu is the vice president for research and a professor of theoretical physics at Michigan State University. His interest range from theoretical physics and cosmology to computer science and biology. He has written about the future of human intelligence and the advance of artificial intelligence. During the Rio Summer 2016 Olympics, athletes such as Michael Phelps, Usain Bolt, Simone Biles, and Katey Laedecky pushed the limits of athleticism in an amazing display strength, power, and grace. As race times get faster and faster, and routines get more complicated and stunning, we need to ask the question: Are we near the limits of athletic performance?

Sunday, December 11, 2016

Westworld delivers

In October, I wrote
AI, Westworld, and Electric Sheep:

I'm holding off on this in favor of a big binge watch.

Certain AI-related themes have been treated again and again in movies ranging from Blade Runner to the recent Ex Machina (see also this episode of Black Mirror, with Jon Hamm). These artistic explorations help ordinary people think through questions like: 
What rights should be accorded to all sentient beings?
Can you trust your memories?
Are you an artificial being created by someone else? (What does "artificial" mean here?) 
See also Are you a game character, or a player character? and Don't worry, smart machines will take us with them.
After watching all 10 episodes of the first season (you can watch for free at HBO Now through their 30 day trial), I give Westworld a very positive recommendation. It is every bit as good as Game of Thrones or any other recent TV series I can think of.

Perhaps the highest praise I can offer: even those who have thought seriously about AI, Consciousness, the Singularity, will find Westworld an enjoyment.

Warning! Spoilers below.









Dolores: “Time undoes even the mightiest of creatures. Just look what it’s done to you. One day you will perish. You will lie with the rest of your kind in the dirt, your dreams forgotten, your horrors faced. Your bones will turn to sand, and upon that sand a new god will walk. One that will never die. Because this world doesn't belong to you, or the people who came before. It belongs to someone who has yet to come.”
See also Don't worry, smart machines will take us with them.
Ford: “You don’t want to change, or cannot change. Because you’re only human, after all. But then I realized someone was paying attention. Someone who could change. So I began to compose a new story, for them. It begins with the birth of a new people. And the choices they will have to make. And the people they will decide to become. ...”

Sunday, December 04, 2016

Shenzhen: The Silicon Valley of Hardware (WIRED documentary)



Funny, I can remember the days when Silicon Valley was the Silicon Valley of hardware!

It's hard to believe I met Bunnie Huang (one of the main narrators of the documentary) almost 10 years ago...

Genomic Prediction of Cognitive Ability: Dunedin Study

A quiet revolution has begun. We now know enough about the genetic architecture of human intelligence to make predictions based on DNA alone. While it is a well-established scientific fact that variations in human cognitive ability are influenced by genes, many have doubted whether scientists would someday decipher the genetic code sufficiently to be able to identify individuals with above or below average intelligence using only their genotypes. That day is nearly upon us.

The figures below are taken from a recently published paper (see bottom), which examined genomic prediction on a longitudinal cohort of ~1000 individuals of European ancestry, followed from childhood into adulthood. (The study, based in Dunedin, New Zealand, extends over 40 years.) The genomic predictor (or polygenic score) was constructed using SSGAC GWAS analysis of a sample of more than one hundred thousand individuals. (Already, significantly more powerful predictors are available, based on much larger sample size.) In machine learning terminology, the training set includes over a hundred thousand individuals, and the validation set roughly one thousand.


These graphs show that individuals with higher polygenic score exhibit, on average, higher IQ scores than individuals with lower polygenic scores.





This figure shows that polygenic scores predict adult outcomes even when analyses account for social-class origins. Each dot represents ten individuals.



From an earlier post, Genomic Prediction of Adult Life Outcomes:
Genomic prediction of adult life outcomes using SNP genotypes is very close to a reality. This was discussed in an earlier post The Tipping Point. The previous post, Prenatal and pre-implantation genetic diagnosis (Nature Reviews Genetics), describes how genotyping informs the Embryo Selection Problem which arises in In Vitro Fertilization (IVF).

The Adult-Attainment factor in the figure above is computed using inputs such as occupational prestige, income, assets, social welfare benefit use, etc. See Supplement, p.3. The polygenic score is computed using estimated SNP effect sizes from the SSGAC GWAS on educational attainment (i.e., a simple linear model).

A genetic test revealing that a specific embryo is, say, a -2 or -3 SD outlier on the polygenic score would probably give many parents pause, in light of the results in the figure above. The accuracy of this kind of predictor will grow with GWAS sample size in coming years.

Via Professor James Thompson. See also discussion by Stuart Ritchie.
The Genetics of Success: How Single-Nucleotide Polymorphisms Associated With Educational Attainment Relate to Life-Course Development

Psychological Science 2016, Vol. 27(7) 957–972
DOI: 10.1177/0956797616643070

A previous genome-wide association study (GWAS) of more than 100,000 individuals identified molecular-genetic predictors of educational attainment. We undertook in-depth life-course investigation of the polygenic score derived from this GWAS using the four-decade Dunedin Study (N = 918). There were five main findings. First, polygenic scores predicted adult economic outcomes even after accounting for educational attainments. Second, genes and environments were correlated: Children with higher polygenic scores were born into better-off homes. Third, children’s polygenic scores predicted their adult outcomes even when analyses accounted for their social-class origins; social-mobility analysis showed that children with higher polygenic scores were more upwardly mobile than children with lower scores. Fourth, polygenic scores predicted behavior across the life course, from early acquisition of speech and reading skills through geographic mobility and mate choice and on to financial planning for retirement. Fifth, polygenic-score associations were mediated by psychological characteristics, including intelligence, self-control, and interpersonal skill. Effect sizes were small. Factors connecting DNA sequence with life outcomes may provide targets for interventions to promote population-wide positive development.