Machines and bilingualism: I had a terrifying thought the other day. I would guess that at 90 percent confidence level machine translation and voice recognition will be good enough in 20 years that people will be able to communicate pretty well across most language barriers using cheap and unobtrusive devices. If so, is it worth all this effort to make sure my kids are bilingual?This video shows the current state of the art from Microsoft Research. See the realtime speech to speech (English to Chinese) demo starting at just before 6 minutes in. The demo could have been faked a bit -- Rashid might be sticking to a prepared script -- but let's hope it was legitimate.
I say it's terrifying because of the significant effort we're expending on our Bilingual Kids Project -- including relocating to Taiwan for this sabbatical. Another point of clarification: I'm not saying in 20 years we'll have AI (far from it). But something that translates basic phrases and simple content (surely we'll have that: Moore's law, massive corpora of translated text, statistical machine learning, yada yada) would reduce significantly the value of all but the most sophisticated language skills.
Pessimism of the Intellect, Optimism of the Will Favorite posts | Manifold podcast | Twitter: @hsu_steve
Thursday, March 28, 2013
Machine translation
In a 2010 post, I wrote:
Subscribe to:
Post Comments (Atom)
Blog Archive
-
▼
2013
(211)
-
▼
03
(18)
- The talented 1 in 10,000
- Machine translation
- On the radio: NPR's On Point
- Koxinga
- He is a second Dirac, only this time human
- "Stupid fools"
- Roe's scientists
- Genetic Architecture of Intelligence
- Dreams from my children
- Inside the 1%
- It's all in the gene: myostatin and racehorses
- Innocent abroad? Frampton alive
- Synthetic biology and the hype cycle
- Glial cells for Algernon
- US wealth inequality
- Gell-Mann, Feynman, Hawking
- A Greek bearing gifts
- I wonder why I wonder why
-
▼
03
(18)
Labels
- physics (420)
- genetics (325)
- globalization (301)
- genomics (295)
- technology (282)
- brainpower (280)
- finance (275)
- american society (261)
- China (249)
- innovation (231)
- ai (206)
- economics (202)
- psychometrics (190)
- science (172)
- psychology (169)
- machine learning (166)
- biology (163)
- photos (162)
- genetic engineering (150)
- universities (150)
- travel (144)
- podcasts (143)
- higher education (141)
- startups (139)
- human capital (127)
- geopolitics (124)
- credit crisis (115)
- political correctness (108)
- iq (107)
- quantum mechanics (107)
- cognitive science (103)
- autobiographical (97)
- politics (93)
- careers (90)
- bounded rationality (88)
- social science (86)
- history of science (85)
- realpolitik (85)
- statistics (83)
- elitism (81)
- talks (80)
- evolution (79)
- credit crunch (78)
- biotech (76)
- genius (76)
- gilded age (73)
- income inequality (73)
- caltech (68)
- books (64)
- academia (62)
- history (61)
- intellectual history (61)
- MSU (60)
- sci fi (60)
- harvard (58)
- silicon valley (58)
- mma (57)
- mathematics (55)
- education (53)
- video (52)
- kids (51)
- bgi (48)
- black holes (48)
- cdo (45)
- derivatives (43)
- neuroscience (43)
- affirmative action (42)
- behavioral economics (42)
- economic history (42)
- literature (42)
- nuclear weapons (42)
- computing (41)
- jiujitsu (41)
- physical training (40)
- film (39)
- many worlds (39)
- quantum field theory (39)
- expert prediction (37)
- ufc (37)
- bjj (36)
- bubbles (36)
- mortgages (36)
- google (35)
- race relations (35)
- hedge funds (34)
- security (34)
- von Neumann (34)
- meritocracy (31)
- feynman (30)
- quants (30)
- taiwan (30)
- efficient markets (29)
- foo camp (29)
- movies (29)
- sports (29)
- music (28)
- singularity (27)
- entrepreneurs (26)
- conferences (25)
- housing (25)
- obama (25)
- subprime (25)
- venture capital (25)
- berkeley (24)
- epidemics (24)
- war (24)
- wall street (23)
- athletics (22)
- russia (22)
- ultimate fighting (22)
- cds (20)
- internet (20)
- new yorker (20)
- blogging (19)
- japan (19)
- scifoo (19)
- christmas (18)
- dna (18)
- gender (18)
- goldman sachs (18)
- university of oregon (18)
- cold war (17)
- cryptography (17)
- freeman dyson (17)
- smpy (17)
- treasury bailout (17)
- algorithms (16)
- autism (16)
- personality (16)
- privacy (16)
- Fermi problems (15)
- cosmology (15)
- happiness (15)
- height (15)
- india (15)
- oppenheimer (15)
- probability (15)
- social networks (15)
- wwii (15)
- fitness (14)
- government (14)
- les grandes ecoles (14)
- neanderthals (14)
- quantum computers (14)
- blade runner (13)
- chess (13)
- hedonic treadmill (13)
- nsa (13)
- philosophy of mind (13)
- research (13)
- aspergers (12)
- climate change (12)
- harvard society of fellows (12)
- malcolm gladwell (12)
- net worth (12)
- nobel prize (12)
- pseudoscience (12)
- Einstein (11)
- art (11)
- democracy (11)
- entropy (11)
- geeks (11)
- string theory (11)
- television (11)
- Go (10)
- ability (10)
- complexity (10)
- dating (10)
- energy (10)
- football (10)
- france (10)
- italy (10)
- mutants (10)
- nerds (10)
- olympics (10)
- pop culture (10)
- crossfit (9)
- encryption (9)
- eugene (9)
- flynn effect (9)
- james salter (9)
- simulation (9)
- tail risk (9)
- turing test (9)
- alan turing (8)
- alpha (8)
- ashkenazim (8)
- data mining (8)
- determinism (8)
- environmentalism (8)
- games (8)
- keynes (8)
- manhattan (8)
- new york times (8)
- pca (8)
- philip k. dick (8)
- qcd (8)
- real estate (8)
- robot genius (8)
- success (8)
- usain bolt (8)
- Iran (7)
- aig (7)
- basketball (7)
- free will (7)
- fx (7)
- game theory (7)
- hugh everett (7)
- inequality (7)
- information theory (7)
- iraq war (7)
- markets (7)
- paris (7)
- patents (7)
- poker (7)
- teaching (7)
- vietnam war (7)
- volatility (7)
- anthropic principle (6)
- bayes (6)
- class (6)
- drones (6)
- econtalk (6)
- empire (6)
- global warming (6)
- godel (6)
- intellectual property (6)
- nassim taleb (6)
- noam chomsky (6)
- prostitution (6)
- rationality (6)
- academia sinica (5)
- bobby fischer (5)
- demographics (5)
- fake alpha (5)
- kasparov (5)
- luck (5)
- nonlinearity (5)
- perimeter institute (5)
- renaissance technologies (5)
- sad but true (5)
- software development (5)
- solar energy (5)
- warren buffet (5)
- 100m (4)
- Poincare (4)
- assortative mating (4)
- bill gates (4)
- borges (4)
- cambridge uk (4)
- censorship (4)
- charles darwin (4)
- computers (4)
- creativity (4)
- hormones (4)
- humor (4)
- judo (4)
- kerviel (4)
- microsoft (4)
- mixed martial arts (4)
- monsters (4)
- moore's law (4)
- soros (4)
- supercomputers (4)
- trento (4)
- 200m (3)
- babies (3)
- brain drain (3)
- charlie munger (3)
- cheng ting hsu (3)
- chet baker (3)
- correlation (3)
- ecosystems (3)
- equity risk premium (3)
- facebook (3)
- fannie (3)
- feminism (3)
- fst (3)
- intellectual ventures (3)
- jim simons (3)
- language (3)
- lee kwan yew (3)
- lewontin fallacy (3)
- lhc (3)
- magic (3)
- michael lewis (3)
- mit (3)
- nathan myhrvold (3)
- neal stephenson (3)
- olympiads (3)
- path integrals (3)
- risk preference (3)
- search (3)
- sec (3)
- sivs (3)
- society generale (3)
- systemic risk (3)
- thailand (3)
- twitter (3)
- alibaba (2)
- bear stearns (2)
- bruce springsteen (2)
- charles babbage (2)
- cloning (2)
- david mamet (2)
- digital books (2)
- donald mackenzie (2)
- drugs (2)
- dune (2)
- exchange rates (2)
- frauds (2)
- freddie (2)
- gaussian copula (2)
- heinlein (2)
- industrial revolution (2)
- james watson (2)
- ltcm (2)
- mating (2)
- mba (2)
- mccain (2)
- monkeys (2)
- national character (2)
- nicholas metropolis (2)
- no holds barred (2)
- offices (2)
- oligarchs (2)
- palin (2)
- population structure (2)
- prisoner's dilemma (2)
- singapore (2)
- skidelsky (2)
- socgen (2)
- sprints (2)
- star wars (2)
- ussr (2)
- variance (2)
- virtual reality (2)
- war nerd (2)
- abx (1)
- anathem (1)
- andrew lo (1)
- antikythera mechanism (1)
- athens (1)
- atlas shrugged (1)
- ayn rand (1)
- bay area (1)
- beats (1)
- book search (1)
- bunnie huang (1)
- car dealers (1)
- carlos slim (1)
- catastrophe bonds (1)
- cdos (1)
- ces 2008 (1)
- chance (1)
- children (1)
- cochran-harpending (1)
- cpi (1)
- david x. li (1)
- dick cavett (1)
- dolomites (1)
- eharmony (1)
- eliot spitzer (1)
- escorts (1)
- faces (1)
- fads (1)
- favorite posts (1)
- fiber optic cable (1)
- francis crick (1)
- gary brecher (1)
- gizmos (1)
- greece (1)
- greenspan (1)
- hypocrisy (1)
- igon value (1)
- iit (1)
- inflation (1)
- information asymmetry (1)
- iphone (1)
- jack kerouac (1)
- jaynes (1)
- jazz (1)
- jfk (1)
- john dolan (1)
- john kerry (1)
- john paulson (1)
- john searle (1)
- john tierney (1)
- jonathan littell (1)
- las vegas (1)
- lawyers (1)
- lehman auction (1)
- les bienveillantes (1)
- lowell wood (1)
- lse (1)
- machine (1)
- mcgeorge bundy (1)
- mexico (1)
- michael jackson (1)
- mickey rourke (1)
- migration (1)
- money:tech (1)
- myron scholes (1)
- netwon institute (1)
- networks (1)
- newton institute (1)
- nfl (1)
- oliver stone (1)
- phil gramm (1)
- philanthropy (1)
- philip greenspun (1)
- portfolio theory (1)
- power laws (1)
- pyschology (1)
- randomness (1)
- recession (1)
- sales (1)
- skype (1)
- standard deviation (1)
- starship troopers (1)
- students today (1)
- teleportation (1)
- tierney lab blog (1)
- tomonaga (1)
- tyler cowen (1)
- venice (1)
- violence (1)
- virtual meetings (1)
- wealth effect (1)
11 comments:
Machine translation better than a child could produce with a dictionary will always be ten years away.
Being bilingual has other benefits: http://www.nytimes.com/2012/03/18/opinion/sunday/the-benefits-of-bilingualism.html?_r=0
Steve, there are at least two ways to approach your question. One is strictly utilitarian - what is the empirical "gain" to being able to perform a task that a machine can do nearly as well (think of a washing machine which replaced your hands and a washboard); the other more aesthetic - language and its comprehension contain cultural cues that a machine cannot replace.
On the former, the answer, I suspect, is "no." It's not worth spending time learning a task that a machine can easily and cheaply replicate.
On the latter, the answer, I also suspect, is "yes." Knowing a language provides insights and understandings into a culture that having a machine reproduce will be lost.
On a bit of a tangent, think about the current situation in Japan, where a not insignificant number of young Japanese are beginning to show weakness in kanji recognition, because they use mobile telephones and other devices that one can input without really knowing how to read or write the characters properly. I suspect a similar thing is happening in China as well; indeed, many younger Chinese cannot write the full form characters very well, owing to the introduction of simplifications a half century ago.
Something is gained, but something of value is lost.
I've read that the advent of computer technology has led many public schools to cease teaching cursive handwriting, which leads me to suspect that before too long the ability to sign one's name in anything but block letters will become a marker of one's social class. Perhaps the ability to speak or read a foreign language without machine assistance will become another such marker.
I've been told that on the other hand some kanji are getting *more* exposure now, because computers don't know that the characters are supposed to be obscure and thus show them instead of spelling the word syllable by syllable. If the word appears often enough where it can be guessed from context, then this may result in improved recognition of kanji that would otherwise have become obsolete.
This is good enough for tourists, emergency medical. I suspect it will lead to more explosive cultural encounters if used more widely because the cultural context will still be foreign. Most of my mistakes in Chinese come when I think in English and then translate it over into Chinese. The fewest mistakes come when I'm thinking in Chinese. Maybe one day the machine will think in Chinese for you, but probably not for decades. Thus in order to use it to be truly fluent, you would need to speak "Chinese-English" (not Chinglish) to get the machine to spit out the right words. I already do this with Google Translate when I know it is giving me the wrong results. But to do that, you'd already need to be fluent...
Those interested in deep learning might want to check out deeplearning.net
I had not seen the Google acquisition news until now--thanks.
That's much like the old joke about a chess computer that could beat the world champion. But this did finally happen, and now the computers are so much stronger that a match is pointless without giving the human some kind of odds.
[Sorry if this reply appears twice]
This used to be the conventional wisdom about when a computer chess program would beat the world champion. But that did finally happen, and by now the computers are so much stronger that a contest is pointless without giving the human some kind of odds.
You must know that there's an inherent difference between categories of problem, some amenable to brute force, others not. Don't think chess, think go, only much more so.
I wonder how often a similar argument was advanced to explain why computers could never win at chess? I agree (good) machine translation is a qualitatively different problem than chess, but at what point does a large enough quantitative difference in computer power/algorithms translate into a qualitative difference in results? I would also venture that machine translation is past the "child with a dictionary" stage despite its obvious imperfections. There is a great deal of territory between "child with a dictionary" and "native speaker" (and then how much more between that and "articulate and knowledgeable native speaker"?). As difficult as some problems are, I find betting against consistent exponential growth (combined with human algorithmic ingenuity) hard to justify.
Though I understand this is still far from world champion, there is progress on go: http://en.wikipedia.org/wiki/Computer_Go#Recent_results
Post a Comment