Thursday, March 28, 2013

Machine translation

In a 2010 post, I wrote:
Machines and bilingualism: I had a terrifying thought the other day. I would guess that at 90 percent confidence level machine translation and voice recognition will be good enough in 20 years that people will be able to communicate pretty well across most language barriers using cheap and unobtrusive devices. If so, is it worth all this effort to make sure my kids are bilingual?

I say it's terrifying because of the significant effort we're expending on our Bilingual Kids Project -- including relocating to Taiwan for this sabbatical. Another point of clarification: I'm not saying in 20 years we'll have AI (far from it). But something that translates basic phrases and simple content (surely we'll have that: Moore's law, massive corpora of translated text, statistical machine learning, yada yada) would reduce significantly the value of all but the most sophisticated language skills.
This video shows the current state of the art from Microsoft Research. See the realtime speech to speech (English to Chinese) demo starting at just before 6 minutes in. The demo could have been faked a bit -- Rashid might be sticking to a prepared script -- but let's hope it was legitimate.


11 comments:

  1. Machine translation better than a child could produce with a dictionary will always be ten years away.

    ReplyDelete
  2. Malvenuto1:04 PM

    Being bilingual has other benefits: http://www.nytimes.com/2012/03/18/opinion/sunday/the-benefits-of-bilingualism.html?_r=0

    ReplyDelete
  3. Steve, there are at least two ways to approach your question. One is strictly utilitarian - what is the empirical "gain" to being able to perform a task that a machine can do nearly as well (think of a washing machine which replaced your hands and a washboard); the other more aesthetic - language and its comprehension contain cultural cues that a machine cannot replace.

    On the former, the answer, I suspect, is "no." It's not worth spending time learning a task that a machine can easily and cheaply replicate.

    On the latter, the answer, I also suspect, is "yes." Knowing a language provides insights and understandings into a culture that having a machine reproduce will be lost.

    On a bit of a tangent, think about the current situation in Japan, where a not insignificant number of young Japanese are beginning to show weakness in kanji recognition, because they use mobile telephones and other devices that one can input without really knowing how to read or write the characters properly. I suspect a similar thing is happening in China as well; indeed, many younger Chinese cannot write the full form characters very well, owing to the introduction of simplifications a half century ago.

    Something is gained, but something of value is lost.

    ReplyDelete
  4. Kudzu_Bob10:40 PM

    I've read that the advent of computer technology has led many public schools to cease teaching cursive handwriting, which leads me to suspect that before too long the ability to sign one's name in anything but block letters will become a marker of one's social class. Perhaps the ability to speak or read a foreign language without machine assistance will become another such marker.

    ReplyDelete
  5. Anonymous_IV11:49 PM

    I've been told that on the other hand some kanji are getting *more* exposure now, because computers don't know that the characters are supposed to be obscure and thus show them instead of spelling the word syllable by syllable. If the word appears often enough where it can be guessed from context, then this may result in improved recognition of kanji that would otherwise have become obsolete.

    ReplyDelete
  6. RagnarDanneskjold3:04 AM

    This is good enough for tourists, emergency medical. I suspect it will lead to more explosive cultural encounters if used more widely because the cultural context will still be foreign. Most of my mistakes in Chinese come when I think in English and then translate it over into Chinese. The fewest mistakes come when I'm thinking in Chinese. Maybe one day the machine will think in Chinese for you, but probably not for decades. Thus in order to use it to be truly fluent, you would need to speak "Chinese-English" (not Chinglish) to get the machine to spit out the right words. I already do this with Google Translate when I know it is giving me the wrong results. But to do that, you'd already need to be fluent...

    ReplyDelete
  7. Richard Seiter7:01 PM

    Those interested in deep learning might want to check out deeplearning.net
    I had not seen the Google acquisition news until now--thanks.

    ReplyDelete
  8. Anonymous_IV9:02 PM

    That's much like the old joke about a chess computer that could beat the world champion. But this did finally happen, and now the computers are so much stronger that a match is pointless without giving the human some kind of odds.

    ReplyDelete
  9. Anonymous_IV9:05 PM

    [Sorry if this reply appears twice]


    This used to be the conventional wisdom about when a computer chess program would beat the world champion. But that did finally happen, and by now the computers are so much stronger that a contest is pointless without giving the human some kind of odds.

    ReplyDelete
  10. You must know that there's an inherent difference between categories of problem, some amenable to brute force, others not. Don't think chess, think go, only much more so.

    ReplyDelete
  11. Richard Seiter10:47 AM

    I wonder how often a similar argument was advanced to explain why computers could never win at chess? I agree (good) machine translation is a qualitatively different problem than chess, but at what point does a large enough quantitative difference in computer power/algorithms translate into a qualitative difference in results? I would also venture that machine translation is past the "child with a dictionary" stage despite its obvious imperfections. There is a great deal of territory between "child with a dictionary" and "native speaker" (and then how much more between that and "articulate and knowledgeable native speaker"?). As difficult as some problems are, I find betting against consistent exponential growth (combined with human algorithmic ingenuity) hard to justify.

    Though I understand this is still far from world champion, there is progress on go: http://en.wikipedia.org/wiki/Computer_Go#Recent_results

    ReplyDelete