Friday, December 08, 2017

Recursive Cortical Networks: data efficient computer vision



Will knowledge from neuroscience inform the design of better AIs (neural nets)? These results from startup Vicarious AI suggest that the answer is yes! (See also this company blog post describing the research.)

It has often been remarked that evolved biological systems (e.g., a baby) can learn much faster and using much less data than existing artificial neural nets. Significant improvements in AI are almost certainly within reach...

Thanks to reader and former UO Physics colleague Raghuveer Parthasarathy for a pointer to this paper!
A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs

Science 08 Dec 2017: Vol. 358, Issue 6368, eaag2612
DOI: 10.1126/science.aag2612

INTRODUCTION
Compositionality, generalization, and learning from a few examples are among the hallmarks of human intelligence. CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart), images used by websites to block automated interactions, are examples of problems that are easy for people but difficult for computers. CAPTCHAs add clutter and crowd letters together to create a chicken-and-egg problem for algorithmic classifiers—the classifiers work well for characters that have been segmented out, but segmenting requires an understanding of the characters, which may be rendered in a combinatorial number of ways. CAPTCHAs also demonstrate human data efficiency: A recent deep-learning approach for parsing one specific CAPTCHA style required millions of labeled examples, whereas humans solve new styles without explicit training.

By drawing inspiration from systems neuroscience, we introduce recursive cortical network (RCN), a probabilistic generative model for vision in which message-passing–based inference handles recognition, segmentation, and reasoning in a unified manner. RCN learns with very little training data and fundamentally breaks the defense of modern text-based CAPTCHAs by generatively segmenting characters. In addition, RCN outperforms deep neural networks on a variety of benchmarks while being orders of magnitude more data-efficient.

RATIONALE
Modern deep neural networks resemble the feed-forward hierarchy of simple and complex cells in the neocortex. Neuroscience has postulated computational roles for lateral and feedback connections, segregated contour and surface representations, and border-ownership coding observed in the visual cortex, yet these features are not commonly used by deep neural nets. We hypothesized that systematically incorporating these findings into a new model could lead to higher data efficiency and generalization. Structured probabilistic models provide a natural framework for incorporating prior knowledge, and belief propagation (BP) is an inference algorithm that can match the cortical computational speed. The representational choices in RCN were determined by investigating the computational underpinnings of neuroscience data under the constraint that accurate inference should be possible using BP.

RESULTS
RCN was effective in breaking a wide variety of CAPTCHAs with very little training data and without using CAPTCHA-specific heuristics. By comparison, a convolutional neural network required a 50,000-fold larger training set and was less robust to perturbations to the input. Similar results are shown on one- and few-shot MNIST (modified National Institute of Standards and Technology handwritten digit data set) classification, where RCN was significantly more robust to clutter introduced during testing. As a generative model, RCN outperformed neural network models when tested on noisy and cluttered examples and generated realistic samples from one-shot training of handwritten characters. RCN also proved to be effective at an occlusion reasoning task that required identifying the precise relationships between characters at multiple points of overlap. On a standard benchmark for parsing text in natural scenes, RCN outperformed state-of-the-art deep-learning methods while requiring 300-fold less training data.

CONCLUSION
Our work demonstrates that structured probabilistic models that incorporate inductive biases from neuroscience can lead to robust, generalizable machine learning models that learn with high data efficiency. In addition, our model’s effectiveness in breaking text-based CAPTCHAs with very little training data suggests that websites should seek more robust mechanisms for detecting automated interactions.

No comments:

Post a Comment