AlphaGo uses two neural nets: one for move selection ("policy") and the other for position evaluation ("value"), but also uses MC search trees. Its strength is roughly top 1000 or so among all human players. In a few months it is scheduled to play one of the very best players in the world.
For training they used a 30 million position Go database of expert games (KGS Go Server). I have no intuition as to whether this is enough data to train the policy and value NNs. The quality of these NNs must be relatively good, as the MC tree search used was much smaller than for DeepBlue and its hand-crafted evaluation function.
Some grandmasters who reviewed AlphaGo's games were impressed by the "humanlike" quality of its play. More discussion: HNN, Reddit.
Mastering the game of Go with deep neural networks and tree search
Nature 529, 484–489 (28 January 2016) doi:10.1038/nature16961
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
Schematic representation of the neural network architecture used in AlphaGo. The policy network takes a representation of the board position s as its input, passes it through many convolutional layers with parameters σ (SL policy network) or ρ (RL policy network), and outputs a probability distribution p (a|s) or p (a|s) over legal moves a, represented by a σρ probability map over the board. The value network similarly uses many convolutional layers with parameters θ, but outputs a scalar value vθ(s′) that predicts the expected outcome in position s′.
Related News: commenter STS points me to some work showing the equivalence of Deep Learning to the Renormalization Group in physics. See also Quanta magazine. The key aspect of RG here is the identification of important degrees of freedom in the process of coarse graining. These degrees of freedom make up so-called Effective Field Theories in particle physics.
These are the days of miracle and wonder!