No big movers this week, but a surprisingly large number of papers :(
A quick round-up of papers I really haven't had time to read (carefully):
Emergence of Locomotion Behaviours in Rich Environments - YouTube - The Verge
by Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver (DeepMind)
I haven't actually read this paper, but the videos are really cool. DeepMind continues on their quest to solve Deep Reinforcement Learning - this time having humanoid (and occasionally arachnoid) agents run through various obstacled environment. The single reward function is forward progress (running left-to-right). The movements all look weird and uncanny-valley-like, but it is really cool nonetheless.
Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency - Code - Project
by Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik
Unsupervised Learning of Depth and Ego-Motion from Video - Code - Project
by Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe
The Confluence of Geometry and Learning
by Shubham Tulsiani and Tinghui Zhou (BAIR)
Here's a write-up of some cross-over papers between researchers at BAIR for inferring a 3D landscape from 2D images. The idea is as follows. Ultimately, we want to train a predictor $P$ that maps from 2D images to a 3D landscape. So $P: I \rightarrow S$, where $I$ is an image and $S$ is the resulting 3D shape. Meanwhile, we have a separate model, verifier $V$, that is able to take a 2D image and a 3D shape and see if they are consistent. (This is done through separate light-ray modeling, and the first paper goes into details about this.) Now, the trick here is that we also give $V$ another image $O$ from a different view/perspective $C$, which it uses as the ground-truth from verifying the $S$ from $P$.
Note the setup: $P$ does not have access to $(O, C)$, and $V$ does not have acccess to $I$. Instead, $P$ needs to supply a shape $S$ that would be valid for any potential view $C$ and its corresponding image $O$. It is pretty neat.
NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles
by Jiajun Lu, Hussein Sibai, Evan Fabry, David Forsyth
In all the ruckus about adversarial attacks on neural networks, here is one piece of good news - adversarial examples may not work as trivially as they first seem, in real life. This paper specifically focuses on the xample of self-driving cars, and seems to show that given the range of potential angles/distances that a self-driving camera is likely to view a potential adversarially tampered-with visual (e.g. a stop sign), those variables are likely to overcome adversarial examples. I have not read the paper in detail, nor do I have the background to judge its conclusion, but it does seem that malicious attackers could take into account the varied viewing conditions in priming their attacks. Nevertheless, it is good for the field to continue to explore the implications of adversarial attacks, and I look forward to perhaps Goodfellow's take on this on his Monday Quora Q&A; (see below).
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
by YuXuan Liu, Abhishek Gupta, Pieter Abbeel, Sergey Levine (UC Berkely/OpenAI)
Overview: Animals/humans learn to imitate by observing others - which is a different viewpoints than the self. The authors explore this new form of imitation learning (which they call imitation-by-observation) by introducing an intermediary model for translating from observations to a self-context.
Meta-Learning with Temporal Convolutions
by Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, Pieter Abbeel
Overview: Approaching Meta-Learning by applying a temporal-convolution-based meta-learner (TCML), i.e. convolutional operations over time. Apparently temporal convolutions out-perform RNNs for this form of meta-learning. Solid results on 1-shot/few-shot learning experiments.
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era - Blog
by Chen Sun, Abhinav Shrivastava, Saurabh Singh, Abhinav Gupta
Overview: Revisiting the old classic, this paper has Overview: Google and CMU throw massive computation resources and a dataset 300x as large as ImageNet. They continue to observe linear improvement in log-scale data size, which is great! It did however take them 2 months of 50 K-80s to train 4 epochs.
Creatism: A deep-learning photographer capable of creating professional work - Blog - Gallery
by Hui Fang, Meng Zhang (Google)
Overview: The authors try to create professional-looking photos from Google Maps images by mimicking a professional photographer's workflow, starting with an image and then applying a series of post-processing operations to enhance the photos. GANs are used to training the filters. According to HN, apparently professional photos just means really intense HDR.
Google's announcing a new initiative focusing on the intersection of human and AI. I think this is a great move. Fully autonomous end-to-end AIs are great to think about, but I think for the near to medium-term future, the human/AI interactions and synergies will be far more exciting and productive. Excited to see what tangibly comes out of this.
Google has pushed up a new tutorial for using TensorFlow for seq2seq modeling. I am not sure if there's anything really new here, but I think it will be a good jumping-in point for people wanting to use TensorFlow for sequence-tasks, especially with the API/best practices not really being settled when TensorFlow first came out.
The Google Brain Residency Program - One Year Later
by Luke Metz, Yun Liu (Google Brain)
The inaugural 1-year Google Brain Resident program has just concluded! They attracted a lot of great minds, put them in Google's offices and let them work on cool ideas for a bit, and the results have been amazing, with a flood of conference papers coming out. I did not realize that this is just the first year of the program! I would be so excited to apply for this.
Recent Evolution of QA Datasets and Going Forward
by Jiwoong Im
A history of QA (Question & Answer) datasets, culminating a pitch for a neural network embedding a search engine (rather than the more conventional use case of a search engine using neural networks). In other words, giving a neural network the ability to explicitly look-up / "google" something, much like external memory networks are able to do with their memory modules.