Facebook AI Research (FAIR) announces Caffe2! Caffe is a popular C++ deep learning library that focuses on image-related applications (see: Caffe Model Zoo). While it apparently got a lot of use in production settings, it lost steam in the Internet circles, probably because it's a little less suited for agile research and so didn't see a lot of popular new models adapted for it. Caffe2 is a modernized version of Caffe, now with blessing from Facebook.
This sets us up for a nice dichotomy between Google and Facebook's AI teams. Google developed Tensorflow, a one-stop, single-library solution for seamless use in both research and production. Facebook instead went for two libraries: PyTorch optimized for research, and Caffe2 for deployments. Google has doubled-down and continues to build on its static computation graph paradigm, while PyTorch went ahead with dynamic computation, which is advantageous both for research-development as well as recurrent neural networks (Caffe2 is still pretty unoptimized for dynamic graphs). It's always good to have more competition!
Rohan & Lenny #3: Recurrent Neural Networks
by Rohan Kapur, Lenny Khazan
A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN
by Dhruv Parthasarathy
I wanted to highlight two good explanatory reads this week, aside from my usual barrage of papers and GitHub repos.
The first is really in-depth walkthrough of RNNs by two pretty young writers (I think one of them is only just graduating high school!). But don't let their age dissuade you from this read - this is post has everything, from in-depth math to well-research and sourced intepretations of the various quirks of RNN design choices.
The second is a run through the history and lineage of Mask R-CNNs, a new image segmentation network from FAIR. I was pretty lost when going through the Mask R-CNN paper as it primarily talked about its improvements over predecessors, so this post was a great pre-reading.
Taming Recurrent Neural Networks for Better Summarization - arXiv
by Abigail See, Peter J. Liu, Christopher D. Manning
In usual RNN summarization networks (or really, any seq2seq network), the workflow is that you run your input sequence through the network, which outputs a semantic embedding of your input, and then you run that through another network to get your output sequence. Additional mechanisms like attention allow the decoding network to look at the semantic embedding from prior timesteps, but the idea is still the same - compress the input into an embedding in a smaller space, then decode your output from that embedding.
This network takes a slightly different and fairly sensible approach, which is to allow the network to just copy the input at times, rather than only decoding off the semantic embedding. At each step in the decoding process, the network can decide whether to copy from the input (through an attention-like mechanism), or to pull from its semantic embedding. This only neatly solves the issue with out-of-vocabulary words in both input and output. (One might expect this approach to also be useful for translating sentences with names and proper nouns.)
(I remember a time when my winning strategy for Chinese tests, which involved reading a passage and answering some questions, was to simply find the most relevant sentence in the passage and copy it verbatim as the answer. This would only get me about 70% of the marks, but hey, 70% is good enough for a B.)
What's more notable is that the authors followed up with an intuitive and approachable post describing the motivation and intution behind their work. It has pretty informative diagrams/animations and walks the reader through the problem and approach in a more comprehensive manner than the more condensed treatment in their paper. It's a really nice companion read to their paper, and I look forward to more researchers following in their footsteps.
Especially Yann LeCun's comment at NIPS 2016 that GANs are the important idea in deep learning in the last 10-20 years, everyone's been racing to get out their coolest GANs in the wild. So it's nice that someone's been nice enough to round them up and put them into a Zoo. There isn't much more information than a link to each GAN, but at least it might prevent GAN name collisions.
A quick round-up of papers I really haven't had time to read (carefully):
A Large Self-Annotated Corpus for Sarcasm
by Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli
Some guys at Princeton went and rounded up 1.3 million examples of sarcasm from Reddit posts by using the /s tag. I really hate the idea of the /s tag, but at least it's being put to good use. As with any data set sourced from Internet communities, I'm guessing it'll be skewed and noisy in all kinds of weird ways, but more data is always better.
Learning to Fly by Crashing
by Dhiraj Gandhi, Lerrel Pinto, Abhinav Gupta
It's hard to train self-flying drones because researchers are generally afraid of crashing their expensive drones. Apparently transfer-learning from simulations isn't good enough to overcome the lack of negative flying experience. So some guys at CMU decided to bite the bullet and crash their drone 11,500 times, for the good of science. The novel approach in this paper is that they built a drone that's made to crash and be rebuilt inexpensively (I'm not kidding).
Generative Face Completion
by Yijun Li, Sifei Liu, Jimei Yang, Ming-Hsuan Yang
Adobe continues to amaze with its image manipulation research. This time, we've got a network that fills in blanked out parts of face portraits, and it works pretty amazingly. Granted, it apparently only works well on well-positioned and framed faces, but the results are really still quite stunning. Look below, for example, at how the network fills in the blanked out sunglasses with actual eyes. Very cool stuff.
Differentiable Neural Computer (DNC))
DeepMind has open-sourced an implementation of their Differentiable Neural Computer, from their paper in Nature. I don't know too much about this, but it is notable that it uses the sonnet library that they open-sourced about 2 weeks ago. This may be standard practice for DeepMind repos from now on.
Contents of this post are intended for entertainment, and only secondarily for information purposes. All opinions, omissions, mistakes and misunderstandings are my own.