# Machine Learning Mailing List - Issue 4

Jason Phang, Tue 11 April 2017, Machine learning mailing list

### CycleGANs¶

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks - GitHub
by Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros

CycleGANs continue a pattern I've observed of simple tricks leading to amazing results in the Deep Learning world.

The setup is simple: we want to learn a function $G:X \rightarrow Y$ that maps from one domain $X$ (e.g. pictures of horses) to another domain $Y$ (e.g. pictures of zebras). We want to do this without paired data - usually, in learning these kinds of mappings, we have a one-to-one pairing between horse and zebra images. However, such data sets are often difficult to come by, so in this constrained case we just have a set of horse images and a set of zebra images. So $G$ is not so much learning a one-to-one mapping as it is learning a distribution.

However, learning a distribution in this case isn't that interesting - if we can arbitrarily learn to map any one horse image to an entirely different zebra image (via GAN training), that's not really a well-defined problem. The algorithm will probably just learn to memorize number of zebra images, randomly output those at the right distribution and call it a day. We want the model to actually learn something interesting about the similarities and differences between horses and zebras! Now here's the trick: we also learn the reverse function mapping $F: Y \rightarrow X$, and we add a loss term corresponding to the cycle-consistency criterion: we want $F(G(x)) \approx x$. This forces both $G$ and $F$ to have to translate between horses and zebra images, while still retaining enough information to reobtain the original image. This new conditional is what helps us get the "one-to-one" relationship we want. The final loss function is pretty neat:

$$L(G, F, D_x, D_y) = L_{GAN}(G, D_y, X, Y) + L_{GAN}(G, D_x, Y, X) + \lambda L_{cyc}(G, F)$$

Look how neatly symmetric it is! This roughly translates to:

• Make $G$ good at translating from horses to zebras
• Make $F$ good at translating from zebras to horses
• Make sure $G$ and $F$ keep enough information to reobtain the original image

The results speak for themselves. Just look at this horse to zebra video:

Definitely check out the GitHub repo and paper.

### Sentimental Neurons¶

Learning to Generate Reviews and Discovering Sentiment - OpenAI Post - Github
by Alec Radford, Rafal Jozefowicz, Ilya Sutskever

This is a little bit of a weird paper. It feels like the researchers were originally trying to solve one problem, and along the way discovered this one cool thing and just ran with it.

The original problem is like so: We want a better NLP model, word2vec and things like it are cute but don't generalize all that well. There's this new idea to train a general model on sequence-to-sequence prediction, and then fine-tune that model to a specific task at hand. Specifically, they trained a character-by-character prediction model on 80 millions Amazon reviews. (They had such an abundance of data that they were fine using just 0.1% as their validation and test sets.) And after that they fine-tuned the model to predict the sentiment of those reviews using a logistic regression. This model, with 4096 hidden mLSTM units, worked pretty well.

Then they took a closer look at their learned model and realized there was one very special hidden unit, where if just used this one unit for prediction, it performs about 99% as well as the full model.

You can even see how the unit's activations changes as it generates text character-by-character:

Coolness aside, I'm not entirely sure what to make of this. Did they just luck out by finding a single unit that captures sentiment, as opposed to a complex interaction of many units? Did their $L1$ regularization encourage the units to be more sparse and independent? I don't know the answer to these questions, but they were kind enough to open-source the model on GitHub(but not the training setup), so it might be worth checking out.

### Google is getting really good at generating speech and sound¶

Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model - GitHub Page
by Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders - Magenta Post - GitHub
by Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, Mohammad Norouzi

Two more papers from Google with fancy audio generative models. The questionably-named "Tacotron" is a fully end-to-end test-to-speech model that builds introduces a new CBHG (Convolution Bank + Highway network + bidirectional GRU) unit its core building block. It's character-to-speech model means that it can actually make very reasonable attempts at pronouncing entirely out-of-domain words like "otolaryngology". Unfortunately they haven't released the code/model, but it's still really cool.

Meanwhile, NSynth follows the lineage of WaveNet and produces instrument sounds of single notes. What's cool is that they then tried linearly combining the latent codes of the instruments (word2vec style) to produce entirely new instruments. The samples are

• DeepMind has open-sourced its internal Tensorflow wrapper library, Sonnet. The Internet's consensus seems to be that it's cool but there're already too many TF-wrapper libraries.
• Google also open-sourced its own seq2seq tool, tf-seq2seq. This confuses me deeply. It works based on configuration files (like CNTK, Caffe) while being closely tied to Tensorflow. I'm not sure if this makes things easier.
• DeepMind's AlphaGo next match has been announced. Besides finally challenging China's top player Ke Jie, AlphaGo will also be fighting a team of humans and be paired with humans in different games. Fun!
• There's a documentary on AlphaGo at the Tribeca Film Festival! As of posting, there're still tickets available.

Contents of this post are intended for entertainment, and only secondarily for information purposes. All opinions, mistakes and misunderstandings are my own.