Run this notebook

Use Livebook to open this notebook and explore new ideas.

It is easy to get started, on your machine or the cloud.

Click below to open and run it in your Livebook at .

(or change your Livebook location)

# Training an Autoencoder on Fashion MNIST ```elixir Mix.install([ {:axon, "~> 0.3.0"}, {:nx, "~> 0.4.0", override: true}, {:exla, "~> 0.4.0"}, {:scidata, "~> 0.1.9"} ]) Nx.Defn.default_options(compiler: EXLA) ``` ## Introduction An autoencoder is a deep learning model which consists of two parts: encoder and decoder. The encoder compresses high dimensional data into a low dimensional representation and feeds it to the decoder. The decoder tries to recreate the original data from the low dimensional representation. Autoencoders can be used in the following problems: * Dimensionality reduction * Noise reduction * Generative models * Data augmentation Let's walk through a basic autoencoder implementation in Axon to get a better understanding of how they work in practice. ## Downloading the data To train and test how our model works, we use one of the most popular data sets: [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist). It consists of small black and white images of clothes. Loading this data set is very simple with the help of `Scidata`. ```elixir {image_data, _label_data} = Scidata.FashionMNIST.download() {bin, type, shape} = image_data ``` We get the data in a raw format, but this is exactly the information we need to build an Nx tensor. ```elixir train_images = bin |> Nx.from_binary(type) |> Nx.reshape(shape) |> Nx.divide(255.0) ``` We also normalize pixel values into the range $[0, 1]$. <!-- livebook:{"break_markdown":true} --> We can visualize one of the images by looking at the tensor heatmap: ```elixir Nx.to_heatmap(train_images[1]) ``` ## Encoder and decoder First we need to define the encoder and decoder. Both are one-layer neural networks. In the encoder, we start by flattening the input, so we get from shape `{batch_size, 1, 28, 28}` to `{batch_size, 784}` and we pass the input into a dense layer. Our dense layer has only `latent_dim` number of neurons. The `latent_dim` (or the latent space) is a compressed representation of data. Remember, we want our encoder to compress the input data into a lower-dimensional representation, so we choose a `latent_dim` which is less than the dimensionality of the input. ```elixir encoder = fn x, latent_dim -> x |> Axon.flatten() |> Axon.dense(latent_dim, activation: :relu) end ``` Next, we pass the output of the encoder to the decoder and try to reconstruct the compressed data into its original form. Since our original input had a dimensionality of 784, we use a dense layer with 784 neurons. Because our original data was normalized to have pixel values between 0 and 1, we use a `:sigmoid` activation in our dense layer to squeeze output values between 0 and 1. Our original input shape was 28x28, so we use `Axon.reshape` to convert the flattened representation of the outputs into an image with correct the width and height. ```elixir decoder = fn x -> x |> Axon.dense(784, activation: :sigmoid) |> Axon.reshape({:batch, 1, 28, 28}) end ``` If we just bind the encoder and decoder sequentially, we'll get the desired model. This was pretty smooth, wasn't it? ```elixir model = Axon.input("input", shape: {nil, 1, 28, 28}) |> encoder.(64) |> decoder.() ``` ## Training the model Finally, we can train the model. We'll use the `:adam` and `:mean_squared_error` loss with `Axon.Loop.trainer`. Our loss function will measure the aggregate error between pixels of original images and the model's reconstructed images. We'll also `:mean_absolute_error` using `Axon.Loop.metric`. `Axon.Loop.run` trains the model with the given training data. ```elixir batch_size = 32 epochs = 5 batched_images = Nx.to_batched(train_images, batch_size) train_batches = Stream.zip(batched_images, batched_images) params = model |> Axon.Loop.trainer(:mean_squared_error, :adam) |> Axon.Loop.metric(:mean_absolute_error, "Error") |> Axon.Loop.run(train_batches, %{}, epochs: epochs, compiler: EXLA) ``` ## Extra: losses To better understand what is mean absolute error (MAE) and mean square error (MSE) let's go through an example. ```elixir # Error definitions for a single sample mean_square_error = fn y_pred, y -> y_pred |> Nx.subtract(y) |> Nx.power(2) |> Nx.mean() end mean_absolute_error = fn y_pred, y -> y_pred |> Nx.subtract(y) |> Nx.abs() |> Nx.mean() end ``` We will work with a sample image of a shoe, a slightly noised version of that image, and also an entirely different image from the dataset. ```elixir shoe_image = train_images[0] noised_shoe_image = Nx.add(shoe_image, Nx.random_normal(shoe_image, 0.0, 0.05)) other_image = train_images[1] :ok ``` For the same image both errors should be 0, because when we have two exact copies, there is no pixel difference. ```elixir { mean_square_error.(shoe_image, shoe_image), mean_absolute_error.(shoe_image, shoe_image) } ``` Now the noised image: ```elixir { mean_square_error.(shoe_image, noised_shoe_image), mean_absolute_error.(shoe_image, noised_shoe_image) } ``` And a different image: ```elixir { mean_square_error.(shoe_image, other_image), mean_absolute_error.(shoe_image, other_image) } ``` As we can see, the noised image has a non-zero MSE and MAE but is much smaller than the error of two completely different pictures. In other words, both of these error types measure the level of similarity between images. A small error implies decent prediction values. On the other hand, a large error value suggests poor quality of predictions. If you look at our implementation of MAE and MSE, you will notice that they are very similar. MAE and MSE can also be called the $L_1$ and $L_2$ loss respectively for the $L_1$ and $L_2$ norm. The $L_2$ loss (MSE) is typically preferred because it's a smoother function whereas $L_1$ is often difficult to optimize with stochastic gradient descent (SGD). ## Inference Now, let's see how our model is doing! We will compare a sample image before and after compression. ```elixir sample_image = train_images[0..0//1] compressed_image = Axon.predict(model, params, sample_image, compiler: EXLA) sample_image |> Nx.to_heatmap() |> IO.inspect(label: "Original") compressed_image |> Nx.to_heatmap() |> IO.inspect(label: "Compressed") :ok ``` As we can see, the generated image is similar to the input image. The only difference between them is the absence of a sign in the middle of the second shoe. The model treated the sign as noise and bled this into the plain shoe.
See source

Have you already installed Livebook?

If you already installed Livebook, you can configure the default Livebook location where you want to open notebooks.
Livebook up Checking status We can't reach this Livebook (but we saved your preference anyway)
Run notebook

Not yet? Install Livebook in just a minute

Livebook is open source, free, and ready to run anywhere.

Run in the cloud

on select platforms

To run on Linux, Docker, embedded devices, or Elixir’s Mix, check our README.

PLATINUM SPONSORS
SPONSORS
Code navigation with go to definition of modules and functions Read More