Rethinking VAE: From Continuous to Discrete Representations Without Probabilistic Assumptions

Songxuan Shi
Department of Applied Physics
Beijing University of Technology
Beijing 100124, China
shisongxuan@emails.bjut.edu.cn

Abstract

This paper explores the generative capabilities of Autoencoders (AEs) and establishes connections between Variational Autoencoders (VAEs) and Vector Quantized-Variational Autoencoders (VQ-VAEs) through a reformulated training framework. We demonstrate that AEs exhibit generative potential via latent space interpolation and perturbation, albeit limited by undefined regions in the encoding space. To address this, we propose a new VAE-like training method that introduces clustering centers to enhance data compactness and ensure well-defined latent spaces without relying on traditional KL divergence or reparameterization techniques. Experimental results on MNIST, CelebA, and FashionMNIST datasets show smooth interpolative transitions, though blurriness persists. Extending this approach to multiple learnable vectors, we observe a natural progression toward a VQ-VAE-like model in continuous space. However, when the encoder outputs multiple vectors, the model degenerates into a discrete Autoencoder (VQ-AE), which combines image fragments without learning semantic representations. Our findings highlight the critical role of encoding space compactness and dispersion in generative modeling and provide insights into the intrinsic connections between VAEs and VQ-VAEs, offering a new perspective on their design and limitations.

More qualitative analysis and theoretical derivations are missing and will be supplemented in subsequent versions.

1 Introduction

In recent years, models based on VQVAE and VAE have achieved remarkable success in the field of image generation. van den Oord et al. (2017); Kingma and Welling (2022) However, few research have explicitly explored the connections between them.

In VAE, the generation process is first viewed as fitting the data probability distribution, and the variational lower bound (ELBO) can be derived. The objective of minimizing ELBO thus becomes maximizing the KL divergence. Encoder of VAE outputs two parameters: mean and variance, which are constrained to follow a Gaussian distribution with zero mean and unit variance. Finally, the latent variables are sampled via the reparameterization and fed into the decoder.

VAE-generated results tend to be blurry, as studied in beta-VAE Higgins et al. (2017). This blurriness stems from two factors: (1) the pixel-wise (or MSE-based) reconstruction loss, and (2) the over-regularization caused by the KL divergence. Consequently, follow-up works have proposed alternative regularization schemes based on Wasserstein distance and Jensen divergence to mitigate these issues.Tolstikhin et al. (2019); Deasy et al. (2021)

For a long time, we have been interpreting VAEs using the language of probability theory, but the transition to its seemingly "discrete counterpart" (VQVAE) has not felt entirely natural. In this article, we will reconstruct the VAE framework from a new perspective—one that can be naturally extended to VQVAE. Our research in this study aims solely to provide a fresh perspective on the generative capabilities of VAEs, not to introduce a novel model.

This reformulation also leads to an insight: The generative capability of VAEs stems from the KL divergence constraint, which essentially compresses the data manifold and entangles it in a way that enables smooth interpolation. The role of reparameterization is to make the sampling space more continuous rather than leaving it undefined.

2 Autoencoders possess generative capabilities

As the predecessor of Variational Autoencoders (VAEs), the Autoencoder (AE) has been regarded in some studies as a nonlinear extension of Principal Component Analysis (PCA). Similar to classification tasksRifai et al. (2011), AE projects data onto a new manifoldLee (2023), reshaping the structure of the original distribution. While AE is not conventionally treated as a generative model, prior work has suggested that it can generate new samples through latent space interpolation.Berthelot et al. (2018); Sainburg et al. (2019) Similar with the operation in StyleGAN.Karras et al. (2019, 2018) This section will further explore and validate the generative potential of AEs from this perspective.

2.1 Perturbation and Interpolation of Latent Codes in Autoencoders

Refer to caption — Figure 2.1: Add perturbation on the first 8 dimension of the latent space.From +0.1 to +15 Both the encoder and decoder are implemented as 64-dimensional MLPs, 3 layers,with a 128-dimensional latent (encoding) space.

Here, we first explore perturbations in the latent space of an autoencoder. As shown in Figure 2.1, small perturbations do not significantly alter the decoder’s output. However, under large perturbations, the generated images degrade and may even collapse into merely pure black. Next, we perform linear interpolation between the latent codes of two images Figure 2.2. The interpolation is defined as:

\mathbf{z}=(1-\alpha)\mathbf{z}_{1}+\alpha\mathbf{z}_{2},\quad\alpha\in[0,1]

where $\mathbf{z1}$ and $\mathbf{z2}$ are the latent codes of two input images, and $\alpha$ controls the interpolation weight. The decoder produces smoothly transitioning outputs, demonstrating continuous variation between the original images.

From this simple experiment, we can conclude that AEs possess a certain degree of generative capability. However, sampling needs to occur within a defined data space; randomly adding perturbations will only yield meaningless results.

This leads us to wonder: what if we introduce "constraints" in another way?

A common example is classification problems. Image classification is a process that makes data linearly separable, essentially learning a low-dimensional manifold of the data; otherwise, divergent encodings would be unusable. If we train an image classification network before training the AE, it might make the encoding space more compact. Rifai et al. (2011); Connor et al. (2021); Leeb et al. (2023)

As illustrated in Figure 2.3 and Figure 2.4, the classification network’s middle section is a 3-layer, 64-dimension MLP. After training the classifier, we freeze its parameters and feed the intermediate dimension data to the decoder for reconstruction. At this point, applying perturbations to the image no longer results in nearly pure black outputs. Instead, we observe a tendency for the images to transform into other digits, with subtle, though not very pronounced, changes in style.The biggest perturbation here is +50, which is much larger than the +15 from the previous experiment. Simultaneously, linear interpolating within this classification network yielded the same results as with the AE.

3 A New VAE Training Method

In the previous section, we discussed how data compactness and definition within the encoding space determine whether a model possesses generative capabilities (continuous transitions between two samples). In this section, we will explore a new VAE-like training method and its connection to VQVAE.

3.1 Methods

To introduce "oscillation" during the AE training process, allowing the encoding space to spread across the entire space and avoid undefined points, and to some extent ensure that the manifold boundaries are not separated like in a classification task, we introduce the concept of "clustering centers." The model structure is shown in Figure 3.1

First, we select N images from the dataset and pass them through the same encoder to produce vectors. This batch is used to train for reconstruction, maintaining the overall reconstruction capability of the network.

Simultaneously, we have a learnable vector. We then take M additional images from the dataset and find the image that has the minimum Mean Squared Error (MSE) with this learnable vector (dot product or cosine similarity could also be used). We then constrain the encoded result of this chosen image to have the minimum MSE with this learnable vector, while also requiring that this vector, when passed to the decoder, can reconstruct that specific image. (This term is optional and was not included in subsequent experiments, as we observed it spontaneously decreases when the first two terms are constrained.)

Therefore, the total loss for the entire model is:

	$\displaystyle\mathcal{L}_{\text{total}}$	$\displaystyle=\mathcal{L}_{\text{reconstruction}}+\lambda_{1}\mathcal{L}_{% \text{encoder\_pull}}(+\lambda_{2}\mathcal{L}_{\text{decoder\_reconstruct\_% center}})$
	$\displaystyle\mathcal{L}_{\text{reconstruction}}$	$\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\\|x_{i}-D(E(x_{i}))\\|^{2}$
	$\displaystyle\mathcal{L}_{\text{encoder\_pull}}$	$\displaystyle=\\|E(x_{j}^{*})-v\\|^{2}$
	$\displaystyle\mathcal{L}_{\text{decoder\_reconstruct\_center}}$	$\displaystyle=\\|x_{j}^{*}-D(v)\\|^{2}(Optional)$

where $x_{i}$ is an input image, $E(\cdot)$ is the encoder, $D(\cdot)$ is the decoder, $x_{j}^{*}$ is the closest image, and $v$ is the learnable vector.

The reason we divided the dataset’s batch extraction into M and N here is to prevent skewed batch distribution from affecting the convergence speed and makes experimental debugging more convenient. In practical applications, this division is optional.

3.2 Experiments

Here, the learnable encoding acts as a clustering center. Initially, this vector is meaningless, and the parameters within both the encoder and decoder are randomly initialized. Consequently, the entire vector oscillates very violently. We were inspired by the noise2noise mean regression instinctLehtinen et al. (2018), this vector will eventually converge to the clustering center of the entire dataset amidst this violent oscillation. In this way, we achieve both constraint and a thorough definition of points across the entire data space.

This means the model, on one hand, must maintain its AE functionality, and on the other hand, it must find a central point in the data and gravitate towards it. Experimental results, as shown in Figure 3.2, were conducted on the MNIST, CelebA, and FashionMNIST datasets, demonstrating a perfectly smooth transition. We achieved this training without using any reparameterization or KL divergence constraints. The encoder and decoder here are both convolutional networks, and they were trained for 20 epochs.

However, the blurriness issue still persists. Then we also tracked the Euclidean distance of this learnable encoding from its previous state, Figure 3.3 which reveals a process of decreasing distance from high to low, thus implicitly achieving annealing.

Next, we expanded the number of learnable vectors, transforming it into a codebook design, similar to VQVAE.

On the MNIST/FasionMINIST dataset, we observed that after 20 epochs of training, these vectors landed in different cluster centers. At this point, this constraint was relaxed, and similar to VQVAE, degeneration still occurred. Some of these vectors eventually converged to the same location, or simply collapsed into an identical solution, as shown in Figure 3.4

In Figure 3.5. We directly fed these 20 trained vectors into the decoder to see what they had learned. It’s evident that some learned similar features, some are scattered and largely meaningless, while others are clear numerical images. As for Token 17, it’s notably blurry, very similar to the decoder’s output during the early stages of VAE training.

Simply constraining the MSE to a specific vector cannot achieve generative capability. We then abandoned the previous network design. What we used was a basic Autoencoder, but we added an MSE constraint to the intermediate latent vector, pushing it towards an all-zero vector. The results, as shown in Figure 3.6, indicate severe collapse. This highlights the importance of reparameterization and the dynamic matching mechanism in our new method.

4 Training with Multiple Vectors

4.1 Training with multiple vectors makes the data space more flexible.

Here, we focus on using a codebook rather than a single vector for training. We previously observed that FashionMNIST is highly prone to collapsp, ultimately becoming indistinguishable from a single vector.

In VQ-VAE, one of the strategies used is Exponential Moving Average (EMA)van den Oord et al. (2017). EMA reduces the oscillation of individual vectors through momentum updates, helping to prevent them from collapsing into the same solution. The core formula for updating a codebook vector $e_{k}$ using EMA is denoted as follows:

\mathbf{e}_{k}\leftarrow\mathbf{e}_{k}+\alpha(\mathbf{z}_{q}-\mathbf{e}_{k})

Where: $\mathbf{e}_{k}$ is the $k$ -th vector in the codebook. $\mathbf{z}_{q}$ is the latent vector. $\alpha$ is the momentum (or learning) rate.

The final t-SNE is shown in LABEL:fig12. The data space also becomes more flexible, as depicted in LABEL:fig11. Here, we applied a +300 additive perturbation, which caused a T-shirt to become longer and its sleeve length to change. Additionally, distortion also occurred.

4.2 Impact of Encoder Capacity on Learnable Vector Quantity

In LABEL:fig12, we observed something interesting: with 50 learnable vectors, only a small number were properly utilized and attracted data.

To address this, we expanded both the encoder and decoder, making them deeper and incorporating residual connections. The results, shown in Figure 4.2, are promising. This time, with 500 vectors, visibly more of them were correctly "attracted" to data clusters. We then randomly selected 200 of these vectors and fed them into the decoder for visualization. Figure 4.3

Conclusion We have now constructed a preliminary version of a VQVAE in continuous space. It’s worth noting that if we require reconstruction to be performed from the codebook rather than the encoder, and if the encoder outputs multiple vectors instead of just one, then this model becomes entirely equivalent to a standard VQVAE.The capacity of the encoder directly impacts whether enough learnable vectors can be allocated. This "continuous VQVAE" still possesses a certain degree of generative capability, although it requires larger perturbations.

4.3 VQ-AutoEncoder

Given that VQVAE modifies our decoder input to come from a codebook rather than directly from the encoder’s output, let’s try to generalize this. We’ll allow the decoder to accept the encoder’s input (following our previous experimental setup), and at this point, the convolutional encoder network will output multiple vectors instead of single.

What we observe then is that:This generative model completely degenerates into an autoencoder, but with its output constrained by the codebook. Any image processed through the decoder will necessarily result in a combination of vectors from that codebook

From Figure 4.6, we observe that at this point, no matter how much perturbation is added, the model does not gain a transition capability. Instead, it only increases distortion. In Figure 4.5, completely random combinations yield only meaningless results. In Figure 4.8, arbitrarily replacing one of the vectors with another from the codebook also introduces distortion. In Figure 4.7, the reconstruction results of the codebook vectors themselves are indistinguishable. In Figure 4.4, it is visible that although the model’s codebook vectors cannot learn meaningful semantics like in VQVAE, they form the original image through correct combinations, and these are indeed random combinations, not a degeneration into an AE. (The codebook indices used for this set of images are in the Appendix. A.1)

Therefore, we can conclude that in such circumstances, the model tends to become a discrete encoder, which can be referred to as VQ-AE. Although each vector in the codebook lacks semantic meaning, it can generate relatively clear original images through combination.

5 Conclusion

Up to this point, following our new construction method, VQVAE naturally establishes a connection with VAE, enabling image generation even without using the original training approach. The key to image generation lies in the compactness of the encoding space and its sufficient dispersion to ensure all points are well-defined. When learnable encodings, acting as cluster constraint centers, become dispersed, larger perturbations are required for smooth image transitions. However, when we extend the encoder’s output from a single encoding to multiple (7x7), the model degenerates into a discrete Autoencoder. In this case, it doesn’t learn semantics similar to VQVAE but rather learns image fragments. The encoder and decoder merely learn how to combine these fragments. The underlying reasons for this behavior are currently unclear to us.

6 Related Work

Recent developments in generative modeling have explored the interplay between discrete and continuous latent spaces, latent compactness, and their implications for unsupervised learning. Discrete latent models, such as DVAE++ Vahdat et al. (2018), address the challenge of smoothing the optimization landscape in non-continuous spaces, which directly relates to our exploration of bridging VAE and VQ-VAE from a non-probabilistic perspective. Similarly, Vector Quantized Wasserstein Autoencoder (VQ-WAE) Vuong et al. (2021) integrates optimal transport objectives into quantized models, offering additional insights into the stability and structure of learned codebooks.

Several works also incorporate clustering and latent representation learning into the generative framework. For instance, Deep Generative Clustering with VAEs Adipoetra and Martin (2021) and Variational Clustering Prasad et al. (2021) show how VAEs can naturally extend to unsupervised clustering tasks by shaping the latent manifold. These methods highlight the dual role of VAEs as both generative models and latent space organizers—supporting our hypothesis that generative quality is a function of manifold compression.

Additionally, works like Joint Optimization of Autoencoders for Clustering and Embedding Fard et al. (2020) propose combined objectives for both embedding and clustering, providing architectural cues for learning compact, semantically meaningful representations. Our work builds upon these ideas by exploring how constraining the encoder space, even through deterministic mechanisms like classifiers, can result in more structured and semantically meaningful generation.

References

Adipoetra and Martin [2021] Michael Adipoetra and Ségolène Martin. Deep generative clustering with vaes and expectation-maximization. arXiv preprint arXiv:2103.10365, 2021.
Berthelot et al. [2018] David Berthelot, Colin Raffel, Aurko Roy, and Ian J. Goodfellow. Understanding and improving interpolation in autoencoders via an adversarial regularizer. CoRR, abs/1807.07543, 2018. URL http://arxiv.org/abs/1807.07543.
Connor et al. [2021] Marissa C. Connor, Gregory H. Canal, and Christopher J. Rozell. Variational autoencoder with learned latent structure, 2021. URL https://arxiv.org/abs/2006.10597.
Deasy et al. [2021] Jacob Deasy, Nikola Simidjievski, and Pietro Liò. Constraining variational inference with geometric jensen-shannon divergence, 2021. URL https://arxiv.org/abs/2006.10599.
Fard et al. [2020] Arash Fard, Thibaut Thonet, and Emilie Morvant. Joint optimization of an autoencoder for clustering and embedding. In Asian Conference on Machine Learning, pages 124–139, 2020.
Higgins et al. [2017] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-VAE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Sy2fzU9gl.
Karras et al. [2018] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation, 2018. URL https://arxiv.org/abs/1710.10196.
Karras et al. [2019] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks, 2019. URL https://arxiv.org/abs/1812.04948.
Kingma and Welling [2022] Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2022. URL https://arxiv.org/abs/1312.6114.
Lee [2023] Yonghyeon Lee. A geometric perspective on autoencoders, 2023. URL https://arxiv.org/abs/2309.08247.
Leeb et al. [2023] Felix Leeb, Stefan Bauer, Michel Besserve, and Bernhard Schölkopf. Exploring the latent space of autoencoders with interventional assays, 2023. URL https://arxiv.org/abs/2106.16091.
Lehtinen et al. [2018] Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, and Timo Aila. Noise2noise: Learning image restoration without clean data, 2018. URL https://arxiv.org/abs/1803.04189.
Prasad et al. [2021] Vignesh Prasad, Dipanjan Das, and Brojeshwar Bhowmick. Variational clustering: Leveraging variational autoencoders for image clustering. In Proceedings of the International Conference on Pattern Recognition (ICPR), 2021.
Rifai et al. [2011] Salah Rifai, Yann Dauphin, Pascal Vincent, Y. Bengio, and Xavier Muller. The manifold tangent classifier. Advances in Neural Information Processing Systems, 01 2011.
Sainburg et al. [2019] Tim Sainburg, Marvin Thielk, Brad Theilman, Benjamin Migliori, and Timothy Gentner. Generative adversarial interpolative autoencoding: adversarial training on latent space interpolations encourage convex latent distributions, 2019. URL https://arxiv.org/abs/1807.06650.
Tolstikhin et al. [2019] Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein auto-encoders, 2019. URL https://arxiv.org/abs/1711.01558.
Vahdat et al. [2018] Arash Vahdat, William G. Macready, and Zhengbing Bian. Dvae++: Discrete variational autoencoders with overlapping transformations. arXiv preprint arXiv:1802.04920, 2018.
van den Oord et al. [2017] Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. CoRR, abs/1711.00937, 2017. URL http://arxiv.org/abs/1711.00937.
Vuong et al. [2021] Tung-Long Vuong, Trung Le, and He Zhao. Vector quantized wasserstein auto-encoder. arXiv preprint arXiv:2104.06872, 2021.

Appendix A Appendices

A.1 Codebook Indices Used in the Experiment

In our experiment, the encoder’s output convolution map has dimensions of 7x7 with 512 channels. Below are the codebook indices matched to each convolutional map location (with channels unfolded as vectors, consistent with VQVAE operations).

\begin{bmatrix}9&10&10&10&10&10&10\\ 47&44&44&44&72&64&37\\ 47&44&44&26&30&96&32\\ 47&26&72&25&89&11&94\\ 18&71&69&3&3&1&8\\ 19&14&20&16&16&16&87\\ 19&7&7&37&50&37&37\end{bmatrix}\quad\begin{bmatrix}9&17&21&31&36&79&10\\ 47&58&24&60&3&95&58\\ 47&44&54&89&23&37&58\\ 47&44&59&60&57&38&58\\ 47&44&59&60&74&38&58\\ 47&44&59&60&57&38&58\\ 19&7&54&94&57&38&7\end{bmatrix}\quad\begin{bmatrix}9&17&21&80&31&79&17\\ 19&25&0&65&65&0&6\\ 91&40&65&0&0&89&53\\ 91&40&65&0&0&89&93\\ 28&69&69&0&0&33&81\\ 28&1&69&65&0&42&2\\ 91&32&33&1&0&33&81\end{bmatrix}

\begin{bmatrix}9&17&85&36&36&79&10\\ 47&26&27&0&22&64&58\\ 47&26&27&0&0&64&58\\ 47&26&27&41&11&64&58\\ 47&26&30&66&11&6&58\\ 47&26&30&81&0&6&58\\ 19&26&30&93&80&6&7\end{bmatrix}\quad\begin{bmatrix}9&17&17&10&17&79&10\\ 19&29&23&82&59&32&6\\ 91&43&3&3&65&3&66\\ 73&60&68&65&65&65&74\\ 73&60&68&68&65&65&94\\ 73&60&68&60&68&65&94\\ 76&39&39&39&39&39&75\end{bmatrix}\quad\begin{bmatrix}9&45&77&36&36&77&17\\ 19&35&35&35&95&35&50\\ 47&70&62&35&35&35&50\\ 47&26&35&87&87&64&58\\ 47&26&35&87&87&64&58\\ 47&7&35&87&87&35&58\\ 19&72&35&87&87&87&37\end{bmatrix}

\begin{bmatrix}9&10&45&36&5&10&10\\ 47&44&72&80&66&58&58\\ 47&44&72&0&66&58&58\\ 47&44&70&0&49&37&58\\ 47&44&29&65&32&38&58\\ 47&26&30&65&3&6&58\\ 19&7&40&1&1&34&7\end{bmatrix}\quad\begin{bmatrix}9&10&17&77&77&17&10\\ 47&44&72&66&4&6&58\\ 47&26&29&11&84&49&37\\ 19&59&3&65&65&3&66\\ 91&67&3&65&89&3&81\\ 28&55&3&65&65&65&2\\ 19&25&94&94&96&96&93\end{bmatrix}\quad\begin{bmatrix}9&10&10&10&10&10&10\\ 47&44&44&26&7&44&26\\ 47&26&7&71&67&86&71\\ 18&71&84&0&0&0&63\\ 51&80&0&56&63&63&22\\ 19&50&12&12&14&14&14\\ 19&7&7&7&7&7&7\end{bmatrix}

\begin{bmatrix}9&10&21&80&49&17&10\\ 47&26&4&22&80&6&58\\ 47&44&4&32&80&38&58\\ 47&44&70&32&80&38&58\\ 47&44&72&32&22&38&58\\ 47&44&72&32&27&38&58\\ 19&7&72&49&32&38&7\end{bmatrix}\quad\begin{bmatrix}9&10&10&10&10&17&45\\ 47&44&26&86&71&84&75\\ 47&26&4&1&60&55&78\\ 19&4&63&75&3&22&80\\ 67&56&8&56&22&32&80\\ 73&80&80&80&62&27&22\\ 91&12&14&14&50&14&46\end{bmatrix}\quad\begin{bmatrix}9&45&36&22&27&31&17\\ 19&29&11&0&0&1&34\\ 19&27&11&63&22&0&81\\ 91&69&11&22&22&0&2\\ 28&0&65&69&80&0&49\\ 28&0&65&0&0&0&32\\ 76&74&39&39&24&63&32\end{bmatrix}

\begin{bmatrix}9&17&85&13&83&79&10\\ 47&59&3&68&68&23&38\\ 47&29&74&65&65&74&6\\ 19&30&68&65&65&60&34\\ 19&43&60&68&68&89&53\\ 91&40&69&60&60&42&93\\ 91&43&46&46&46&30&53\end{bmatrix}\quad\begin{bmatrix}9&17&17&17&17&17&17\\ 19&4&87&67&67&78&78\\ 91&27&22&22&0&0&0\\ 76&22&22&0&0&0&0\\ 73&27&80&0&0&0&80\\ 73&22&22&63&22&22&24\\ 91&37&50&50&37&37&37\end{bmatrix}\quad\begin{bmatrix}9&10&10&10&10&10&10\\ 47&44&44&7&7&26&7\\ 47&26&7&25&80&78&66\\ 19&70&80&30&0&0&16\\ 18&43&63&32&63&1&80\\ 51&46&20&20&46&16&46\\ 19&7&7&7&7&7&7\end{bmatrix}

\begin{bmatrix}9&45&85&13&83&5&17\\ 19&43&3&65&8&3&34\\ 91&56&3&68&65&8&53\\ 47&50&69&68&68&93&50\\ 47&58&69&68&68&93&58\\ 47&72&69&68&68&93&58\\ 19&72&1&94&94&93&7\end{bmatrix}\quad\begin{bmatrix}9&10&10&10&10&10&10\\ 47&44&44&26&86&37&86\\ 47&44&26&72&48&69&32\\ 19&26&86&87&22&89&0\\ 18&87&48&80&80&1&80\\ 51&14&20&20&20&16&20\\ 19&7&7&7&7&7&7\end{bmatrix}\quad\begin{bmatrix}9&17&17&17&17&17&17\\ 18&71&71&71&71&71&78\\ 80&0&0&0&0&0&1\\ 40&69&8&69&69&8&96\\ 69&0&0&0&0&0&1\\ 69&1&0&1&1&1&1\\ 51&14&14&14&12&12&12\end{bmatrix}

\begin{bmatrix}9&10&17&56&2&17&10\\ 47&44&72&89&57&37&58\\ 47&44&72&89&57&38&58\\ 47&44&70&65&1&6&58\\ 47&44&59&65&0&6&58\\ 47&44&54&65&74&38&58\\ 19&7&54&1&57&38&7\end{bmatrix}\quad\begin{bmatrix}9&10&45&13&36&5&10\\ 47&44&72&89&60&53&58\\ 47&44&54&89&68&34&58\\ 47&44&54&60&60&34&58\\ 47&44&59&60&68&53&58\\ 47&44&29&60&60&53&58\\ 19&7&25&16&16&12&58\end{bmatrix}