这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Information preservation with wasserstein autoencoders: generation consistency and adversarial robustness

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Amongst the numerous variants Variational Autoencoder (VAE) has inspired, the Wasserstein Autoencoder (WAE) stands out due to its heightened generative quality and intriguing theoretical properties. WAEs consist of an encoding and a decoding network— forming a bottleneck— with the prime objective of generating new samples resembling the ones it was catered to. In the process, they aim to achieve a target latent representation of the encoded data. Our work offers a comprehensive theoretical understanding of the machinery behind WAEs. From a statistical viewpoint, we pose the problem as concurrent density estimation tasks based on neural network-induced transformations. This allows us to establish deterministic upper bounds on the realized errors WAEs commit, supported by simulations on real and synthetic data sets. We also analyze the propagation of these stochastic errors in the presence of adversaries. As a result, both the large sample properties of the reconstructed distribution and the resilience of WAE models are explored.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. A bounded measurable function \(h: \mathbb {R} \rightarrow \mathbb {R}\) is said to be sigmoidal if \(\lim _{x \rightarrow -\infty }h(x) = 0\) and \(\lim _{x \rightarrow \infty }h(x) = 1\). Common examples of such activation functions include logistic, hyperbolic tangent, and \(h(x) = \text {ReLU}(x) - \text {ReLU}(x-1)\).

  2. The doubling dimension is defined as \(d^{*}(X) = \log _{2}\lambda \), where \(\lambda \ge 1\) (doubling constant) is the smallest number such that at most \(\lambda \) balls of half radius are needed to cover every ball in X.

  3. See the exact form the transported density achieves under the co-area formula in McCann and Pass (2020), Section 2. In general cases concerning transformations between variables, the surplus multiplicand is rather \(\text {vol}[J_{E}(x)]:=\) product of singular values of the \(k \times d\) Jacobian matrix \(J_{E}\) (Ben-Israel 1999).

  4. Such transformations can merely be of the Rosenblatt type. Given that \(p_{\rho } \in \mathcal {C}^{m_{z}}_{R}(\Omega _{z})\), for some \(m_{z} \in \mathbb {R}_{> 0}\); E can be shown to be smooth in the sense of Hölder-Zygmund (Asatryan et al. 2023).

  5. A Lipschitz map \(f:(\mathcal {Z},c_{z}) \rightarrow (\mathcal {X},c_{x})\) is said to be regular if there exists a constant \(C>0\) such that given any ball B in \(\mathcal {Z}\), \(f^{-1}(B)\) can be covered by at most C balls, each of radius \(C.\text {rad}(B)\) (David and Semmes 2000).

  6. We follow the fasano.franceschini.test implementation (Puritz et al. 2021) on R.

  7. We implement the cramer package (Franz 2006) in R with underlying kernel specified to \(\kappa (z) = \sqrt{z}/2\).

References

  • Ashtiani, H., Ben-David, S., Mehrabian, A.: Sample-efficient learning of mixtures. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  • Asatryan, H., Gottschalk, H., Lippert, M., Rottmann, M.: A convenient infinite dimensional framework for generative adversarial learning. Electron. J. Stat 17(1), 391–428 (2023)

    Article  MathSciNet  Google Scholar 

  • Acharya, J., Jafarpour, A., Orlitsky, A., Suresh, A.T.: Sorting with adversarial comparators and application to density estimation. In: 2014 IEEE International Symposium on Information Theory, pp. 1682–1686 (2014). IEEE

  • Anil, C., Lucas, J., Grosse, R.: Sorting out lipschitz function approximation. In: International Conference on Machine Learning, pp. 291–301 (2019). PMLR

  • Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39(3), 930–945 (1993)

    Article  MathSciNet  Google Scholar 

  • Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  • Baringhaus, L., Franz, C.: On a new multivariate two-sample test. J. Multivar. Anal. 88(1), 190–206 (2004)

    Article  MathSciNet  Google Scholar 

  • Ben-Israel, A.: The change-of-variables formula using matrix volume. SIAM J. Matrix Anal. Appl. 21(1), 300–312 (1999)

    Article  MathSciNet  Google Scholar 

  • Birrell, J., Katsoulakis, M.A., Rey-Bellet, L., Zhu, W.: Structure-preserving gans. In: International Conference on Machine Learning (2022)

  • Bourgain, J.: On lipschitz embedding of finite metric spaces in hilbert space. Israel J. Math. 52, 46–52 (1985)

    Article  MathSciNet  Google Scholar 

  • Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math. 44(4), 375–417 (1991)

    Article  MathSciNet  Google Scholar 

  • Bartal, Y., Recht, B., Schulman, L.J.: Dimensionality reduction: beyond the johnson-lindenstrauss bound. In: Proceedings of the Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 868–887 (2011). SIAM

  • Caffarelli, L.A.: Monotonicity properties of optimal transportation and the fkg and related inequalities. Commun. Math. Phys. 214, 547–563 (2000)

    Article  MathSciNet  Google Scholar 

  • Chakraborty, S., Bartlett, P.: A statistical analysis of wasserstein autoencoders for intrinsically low-dimensional data. In: The 12th International Conference on Learning Representations (2024)

  • Chakrabarty, A., Das, S.: Statistical regeneration guarantees of the wasserstein autoencoder with latent space consistency. Advances in Neural Information Processing Systems (NeurIPS) 34, 17098–17110 (2021)

    Google Scholar 

  • Colombo, M., Fathi, M.: Bounds on optimal transport maps onto log-concave measures. J. Differ. Equ 271, 1007–1022 (2021)

    Article  MathSciNet  Google Scholar 

  • Courty, N., Flamary, R., Ducoffe, M.: Learning wasserstein embeddings. In: International Conference on Learning Representations (2018)

  • Chen, M., Gao, C., Ren, Z.: A general decision theory for Huber’s \(\epsilon \)-contamination model. Electron. J. Stat. 10, 3752–3774 (2016)

    Article  MathSciNet  Google Scholar 

  • Chen, M., Jiang, H., Liao, W., Zhao, T.: Efficient approximation of deep relu networks for functions on low dimensional manifolds. Advances in neural information processing systems 32 (2019)

  • Chen, Z., Katsoulakis, M., Rey-Bellet, L., Zhu, W.: Sample complexity of probability divergences under group symmetry. In: International Conference on Machine Learning, pp. 4713–4734 (2023). PMLR

  • Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? Journal of the ACM (JACM) 58(3), 1–37 (2011)

    Article  MathSciNet  Google Scholar 

  • Chiappori, P.-A., McCann, R.J., Pass, B.: Multi-to one-dimensional optimal transport. Commun. Pure Appl. Math. 70(12), 2405–2444 (2017)

    Article  MathSciNet  Google Scholar 

  • Chernodub, A., Nowicki, D.: Norm-preserving orthogonal permutation linear unit activation functions (oplu). arXiv preprint arXiv:1604.02313 (2016)

  • Caragea, A., Petersen, P., Voigtlaender, F.: Neural network approximation and estimation of classifiers with classification boundary in a barron class. arXiv preprint arXiv:2011.09363 (2020)

  • Daubechies, I., DeVore, R., Foucart, S., Hanin, B., Petrova, G.: Nonlinear approximation and (deep) relu networks. Constr. Approx. 55(1), 127–172 (2022)

    Article  MathSciNet  Google Scholar 

  • Devroye, L., Gyorfi, L.: No empirical probability measure can converge in the total variation sense for all distributions. The Annals of Statistics, 1496–1499 (1990)

  • Dvurechensky, P., Gasnikov, A., Kroshnin, A.: Computational optimal transport: Complexity by accelerated gradient descent is better than by sinkhorn’s algorithm. In: Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80 (2018)

  • Devroye, L., Lugosi, G.: Combinatorial Methods in Density Estimation. Springer (2001)

  • David, G., Semmes, S.: Regular mappings between dimensions. Publicacions Matematiques, 369–417 (2000)

  • Dai, B., Wipf, D.: Diagnosing and enhancing VAE models. In: International Conference on Learning Representations (2019)

  • Dai, B., Wang, Y., Aston, J., Hua, G., Wipf, D.: Connections with robust pca and the role of emergent sparsity in variational autoencoder models. J. Mach. Learn. Res. 19(1), 1573–1614 (2018)

    MathSciNet  Google Scholar 

  • Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Trans. Inf. Theory 49(7), 1858–1860 (2003)

    Article  MathSciNet  Google Scholar 

  • Fasano, G., Franceschini, A.: A multidimensional version of the kolmogorov-smirnov test. Mon. Not. R. Astron. Soc. 225(1), 155–170 (1987)

    Article  Google Scholar 

  • Franz, C.: cramer: multivariate nonparametric cramer-test for the two-sample-problem. R package version 0.8-1 (2006)

  • Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res 13(1), 723–773 (2012)

    MathSciNet  Google Scholar 

  • Giné, E., Guillou, A.: Rates of strong uniform consistency for multivariate kernel density estimators. In: Annales de l’Institut Henri Poincare (B) Probability and Statistics, vol. 38, pp. 907–921 (2002). Elsevier

  • Gottlieb, L.-A., Kontorovich, A., Krauthgamer, R.: Efficient regression in metric spaces via approximate lipschitz extension. IEEE Trans. Inf. Theory 63(8), 4838–4849 (2017)

    Article  MathSciNet  Google Scholar 

  • Gribonval, R., Kutyniok, G., Nielsen, M., Voigtlaender, F.: Approximation spaces of deep neural networks. Constr. Approx. 55(1), 259–367 (2022)

    Article  MathSciNet  Google Scholar 

  • Giné, E., Nickl, R.: Mathematical Foundations of Infinite-dimensional Statistical Models. Cambridge university press (2021)

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

  • Huster, T., Chiang, C.-Y.J., Chadha, R.: Limitations of the lipschitz constant as a defense against adversarial examples. In: ECML PKDD 2018 Workshops. Springer (2019)

  • Husain, H., Nock, R., Williamson, R.C.: A primal-dual link between gans and autoencoders. Advances in Neural Information Processing Systems 32 (2019)

  • Hyvärinen, A., Pajunen, P.: Nonlinear independent component analysis: Existence and uniqueness results. Neural Netw. 12(3), 429–439 (1999)

    Article  Google Scholar 

  • Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Conference in modern analysis and probability 26, 189–206 (1984)

    Article  MathSciNet  Google Scholar 

  • Jain, A., Orlitsky, A.: A general method for robust learning from batches. Adv. Neural. Inf. Process. Syst. 33, 21775–21785 (2020)

    Google Scholar 

  • Jiang, Z., Zheng, Y., Tan, H., Tang, B., Zhou, H.: Variational deep embedding: An unsupervised and generative approach to clustering. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1965–1972 (2017)

  • Klusowski, J.M., Barron, A.R.: Approximation by combinations of relu and squared relu ridge functions with \(l^{1}\) and \(l^{0}\) controls. IEEE Trans. Inf. Theory 64(12), 7649–7656 (2018)

    Article  Google Scholar 

  • Khemakhem, I., Kingma, D., Monti, R., Hyvarinen, A.: Variational autoencoders and nonlinear ica: A unifying framework. In: International Conference on Artificial Intelligence and Statistics, pp. 2207–2217 (2020). PMLR

  • Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations (2014)

  • LeCun, Y., Cortes, C., Burges J.c., C.: MNIST handwritten digit database. https://yann.lecun.com/exdb/mnist/

  • Li, C.T., Farnia, F.: Mode-seeking divergences: Theory and applications to gans. In: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 206, pp. 8321–8350. PMLR (2023)

  • Liu, H., Gao, C.: Density estimation with contamination: minimax rates and theory of adaptation. Electron. J. Stat. 13, 3613–3653 (2019)

    Article  MathSciNet  Google Scholar 

  • Lee, H., Ge, R., Ma, T., Risteski, A., Arora, S.: On the ability of neural nets to express distributions. In: Conference on Learning Theory, pp. 1271–1296 (2017). PMLR

  • Liu, T., Kumar, P., Zhou, R., Liu, X.: Learning from few samples: Transformation-invariant svms with composition and locality at multiple scales. Adv. Neural. Inf. Process. Syst. 35, 9151–9163 (2022)

    Google Scholar 

  • Liu, Z., Loh, P.-L.: Robust W-GAN-based estimation under Wasserstein contamination. Information and Inference: A Journal of the IMA 12(1), 312–362 (2022)

    Article  MathSciNet  Google Scholar 

  • Larsen, K.G., Nelson, J.: Optimality of the johnson-lindenstrauss lemma. In: 58th Annual Symposium on Foundations of Computer Science (FOCS), pp. 633–638 (2017). IEEE

  • Moriakov, N., Adler, J., Teuwen, J.: Kernel of cyclegan as a principal homogeneous space. In: International Conference on Learning Representations (2020)

  • McCann, R.J.: Existence and uniqueness of monotone measure-preserving maps. Duke Math. J. 80(2), 309–323 (1995)

    Article  MathSciNet  Google Scholar 

  • Montanelli, H., Du, Q.: New error bounds for deep relu networks using sparse grids. SIAM Journal on Mathematics of Data Science 1(1), 78–92 (2019)

    Article  MathSciNet  Google Scholar 

  • Modeste, T., Dombry, C.: Characterization of translation invariant mmd on r d and connections with wasserstein distances (2022)

  • Mahabadi, S., Makarychev, K., Makarychev, Y., Razenshteyn, I.: Nonlinear dimension reduction via outer bi-lipschitz extensions. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1088–1101 (2018)

  • McCann, R.J., Pass, B.: Optimal transportation between unequal dimensions. Arch. Ration. Mech. Anal. 238(3), 1475–1520 (2020)

    Article  MathSciNet  Google Scholar 

  • Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)

    Article  MathSciNet  Google Scholar 

  • Nickl, R., Pötscher, B.M.: Bracketing metric entropy rates and empirical central limit theorems for function classes of besov-and sobolev-type. J. Theor. Probab. 20, 177–199 (2007)

    Article  MathSciNet  Google Scholar 

  • Peacock, J.A.: Two-dimensional goodness-of-fit testing in astronomy. Monthly Notices of the Royal Astronomical Society 202(3) (1983)

  • Puritz, C., Ness-Cohn, E., Braun, R.: fasano. franceschini. test: An implementation of a multidimensional ks test in r. arXiv preprint arXiv:2106.10539 (2021)

  • Petersen, P., Voigtlaender, F.: Optimal approximation of piecewise smooth functions using deep relu neural networks. Neural Netw. 108, 296–330 (2018)

    Article  Google Scholar 

  • Pope, P., Zhu, C., Abdelkader, A., Goldblum, M., Goldstein, T.: The intrinsic dimension of images and its impact on learning. In: International Conference on Learning Representations (2021)

  • Rolinek, M., Zietlow, D., Martius, G.: Variational autoencoders pursue pca directions (by accident). In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12406–12415 (2019)

  • Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.: On integral probability metrics, \(\phi \)-divergences and binary classification. preprint arXiv:0901.2698 (2009)

  • Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.: Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 11, 1517–1561 (2010)

    MathSciNet  Google Scholar 

  • Suzuki, T.: Adaptivity of deep reLU network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality. In: International Conference on Learning Representations (2019)

  • Shen, Z., Yang, H., Zhang, S.: Deep network approximation characterized by number of neurons. arXiv preprint arXiv:1906.05497 (2019)

  • Tanielian, U., Biau, G.: Approximating lipschitz continuous functions with groupsort neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 442–450 (2021). PMLR

  • Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders. In: International Conference on Learning Representations (2018)

  • Topsoe, F.: Some inequalities for information divergence and related measures of discrimination. IEEE Trans. Inf. Theory 46(4), 1602–1609 (2000)

  • Vaart, A.W.: Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press (2000)

  • Vaart, A., Wellner, J.A.: Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer (1996)

  • Vladimirova, M., Girard, S., Nguyen, H., Arbel, J.: Sub-weibull distributions: Generalizing sub-gaussian and sub-exponential properties to heavier tailed distributions. Stat 9(1) (2020)

  • Van Handel, R.: Probability in high dimension. Technical report, PRINCETON UNIV NJ (2014)

  • Villani, C.: Optimal Transport: Old and New vol. 338. Springer (2009)

  • Virmaux, A., Scaman, K.: Lipschitz regularity of deep neural networks: analysis and efficient estimation. Advances in Neural Information Processing Systems 31 (2018)

  • Wojtowytsch, S.: Representation formulas and pointwise properties for barron functions. Calc. Var. Partial. Differ. Equ. 61(2), 1–37 (2022)

    MathSciNet  Google Scholar 

  • Weed, J., Bach, F.R.: Sharp asymptotic and finite-sample rates of convergence of empirical measures in wasserstein distance. Bernoulli (2017)

  • Wei, R., Garcia, C., El-Sayed, A., Peterson, V., Mahmood, A.: Variations in variational autoencoders - a comparative evaluation. IEEE Access 8 (2020)

  • Yarotsky, D.: Error bounds for approximations with deep relu networks. Neural Netw. 94, 103–114 (2017)

    Article  Google Scholar 

  • Yarotsky, D.: Optimal approximation of continuous functions by very deep relu networks. In: Conference on Learning Theory, pp. 639–649 (2018). PMLR

  • Yang, Y., Li, Z., Wang, Y.: On the capacity of deep generative networks for approximating distributions. Neural networks 145 (2022)

  • Zhu, B., Jiao, J., Steinhardt, J.: Generalized resilience and robust statistics. Ann. Stat. 50(4), 2256–2283 (2022)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swagatam Das.

Ethics declarations

Code availability

All codes, along with implementation details, can be found in the following repository https://github.com/Thecoder1012/Decons_Wae.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 337 KB)

: Experiments and Simulations

: Experiments and Simulations

Table 1 Two-sample tests of equality on latent and encoded distributions
Fig. 17
figure 17

Concentration of bin estimates corresponding to Five-Gaussian data under ReLU encoders (yellow) against target latent Beta(0.5, 0.8) copula (blue), over epochs (200, 800, 1400, 2000) (left to right from top) in a WAE-GAN setup

Fig. 18
figure 18

Concentration of bin estimates corresponding to Five-Gaussian under ReLU encoders (yellow), given latent bivariate Gaussian distribution (blue), over epochs (200, 800, 1400, 1800) (left to right from top) in a WAE-MMD setup with regularization \(\lambda =0.1\)

Fig. 19
figure 19

Concentration of bin estimates (yellow) against latent Gaussian distribution (blue) and corresponding QQ plots of marginals (upper), for epochs 500 (top row) and 4000 (bottom row) for Swiss roll data. Evidently, the encoded distribution preserves information from the input data and matches the target marginals simultaneously

Fig. 20
figure 20

Actual Swiss roll data (top left) vs reconstructed samples (\(n=10000\)) after epochs (1000, 4000, 8000) (clockwise from top right) under MMD latent loss

Fig. 21
figure 21

Evolution of information preservation over epochs (0, 200, 1000, 1800) (clockwise) based on the propagation of quantile-quantile (QQ) plots of marginals corresponding to encoded (blue) vs latent distribution (red) under ReLU encoders given Five-Gaussian input data, in a WAE-MMD setup with regularization \(\lambda =0.1\)

1.1 Encoded vs. Latent Distributions

To check the efficacy of a WAE-encoding statistically, we perform two-sample non-parametric tests of equality on target latent and encoded observations. Peacock (1983) suggested a multi-dimensional generalization to the well-known Kolmogorov–Smirnov test, which, however, has high computational complexity. In our study, we identify the test suggested by Fasano and Franceschini (1987) (FF) as a suitable alternative based on its manageable complexity without sacrificing the power and consistencyFootnote 6. Many suggestions have been made to improve the reliability of two-sample tests in higher dimensions since. However, given the ease of implementation, we use FF’s version of the KS test. To ascertain our findings, we additionally carry out a referral test of equality of distributions, based on kernelized distances between pairs of observations, namely the Cramér test (Baringhaus and Franz 2004). Unlike FF, the test statistic corresponding to CramérFootnote 7 is not distribution-free, and requires bootstrapping to obtain the p-value. We utilize two of such methods, namely the usual Monte-Carlo (MC) and calculating the approximate eigenvalues (EV) as the weights to the limiting distribution (of the statistic). Test results on the Five-Gaussian data at \(5\%\) level of significance, given \(\lambda = 0.8\), are as follows:

The test results corroborate our theoretical findings. Though we only establish an upper bound to the latent loss, it is apparent that there lies an optimization error due to the minimum distance estimate. As such, under a metrizing measure of discrepancy, the target latent law and the encoded estimate must be distinct in distribution. The rejection of the null hypothesis reiterates the same observation.

It becomes even more clear looking at the histograms corresponding to samples from the two, overlaid. The interesting observation from Fig 17, 18 is the visual manifestation of information preservation. Semantic information, in the form of cluster structures originally present in the data set, remains intact in encoded distributions while trying to maximize similarity with their target counterparts. The evolution of this ‘maximization’ is clear from the histograms obtained over epochs. Another viewpoint that attests to this finding is the quantile-quantile plot [Fig 21] of the marginals.

Fig. 22
figure 22

Sample corrected (\(\times n^{\frac{1}{3}}\)) Wasserstein reconstruction error corresponding to WAE-MMD for Five-Gaussian input data using (a) a decoder that follows the architecture of Theorem 5.2, and (b) one that violates the width criteria therein, having a comparable number of parameters. (c) Regenerated sample from the latter model after 4000 epochs. The second model does not exhibit accurate reconstruction and the associated errors follow a much slower convergence rate in the process

Fig. 23
figure 23

Reconstruction error of Five-Gaussian data under (a) JS and (b), (c) sample corrected (\(\times n^{\frac{1}{2}}\)) MMD latent loss, using GroupSort encoders (grouping 2)

Fig. 24
figure 24

Reconstructed samples (\(n=10000\)) from Five-Gaussian dataset with \(10\%\) observations contaminated at level 0.2, under MMD latent loss. The corrupting distribution remains standard tri-variate Cauchy

Fig. 25
figure 25

Reconstructed samples (\(n=2000\)) from Swiss roll dataset with \(10\%\) observations contaminated at level 0.01 & 0.1 (left to right) and \(20\%\) observations contaminated at level 0.1, under MMD latent loss. The corrupting distribution is taken to be standard tri-variate Cauchy

Fig. 26
figure 26

Reconstructed samples (\(n=10,000\)) from Five-Gaussian dataset with \(10\%\) observations contaminated at level 0.2, under MMD latent loss. The corrupting distribution is taken to be standard tri-variate Cauchy

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chakrabarty, A., Basu, A. & Das, S. Information preservation with wasserstein autoencoders: generation consistency and adversarial robustness. Stat Comput 35, 114 (2025). https://doi.org/10.1007/s11222-025-10657-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1007/s11222-025-10657-z

Keywords