Abstract
Generative adversarial networks (GANs) have achieved remarkable progress in the natural image field. However, when applying GANs in the remote sensing (RS) image generation task, an extraordinary phenomenon is observed: the GAN model is more sensitive to the amount of training data for RS image generation than for natural image generation (Fig. 1). In other words, the generation quality of RS images will change significantly with the number of training categories or samples per category. In this paper, we first analyze this phenomenon from two kinds of toy experiments and conclude that the amount of feature information contained in the GAN model decreases with reduced training data (Fig. 2). Then we establish a structural causal model (SCM) of the data generation process and interpret the generated data as the counterfactuals. Based on this SCM, we theoretically prove that the quality of generated images is positively correlated with the amount of feature information. This provides insights for enriching the feature information learned by the GAN model during training. Consequently, we propose two innovative adjustment schemes, namely uniformity regularization and entropy regularization, to increase the information learned by the GAN model at the distributional and sample levels, respectively. Extensive experiments on eight RS datasets and three natural datasets show the effectiveness and versatility of our methods. The source code is available at https://github.com/rootSue/Causal-RSGAN.
Similar content being viewed by others
Data Availibility
The benchmark datasets can be downloaded from the literature cited in Sect. 6.1.
References
Abady, L., Barni, M., Garzelli, A., & Tondi, B. (2020). Gan generation of synthetic multispectral satellite images. In Image and signal processing for remote sensing XXVI (Vol. 11533, pp.122–133). SPIE.
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. arXiv:1701.07875
Ashfaq, Q., Akram, U., & Zafar, R. (2021). Thermal image dataset for object classification. Mendeley Data 1.
Aybar, C., Ysuhuaylas, L., Loja, J., Gonzales, K., Herrera, F., Bautista, L., Yali, R., Flores, A., Diaz, L., Cuenca, N., et al. (2022). Cloudsen12, a global dataset for semantic understanding of cloud and cloud shadow in sentinel-2. Scientific data, 9(1), 782.
Bejiga, M. B., Hoxha, G., & Melgani, F. (2020). Improving text encoding for retro-remote sensing. IEEE Geoscience and Remote Sensing Letters, 18(4), 622–626.
Bell-Kligler, S., Shocher, A., & Irani, M. (2019). Blind super-resolution kernel estimation using an internal-gan. Advances in Neural Information Processing Systems, 32.
Bińkowski, M., Sutherland, DJ., Arbel, M., & Gretton, A. (2018). Demystifying mmd gans. In International conference on learning representations.
Borodachov, S. V., Hardin, D. P., & Saff, E. B. (2019). Discrete energy on rectifiable sets. Springer.
Brock, A., Donahue, J., & Simonyan, K. (2018). Large scale gan training for high fidelity natural image synthesis. In International conference on learning representations
Chen, H., Li, W., & Shi, Z. (2021). Adversarial instance augmentation for building change detection in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–16.
Chen, X., Chen, S., Xu, T., Yin, B., Peng, J., Mei, X., & Li, H. (2020). Smapgan: Generative adversarial network-based semisupervised styled map tile generation method. IEEE Transactions on Geoscience and Remote Sensing, 59(5), 4388–4406.
Choi, J., Kim, T., & Kim, C. (2019). Self-ensembling with gan-based data augmentation for domain adaptation in semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, (pp. 6830–6840).
Esser, P., Rombach, R., & Ommer, B. (2021). Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (pp. 12873–12883)
Gao, F., Liu, Q., Sun, J., Hussain, A., & Zhou, H. (2019). Integrated GANs: Semi-supervised SAR target recognition. IEEE Access, 7, 113999–114013.
Gong, C., Han, J., & Lu, X. (2017). Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE, 105(10), 1865–1883.
Goodfellow, IJ., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Proceedings of the 27th international conference on neural information processing systems, (Vol. 2, pp. 2672–2680). MIT Press, NIPS’14.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of wasserstein gans. Advances in neural information processing systems, 30.
Gulrajani, I., Raffel, C., & Metz, L. (2018). Towards gan benchmarks which require generalization. In International conference on learning representations
He, J., Shi, W., Chen, K., Fu, L., & Dong, C. (2022). Gcfsr: a generative and controllable face super resolution method without facial and GAN priors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (pp. 1889–1898).
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems, 30.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. science, 313(5786), 504–507.
Hughes, M. J., & Hayes, D. J. (2014). Automated detection of cloud and cloud shadow in single-date landsat imagery using neural networks and spatial post-processing. Remote Sensing, 6(6), 4907–4926.
Jiang, K., Wang, Z., Yi, P., Wang, G., Lu, T., & Jiang, J. (2019). Edge-enhanced GAN for remote sensing image superresolution. IEEE Transactions on Geoscience and Remote Sensing, 57(8), 5799–5812.
Jiang, L., Dai, B., Wu, W., & Loy, C. C. (2021). Deceive d: Adaptive pseudo augmentation for GAN training with limited data. Advances in Neural Information Processing Systems, 34, 21655–21667.
Jiang, Y., Chang, S., & Wang, Z. (2021). Transgan: Two pure transformers can make one strong GAN, and that can scale up. Advances in Neural Information Processing Systems, 34, 14745–14758.
Jolicoeur-Martineau, A. (2018). The relativistic discriminator: A key element missing from standard gan. arXiv:1807.00734
Kang, M., & Park, J. (2020). Contragan: Contrastive learning for conditional image generation. Advances in Neural Information Processing Systems, 33, 21357–21369.
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (pp. 4401–4410).
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., & Aila, T. (2020). Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33, 12104–12114.
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020b). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (pp. 8110–8119).
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., & Wang, Z., et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 4681–4690)
Li, L., Li, P., Yang, M., & Gao, S. (2019). Multi-branch semantic GAN for infrared image generation from optical image. In Z. Cui, J. Pan, S. Zhang, L. Xiao, & J. Yang (Eds.), Intelligence science and big data engineering. Visual data engineering (pp. 484–494). Springer.
Lin, D., Fu, K., Wang, Y., Xu, G., & Sun, X. (2017). Marta GANS: Unsupervised representation learning for remote sensing image classification. IEEE Geoscience and Remote Sensing Letters, 14(11), 2092–2096. https://doi.org/10.1109/LGRS.2017.2752750
Liu, B., Zhu, Y., Song, K., & Elgammal, A. (2020) Towards faster and stabilized GAN training for high-fidelity few-shot image synthesis. In International conference on learning representations.
Liu, MY., & Tuzel, O. (2016). Coupled generative adversarial networks. Advances in Neural Information Processing Systems, 29.
Long, Y., Gong, Y., Xiao, Z., & Liu, Q. (2017). Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 55(5), 2486–2498.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2579–2605.
Mao, X., Li, Q., Xie, H., Lau, RY., Wang, Z., & Paul Smolley, S. (2017) Least squares generative adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2794–2802).
Mescheder, L., Geiger, A., & Nowozin, S. (2018) Which training methods for gans do actually converge? In International conference on machine learning (pp. 3481–3490). PMLR.
Miyato, T., Kataoka, T., Koyama, M., & Yoshida, Y. (2018) Spectral normalization for generative adversarial networks. arXiv:1802.05957
mnmoustafa, MA. (2017) Tiny imagenet. https://kaggle.com/competitions/tiny-imagenet
Mohajerani, S., & Saeedi, P. (2019) Cloud-net: An end-to-end cloud detection algorithm for landsat 8 imagery. In IGARSS 2019—2019 IEEE international geoscience and remote sensing symposium (pp. 1029–103). https://doi.org/10.1109/IGARSS.2019.8898776
Park, T., Liu, MY., Wang, TC., & Zhu, JY. (2019) Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2337–23460).
Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., & Lischinski, D. (2021) Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2085–2094).
Ranzato, M., Mnih, V., & Hinton, G. E. (2010). Generating more realistic images using gated MRF’s. Advances in Neural Information Processing Systems, 23.
Rui, X., Cao, Y., Yuan, X., Kang, Y., & Song, W. (2021). Disastergan: Generative adversarial networks for remote sensing disaster image generation. Remote Sensing. https://doi.org/10.3390/rs13214284
Shahbazi, M., Danelljan, M., Paudel, DP., & Gool, LV. (2022) Collapse by conditioning: Training class-conditional GANs with limited data. In International conference on learning representations
Su, X., Lin, Y., Zheng, Q., Wu, F., Zheng, C., & Zhao, J. (2022) Gsgan: Learning controllable geospatial images generation. IET Image Processing
Suo, J., Wang, T., Zhang, X., Chen, H., Zhou, W., & Shi, W. (2023). Hit-uav: A high-altitude infrared thermal dataset for unmanned aerial vehicle-based object detection. Scientific Data, 10(1), 227.
Thomas, M., & Joy, A. T. (2006). Elements of information theory. Wiley.
Tseng, HY., Jiang, L., Liu, C., Yang, MH., & Yang, W. (2021) Regularizing generative adversarial networks under limited data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7921–7931).
Wang, SY., Bau, D., & Zhu, JY. (2021) Sketch your own GAN. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14050–14060).
Wang, Y., Wang, C., Zhang, H., Dong, Y., & Wei, S. (2019). A SAR dataset of ship detection for deep learning under complex backgrounds. Remote Sensing, 11(7), 765.
Webster, R., Rabin, J., Simon, L., & Jurie, F. (2019) Detecting overfitting of deep generative networks via latent recovery. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11273–11282)
Wei, S., Zeng, X., Qu, Q., Wang, M., Su, H., & Shi, J. (2020). HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access, 8, 120234–120254.
Wei, Y., Luo, X., Hu, L., Peng, Y., & Feng, J. (2020). An improved unsupervised representation learning generative adversarial network for remote sensing image scene classification. Remote Sensing Letters, 11(6), 598–607.
Xia, G. S., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., Zhang, L., & Lu, X. (2017). Aid: A benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions on Geoscience And Remote Sensing, 55(7), 3965–3981.
Xiong, Y., Guo, S., Chen, J., Deng, X., Sun, L., Zheng, X., & Xu, W. (2020). Improved SRGAN for remote sensing image super-resolution across locations and sensors. Remote Sensing, 12(8), 1263.
Xu, L., & Jordan, M. I. (1996). On convergence properties of the EM algorithm for gaussian mixtures. Neural Computation, 8(1), 129–151.
Xu, Q., Huang, G., Yuan, Y., Guo, C., Sun, Y., Wu, F., & Weinberger, KQ. (2018a) An empirical study on evaluation metrics of generative adversarial networks. arXiv:1806.07755
Xu, Y., Du, B., & Zhang, L. (2018b) Can we generate good samples for hyperspectral classification?-a generative adversarial network based method. In IGARSS 2018-2018 IEEE international geoscience and remote sensing symposium (pp. 5752–5755). IEEE.
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., & Huang, TS. (2018) Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5505–5514).
Yu, Y., Li, X., & Liu, F. (2019). Attention GANs: Unsupervised deep feature learning for aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing, 58(1), 519–531.
Zeng, Y., Lin, Z., Lu, H., & Patel, VM. (2021) Cr-fill: Generative image inpainting with auxiliary contextual reconstruction. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14164–14173).
Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2019) Self-attention generative adversarial networks. In International conference on machine learning (pp. 7354–7363). PMLR.
Zhao, B., Zhong, Y., Xia, G. S., & Zhang, L. (2015). Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 54(4), 2108–2123.
Zhao, B., Zhang, S., Xu, C., Sun, Y., & Deng, C. (2021). Deep fake geography? when geospatial data encounter artificial intelligence. Cartography and Geographic Information Science, 48(4), 338–352.
Zhao, S., Liu, Z., Lin, J., Zhu, J. Y., & Han, S. (2020). Differentiable augmentation for data-efficient GAN training. Advances in Neural Information Processing Systems, 33, 7559–7570.
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2017). Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6), 1452–1464.
Zhou, W., Newsam, S., Li, C., & Shao, Z. (2018). Patternnet: A benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS Journal of Photogrammetry and Remote Sensing, 145, 197–209.
Zimmermann, RS., Sharma, Y., Schneider, S., Bethge, M., & Brendel, W. (2021). Contrastive learning inverts the data generating process. In International conference on machine learning (pp. 12979–12990). PMLR.
Acknowledgements
The authors would like to acknowledge the support and collaborative efforts of the project team. Special thanks to Chenchen Li for helping to organize the data. This work was supported by the Postdoctoral Fellowship Program of CPSF (GZB20230790), the China Postdoctoral Science Foundation (2023M743639), the CAS Project for Young Scientists in Basic Research, Grant No. YSBR-040, and the Special Research Assistant Fund (E3YD590101) of the Chinese Academy of Sciences.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest.
Additional information
Communicated by Ming-Hsuan Yang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Proofs and Theoretical Analysis
1.1 A.1 Proof of Theorem 1
The global minimum of \(\mathcal {L}_{\text{ AlignMax }}\) is achieved when the first term (distance between real and generated distributions) is minimized (i.e., equal to zero) and the second term (entropy) is maximized. Commonly, the uniform distribution on \((0, 1)^{d_c}\) gets the maximum entropy.
Step 1. First, we show that there exists a smooth function \(\textbf{g}_{*}: \mathcal {X} \rightarrow (0, 1)^{d_c}\) which gets the global minimum of \(\mathcal {L}_{\text{ AlignMax }}\). Consider the function \(\textbf{f}^{-1}_{1:{d_c}}: \mathcal {X}\rightarrow \mathcal {C}\), i.e., the inverse of the true mixing \(\textbf{f}\), restricted to its first \(d_c\) dimensions. Further, we have \(\textbf{f}^{-1}(x)_{1:{d_c}} = \textbf{c}\) by definition.
Then we construct a function \(\textbf{k}:\mathcal {C} \rightarrow (0, 1)^{d_c}\) which maps \(\textbf{c}\) to a uniform random variable on \((0, 1)^{d_c}\) using a recursive method known as the Darmois construction.
Specifically, we define
where \(i= 1, \ldots , d_{c}\), \(F_i\) represents the conditional cumulative distribution function (CDF) of \(c_i\) given \(\textbf{c}_{1: i-1}\). By construction, \(\textbf{k}(\textbf{c})\) is uniformly distributed on \((0, 1)^{d_c}\). Moreover, \(\textbf{k}\) is smooth by the assumption that \(p_\textbf{z}\) (and hence \(p_\textbf{c}\)) is a smooth density. Finally, we define
which is a smooth function since it is a composition of two smooth functions.
Next, we proof \(\textbf{g}^{*}\) gets the global minimum of \(\mathcal {L}_{\text{ AlignMax }}\). Using \(\textbf{f}^{-1}(\textbf{x})_{1:{d_c}} = \textbf{c}\) and \(\textbf{f}^{-1}(\widetilde{\textbf{x}})_{1:{d_c}} = \widetilde{\textbf{c}}\), we have
where in the last step we utilize the fact that \(\textbf{c} = \widetilde{\textbf{c}}\) almost surely with respect to the ground truth generative process described in Fig. 4. As a result, the first term is zero. Moreover, since \(\textbf{k}(\textbf{c})\) is uniformly distributed on \((0, 1)^{d_c}\) and the uniform distribution on the unit hypercube has zero entropy, the second term is also zero.
Next, let \(\textbf{g}:\mathcal {X} \rightarrow (0, 1)^{d_c}\) be any smooth function which attains the global minimum of of \(\mathcal {L}_{\text{ AlignMax }}\), i.e.,
Define \(\textbf{h}: = \textbf{g} \circ \textbf{f}: \mathcal {Z} \rightarrow (0,1)^{d_{c}}\) which is smooth because both g and f are smooth. Given \(\textbf{x}=\textbf{f}(\textbf{z})\), we can get
The first equation suggests that \(\hat{\textbf{c}}=\textbf{h}(\textbf{z})_{1: d_{c}}= \textbf{h}(\tilde{\textbf{z}})_{1: d_{c}}\) must hold (almost surely with respect to p), and the second equation suggests that \(\hat{\textbf{c}}=\textbf{h}(\textbf{z})\) must be uniformly distributed on \((0,1)^{d_c}\).
Step 2. Next, we show that \(\textbf{h}(\textbf{z}) = \textbf{h}(\textbf{c}, \mathbf {\epsilon })\) can only depend on the true content \(\textbf{c}\) and not on any of the noise variables \(\mathbf {\epsilon }\).
Suppose for a contradiction that \(\textbf{h}_{c}(\textbf{c}, \mathbf {\epsilon }):=\textbf{h}(\textbf{c}, \mathbf {\epsilon })_{1: d_{c}}=\textbf{h}(\textbf{z})_{1: d_{c}}\) depends on some component of the noise variable \(\mathbf {\epsilon }\):
that is, we assume that the partial derivative of \(\textbf{h}_c\) with respect to some noise variable \({\epsilon }_l\) is non-zero at some point \(\textbf{z}^{*}=\left( \textbf{c}^{*}, \mathbf {\epsilon }^{*}\right) \in \mathcal {Z}=\mathcal {C} \times \mathcal {E}\).
Since \(\textbf{h}\) is smooth, so is \(\textbf{h}_c\). Therefore, \(\textbf{h}_c\) has continuous (first) partial derivatives. By continuity of the partial derivative, \(\frac{\partial \textbf{h}_{c}}{\partial {\epsilon }_{l}}\) must be non-zero in a neighbourhood of \((\textbf{c}^{*},\mathbf {\epsilon }^{*})\), i.e.
where \(\mathbf {\epsilon }_{-l} \in \mathcal {E}_{-l}\) denotes the vector of remaining noise variables except \({\epsilon }_l\).
Next, define the auxiliary function \(\psi : \mathcal {C} \times \mathcal {E} \times \mathcal {E} \rightarrow \mathbb {R}_{\ge 0}\) as follows:
To obtain a contradiction to the invariance condition under assumption Eq. (12), it remains to show that \(\psi \) is strictly positive with probability greater than zero (w.r.t. p).
First, the strict monotonicity from Eq. (1) that suggests
Note that in order to obtain the strict inequality in Eq. (14), it is important that \({\epsilon }_l\) and \(\widetilde{\epsilon }_l\) take values in disjoint open subsets of the interval \(({\epsilon }_{l}^{*}-\eta , {\epsilon }_{l}^{*}+\eta )\) from Eq. (1).
Since \(\psi \) is a composition of continuous functions (absolute value of the difference of two continuous functions), it is also continuous. Considering the open set \(\mathbb {R}_{>0}\), it is known that pre-images (or inverse images) of open sets under a continuous function are always open. For the continuous function \(\psi \), this pre-image corresponds to an open set
in the domain of \(\psi \) on which \(\psi \) is strictly positive.
Moreover, due to Eq. (14)
so \(\mathcal {U}\) is non-empty.
Next, by assumption (iii), there exists at least one subset \(A \subseteq {1,..., d_{\epsilon }}\) of changing noise variables such that linA and \(p_A(A) > 0\); pick one such subset and call it A.
Then, also by assumption (iii), for any \(\mathbf {\epsilon }_A \in \mathcal {E}_A\), there is an open subset \(\mathcal {O}(\mathbf {\epsilon }_A) \subseteq \mathcal {E}_A\) containing \(\mathbf {\epsilon }_A\), such that \(p_{\tilde{\mathbf {\epsilon }}_{A} \mid \mathbf {\epsilon }_{A}}\left( \cdot \mid \mathbf {\epsilon }_{A}\right) >0\) within \(\mathcal {O}\left( \mathbf {\epsilon }_{A}\right) \).
Define the following space
and, recalling that \(A^{\textrm{c}}=\left\{ 1, \ldots , d_{\epsilon }\right\} \backslash A\) denotes the complement of A, define
which is a topological subspace of \(\mathcal {C} \times \mathcal {E} \times \mathcal {E}\).
By assumptions (ii) and (iii), \(p_\textbf{z}\) is smooth and fully supported, and \(p_{\tilde{\mathbf {\epsilon }}_{A} \mid \mathbf {\epsilon }_{A}}\left( \cdot \mid \mathbf {\epsilon }_{A}\right) \) is smooth and fully supported on \(\mathcal {O}(\mathbf {\epsilon }_A)\) for any \(\mathbf {\epsilon }_A \in \mathcal {E}_A\). Therefore, the measure \(\mu _{\left( \textbf{c}, \mathbf {\epsilon }_{A^{\textrm{c}}}, \mathbf {\epsilon }_{A}, \tilde{\mathbf {\epsilon }}_{A}\right) \mid A}\) has fully supported, strictly-positive density on \(\mathcal {R}\) w.r.t. a strictly positive measure on \(\mathcal {R}\). In other words, \(p_{z} \times p_{\tilde{\epsilon }_{A} \mid {\epsilon }_{A}}\) is fully supported (i.e., strictly positive) on \(\mathcal {R}\).
Now consider the intersection \(\mathcal {U}\cap \mathcal {R}\) of the open set \(\mathcal {U}\) with the topological subspace \(\mathcal {R}\). Since \(\mathcal {U}\) is open, by definition of topological subspaces, the intersection \(\mathcal {U}\cap \mathcal {R} \subseteq \mathcal {R}\) is open in \(\mathcal {R}\) and thus has the same dimension as \(\mathcal {R}\) if non-empty.
Moreover, since \(\mathcal {O}\left( \mathbf {\epsilon }_{A}^{*}\right) \) is open containing \(\mathbf {\epsilon }_{A}^{*}\), there exists \(\eta ^{\prime }>0\) such that \(\left\{ \mathbf {\epsilon }_{-l}^{*}\right\} \times \left( {\epsilon }_{l}^{*}-\eta ^{\prime }, {\epsilon }_{l}^{*}\right) \subset \mathcal {O}\left( \mathbf {\epsilon }_{A}^{*}\right) \). Thus, for \(\eta ^{\prime \prime }=\min \left( \eta , \eta ^{\prime }\right) >0\),
In particular, this implies that
Now, since \(\eta ^{\prime \prime } \le \eta \), the LHS of Eq. (16) is also in \(\mathcal {U}\), so the intersection \(\mathcal {U} \cap \mathcal {R}\) is non-empty.
In summary, the intersection \(\mathcal {U} \cap \mathcal {R} \subseteq \mathcal {R}\):
-
is non-empty since both \(\mathcal {U}\) and \(\mathcal {R}\) contain the LHS of Eq. (15);
-
is an open subset of the topological subspace \(\mathcal {R}\) of \(\mathcal {C} \times \mathcal {E} \times \mathcal {E}\) since it is the intersection of an open set, \(\mathcal {U}\), with \(\mathcal {R}\);
-
satisfies \(\psi >0\) since this holds for all of \(\mathcal {U}\);
-
is fully supported w.r.t. the generative process since this holds for all of \(\mathcal {R}\).
As a consequence,
where \(\mathbb {P}\) denotes probability w.r.t. the true generative process p. Since \(p_A(A)>0\), this is a contradiction to the invariance in Eq. (11).
Hence, \(\textbf{h}_c(\textbf{c},\mathbf {\epsilon })\) does not depend on any noise variable \({\epsilon }_l\). It is thus only a function of \(\textbf{c}\), i.e., \(\hat{\textbf{c}} = \textbf{h}_c(\textbf{c})\).
Step 3. Finally, we show that the mapping \(\hat{\textbf{c}} = \textbf{h}(\textbf{c})\) is invertible. We use the following result from (Zimmermann et al., 2021).
Proposition 1
Let \(\mathcal {M}\), \(\mathcal {N}\) be simply connected and oriented \(\mathcal {C}^{1}\) manifolds without boundaries and \(h: \mathcal {M} \rightarrow \mathcal {N}\) be a differentiable map. Further, let the random variable \(\textbf{z} \in \mathcal {M}\) be distributed according to \(\textbf{z} \sim p(\textbf{z})\) for a regular density function p, i.e., \(0<p<\infty \). If the pushforward \(p_{\# h}(\textbf{z})\) of p through h is also a regular density, i.e., \(0<p_{\# h}<\infty \), then h is a bijection.
We apply this result to the simply connected and oriented \(\mathcal {C}^{1}\) manifolds without boundaries \(\mathcal {M}=\mathcal {C}\) and \(\mathcal {N}=(0,1)^{d_{c}}\), and the smooth (hence, differentiable) map \(\textbf{h}: \mathcal {C} \rightarrow (0,1)^{d_{c}}\) which maps the random variable \(\textbf{c}\) to a uniform random variable \(\hat{\textbf{c}}\) (as established in Eq. 11).
Since both \(p_{\textbf{c}}\) (by assumption) and the uniform distribution (the pushforward of \(p_{\textbf{c}}\) through \(\textbf{h}\)) are regular densities in the sense of Proposition 1, we conclude that \(\textbf{h}\) is a bijection, i.e., invertible.
We have shown that for any smooth \(\textbf{g}: \mathcal {X} \rightarrow (0,1)^{d_{c}}\) which minimises \(\mathcal {L}_{\text{ AlignMax }}\), we have that \(\hat{\textbf{c}}=\textbf{g}(\textbf{x})=\textbf{h}(\textbf{c})\) for a smooth and invertible \(\textbf{h}: \mathcal {C} \rightarrow (0,1)^{d_{c}}\), i.e., \(\textbf{c}\) is block-identified by \(\textbf{g}\).
1.2 A.2 Proof of Theorems 2 and 3
In this section, we present proofs for the Theorem 2 and 3 in main paper Sections 4.1. These two theorems illustrate the deep relations between the Gaussian kernel \(K:S^d {\times } S^d\ {\rightarrow }\ \mathbb {R}\) and the uniform distribution on the unit hypersphere \(S^d\). As we will show below, these properties directly follow well-known results on strictly positive definite kernels.
Definition 2
(Strict positive definiteness). A symmetric and lower semi-continuous kernel K on \(A {\times } A\) (where A is infinite and compact) is called strictly positive definite if for every finite signed Borel measure \(\mu \) supported on A whose energy
is well defined, we have \(I_{K}[{\mu }]\ {\ge }\ 0\), where equality holds only if \({\mu }\ {\equiv }\ 0\) on the \(\sigma \)-algebra of Borel subsets of A.
Definition 3
Let M(\(S^d\)) be the set of Borel probability measures on \(S^d\).
Definition 4
(Uniform distribution on \(S^d\)). \({\sigma }^d\) denotes the normalized surface area measure on \(S^d\).
According to Borodachov et al.(Borodachov et al., 2019), we can get the following two results.
Lemma 1
(Strict positive definiteness of \(G_{\gamma }\)). For \({\gamma }\ >\ 0\), the Gaussian kernel is \(G_{\gamma }(x, y)\ {\triangleq }\ e^{-\gamma \Vert x-y\Vert _{2}^{2}}\) strictly positive definite on \(S^d {\times } S^d\).
Lemma 2
(Strictly positive definite kernels on \(S^d\)). Consider kernel \(K_{f}: \mathcal {S}^{d} \times \mathcal {S}^{d} \rightarrow (-\infty ,+\infty ] \) of the form:
If \(K_f\) is strictly positive definite on \(S^d {\times } S^d\) and \(I_{K_f}[{\sigma }^d]\) is finite, then \({\sigma }^d\) is the unique measure (on Borel subsets of \(S^d\)) in the solution of \(\min _{\mu \in \mathcal {M}\left( \mathcal {S}^{d}\right) } I_{K_{f}}[\mu ]\), and the normalized counting measures associated with any \(K_f\) -energy minimizing sequence of N-point configurations on \(S^d\) converges weak* to \({\sigma }^d\). In particular, this conclusion holds whenever f has the property that \(-f'(t)\) is strictly completely monotone on (0, 4] and \(I_{K_f}[{\sigma }^d]\) is finite.
Based on Lemmas 1 and 2, we now can get two propositions.
Proposition 2
\({\sigma }^d\) is the unique solution (on Borel subsets of \(S_d\)) of
Definition 5
(\(\text {Weak}^{*}\) convergence of measures). A sequence of Borel measures \(\left\{ \mu _{n}\right\} _{n=1}^{\infty }\) in \(\mathbb {R}^{p}\) converges \(\text {weak}^{*}\) to a Borel measure \(\mu \) if for all continuous function \(f: \mathbb {R}^{p} \rightarrow \mathbb {R}\), we have
Proposition 3
For each \(N\ >\ 0\), the N point minimizer of the average pairwise potential is
The normalized counting measures associated with the \(\left\{ \textbf{u}_{N}^{*}\right\} _{N=1}^{\infty }\) sequence converge \(\text {weak}^{*}\) to \( \sigma _{d}\).
According to Propositions 2 and 3, the Theorems 2 and 3 can be naturally proven.
1.3 A.3 Information Theory
In Sect. 1, we mentioned that sparser features contain less information, and the uniform distribution gets the maximum information entropy. This conclusion is inspired from Theorems 2.6.4 and Theorem 8.2.2 in (Thomas & Joy, 2006).
Theorem 2.6.4 \(H (X) \le log |X|\), where |X| denotes the number of elements in the range of X, with equality if and only X has a uniform distribution over X.
Theorem 8.2.2 The typical set \(A_{\epsilon }^{(n)}\) has the following properties:
-
\({\text {Pr}}\left( A_{\epsilon }^{(n)}\right) >1-\epsilon \) for n sufficiently large.
-
\({\text {Vol}}\left( A_{\epsilon }^{(n)}\right) \le 2^{n(h(X)+\epsilon )}\) for all n.
-
\({\text {Vol}}\left( A_{\epsilon }^{(n)}\right) \ge (1-\epsilon ) 2^{n(h(X)-\epsilon )}\) for n sufficiently large.”
Theorem 2.6.4 proves that the uniform distribution gets the maximum information entropy, and Theorem 8.2.2 proves that sparser features contain less information.
B Experiment Details
1.1 B.1 Datasets
Cloud Datset We collect three datasets: 38cloud (Mohajerani & Saeedi, 2019), Sentinel-2 Cloud Mask (Aybar et al., 2022) and SPCRAS (Hughes & Hayes, 2014). The 38cloud dataset consists of images collected from the Landsat 8 satellite. The size of images varies from 7000\(\times \)7000 to 8000\(\times \)8000 pixels. The spatial resolution of this dataset is 30 m. The Sentinel-2 Cloud Mask dataset consists of images collected from the Sentinel-2 satellite. Each image is 1022\(\times \)1022 pixels in size. The spatial resolution of this dataset is 20 m. The SPCRAS dataset consists of images collected from the Landsat 8 satellite. Each image is 1000\(\times \)1000 pixels in size. The spatial resolution of this dataset is 30 m. These datasets are filtered, cropped and downsampled to construct a new dataset: ’Cloud’. The Cloud dataset has 1386 images in total, and each image has 128\(\times \)128 pixels and kilometer resolution.
USGS Datset: We collect the USGS image dataset of SIRI-WHU (Zhao et al., 2015). The dataset was acquired from United States Geological Survey (USGS) covering Montgomery, Ohio, USA. It has four classes: farms, forests, parking lot and residential area. It consists of one image of 10000\(\times \)9000 pixels in size. The spatial resolution of this image is 2 feet. We first crop image patches of 64\(\times \)64 pixels from the USGS image. Then, these patches are upsampled to 256\(\times \)256 pixels. In this way, we construct a new dataset which has 21,840 images in total, and almost sub-centimeter resolution.
HIT-UAV Dataset: The HIT-UAV dataset (Suo et al., 2023) comprises 2898 infrared thermal images extracted from 43,470 frames in hundreds of videos captured by Unmanned Aerial Vehicles (UAVs) in various scenarios. Each image is 256\(\times \)256 pixels in size, and only has one channel.
SAR Dataset: We collect HRSID-JPG (Wang et al., 2019) and SAR-Ship datasets (Wei et al., 2020a). The HRSID-JPG dataset comprises 5604 high-resolution SAR images and 16,951 ship instances. The SAR-Ship-Dataset has 43,819 ship slices. We filter these datasets to remove images with too much background. Then we constuct a new Radar dataset which has 9407 images in total, and each image is 256\(\times \)256 pixels in size and only has one channel.
Thermal Dataset: We conduct experiments on thermal natural image dataset for object classification (Ashfaq et al., 2021). The images were captured using Seek Thermal and FLIR. There are three classes: cat, car and man. This dataset was filtered, center-cropped and resized. We finally get a new dataset which has 5543 images in total, and each image is 256\(\times \)256 pixels in size.
1.2 B.2 Implementation Details
For the infrared, thermal, and Radar images, their image shapes are similar to the natural images, so we do not need to adjust the network architecture. During our experiments, we made some adjustments to the hyperparameters of the loss function as shown in Table 9.
C Visual Results
The generated images on NWPU, PN and AID datasets are visulized in the Figs. 10, 11 and 12. Overall, the images generated by the BigGAN gets the worst quality. Compared with BigGAN and StyleGAN2, the images generated by our method have great diversity and rich content. The generated images on Cloud, USGS, HIT-UAV, SAR and Thermal datasets are visualized in the Figs. 13, 14, 15, 16 and 17. Our method are effective on RS images of different scales and different modalities. The generated images on FFHQ256(10k) and LSUN-cat(30k) datasets are visualized in the Figs. 18 and 19. Our method still outperforms StyleGAN2 on these datasets.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Su, X., Qiang, W., Hu, J. et al. Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image Generation. Int J Comput Vis 132, 5192–5216 (2024). https://doi.org/10.1007/s11263-024-02125-4
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02125-4