Abstract
The efficacy of feature selection methods in dimensionality reduction and enhancing the performance of learning algorithms has been well documented. Traditional feature selection algorithms often grapple with delineating non-linear relationships between features and responses. While deep neural networks excel in capturing such non-linearities, their inherent “black-box” nature detracts from their interpretability. Furthermore, the complexity of deep network architectures can give rise to prolonged training durations and the challenge of vanishing gradients. This study aims to refine network structures, hasten network training, and bolster model interpretability without forfeiting accuracy. This paper delves into a sparse-weighted feature selection approach grounded in convolutional neural networks, termed the low-dimensional sparse-weighted feature selection network (LSWFSNet). LSWFSNet integrates a convolutional selection kernel between the input and convolutional layers, facilitating weighted convolutional calculations on input data while imposing sparse constraints on the selection kernel. Features with significant weights in this kernel are earmarked for subsequent operations in the LSWFSNet computational domain, while those with negligible weights are eschewed to diminish model intricacy. By streamlining the network’s input data, LSWFSNet refines the post-convolution feature maps, thus simplifying its structure. Acknowledging the intrinsic interconnections within the data, our study amalgamates diverse sparse constraints into a cohesive objective function. This ensures the convolutional kernel’s sparsity while acknowledging the structural dynamics of the data. Notably, the foundational convolutional network in this method can be substituted with any deep convolutional network, contingent upon suitable adjustments to the convolutional selection kernel in relation to input data dimensions. The LSWFSNet model was tested on human emotion electroencephalography (EEG) datasets curated by Shanghai Jiao Tong University. When various sparse constraint methodologies were employed, the convolutional kernel manifested sparsity. Regions in the convolutional selection kernel with non-zero weights were identified as having strong correlations with emotional responses. The empirical outcomes not only resonate with extant neuroscience insights but also supersede the baseline network in accuracy metrics. LSWFSNet’s applicability extends to pivotal tasks like keypoint recognition, be it the extraction of salient pixels in facial detection models or the isolation of target attributes in object detection frameworks. This study’s significance is anchored in the amalgamation of sparse constraint techniques with deep convolutional networks, supplanting traditional fully connected networks. This fusion amplifies model interpretability and broadens its applicability, notably in image processing arenas.
Similar content being viewed by others
Data Availability
The data used in this study was collected by the Department of Computer Science, Shanghai Jiao Tong University. The author of this paper established communication with the department via email and signed an application for data usage. With the department’s consent, the author obtained permission to use the data solely for academic research purposes. Without explicit permission from the relevant department of the Department of Computer Science, Shanghai Jiao Tong University, the author of this paper is not authorized to share or disclose the data.
References
Zhang S, Lang Z-Q. Orthogonal least squares based fast feature selection for linear classification. Patt Recog. 2022;3(123):108419.
Shang R, Zhang X, Feng J, et al. Sparse and low-dimensional representation with maximum entropy adaptive graph for feature selection. Neurocomputing. 2022;7(485):57–73.
Hallajian B, Motameni H, Akbari E. Ensemble feature selection using distance-based supervised and unsupervised methods in binary classification. Exp Syst Appl. 2022;15(200):116794.
Li M, Huan J, Yang J. Automatic feature extraction and fusion recognition of motor imagery EEG using multilevel multiscale CNN. Med Biol Eng Comput. 2021;59:2037–50.
Chen S, Ding CHQ, Zhou Z, Luo B. Feature selection based on correlation deflation. Neural Comput Appl. 2019;10(31):6383–92.
You D, Sun M, Liang S, et al. Online feature selection for multi-source streaming features. Inf Sci. 2022;4(590):267–95.
Wei Z, Li Q, Wei J, et al. Neural networks for a class of sparse optimization with \(L_0\)-regularization. Neural Netw. 2022;151:211–21.
Vu V, Lei J. Minimax sparse principal subspace estimation in high dimension. Inst Math Stat. 2013;6(41):2905–47.
Pang T, Nie F, Han J, et al. Efficient feature selection via \(L_{2,0}\)-norm constrained sparsed regression. IEEE Trans Knowl Data Eng. 2019;5(31):880–93.
Jin X, Miao J, Wang Q, et al. Sparse matrix factorization with \(L_{2,1}\)-norm for matrix completion. Patt Recog. 2022;127:108655.
Huang Y, Jie W, Yu Z, et al. Supervised feature selection through deep neural networks with pairwise connected structure. Knowl Based Syst. 2020;27(204):106202.
Tokovarov M. Convolutional neural networks with reusable full-dimension-long layers for feature selection and classification of motor imagery in EEG signals. In: 29th International Conference on Artificial Neural Networks. 2020. p. 79–91.
Wu Y, Lan Y, Zhang L, et al. Feature flow regularization: improving structured sparsity in deep neural networks. Neural Netw. 2023;161:598–613.
Nie F, Huang H, Cai X, Ding C.: Efficient and robust feature selection via joint \(L_{2,1}\)-norm minimization. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems. 2020. p. 1813–21.
Wang Z, Nie F, Lai T, et al. Discriminative feature selection via a structured sparse subspace learning Module. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-2020). pp. 3009–15.
Zhang H, Wang J, Sun Z, et al. Feature selection for neural networks using group Lasso regularization. IEEE Trans Knowl Data Eng. 2020;4(32):659–73.
Cai X, Nie F, Huang H. Exact top-k feature selection via l2,0-norm constraint. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. 2013. p. 1240–6.
Scardapance S, Comminiello D, Hussain A, et al. Group sparse regularization for deep neural networks. Neurocomputing. 2017;7(241):81–9.
Rui T, Zou J, Zhou Y, et al. Convolutional neural network feature maps selection based on LDA. Multimed Tools Appl. 2018;77:10635–49.
Xie X, Zhang H, Wang J, et al. Learning optimized structure of neural networks by hidden node pruning with \(L_1\) regularization. IEEE Trans Cybern. 2020;3(50):1333–46.
Li Y, Yu C, Wasserman W. Deep feature selection: theory and application to identify enhancers and promoters. J Comput Biol. 2016;5(23):322–36.
Yamada Y, Lindenbaum O, Negahban S, et al. Feature selection using stochastic gates. In: Proceedings of the 37th International Conference on Machine Learning, 119. 2020. p. 10648–59.
Roffo G, Melzi S, Castellani U, et al. Infinite feature selection: a graph-based feature filtering approach. IEEE Trans Patt Anal Mach Intell. 2021;12(43):4396–410.
Zuo Z, Li J, Xu H, et al. Curvature-based feature selection with application in classifying electronic health records. Technol Forecast Soc Change. 2021;173:121–7.
Guo X, Yu K, Cao F, et al. Error-aware Markov blanket learning for causal feature selection. Inf Sci. 2022;589:849–77.
Saadatmand H, Akbarzadeh-T M-R. Set-based integer-coded fuzzy granular evolutionary algorithms for high-dimensional feature selection. Appl Soft Comput. 2023;142:110240.
Funding
This study was funded by NSFC Key Project of International (Regional) Cooperation and Exchanges (no. 61860206004) and in part by the National Natural Science Foundation of China (no. 61976004).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Competing Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In this paper, seven various types of convolutional networks are used for backbone, namely VGG-16, Alexnet, Googlenet, Resnet-34, Densenet-101, Efficientnet-B0, and Mobilenet-V2.
Backbone with and without sparse constraints are trained with the same feature vectors from the training set that is used to construct the feature subset. For fair comparison under various sparsity constraints, the parameter setting of the model, including learning rate, is kept exactly the same. The algorithm and methodology are described in “Proposed New Architecture Description” section.
Tables 3 and 4 show the comparison of the accuracy of different networks under different sparsity constraints, where the numbers in parentheses in the table represent the proportion of features screened out by the model under that sparsity constraint.
From the accuracy of the two tables, it is easy to find that the accuracy achieved by shallow networks, such as VGG16, is significantly higher than deep networks, such as Densenet101. The results show that there is no strong correlation between the model’s depth and the accuracy. In addition, the feedback of the brain is not completely consistent due to other relevant factors such as the physical state of the human subjects. Therefore, the data sampled from the same human subjects at different times are not exactly the same, which leads to the model accuracy not being exactly the same, for example, “JL20140404” and “JL20140419” in Table 3.
From the sparsity in the two tables, the model can reduce the amount of input data up to \(30\%\). However, this does not reach the desirable degree of sparsity. This may be due to the fact that emotional feedback is a very complex process, not just an activity which involving a certain part of brain regions. But the method does achieve results in the aspect of input reduction.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, WB., Chen, SB., Ding, C. et al. Non-linear Feature Selection Based on Convolution Neural Networks with Sparse Regularization. Cogn Comput 16, 654–670 (2024). https://doi.org/10.1007/s12559-023-10230-8
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s12559-023-10230-8