Abstract
Ensemble techniques are powerful approaches that combine several weak learners to build a stronger one. As a meta-learning framework, ensemble techniques can easily be applied to many machine learning methods. Inspired by ensemble techniques, in this paper we propose an ensemble loss functions applied to a simple regressor. We then propose a half-quadratic learning algorithm in order to find the parameter of the regressor and the optimal weights associated with each loss function. Moreover, we show that our proposed loss function is robust in noisy environments. For a particular class of loss functions, we show that our proposed ensemble loss function is Bayes consistent and robust. Experimental evaluations on several data sets demonstrate that the our proposed ensemble loss function significantly improves the performance of a simple regressor in comparison with state-of-the-art methods.
Similar content being viewed by others
References
Bai Q, Lam H, Sclaroff S (2014) A bayesian framework for online classifier ensemble. In: International conference on machine learning, pp 1584–1592
Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101(473):138–156
Bartlett PL, Wegkamp MH (2008) Classification with a reject option using a hinge loss. J Mach Learn Res 9:1823–1840
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Buja A, Stuetzle W, Shen Y (2005) Loss functions for binary class probability estimation and classification: structure and applications. Manuscript, available at http://www-stat.wharton.upenn.edu/~buja
Chen B, Xing L, Liang J, Zheng N, Principe JC (2014) Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion. IEEE Signal Process Lett 21(7):880–884
Chen B, Xing L, Xu B, Zhao H, Zheng N, Principe JC (2017) Kernel risk-sensitive loss: definition, properties and application to robust adaptive filtering. IEEE Trans Signal Process 65(11):2888–2901
Chen B, Xing L, Zhao H, Zheng N, Prı JC, et al. (2016) Generalized correntropy for robust adaptive filtering. IEEE Trans Signal Process 64(13):3376–3387
Cruz RM, Sabourin R, Cavalcanti GD, Ren TI (2015) Meta-des: a dynamic ensemble selection framework using meta-learning. Pattern Recogn 48(5):1925–1935
Dudek G (2016) Pattern-based local linear regression models for short-term load forecasting. Electr Power Syst Res 130:139– 147
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(Aug):1871–1874
Feng Y, Yang Y, Suykens JA (2016) Robust gradient learning with applications. IEEE Transactions on Neural Networks and Learning Systems 27(4):822–835
Friedman J, Hastie T, Tibshirani R, et al. (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407
Geman D, Reynolds G (1992) Constrained restoration and the recovery of discontinuities. IEEE Trans Pattern Anal Mach Intell 14(3):367–383
Geman D, Yang C (1995) Nonlinear image recovery with half-quadratic regularization. IEEE Trans Image Process 4(7):932–946
Genton MG (1998) Highly robust variogram estimation. Math Geol 30(2):213–221
Hajiabadi H, Molla-Aliod D, Monsefi R (2017) On extending neural networks with loss ensembles for text classification. In: Proceedings of the Australasian language technology association workshop 2017
He R, Zheng WS, Tan T, Sun Z (2014) Half-quadratic-based iterative minimization for robust sparse representation. IEEE Trans Pattern Anal Mach Intell 36(2):261–275
Holland MJ, Ikeda K (2016) Minimum proper loss estimators for parametric models. IEEE Trans Signal Process 64(3):704–713
Huber PJ, et al. (1964) Robust estimation of a location parameter. The annals of mathematical statistics 35(1):73–101
Islam M, Rojas E, Bergey D, Johnson A, Yodh A (2003) High weight fraction surfactant solubilization of single-wall carbon nanotubes in water. Nano letters 3(2):269–273
Kang S, Kang P (2018) Locally linear ensemble for regression. Inf Sci 432:199–209
Khan I, Roth PM, Bais A, Bischof H (2013) Semi-supervised image classification with huberized laplacian support vector machines. In: 2013 IEEE 9th international conference on Emerging technologies (ICET), IEEE, pp 1–6.
Ko AH, Sabourin R, Britto AS Jr (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recogn 41(5):1718–1731
Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: 2006. IJCNN’06. International joint conference on Neural networks, IEEE, pp 4919–4924
Liu W, Pokharel PP, Príncipe JC (2007) Correntropy: properties and applications in non-gaussian signal processing. IEEE Trans Signal Process 55(11):5286–5298
Liu Y, Yao X, Higuchi T (2000) Evolutionary ensembles with negative correlation learning. IEEE Trans Evol Comput 4(4):380–387
López J, Maldonado S (2018) Robust twin support vector regression via second-order cone programming. Knowl-Based Syst 152:83–93
Mannor S, Meir R (2001) Weak learners and improved rates of convergence in boosting. In: Advances in neural information processing systems, pp 280–286
Masnadi-Shirazi H, Mahadevan V, Vasconcelos N (2010) On the design of robust classifiers for computer vision. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 779–786
Masnadi-Shirazi H, Vasconcelos N (2009) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Advances in neural information processing systems, pp 1049–1056
Mendes-Moreira J, Soares C, Jorge AM, Sousa JFD (2012) Ensemble approaches for regression: a survey. ACM Comput Surv (CSUR) 45(1):10
Meyer CD (2000) Matrix analysis and applied linear algebra, vol 71 Siam
Miao Q, Cao Y, Xia G, Gong M, Liu J, Song J (2016) Rboost: label noise-robust boosting algorithm based on a nonconvex loss function and the numerically stable base learners. IEEE Transactions on Neural Networks and Learning Systems 27(11):2216–2228
Nápoles G, Falcon R, Papageorgiou E, Bello R, Vanhoof K (2017) Rough cognitive ensembles. Int J Approx Reason 85:79–96
Painsky A, Rosset S (2016) Isotonic modeling with non-differentiable loss functions with application to lasso regularization. IEEE Trans Pattern Anal Mach Intell 38(2):308–321
Peng J, Guo L, Hu Y, Rao K, Xie Q (2017) Maximum correntropy criterion based regression for multivariate calibration. Chemometr Intell Lab Syst 161:27–33
Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas M (2015) Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev 71:804–818
Sangari A, Sethares W (2016) Convergence analysis of two loss functions in soft-max regression. IEEE Trans Signal Process 64(5):1280–1288
Schaal S, Atkeson CG, Vijayakumar S (2002) Scalable techniques from nonparametric statistics for real time robot learning. Appl Intell 17(1):49–60
Steinwart I, Christmann A (2008) Support vector machines. Springer Science & Business Media
Tang L, Tian Y, Yang C, Pardalos PM (2018) Ramp-loss nonparallel support vector regression: robust, sparse and scalable approximation Knowledge-Based Systems
Uhlich S, Yang B (2012) Bayesian estimation for nonstandard loss functions using a parametric family of estimators. IEEE Trans Signal Process 60(3):1022–1031
Vapnik V (1998) Statistical learning theory, vol 1998. Wiley, New York
Vapnik V (2013) The nature of statistical learning theory. Springer science & business media
Wang K, Zhong P (2014) Robust non-convex least squares loss function for regression with outliers. Knowl-Based Syst 71:290–302
Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment. In: 2004 Conference record of the thirty-seventh asilomar conference on Signals, systems and computers, vol 2. IEEE, pp 1398–1402
Xiao Y, Wang H, Xu W (2017) Ramp loss based robust one-class svm. Pattern Recogn Lett 85:15–20
Xie L, Yin M, Wang L, Tan F, Yin G (2018) Matrix regression preserving projections for robust feature extraction. Knowledge-Based Systems
Zhang J, Chung C, Han Y (2016) Online damping ratio prediction using locally weighted linear regression. IEEE Trans Power Syst 31(3):1954–1962
Zhang P, Zhuo T, Zhang Y, Huang H, Chen K (2016) Bayesian tracking fusion framework with online classifier ensemble for immersive visual applications. Multimedia Tools and Applications 75(9):5075–5092
Zhang T (2004) Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Stat 32(1):56–85. http://www.jstor.org/stable/3448494
Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retr 4(1):5–31
Zhao L, Mammadov M, Yearwood J (2010) From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE international conference on Data mining workshops (ICDMW), IEEE, pp 1281–1288
Zhao S, Chen B, Principe JC (2012) An adaptive kernel width update for correntropy. In: The 2012 international joint conference on Neural networks (IJCNN), IEEE, pp 1–5
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hajiabadi, H., Monsefi, R. & Yazdi, H.S. relf: robust regression extended with ensemble loss function. Appl Intell 49, 1437–1450 (2019). https://doi.org/10.1007/s10489-018-1341-9
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s10489-018-1341-9