这是indexloc提供的服务,不要输入任何密码
Skip to main content
Springer Nature Link
Log in
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Progress in Artificial Intelligence
  3. Article

Learning from streaming data with concept drift and imbalance: an overview

  • Review
  • Published: 13 January 2012
  • Volume 1, pages 89–101, (2012)
  • Cite this article
Download PDF
Progress in Artificial Intelligence Aims and scope Submit manuscript
Learning from streaming data with concept drift and imbalance: an overview
Download PDF
  • T. Ryan Hoens1,
  • Robi Polikar2 &
  • Nitesh V. Chawla1 
  • 6921 Accesses

  • 250 Citations

  • 9 Altmetric

  • Explore all metrics

Abstract

The primary focus of machine learning has traditionally been on learning from data assumed to be sufficient and representative of the underlying fixed, yet unknown, distribution. Such restrictions on the problem domain paved the way for development of elegant algorithms with theoretically provable performance guarantees. As is often the case, however, real-world problems rarely fit neatly into such restricted models. For instance class distributions are often skewed, resulting in the “class imbalance” problem. Data drawn from non-stationary distributions is also common in real-world applications, resulting in the “concept drift” or “non-stationary learning” problem which is often associated with streaming data scenarios. Recently, these problems have independently experienced increased research attention, however, the combined problem of addressing all of the above mentioned issues has enjoyed relatively little research. If the ultimate goal of intelligent machine learning algorithms is to be able to address a wide spectrum of real-world scenarios, then the need for a general framework for learning from, and adapting to, a non-stationary environment that may introduce imbalanced data can be hardly overstated. In this paper, we first present an overview of each of these challenging areas, followed by a comprehensive review of recent research for developing such a general framework.

Article PDF

Download to read the full article text

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Low-Dimensional Representation Learning from Imbalanced Data Streams

Chapter © 2021

A G-Means Update Ensemble Learning Approach for the Imbalanced Data Stream with Concept Drifts

Chapter © 2016

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.
  • Big Data
  • Data Mining
  • Learning algorithms
  • Machine Learning
  • Statistical Learning
  • Stochastic Learning and Adaptive Control
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  1. Alippi, C., Boracchi, G., Roveri, M.: Just in time classifiers: managing the slow drift case. In: IJCNN, pp. 114–120. IEEE, New York (2009). doi:10.1109/IJCNN.2009.5178799

  2. Alippi, C., Roveri, M.: Just-in-time adaptive classifiers in non-stationary conditions. In: IJCNN, pp. 1014–1019. IEEE, New York (2007)

  3. Alippi C., Roveri M.: Just-in-time adaptive classifierspart ii: designing the classifier. TNN 19(12), 2053–2064 (2008)

    Google Scholar 

  4. Andres-Andres, A., Gomez-Sanchez, E., Bote-Lorenzo, M.: Incremental rule pruning for fuzzy artmap neural network. In: ICANN, pp. 655–660 (2005)

  5. Becker, H., Arias, M.: Real-time ranking with concept drift using expert advice. In: KDD, pp. 86–94. ACM, New York (2007)

  6. Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SDM, pp. 443–448 (Citeseer) (2007)

  7. Bifet, A., Gavalda, R.: Adaptive learning from evolving data streams. In: IDA, pp. 249–260 (2009)

  8. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: KDD, pp. 139–148. ACM, New York (2009)

  9. Black M., Hickey R.: Learning classification rules for telecom customer call data under concept drift. Soft Comput. Fusion Found. Methodol. Appl. 8(2), 102–108 (2003)

    Article  Google Scholar 

  10. Breiman L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). doi:10.1023/A:1018054314350

    MathSciNet  MATH  Google Scholar 

  11. Breiman L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). doi:10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  12. Buntine W.: Learning classification trees. Stat. Comput. 2(2), 63–73 (1992)

    Article  Google Scholar 

  13. Carpenter G., Grossberg S., Markuzon N., Reynolds J., Rosen D.: Fuzzy artmap: a neural network architecture for incremental supervised learning of analog multidimensional maps. TNN 3(5), 698–713 (1992)

    Google Scholar 

  14. Carpenter G., Grossberg S., Reynolds J.: Artmap: supervised real-time learning and classification of nonstationary data by a self-organizing neural network. Neural Netw. 4(5), 565–588 (1991)

    Article  Google Scholar 

  15. Carpenter G., Tan A.: Rule extraction: from neural architecture to symbolic representation. Connect. Sci. 7(1), 3–27 (1995)

    Article  Google Scholar 

  16. Chawla N., Japkowicz N., Kotcz A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  17. Chawla, N., Lazarevic, A., Hall, L., Bowyer, K.: Smoteboost: improving prediction of the minority class in boosting. In: PKDD, pp. 107–119 (2003)

  18. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer, Berlin (2010)

  19. Chawla N.V., Cieslak D.A., Hall L.O., Joshi A.: Automatically countering imbalance and its empirical relationship to cost. DMKD 17(2), 225–252 (2008)

    Article  MathSciNet  Google Scholar 

  20. Chen, S., He, H.: Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. In: IJCNN, pp. 522–529. IEEE, New York (2009)

  21. Chu, F., Zaniolo, C.: Fast and light boosting for adaptive mining of data streams. In: PAKDD, pp. 282–292 (2004)

  22. Dietterich, T.: Ensemble methods in machine learning. In: MCS, pp. 1–15 (2000)

  23. Ditzler, G., Polikar, R.: An incremental learning framework for concept drift and class imbalance. In: IJCNN. IEEE, New York (2010)

  24. Ditzler, G., Polikar, R., Chawla, N.V.: An incremental learning algorithm for nonstationary environments and class imbalance. In: ICPR. IEEE, New York (2010)

  25. Domingos, P., Hulten, G.: Mining high-speed data streams. In: KDD, pp. 71–80. ACM, New York (2000)

  26. Elwell, R., Polikar, R.: Incremental learning in nonstationary environments with controlled forgetting. In: IJCNN, pp. 771–778. IEEE, New York (2009)

  27. Elwell, R., Polikar, R.: Incremental learning of variable rate concept drift. In: MCS, pp. 142–151 (2009)

  28. Elwell R., Polikar R.: Incremental learning of concept drift in nonstationary environments. TNN 22(10), 1517–1531 (2011)

    Google Scholar 

  29. Fan, W.: Systematic data selection to mine concept-drifting data streams. In: KDD, pp. 128–137. ACM, New York (2004)

  30. Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: ICML (1996). doi:10.1007/3-540-59119-2_166

  31. Friedman J., Hastie T., Tibshirani R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat. 28(2), 337–407 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  32. Fu L.: Incremental knowledge acquisition in supervised learning networks. SMC Part A 26(6), 801–809 (2002)

    Google Scholar 

  33. Fukunaga K., Hostetler L.: Optimization of k nearest neighbor density estimates. Inf. Theory 19(3), 320–326 (2002)

    Article  MathSciNet  Google Scholar 

  34. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: AAI, pp. 66–112 (2004)

  35. Gao J., Ding B., Fan W., Han J., Yu P.: Classifying data streams with skewed class distributions and concept drifts. Internet Comput. 12(6), 37–49 (2008)

    Article  Google Scholar 

  36. Gao, J., Fan, W., Han, J., Yu, P.: A general framework for mining concept-drifting data streams with skewed distributions. In: SDM, pp. 3–14 (Citeseer) (2007)

  37. Giraud-Carrier C.: A note on the utility of incremental learning. AI Commun. 13(4), 215–223 (2000)

    MATH  Google Scholar 

  38. Grossberg S.: Nonlinear neural networks: principles, mechanisms, and architectures. Neural Netw. 1(1), 17–61 (1988)

    Article  MathSciNet  Google Scholar 

  39. Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. SIGKDD Explor. Newsl. 6, 30–39 (2004). doi:10.1145/1007730.1007736

  40. Ho T.: The random subspace method for constructing decision forests. PAMI 20(8), 832–844 (1998)

    Article  Google Scholar 

  41. Hoeffding W.: Probability inequalities for sums of bounded random variables. JASA 58(301), 13–30 (1963)

    MathSciNet  MATH  Google Scholar 

  42. Hoeglinger, S., Pears, R.: Use of hoeffding trees in concept based data stream mining. In: ICIAFS, pp. 57–62 (2007). doi:10.1109/ICIAFS.2007.4544780

  43. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: KDD, pp. 97–106. ACM, New York (2001)

  44. Joachims, T.: Estimating the generalization performance of an svm efficiently. In: ICML, p. 431. Morgan Kaufmann, Menlo Park (2000)

  45. Karnick, M., Ahiskali, M., Muhlbaier, M., Polikar, R.: Learning concept drift in nonstationary environments using an ensemble of classifiers based approach. In: IJCNN, pp. 3455–3462. IEEE, New York (2008)

  46. Karnick, M., Muhlbaier, M., Polikar, R.: Incremental learning in non-stationary environments with concept drift using a multiple classifier based approach. In: ICPR, pp. 1–4. IEEE, New York (2009)

  47. Kelly, M., Hand, D., Adams, N.: The impact of changing populations on classifier performance. In: KDD, pp. 367–371. ACM, New York (1999)

  48. Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: ICML (Citeseer) (2000)

  49. Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: ICML, pp. 161–169. Morgan Kaufmann, Menlo Park (1997)

  50. Kolter, J., Maloof, M.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: ICDM, pp. 123–130. IEEE, New York (2003)

  51. Kolter, J., Maloof, M.: Using additive expert ensembles to cope with concept drift. In: ICML, pp. 449–456. ACM, New York (2005)

  52. Kolter J., Maloof M.: Dynamic weighted majority: an ensemble method for drifting concepts. JMLR 8, 2755–2790 (2007)

    MATH  Google Scholar 

  53. Kubat M.: Floating approximation in time-varying knowledge bases. PRL 10(4), 223–227 (1989)

    Article  MATH  Google Scholar 

  54. Kuncheva L.I., Whitaker C.J.: Measures of diversity in classifier ensembles. Mach. Learn. 51, 181–207 (2003)

    Article  MATH  Google Scholar 

  55. Lange S., Grieser G.: On the power of incremental learning. TCS 288(2), 277–307 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  56. Lange S., Zilles, S.: Formal models of incremental learning and their analysis. In: IJCNN, vol. 4, pp. 2691–2696. IEEE, New York (2003)

  57. Last M.: Online classification of nonstationary data streams. IDA 6(2), 129–147 (2002)

    MathSciNet  MATH  Google Scholar 

  58. Lazarescu M., Venkatesh S., Bui H.: Using multiple windows to track concept drift. IDA 8(1), 29–59 (2004)

    Google Scholar 

  59. Lichtenwalter, R., Chawla, N.V.: Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. In: New Frontiers in Applied Data Mining. Lecture Notes in Computer Science, vol. 5669, pp. 53–75. Springer, Berlin (2010)

  60. Maron, O., Moore, A.W.: Hoeffding races: accelerating model selection search for classification and function approximation. In: NIPS, pp. 59–66 (1993)

  61. Masnadi-Shirazi H., Vasconcelos N.: Cost-sensitive boosting. PAMI 33(2), 294–309 (2011). doi:10.1109/TPAMI.2010.71

    Article  Google Scholar 

  62. Mitchell T., Caruana R., Freitag D., McDermott J., Zabowski D.: Experience with a learning personal assistant. Commun. ACM 37(7), 80–91 (1994)

    Article  Google Scholar 

  63. Moreno-Torres, J., Herrera, F.: A preliminary study on overlapping and data fracture in imbalanced domains by means of genetic programming-based feature extraction. In: ISDA, pp. 501 –506 (2010). doi:10.1109/ISDA.2010.5687214

  64. Moreno-Torres, J., Raeder, T., Alaiz-Rodríguez, R., Chawla, N.V., Herrera, F.: A unifying view on dataset shift in classification. Pattern Recognit. 45, 521–530 (2011)

    Google Scholar 

  65. Muhlbaier, M., Polikar, R.: An ensemble approach for incremental learning in nonstationary environments. In: MCS, pp. 490–500 (2007)

  66. Muhlbaier, M., Polikar, R.: Multiple classifiers based incremental learning algorithm for learning in nonstationary environments. In: ICMLC, vol. 6, pp. 3618–3623. IEEE, New York (2007)

  67. Muhlbaier M., Topalis A., Polikar R.: Learn++. nc: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. TNN 20(1), 152–168 (2009). doi:10.1109/TNN.2008.2008326

    Google Scholar 

  68. Nishida, K., Yamauchi, K., Omori, T.: Ace: adaptive classifiers-ensemble system for concept-drifting environments. In: MCS, pp. 176–185 (2005)

  69. Pfahringer, B., Holmes, G., Kirkby, R.: New options for hoeffding trees. In: AAI, pp. 90–99 (2007)

  70. Polikar R.: Ensemble based systems in decision making. Circuits Syst. Mag. 6(3), 21–45 (2006)

    Article  Google Scholar 

  71. Polikar R.: Bootstrap-inspired techniques in computation intelligence. Signal Process. Mag. 24(4), 59–72 (2007)

    Article  Google Scholar 

  72. Polikar, R., Upda, L., Upda, S.S., Honavar, V.: Learn++: an incremental learning algorithm for supervised neural networks. In: SMC Part C, pp. 497–508 (2001)

  73. Quinlan, J.: C4.5: Programs For Machine Learning. Morgan Kaufmann, Menlo Park (1993)

  74. Schapire R., Singer Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)

    Article  MATH  Google Scholar 

  75. Scholz M., Klinkenberg R.: Boosting classifiers for drifting concepts. IDA 11(1), 3–28 (2007)

    Google Scholar 

  76. Stanley, K.: Learning concept drift with a committee of decision trees. Technical Report AI-03-302, Computer Science Department, University of Texas-Austin (2003)

  77. Street, W., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: KDD, pp. 377–382. ACM, New York (2001)

  78. Ting, K.: A comparative study of cost-sensitive boosting algorithms. In: ICML (Citeseer) (2000)

  79. Tsymbal, A.: The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Departament of Computer Science, Trinity College (2004). https://www.cs.tcd.ie/publications/techreports/reports

  80. Tsymbal, A., Pechenizkiy, M., Cunningham, P., Puuronen, S.: Handling local concept drift with dynamic integration of classifiers: domain of antibiotic resistance in nosocomial infections. In: CBMS, pp. 679 –684 (2006). doi:10.1109/CBMS.2006.94

  81. Tsymbal A., Pechenizkiy M., Cunningham P., Puuronen S.: Dynamic integration of classifiers for handling concept drift. Inf. Fusion 9(1), 56–68 (2008)

    Article  Google Scholar 

  82. Wang, H., Fan, W., Yu, P., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD, pp. 226–235. ACM, New York (2003)

  83. Wang, H., Yin, J., Pei, J., Yu, P., Yu, J.: Suppressing model overfitting in mining concept-drifting data streams. In: KDD, pp. 736–741. ACM, New York (2006)

  84. Widmer, G., Kubat, M.: Learning flexible concepts from streams of examples: Flora2. In: ECAI, p. 467. Wiley, New York (1992)

  85. Widmer, G., Kubat, M.: Effective learning in dynamic environments by explicit context tracking. In: ECML, pp. 227–243. Springer, Berlin (1993)

  86. Widmer G., Kubat M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA

    T. Ryan Hoens & Nitesh V. Chawla

  2. Electrical and Computer Engineering, Rowan University, Glassboro, NJ, 08028, USA

    Robi Polikar

Authors
  1. T. Ryan Hoens
    View author publications

    Search author on:PubMed Google Scholar

  2. Robi Polikar
    View author publications

    Search author on:PubMed Google Scholar

  3. Nitesh V. Chawla
    View author publications

    Search author on:PubMed Google Scholar

Corresponding author

Correspondence to T. Ryan Hoens.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hoens, T.R., Polikar, R. & Chawla, N.V. Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 1, 89–101 (2012). https://doi.org/10.1007/s13748-011-0008-0

Download citation

  • Received: 01 October 2011

  • Accepted: 30 November 2011

  • Published: 13 January 2012

  • Issue date: April 2012

  • DOI: https://doi.org/10.1007/s13748-011-0008-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Class imbalance
  • Concept drift
  • Data streams
  • Classification

Profiles

  1. Nitesh V. Chawla View author profile
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Language editing
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

23.94.208.52

Not affiliated

Springer Nature

© 2025 Springer Nature