Application of extreme learning machine method for time series analysis

Geng xue

Outline

Application of extreme learning machine method for time series analysis

Geng xue

2007, International Journal of …

visibility

…

description

7 pages

Abstract

In this paper, we study the application of Extreme Learning Machine (ELM) algorithm for single layered feedforward neural networks to non-linear chaotic time series problems. In this algorithm the input weights and the hidden layer bias are randomly chosen. The ELM formulation leads to solving a system of linear equations in terms of the unknown weights connecting the hidden layer to the output layer. The solution of this general system of linear equations will be obtained using Moore-Penrose generalized pseudo inverse. For the study of the application of the method we consider the time series generated by the Mackey Glass delay differential equation with different time delays, Santa Fe A and UCR heart beat rate ECG time series. For the choice of sigmoid, sin and hardlim activation functions the optimal values for the memory order and the number of hidden neurons which give the best prediction performance in terms of root mean square error are determined. It is observed that the results obtained are in close agreement with the exact solution of the problems considered which clearly shows that ELM is a very promising alternative method for time series prediction.

Key takeaways
AI

Extreme Learning Machine (ELM) demonstrates high prediction accuracy for chaotic time series analysis.
The study evaluates ELM using Mackey Glass, Santa Fe A, and UCR heartbeat datasets.
Optimal memory orders and hidden neurons for ELM are identified as m=5, L=37 for sigmoid activation.
Root Mean Square Error (RMSE) is utilized to assess prediction performance across activation functions.
ELM's formulation relies on solving linear equations using Moore-Penrose generalized pseudo inverse.

Figures (14)

The above set of equations can be rewritten in the following matrix form as: that the matrix H_ need not be a square matrix.

Fig. 4 Predicted result for M = 7 when sin and sigmoid activation functions are used for MG. time series corresponding to the time period from 1052 to 1201 Fig. 3 Predicted result for M = 5 when using sin and sigmoid activation functions for MG,7 time series corresponding to the time period from 1052 to 1201 by the above differential equation which are widely used as benchmark data set values for analyzing the generalization ability of the method of prediction. For this, consider the

Fig. 2 The Mackey Glass Time Series with time delay 7 = 30

Fig. 1 The Mackey Glass Time Series with time delay 7 = 17

Fig. 5 Error plot for MG, time series when different activation functions are used with memory order M = 5 Fig. 5 Error plot for MG ;, time series when different activation

Fig. 6 Error plot for MG 3p time series when different activation functions are used with memory order M = 7

In order to avoid the initialization transients, the initial 3500 samples are discarded [8]. We considered 1050 data points corresponding to the sample time period from 3501 to 4550 for training and the sample time period from 4551 to 5000 for testing. In Fig. 1 and Fig. 2 we have shown the time series data values from 1 to 1500 (after discarding the initial Fig. 7 The Santa Fe A Laser Time Series

Fig. 10 UCR time series of human heart beat electro cardio- gram body caused by a beating heart. This time series data set consists of 3751 sample values. In our experiment data points

Fig. 8 Predicted result for M = 5 when using sin and sigmoid activation functions on Santa Fe-A time series corresponding to the time period from 702 to 851

Fig. 9 Error plot for Santa Fe-A time series for each of the activation function is used with memory order M = 5 We repeat our experiments on time series datasets from a diverse set of domains available from UCR Time Series Data Mining Archive’. We consider two important time series datasets both of human heart beat.

Fig. 11 Predicted and actual values of human heart beat ECG time series for the choice of sigmoid activation function where M = 9 andL =41 from 1 to 2251 are taken as the training set and the remaining points from 2252 to 3751 as the test set. As it was explained earlier, by varying the memory orderm = {3,4,5,6,7,8} and the number of hidden neurons L ={l,...,41}, the best prediction performance was obtained for the choice of sigmoid activation function for which the corresponding values of the memory order and the number of hidden neurons being m=) and L=41 respectively. From Fig. 11 we observe that the predicted values are in close agreement with the actual values.

Fig. 13 Predicted and actual values of human heart beat (second dataset) ECG time series for the choice of sigmoid activation function er ey 2, ee _ Fig. 12 UCR time series (second dataset) of human heart beat electro- cardiogram

International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007 Application of Extreme Learning Machine Method for Time Series Analysis Rampal Singh, and S. Balasundaram Time series forecasting is an important and challenging Abstract—In this paper, we study the application of Extreme problem of regression. In the regression problem by Learning Machine (ELM) algorithm for single layered feedforward analyzing the given input samples the best fit functional neural networks to non-linear chaotic time series problems. In this model describing the relationship between the dependent algorithm the input weights and the hidden layer bias are randomly and independent variables is obtained. chosen. The ELM formulation leads to solving a system of linear There are many prediction models exist in the literature equations in terms of the unknown weights connecting the hidden layer to the output layer. The solution of this general system of [4], [9], [12] for time series. The important and widely used linear equations will be obtained using Moore-Penrose generalized among them are Auto Regressive Integrated Moving pseudo inverse. For the study of the application of the method we Average (ARIMA) [1], ANNs [12], [17] and Support Vector consider the time series generated by the Mackey Glass delay Regression (SVR) [8], [9], [13], [15] methods. Among the differential equation with different time delays, Santa Fe A and above methods, ARIMA assumes the existence of a linear UCR heart beat rate ECG time series. For the choice of sigmoid, relationship in the time series values, i.e. its prediction value sin and hardlim activation functions the optimal values for the will be a linear function of the past observations and memory order and the number of hidden neurons which give the therefore it is not always suitable for complex real world best prediction performance in terms of root mean square error are problems [18]. Also it is proposed to combine several determined. It is observed that the results obtained are in close agreement with the exact solution of the problems considered methods in order to obtain improved forecasting accuracy. which clearly shows that ELM is a very promising alternative For the study of a hybrid approach of combining ARIMA method for time series prediction. and ANN for time series forecasting we refer the reader to [18]. For time series involving seasonality, combining Keywords—Chaotic time series, Extreme learning machine, Seasonal time series ARIMA (SARIMA) and ANN is Generalization performance. discussed in [16] and for the study of a combined SARIMA and SVR approach see [3]. Huang et al [5] have proposed a new learning algorithm I. INTRODUCTION for Single hidden Layer Feedforward Neural Network (SLFN) architecture called Extreme Learning Machine A RTIFICIAL Neural Networks (ANNs) have been extensively applied for pattern classification and regression problems. The major reason for the success of (ELM) which overcomes the problems caused by gradient descent based algorithms such as BP applied in ANNs. In ANNs is their ability in obtaining a non-linear approximation this algorithm the input weights and the hidden layer bias are randomly chosen. The ELM formulation leads to solving model function describing the association between the a system of linear equations in terms of the unknown dependent and independent variables using the given input weights connecting the hidden layer to the output layer. The samples. Since ANNs adaptively select the model from the solution of this general system of linear equations is features presented in the input data, they are applied to a obtained using Moore-Penrose generalized pseudo inverse large number of classes of problems of importance like [10]. In this work we discuss briefly the ELM algorithm and optical character recognition [7], face detection [11], gene study its feasibility of application for chaotic time series prediction [14], credit scoring [6] and time series forecasting prediction problems. [12], [17].Though ANNs have many advantages such as Throughout this paper, we assume all vectors to be better approximation capabilities and simple network column vectors. For any two vectors x, y in the m- structures, however, it suffers from several problems such as dimensional real space ℜ , we denote the inner product of m presence of local minima's, imprecise learning rate, selection the two vectors by x ′. y where x ′ is the transpose of the of the number of hidden neurons and over fitting. Moreover, the gradient descent based learning algorithms such as Back vector x and the norm of a vector by || ⋅ || . The paper is Propagation (BP) will generally lead to slow convergence organized as follows. In Section 2, we define Moore- during the training of the networks. Penrose generalized inverse, the minimum norm least squares solution of a general linear system of equations and state the relation between them. In Section 3, we revive the Rampal Singh is with Department of Computer Science, Deen Dayal ELM algorithm for SLFN. For the application of this Upadhyaya College, University of Delhi, New Delhi-110015, India (phone:+91-11-27570620, 09350647546, e-mail: rpsrana@ddu.du.ac.in). algorithm we considered Mackey Glass delay differential S. Balasundaram is with School of Computer & Systems Sciences, equation with different time delays, Santa Fe-A and UCR Jawaharlal Nehru University, New Delhi-110067, India (phone: +91-11- heart beat rate (ECG) time series in Section 4 and the results 26704724, e-mail: balajnu@hotmail.com). 256 International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007 obtained using ELM have been compared with exact Suppose, we are given the training data set solutions. Finally, we conclude our paper in Section 5. { ( x i , y i ) }i =1, 2 ,..., M where xi = ( xi1 ,..., xim ) ∈ ℜ m ' II. MOORE - PENROSE GENERALIZED INVERSE denotes the input vector and yi = ( yi1 ,..., yik ) ' ∈ ℜ k is The solution of a general linear system of equations its corresponding output vector and M is the total number Ax = y , of input data patterns. Further assume that the values of the where A may be a singular or rectangular matrix can be weight vectors wi ∈ ℜ mbi ∈ ℜ are and the bias obtained by the use of the Moore-Penrose generalized randomly assigned. Then, the standard SLFN with L pseudo inverse. number of hidden neurons approximates the input samples Definition 2.1 [10]: A matrix G of size n × m is called with zero error if and only if there exists βi ∈ ℜk so that the Moore-Penrose generalized pseudo inverse of a given matrix A of size m × n , if L y j = ∑ β i G ( wi , bi , x j ) ∀j = 1,2,..., M . (1) AGA= A, GAG= G, ( AG) ' = AG, (GA) ' = GA. i =1 + The above set of equations can be rewritten in the following In this case, we will denote G by A . Definition 2.2: For the given general linear system of matrix form as: equations Ax = y where A is a matrix of size m × n and y Hβ = Y (2) is a vector in Rm, a vector x* in Rn is called a least squares where solution if || Ax* – y || = min || Ax – y || x Definition 2.3: The vector x* in Rn is called a minimum norm least squares solution of the general linear (3) system Ax = y , if x* must be a least squares solution and further among all least squares solutions x in Rn ||x*|| ≤ ||x|| must be true. Theorem 2.1 [10]: Let G be a matrix of size n × m . Then (4) x* = Gy is a minimum norm least squares solution of the general linear system Ax = y if and only if G = A+, the th th Note that the i column of H will be the output of the i Moore-Penrose generalized inverse of A. hidden neuron for the inputs x1 , x2 ,..., xM . Further, observe From the above theorem it is clear that x* = A+ y is the that the matrix H need not be a square matrix. unique minimum norm least squares solution. Under the assumption that the activation function g (⋅) is infinitely differentiable, it has been shown in [5] that III. EXTREME LEARNING MACHINE ALGORITHM Let us consider an SLFN having L number of hidden for fixed input weight vectors wi and biases bi , the least neurons. Let G (.,.,.) be a real valued function so that squares solution β for the matrix equation ( 2) with minimum norm of output weights β can be obtained and th G ( wi , bi , x ) be the output of the i hidden neuron with bias bi ∈ ℜ corresponding to the input vector x ∈ ℜ and that the smallest training error can be reached by the m solution β . Moreover, the solution β of the matrix the weight vector wi = ( wi1 ,..., wim ) where wis is the ' th equation ( 2) will be given by weight of the connection between the i hidden neuron th βˆ = H +Y and s neuron of the input layer. It is well known that for + feed-forward neural networks, the output function f (⋅) where H is the Moore-Penrose generalized pseudo inverse of the matrix H. Further, it has been reported in [5] will be given by L that ELM tends to produce better generalization performance f ( x) = ∑ β i G ( wi , bi , x) , than BP with the main advantage being the decrease in i =1 computational time while training the network. Training an SLFN is equivalent to obtaining a minimum where β i = ( β i1 ,..., β ik ) ∈ ℜ k ' is the weight vector norm least squares solution of the matrix equation Hβ = Y . th connecting the i hidden neuron with the k th neuron of the In the course of learning, once the input weights and the output layer. Note that for the case of additive hidden hidden layer biases are randomly chosen they will not be neurons, G (.,.,.) will take the following form: adjusted at all. By Theorem 2.1, the smallest norm least- squares solution of the above learning machine is obtained G ( wi , bi , x) = g ( wi′.x + bi ) , when where g : ℜ → ℜ will be the activation function. In this βˆ = H +Y work, we assume the case of additive hidden neurons. Since sin and sigmoid are infinitely differentiable functions the ELM algorithm can be successfully applied by 257 International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007 choosing any one of them as an activation function. and However, we studied the application of ELM algorithm also using hardlim activation function in all our experiments. The ELM algorithm for SLFN can be stated [5] as follows: Input: Training set { ( xi , yi ) }i =1, 2,...,M where (6) xi ∈ ℜ and yi ∈ ℜ ; L the number of hidden neurons m k respectively. Note that m determines the dimension of the and the activation function g (⋅) . input vectors of the ELM algorithm. The time series 1. For i = 1,2,..., L randomly assign the input weight prediction problem may be stated as: for i = 1,..., M we vector wi ∈ ℜ and bias bi ∈ ℜ . m predict the target signal value yi = x((i + m)τ ) ∈ ℜ corresponding to the auto corrected input vector xi ∈ ℜ . m 2. Determine the matrix H defined by the equation (3) . + Observe that the number of neurons in the output layer is 3. Calculate H . k = 1. 4. Calculate the output weights matrix β by In order to demonstrate the effectiveness of ELM learning βˆ = H +Y , algorithm we have taken the time series generated by the Mackey Glass delay differential equation with different where Y is given by the equation ( 4) . delays[2], Santa Fe A and UCR heart beat rate chaotic time Output: The Single hidden Layer Feedforward neural series datasets. We have performed our experiments by Network (SLFN) with the determined output weight vectors choosing the sigmoid, sin and hardlim activation functions βi ∈ ℜk for the randomly chosen weight in the ELM learning algorithm. We use the Root Mean Square Error (RMSE) to evaluate the prediction performance vectors wi ∈ ℜ and biases bi ∈ ℜ for i m = 1,2,..., L . of ELM. This is calculated using the following formula given by: For any input sample x ∈ ℜ the output value y can m n ∑(y be calculated using the following formula: L RMSE = 1 n i −~ yi ) 2 , y = ∑ βˆi g ( wi′.x + bi ) i =1 i =1 where n is the number of test data and yi and ~ yi are the where wi , bi and the activation function g(.) are input and actual and predicted values of the time series respectively. For Mackey Glass and Santa Fe A heart beat rate time series the weight vectors β i ∈ ℜ k are the output of the ELM datasets, the first 70% of the total number of data values for algorithm. training and the remaining data values for testing are used. However, for UCR time series dataset 60% of the total IV. EXPERIMENTS AND RESULTS number of sample values for training and the remaining samples for testing are used. In all our experiments we A. Preprocessing of the Data applied the ELM source code1 written in Matlab. Time series prediction is the problem of determining a function having the underlying relationship between the For choosing the memory order ( m) and the number of previous values and the next value. Suppose N hidden neurons (L) of the ELM network parameters, we observations {x (iτ )}i =1, 2,..., N of the time series x(t ) are vary m and L over a set of predefined values and given with time delay τ . In all our experiments first the determine the pair of values for m and L which gives the best performance based on the criteria of the RMSE on the original data is normalized with zero mean and standard test set. This is performed for each of the transfer functions, deviation equals to one. Then the normalized data is i.e. for sigmoid, sin and hardlim functions, and the best transformed into auto corrected data, i.e. for a given positive results obtained are reported. integer value m and i = 1,..., ( N − m) we define the auto corrected input vector B. MG17, MG30 Time Series xi = ( x(iτ ), x((i + 1)τ ),..., x((i + m − 1)τ ))′ ∈ ℜm consists Consider the Mackey-Glass time delay differential equation [2,8] given by of the previous signal values. Here m is called the embedding dimension or memory order. The normalized ∂x(t ) x(t − τ ) auto corrected input vectors and their corresponding output = −bx(t ) + a , values can be represented in the following matrix form ∂t 1 + x(t − τ )10 where a, b are parameters and τ is the time delay. We study the application of ELM algorithm on two time series generated (5) 1 http://www.ntu.edu.sg/home/egbhuang 258 International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007 Mackey Glass 17 Time Series 2 1.5 1 0.5 0 x(t) -0.5 -1 -1.5 -2 -2.5 1 137 273 409 545 681 817 953 1089 1225 1361 1497 t Fig. 1 The Mackey Glass Time Series with time delay τ = 17 Mackey Glas s 30 Tim e Series 2 1 0 x(t) -1 -2 -3 1 137 273 409 545 681 817 953 1089 1225 1361 1497 t Fig. 2 The Mackey Glass Time Series with time delay τ = 30 actual value sigmoid sin 6 4 e lu 2 a tv 0 u u tp -2 heo -4 T -6 -8 1 22 43 64 85 106 127 148 Fig. 3 Predicted result for m = 5 when using sin and sigmoid activation functions for MG17 time series corresponding to the time period from 1052 to 1201 actual value s igm oid s in 6 4 The output value 2 0 -2 -4 -6 -8 1 22 43 64 85 106 127 148 Fig. 4 Predicted result for m=7 when sin and sigmoid activation functions are used for MG30 time series corresponding to the time period from 1052 to 1201 by the above differential equation which are widely used as benchmark data set values for analyzing the generalization ability of the method of prediction. For this, consider the 259 International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007 s in 3500 samples) for MG17 and MG30 time series respectively 1 .0 s ig m o id h a rd lm where the first 1050 samples are taken for training and the m =5 remaining 450 samples for testing. Experiments were 0 .8 performed using all the activation functions namely sin, sigmoid and hardlim functions. As it was discussed earlier, the Testing Error 0 .6 best prediction performance for MG17 time series was obtained 0 .4 by varying the memory order m = {5,7,9} and the number of 0 .2 hidden neurons L = {1,3,...,41} for the choice of each one of the above activation functions. It was found that the best 0 .0 prediction performance was obtained for the sigmoid activation function having its corresponding values of m and L 0 10 20 30 40 The nu m ber of hid den neu ro ns being m = 5 and L = 37 respectively. In Fig. 3 we have shown the actual and the predicted time series of MG17 for the time Fig. 5 Error plot for MG17 time series when different activation period from 1052 to 1201. The RMSE curve is plotted for all functions are used with memory order m = 5 the activation functions with the memory order m=5 on the test data set in Fig. 5. sin Similarly for MG30 times series, by varying the memory 1 .0 sig m o id h a rd lm order m = {5,7,9} and the number of hidden 0 .8 m =7 neurons L = {1,3,...,77} for the choice of each one of the activation functions, the best prediction performance was obtained again for the sigmoid activation function but the Testing Error 0 .6 corresponding values of the memory order and the number of 0 .4 hidden neurons being m = 7 and L = 77 respectively. In Fig. 4 we have shown the actual and the predicted time series 0 .2 of MG30 for the time period from 1052 to 1201. Also we have 0 .0 plotted the RMSE curve obtained for each of the activation functions with the memory order m=7 on the test data set in -10 0 10 20 30 40 50 60 70 80 Fig. 6. The num ber of hidden neurons C. Santa Fe-A Time Series Fig. 6 Error plot for MG30 time series when different activation This is a laser time series data set shown in Fig. 7 recorded functions are used with memory order m = 7 from a Far-Infrared-Laser in a chaotic state, which is approximately described by three coupled non-linear ordinary chaotic time series [2,8] generated using the parameter values a=0.2, b=0.1 and τ = 17,30 where τ is the time delay. Let us call these time series corresponding to τ = 17,30 as MG17 and MG30 respectively. In order to avoid the initialization transients, the initial 3500 samples are discarded [8]. We considered 1050 data points corresponding to the sample time period from 3501 to 4550 for training and the sample time period from 4551 to 5000 for testing. In Fig. 1 and Fig. 2 we have shown the time series data values from 1 to 1500 (after discarding the initial Santa Fe A Tiem Series 5 4 3 2 x(t) 1 0 -1 -2 1 112 223 334 445 556 667 778 889 1000 t Fig. 7 The Santa Fe A Laser Time Series 260 International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007 actual value s in s igm oid 5 4 Theoutput value 3 2 1 0 -1 -2 -3 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145 Fig. 8 Predicted result for m=5 when using sin and sigmoid activation functions on Santa Fe-A time series corresponding to the time period from 702 to 851 differential equations2. The data set contains 1000 data points body caused by a beating heart. This time series data set (See Fig. 7). Among them the first 700 data points will be used consists of 3751 sample values. In our experiment data points for training and the remaining 300 points for testing. By varying m = {3,5,7} and L = {1,3,…,81} it was determined ECG Time Series for Training and Testing that the best prediction performance was obtained for the sin 6 activation function for which the corresponding values of the memory order and the number of hidden neurons being m=5 4 and L= 49 respectively. In Fig. 8, we have plotted the actual 2 and the predicted values for sin and sigmoid activation 0 functions corresponding to the time period from 702 to 851. -2 Finally, for m = 5 we have shown the RMSE values for all the -4 activation functions in Fig. 9. -6 -8 D. UCR Time Series Datasets 1 361 721 1081 1441 1801 2161 2521 2881 3241 3601 We repeat our experiments on time series datasets from a Time Index diverse set of domains available from UCR Time Series Data Fig. 10 UCR time series of human heart beat electro cardio- gram Mining Archive3. We consider two important time series datasets both of human heart beat. from 1 to 2251 are taken as the training set and the remaining points from 2252 to 3751 as the test set. As it was explained 0 .9 s in earlier, by varying the memory order m = {3,4,5,6,7,8} and the number of hidden neurons L = {1,...,41} , the best 0 .8 s ig m o id h a rd lim 0 .7 m =5 prediction performance was obtained for the choice of sigmoid 0 .6 activation function for which the corresponding values of the 0 .5 memory order and the number of hidden neurons being Test Error 0 .4 m = 5 and L = 41 respectively. From Fig. 11 we observe 0 .3 that the predicted values are in close agreement with the actual 0 .2 values. 0 .1 0 .0 Actual Time Series Value Predicted Time Series 0 20 40 60 80 T h e n u m b e r o f h id d e n n e u ro n s 8 6 4 Fig. 9 Error plot for Santa Fe-A time series for each of the activation 2 function is used with memory order m = 5 0 -2 -4 i. The Time Series of Human Heart Beat -6 -8 First let us consider the ECG time series of human heart -10 beat dataset shown in Fig. 10. ECGs are time series of the -12 electrical potential between two points on the surface of the 1 147 293 439 585 731 877 1023 1169 1315 1461 Testing Index Fig. 11 Predicted and actual values of human heart beat ECG time 2 This Time Series is available on: httt://www- series for the choice of sigmoid activation function where m = 5 psych.stanford.edu/~andreas/Time-Series/SantaFe.html and L = 41 3 http://www.cs.ucr.edu/~eamonn/time_series_data. 261 International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007 ii. The Time Series of Human Heart Beat (Second dataset) conclude that ELM is a promising method for time series This is the second ECG time series used in our experiment prediction problems. and is shown in Fig. 12. It consists of 3750 data values. In this example, data values from 1 to 2251 are considered for REFERENCES training and the remaining data values from 2252 to 3750 as [1] P. J. Brockwell and R. A. Davis, “Introduction to Time Series the test set. By varying the memory order m = {3,4,5,6,7,8} Forecasting”, 2nd ed., Springer, Berlin, 2002. [2] M.Casdagli, “Nonlinear Prediction of Chaotic Time Series”, Physica D, and the number of hidden neurons L = {1,...,81} ,the best 35, (1989), pp. 335-356. [3] K. Y. Chen and C. H. Wang, “A Hybrid SARIMA and Support Vector prediction performance was obtained for the choice of sigmoid Machines for Forecasting the Production Values of the Machinery activation function having its corresponding values of the Industry in Taiwan”, Expert Systems with Applications, (2006). memory order and the number of hidden neurons being [4] Y. Chen, B.Yang and J.Dong, “Time Series Prediction using a Local m = 6 and L = 61 respectively. Linear Wavelet Neural Network”, Neurocomputing, 69 (2006), pp.449- 465. [5] G. B. Huang, Q. Y. Zhu and C. K. Siew, “Extreme Learning Machine: Raw ECG Time Series for Training and Testing Theory and Applications”, Neurocomputing, 70, (2006), pp.489-501. [6] R.Malhotra and D.K.Malhotra, “Evaluating Consumer Loans Using 0 Neural Networks”, Omega, 31, (2003), pp.83-96. -0.5 [7] N.Mani and P.Voumard, “An Optical Character Recognition Using Artificial Neural Network”, IEEE Int. Conf. on Systems, Man, and -1 Cybernetics, Vol. 3, (1996), pp.2244-2247. -1.5 [8] S.Mukherjee, E.Osuna and F.Girosi, “Nonlinear Prediction of Chaotic Time Series Using Support Vector Machines”, in Neural Networks for -2 Signal Processing VII, Proceed. of the IEEE Signal Processing Society -2.5 Workshop, FL, (1997), pp.511-520. [9] K.R.Muller, A.J.Smola, G.Ratsch, B.Schlkopf and J.Kohlmorgen, -3 “Using Support Vector Machines for Time Series Prediction”, in 1 361 721 1081 1441 1801 2161 2521 2881 3241 3601 B.Schlkopf, C.J.C. Burges and A.J.Smola (Eds), Advances in Kernel Time Index Methods- Support Vector Learning, MIT Press, Cambridge, MA, (1999), pp.243-254. Fig. 12 UCR time series (second dataset) of human heart beat [10] C.R.Rao and S.K.Mitra, Generalized Inverse of Matrices and its electro- cardiogram Applications, Wiley, New York, (1971). [11] H.A.Rowley, S.Baluja and T.Kanade, “Neural Network based Face Detection”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Actual ECG Time Series Value Predicted ECG Time Series Vol.20, No. 1, (1998), pp.23-38. [12] Z.Tang, P.A.Fishwick, “Feedforward Neural Nets as Models for Time 8 Series Forecasting”, ORSA J. Comput. 5(1993), pp.374-385. 6 [13] F.E.H.Tay and L.Cao, “Application of Support Vector Machines in 4 Financial Time Series Forecasting”, Omega 29 (2001), pp.309-317. 2 [14] Q.Tong, H.Zheng and X.Wang, “Gene Prediction Algorithm Based on 0 the Statistical Combination and the Classification in terms of Gene -2 Characteristics”, Int. Conf. on Neural Networks and Brain, Vol.2, -4 (2005), pp.673 – 677. -6 [15] T.B.Trafalis, H.Ince, “Support Vector Machine for Regression and -8 Applications to Financial Forecasting”, Proceedings of the IEEE -10 INNSENNS Int. Joint Conf., Vol.16, IEEE (2000), pp. 348-353. -12 [16] F.M.Tseng, H.C.Yu and G.H.Tzeng, “Combining Neural Network 1 148 295 442 589 736 883 1030 1177 1324 1471 Model with Seasonal Time Series ARIMA Model”, Technological Testing Index Forecasting and Social Change, 69 (2002), pp.71-87. [17] G.P.Zhang, E.B.Patuwo and M.Y.Hu, “A Simulation Study of Artificial Fig. 13 Predicted and actual values of human heart beat (second Neural Networks for Nonlinear Time Series Forecasting”, dataset) ECG time series for the choice of sigmoid activation function Comput.Oper.Res. 28,(2001), pp.381-396. where m = 6 and L = 61 [18] G.P.Zhang, “Time Series Forecasting using a Hybrid ARIMA and Neural Network Model”, Neurocomputing, 50, (2003),pp. 159-175. Fig. 13 illustrates the predicted and the actual values for the test set where the predicted values obtained using ELM and the actual values are shown in thin and thick solid lines respectively. The results show that the predicted values are in close agreement with the actual values. V. CONCLUSION In this paper, we studied the application of Extreme Learning Machine algorithm for chaotic time series generated by the Mackey Glass delay differential equation with different time delays, Santa Fe A and UCR heart beat rate ECG time series. We performed our experiments using sigmoid, sin and hardlim activation functions and demonstrated that the ELM algorithm using sin and sigmoid activation functions can achieve high prediction accuracy. Also from our study we 262

References (18)

P. J. Brockwell and R. A. Davis, "Introduction to Time Series Forecasting", 2 nd ed., Springer, Berlin, 2002.
M.Casdagli, "Nonlinear Prediction of Chaotic Time Series", Physica D, 35, (1989), pp. 335-356.
K. Y. Chen and C. H. Wang, "A Hybrid SARIMA and Support Vector Machines for Forecasting the Production Values of the Machinery Industry in Taiwan", Expert Systems with Applications, (2006).
Y. Chen, B.Yang and J.Dong, "Time Series Prediction using a Local Linear Wavelet Neural Network", Neurocomputing, 69 (2006), pp.449- 465.
G. B. Huang, Q. Y. Zhu and C. K. Siew, "Extreme Learning Machine: Theory and Applications", Neurocomputing, 70, (2006), pp.489-501.
R.Malhotra and D.K.Malhotra, "Evaluating Consumer Loans Using Neural Networks", Omega, 31, (2003), pp.83-96.
N.Mani and P.Voumard, "An Optical Character Recognition Using Artificial Neural Network", IEEE Int. Conf. on Systems, Man, and Cybernetics, Vol. 3, (1996), pp.2244-2247.
S.Mukherjee, E.Osuna and F.Girosi, "Nonlinear Prediction of Chaotic Time Series Using Support Vector Machines", in Neural Networks for Signal Processing VII, Proceed. of the IEEE Signal Processing Society Workshop, FL, (1997), pp.511-520.
K.R.Muller, A.J.Smola, G.Ratsch, B.Schlkopf and J.Kohlmorgen, "Using Support Vector Machines for Time Series Prediction", in B.Schlkopf, C.J.C. Burges and A.J.Smola (Eds), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, (1999), pp.243-254.
C.R.Rao and S.K.Mitra, Generalized Inverse of Matrices and its Applications, Wiley, New York, (1971).
H.A.Rowley, S.Baluja and T.Kanade, "Neural Network based Face Detection", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.20, No. 1, (1998), pp.23-38.
Z.Tang, P.A.Fishwick, "Feedforward Neural Nets as Models for Time Series Forecasting", ORSA J. Comput. 5(1993), pp.374-385.
F.E.H.Tay and L.Cao, "Application of Support Vector Machines in Financial Time Series Forecasting", Omega 29 (2001), pp.309-317.
Q.Tong, H.Zheng and X.Wang, "Gene Prediction Algorithm Based on the Statistical Combination and the Classification in terms of Gene Characteristics", Int. Conf. on Neural Networks and Brain, Vol.2, (2005), pp.673 -677.
T.B.Trafalis, H.Ince, "Support Vector Machine for Regression and Applications to Financial Forecasting", Proceedings of the IEEE INNSENNS Int. Joint Conf., Vol.16, IEEE (2000), pp. 348-353.
F.M.Tseng, H.C.Yu and G.H.Tzeng, "Combining Neural Network Model with Seasonal Time Series ARIMA Model", Technological Forecasting and Social Change, 69 (2002), pp.71-87.
G.P.Zhang, E.B.Patuwo and M.Y.Hu, "A Simulation Study of Artificial Neural Networks for Nonlinear Time Series Forecasting", Comput.Oper.Res. 28,(2001), pp.381-396.
G.P.Zhang, "Time Series Forecasting using a Hybrid ARIMA and Neural Network Model", Neurocomputing, 50, (2003),pp. 159-175.

Application of extreme learning machine method for time series analysis

Sign up for access to the world's latest research

Abstract

Key takeawaysAI

Related papers

References (18)

Related papers

Key takeaways
AI