International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007
Application of Extreme Learning Machine
Method for Time Series Analysis
Rampal Singh, and S. Balasundaram
Time series forecasting is an important and challenging
Abstract—In this paper, we study the application of Extreme problem of regression. In the regression problem by
Learning Machine (ELM) algorithm for single layered feedforward analyzing the given input samples the best fit functional
neural networks to non-linear chaotic time series problems. In this model describing the relationship between the dependent
algorithm the input weights and the hidden layer bias are randomly and independent variables is obtained.
chosen. The ELM formulation leads to solving a system of linear There are many prediction models exist in the literature
equations in terms of the unknown weights connecting the hidden
layer to the output layer. The solution of this general system of
[4], [9], [12] for time series. The important and widely used
linear equations will be obtained using Moore-Penrose generalized among them are Auto Regressive Integrated Moving
pseudo inverse. For the study of the application of the method we Average (ARIMA) [1], ANNs [12], [17] and Support Vector
consider the time series generated by the Mackey Glass delay Regression (SVR) [8], [9], [13], [15] methods. Among the
differential equation with different time delays, Santa Fe A and above methods, ARIMA assumes the existence of a linear
UCR heart beat rate ECG time series. For the choice of sigmoid, relationship in the time series values, i.e. its prediction value
sin and hardlim activation functions the optimal values for the will be a linear function of the past observations and
memory order and the number of hidden neurons which give the therefore it is not always suitable for complex real world
best prediction performance in terms of root mean square error are problems [18]. Also it is proposed to combine several
determined. It is observed that the results obtained are in close
agreement with the exact solution of the problems considered
methods in order to obtain improved forecasting accuracy.
which clearly shows that ELM is a very promising alternative For the study of a hybrid approach of combining ARIMA
method for time series prediction. and ANN for time series forecasting we refer the reader to
[18]. For time series involving seasonality, combining
Keywords—Chaotic time series, Extreme learning machine, Seasonal time series ARIMA (SARIMA) and ANN is
Generalization performance. discussed in [16] and for the study of a combined SARIMA
and SVR approach see [3].
Huang et al [5] have proposed a new learning algorithm
I. INTRODUCTION for Single hidden Layer Feedforward Neural Network
(SLFN) architecture called Extreme Learning Machine
A RTIFICIAL Neural Networks (ANNs) have been
extensively applied for pattern classification and
regression problems. The major reason for the success of
(ELM) which overcomes the problems caused by gradient
descent based algorithms such as BP applied in ANNs. In
ANNs is their ability in obtaining a non-linear approximation this algorithm the input weights and the hidden layer bias
are randomly chosen. The ELM formulation leads to solving
model function describing the association between the
a system of linear equations in terms of the unknown
dependent and independent variables using the given input
weights connecting the hidden layer to the output layer. The
samples. Since ANNs adaptively select the model from the solution of this general system of linear equations is
features presented in the input data, they are applied to a obtained using Moore-Penrose generalized pseudo inverse
large number of classes of problems of importance like [10]. In this work we discuss briefly the ELM algorithm and
optical character recognition [7], face detection [11], gene study its feasibility of application for chaotic time series
prediction [14], credit scoring [6] and time series forecasting prediction problems.
[12], [17].Though ANNs have many advantages such as Throughout this paper, we assume all vectors to be
better approximation capabilities and simple network column vectors. For any two vectors x, y in the m-
structures, however, it suffers from several problems such as dimensional real space ℜ , we denote the inner product of
m
presence of local minima's, imprecise learning rate, selection
the two vectors by x ′. y where x ′ is the transpose of the
of the number of hidden neurons and over fitting. Moreover,
the gradient descent based learning algorithms such as Back vector x and the norm of a vector by || ⋅ || . The paper is
Propagation (BP) will generally lead to slow convergence organized as follows. In Section 2, we define Moore-
during the training of the networks. Penrose generalized inverse, the minimum norm least
squares solution of a general linear system of equations and
state the relation between them. In Section 3, we revive the
Rampal Singh is with Department of Computer Science, Deen Dayal ELM algorithm for SLFN. For the application of this
Upadhyaya College, University of Delhi, New Delhi-110015, India
(phone:+91-11-27570620, 09350647546, e-mail: rpsrana@ddu.du.ac.in). algorithm we considered Mackey Glass delay differential
S. Balasundaram is with School of Computer & Systems Sciences, equation with different time delays, Santa Fe-A and UCR
Jawaharlal Nehru University, New Delhi-110067, India (phone: +91-11- heart beat rate (ECG) time series in Section 4 and the results
26704724, e-mail: balajnu@hotmail.com).
256
International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007
obtained using ELM have been compared with exact Suppose, we are given the training data set
solutions. Finally, we conclude our paper in Section 5. { ( x i , y i ) }i =1, 2 ,..., M where xi = ( xi1 ,..., xim ) ∈ ℜ m
'
II. MOORE - PENROSE GENERALIZED INVERSE denotes the input vector and yi = ( yi1 ,..., yik ) ' ∈ ℜ k is
The solution of a general linear system of equations its corresponding output vector and M is the total number
Ax = y , of input data patterns. Further assume that the values of the
where A may be a singular or rectangular matrix can be weight vectors wi ∈ ℜ mbi ∈ ℜ are and the bias
obtained by the use of the Moore-Penrose generalized randomly assigned. Then, the standard SLFN with L
pseudo inverse. number of hidden neurons approximates the input samples
Definition 2.1 [10]: A matrix G of size n × m is called
with zero error if and only if there exists βi ∈ ℜk so that
the Moore-Penrose generalized pseudo inverse of a given
matrix A of size m × n , if
L
y j = ∑ β i G ( wi , bi , x j ) ∀j = 1,2,..., M . (1)
AGA= A, GAG= G, ( AG) ' = AG, (GA) ' = GA. i =1
+ The above set of equations can be rewritten in the following
In this case, we will denote G by A .
Definition 2.2: For the given general linear system of matrix form as:
equations Ax = y where A is a matrix of size m × n and y Hβ = Y (2)
is a vector in Rm, a vector x* in Rn is called a least squares where
solution if
|| Ax* – y || = min || Ax – y ||
x
Definition 2.3: The vector x* in Rn is called a minimum
norm least squares solution of the general linear
(3)
system Ax = y , if x* must be a least squares solution and
further among all least squares solutions x in Rn
||x*|| ≤ ||x||
must be true.
Theorem 2.1 [10]: Let G be a matrix of size n × m . Then (4)
x* = Gy is a minimum norm least squares solution of the
general linear system Ax = y if and only if G = A+, the
th th
Note that the i column of H will be the output of the i
Moore-Penrose generalized inverse of A. hidden neuron for the inputs x1 , x2 ,..., xM . Further, observe
From the above theorem it is clear that x* = A+ y is the that the matrix H need not be a square matrix.
unique minimum norm least squares solution. Under the assumption that the activation function
g (⋅) is infinitely differentiable, it has been shown in [5] that
III. EXTREME LEARNING MACHINE ALGORITHM
Let us consider an SLFN having L number of hidden for fixed input weight vectors wi and biases bi , the least
neurons. Let G (.,.,.) be a real valued function so that squares solution β for the matrix equation ( 2) with
minimum norm of output weights β can be obtained and
th
G ( wi , bi , x ) be the output of the i hidden neuron with
bias bi ∈ ℜ corresponding to the input vector x ∈ ℜ and that the smallest training error can be reached by the
m
solution β . Moreover, the solution β of the matrix
the weight vector wi = ( wi1 ,..., wim ) where wis is the
'
th
equation ( 2) will be given by
weight of the connection between the i hidden neuron
th βˆ = H +Y
and s neuron of the input layer. It is well known that for
+
feed-forward neural networks, the output function f (⋅) where H is the Moore-Penrose generalized pseudo
inverse of the matrix H. Further, it has been reported in [5]
will be given by
L
that ELM tends to produce better generalization performance
f ( x) = ∑ β i G ( wi , bi , x) , than BP with the main advantage being the decrease in
i =1
computational time while training the network.
Training an SLFN is equivalent to obtaining a minimum
where β i = ( β i1 ,..., β ik ) ∈ ℜ k
'
is the weight vector norm least squares solution of the matrix equation Hβ = Y .
th
connecting the i hidden neuron with the k th neuron of the In the course of learning, once the input weights and the
output layer. Note that for the case of additive hidden hidden layer biases are randomly chosen they will not be
neurons, G (.,.,.) will take the following form: adjusted at all. By Theorem 2.1, the smallest norm least-
squares solution of the above learning machine is obtained
G ( wi , bi , x) = g ( wi′.x + bi ) , when
where g : ℜ → ℜ will be the activation function. In this βˆ = H +Y
work, we assume the case of additive hidden neurons. Since sin and sigmoid are infinitely differentiable
functions the ELM algorithm can be successfully applied by
257
International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007
choosing any one of them as an activation function. and
However, we studied the application of ELM algorithm also
using hardlim activation function in all our experiments.
The ELM algorithm for SLFN can be stated [5] as
follows:
Input: Training set { ( xi , yi ) }i =1, 2,...,M where (6)
xi ∈ ℜ and yi ∈ ℜ ; L the number of hidden neurons
m k
respectively. Note that m determines the dimension of the
and the activation function g (⋅) . input vectors of the ELM algorithm. The time series
1. For i = 1,2,..., L randomly assign the input weight prediction problem may be stated as: for i = 1,..., M we
vector wi ∈ ℜ and bias bi ∈ ℜ .
m predict the target signal value yi = x((i + m)τ ) ∈ ℜ
corresponding to the auto corrected input vector xi ∈ ℜ .
m
2. Determine the matrix H defined by the equation (3) .
+ Observe that the number of neurons in the output layer is
3. Calculate H .
k = 1.
4. Calculate the output weights matrix β by
In order to demonstrate the effectiveness of ELM learning
βˆ = H +Y , algorithm we have taken the time series generated by the
Mackey Glass delay differential equation with different
where Y is given by the equation ( 4) .
delays[2], Santa Fe A and UCR heart beat rate chaotic time
Output: The Single hidden Layer Feedforward neural series datasets. We have performed our experiments by
Network (SLFN) with the determined output weight vectors
choosing the sigmoid, sin and hardlim activation functions
βi ∈ ℜk for the randomly chosen weight in the ELM learning algorithm. We use the Root Mean
Square Error (RMSE) to evaluate the prediction performance
vectors wi ∈ ℜ and biases bi ∈ ℜ for i
m
= 1,2,..., L . of ELM. This is calculated using the following formula
given by:
For any input sample x ∈ ℜ the output value y can
m
n
∑(y
be calculated using the following formula:
L RMSE = 1
n i −~
yi ) 2 ,
y = ∑ βˆi g ( wi′.x + bi ) i =1
i =1 where n is the number of test data and yi and ~
yi are the
where wi , bi and the activation function g(.) are input and actual and predicted values of the time series respectively.
For Mackey Glass and Santa Fe A heart beat rate time series
the weight vectors β i ∈ ℜ
k
are the output of the ELM datasets, the first 70% of the total number of data values for
algorithm. training and the remaining data values for testing are used.
However, for UCR time series dataset 60% of the total
IV. EXPERIMENTS AND RESULTS number of sample values for training and the remaining
samples for testing are used. In all our experiments we
A. Preprocessing of the Data
applied the ELM source code1 written in Matlab.
Time series prediction is the problem of determining a
function having the underlying relationship between the For choosing the memory order ( m) and the number of
previous values and the next value. Suppose N hidden neurons (L) of the ELM network parameters, we
observations {x (iτ )}i =1, 2,..., N of the time series x(t ) are vary m and L over a set of predefined values and
given with time delay τ . In all our experiments first the determine the pair of values for m and L which gives the
best performance based on the criteria of the RMSE on the
original data is normalized with zero mean and standard
test set. This is performed for each of the transfer functions,
deviation equals to one. Then the normalized data is
i.e. for sigmoid, sin and hardlim functions, and the best
transformed into auto corrected data, i.e. for a given positive
results obtained are reported.
integer value m and i = 1,..., ( N − m) we define the auto
corrected input vector B. MG17, MG30 Time Series
xi = ( x(iτ ), x((i + 1)τ ),..., x((i + m − 1)τ ))′ ∈ ℜm consists Consider the Mackey-Glass time delay differential
equation [2,8] given by
of the previous signal values. Here m is called the
embedding dimension or memory order. The normalized
∂x(t ) x(t − τ )
auto corrected input vectors and their corresponding output = −bx(t ) + a ,
values can be represented in the following matrix form ∂t 1 + x(t − τ )10
where a, b are parameters and τ is the time delay. We
study the application of ELM algorithm on two time series
generated
(5)
1
http://www.ntu.edu.sg/home/egbhuang
258
International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007
Mackey Glass 17 Time Series
2
1.5
1
0.5
0
x(t)
-0.5
-1
-1.5
-2
-2.5
1 137 273 409 545 681 817 953 1089 1225 1361 1497
t
Fig. 1 The Mackey Glass Time Series with time delay τ = 17
Mackey Glas s 30 Tim e Series
2
1
0
x(t)
-1
-2
-3
1 137 273 409 545 681 817 953 1089 1225 1361 1497
t
Fig. 2 The Mackey Glass Time Series with time delay τ = 30
actual value sigmoid sin
6
4
e
lu
2
a
tv
0
u u
tp
-2
heo
-4
T
-6
-8
1 22 43 64 85 106 127 148
Fig. 3 Predicted result for m = 5 when using sin and sigmoid activation functions for MG17 time series
corresponding to the time period from 1052 to 1201
actual value s igm oid s in
6
4
The output value
2
0
-2
-4
-6
-8
1 22 43 64 85 106 127 148
Fig. 4 Predicted result for m=7 when sin and sigmoid activation functions are used for MG30 time series
corresponding to the time period from 1052 to 1201
by the above differential equation which are widely used as
benchmark data set values for analyzing the generalization
ability of the method of prediction. For this, consider the
259
International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007
s in 3500 samples) for MG17 and MG30 time series respectively
1 .0 s ig m o id
h a rd lm
where the first 1050 samples are taken for training and the
m =5 remaining 450 samples for testing. Experiments were
0 .8
performed using all the activation functions namely sin,
sigmoid and hardlim functions. As it was discussed earlier, the
Testing Error
0 .6
best prediction performance for MG17 time series was obtained
0 .4 by varying the memory order m = {5,7,9} and the number of
0 .2
hidden neurons L = {1,3,...,41} for the choice of each one of
the above activation functions. It was found that the best
0 .0 prediction performance was obtained for the sigmoid
activation function having its corresponding values of m and L
0 10 20 30 40
The nu m ber of hid den neu ro ns
being m = 5 and L = 37 respectively. In Fig. 3 we have shown
the actual and the predicted time series of MG17 for the time
Fig. 5 Error plot for MG17 time series when different activation period from 1052 to 1201. The RMSE curve is plotted for all
functions are used with memory order m = 5 the activation functions with the memory order m=5 on the
test data set in Fig. 5.
sin Similarly for MG30 times series, by varying the memory
1 .0 sig m o id
h a rd lm
order m = {5,7,9} and the number of hidden
0 .8
m =7 neurons L = {1,3,...,77} for the choice of each one of the
activation functions, the best prediction performance was
obtained again for the sigmoid activation function but the
Testing Error
0 .6
corresponding values of the memory order and the number of
0 .4
hidden neurons being m = 7 and L = 77 respectively. In
Fig. 4 we have shown the actual and the predicted time series
0 .2
of MG30 for the time period from 1052 to 1201. Also we have
0 .0
plotted the RMSE curve obtained for each of the activation
functions with the memory order m=7 on the test data set in
-10 0 10 20 30 40 50 60 70 80 Fig. 6.
The num ber of hidden neurons
C. Santa Fe-A Time Series
Fig. 6 Error plot for MG30 time series when different activation This is a laser time series data set shown in Fig. 7 recorded
functions are used with memory order m = 7 from a Far-Infrared-Laser in a chaotic state, which is
approximately described by three coupled non-linear ordinary
chaotic time series [2,8] generated using the parameter values
a=0.2, b=0.1 and τ = 17,30 where τ is the time delay. Let
us call these time series corresponding to τ = 17,30 as
MG17 and MG30 respectively.
In order to avoid the initialization transients, the initial
3500 samples are discarded [8]. We considered 1050 data
points corresponding to the sample time period from 3501 to
4550 for training and the sample time period from 4551 to
5000 for testing. In Fig. 1 and Fig. 2 we have shown the time
series data values from 1 to 1500 (after discarding the initial
Santa Fe A Tiem Series
5
4
3
2
x(t)
1
0
-1
-2
1 112 223 334 445 556 667 778 889 1000
t
Fig. 7 The Santa Fe A Laser Time Series
260
International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007
actual value s in s igm oid
5
4
Theoutput value
3
2
1
0
-1
-2
-3
1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145
Fig. 8 Predicted result for m=5 when using sin and sigmoid activation functions on Santa Fe-A time series corresponding to the time period
from 702 to 851
differential equations2. The data set contains 1000 data points body caused by a beating heart. This time series data set
(See Fig. 7). Among them the first 700 data points will be used consists of 3751 sample values. In our experiment data points
for training and the remaining 300 points for testing. By
varying m = {3,5,7} and L = {1,3,…,81} it was determined ECG Time Series for Training and Testing
that the best prediction performance was obtained for the sin
6
activation function for which the corresponding values of the
memory order and the number of hidden neurons being m=5 4
and L= 49 respectively. In Fig. 8, we have plotted the actual 2
and the predicted values for sin and sigmoid activation 0
functions corresponding to the time period from 702 to 851. -2
Finally, for m = 5 we have shown the RMSE values for all the -4
activation functions in Fig. 9. -6
-8
D. UCR Time Series Datasets 1 361 721 1081 1441 1801 2161 2521 2881 3241 3601
We repeat our experiments on time series datasets from a Time Index
diverse set of domains available from UCR Time Series Data
Fig. 10 UCR time series of human heart beat electro cardio- gram
Mining Archive3. We consider two important time series
datasets both of human heart beat.
from 1 to 2251 are taken as the training set and the remaining
points from 2252 to 3751 as the test set. As it was explained
0 .9
s in earlier, by varying the memory order m = {3,4,5,6,7,8} and
the number of hidden neurons L = {1,...,41} , the best
0 .8 s ig m o id
h a rd lim
0 .7 m =5
prediction performance was obtained for the choice of sigmoid
0 .6
activation function for which the corresponding values of the
0 .5
memory order and the number of hidden neurons being
Test Error
0 .4
m = 5 and L = 41 respectively. From Fig. 11 we observe
0 .3
that the predicted values are in close agreement with the actual
0 .2
values.
0 .1
0 .0 Actual Time Series Value Predicted Time Series
0 20 40 60 80
T h e n u m b e r o f h id d e n n e u ro n s 8
6
4
Fig. 9 Error plot for Santa Fe-A time series for each of the activation 2
function is used with memory order m = 5 0
-2
-4
i. The Time Series of Human Heart Beat -6
-8
First let us consider the ECG time series of human heart -10
beat dataset shown in Fig. 10. ECGs are time series of the -12
electrical potential between two points on the surface of the 1 147 293 439 585 731 877 1023 1169 1315 1461
Testing Index
Fig. 11 Predicted and actual values of human heart beat ECG time
2
This Time Series is available on: httt://www- series for the choice of sigmoid activation function where m = 5
psych.stanford.edu/~andreas/Time-Series/SantaFe.html and L = 41
3
http://www.cs.ucr.edu/~eamonn/time_series_data.
261
International Journal of Intelligent Systems and Technologies 2;4 © www.waset.org Fall 2007
ii. The Time Series of Human Heart Beat (Second dataset) conclude that ELM is a promising method for time series
This is the second ECG time series used in our experiment prediction problems.
and is shown in Fig. 12. It consists of 3750 data values. In this
example, data values from 1 to 2251 are considered for REFERENCES
training and the remaining data values from 2252 to 3750 as [1] P. J. Brockwell and R. A. Davis, “Introduction to Time Series
the test set. By varying the memory order m = {3,4,5,6,7,8} Forecasting”, 2nd ed., Springer, Berlin, 2002.
[2] M.Casdagli, “Nonlinear Prediction of Chaotic Time Series”, Physica D,
and the number of hidden neurons L = {1,...,81} ,the best 35, (1989), pp. 335-356.
[3] K. Y. Chen and C. H. Wang, “A Hybrid SARIMA and Support Vector
prediction performance was obtained for the choice of sigmoid
Machines for Forecasting the Production Values of the Machinery
activation function having its corresponding values of the Industry in Taiwan”, Expert Systems with Applications, (2006).
memory order and the number of hidden neurons being [4] Y. Chen, B.Yang and J.Dong, “Time Series Prediction using a Local
m = 6 and L = 61 respectively. Linear Wavelet Neural Network”, Neurocomputing, 69 (2006), pp.449-
465.
[5] G. B. Huang, Q. Y. Zhu and C. K. Siew, “Extreme Learning Machine:
Raw ECG Time Series for Training and Testing Theory and Applications”, Neurocomputing, 70, (2006), pp.489-501.
[6] R.Malhotra and D.K.Malhotra, “Evaluating Consumer Loans Using
0 Neural Networks”, Omega, 31, (2003), pp.83-96.
-0.5 [7] N.Mani and P.Voumard, “An Optical Character Recognition Using
Artificial Neural Network”, IEEE Int. Conf. on Systems, Man, and
-1 Cybernetics, Vol. 3, (1996), pp.2244-2247.
-1.5 [8] S.Mukherjee, E.Osuna and F.Girosi, “Nonlinear Prediction of Chaotic
Time Series Using Support Vector Machines”, in Neural Networks for
-2 Signal Processing VII, Proceed. of the IEEE Signal Processing Society
-2.5 Workshop, FL, (1997), pp.511-520.
[9] K.R.Muller, A.J.Smola, G.Ratsch, B.Schlkopf and J.Kohlmorgen,
-3 “Using Support Vector Machines for Time Series Prediction”, in
1 361 721 1081 1441 1801 2161 2521 2881 3241 3601 B.Schlkopf, C.J.C. Burges and A.J.Smola (Eds), Advances in Kernel
Time Index Methods- Support Vector Learning, MIT Press, Cambridge, MA, (1999),
pp.243-254.
Fig. 12 UCR time series (second dataset) of human heart beat [10] C.R.Rao and S.K.Mitra, Generalized Inverse of Matrices and its
electro- cardiogram Applications, Wiley, New York, (1971).
[11] H.A.Rowley, S.Baluja and T.Kanade, “Neural Network based Face
Detection”, IEEE Trans. on Pattern Analysis and Machine Intelligence,
Actual ECG Time Series Value Predicted ECG Time Series Vol.20, No. 1, (1998), pp.23-38.
[12] Z.Tang, P.A.Fishwick, “Feedforward Neural Nets as Models for Time
8 Series Forecasting”, ORSA J. Comput. 5(1993), pp.374-385.
6 [13] F.E.H.Tay and L.Cao, “Application of Support Vector Machines in
4 Financial Time Series Forecasting”, Omega 29 (2001), pp.309-317.
2 [14] Q.Tong, H.Zheng and X.Wang, “Gene Prediction Algorithm Based on
0 the Statistical Combination and the Classification in terms of Gene
-2 Characteristics”, Int. Conf. on Neural Networks and Brain, Vol.2,
-4 (2005), pp.673 – 677.
-6
[15] T.B.Trafalis, H.Ince, “Support Vector Machine for Regression and
-8
Applications to Financial Forecasting”, Proceedings of the IEEE
-10
INNSENNS Int. Joint Conf., Vol.16, IEEE (2000), pp. 348-353.
-12
[16] F.M.Tseng, H.C.Yu and G.H.Tzeng, “Combining Neural Network
1 148 295 442 589 736 883 1030 1177 1324 1471
Model with Seasonal Time Series ARIMA Model”, Technological
Testing Index Forecasting and Social Change, 69 (2002), pp.71-87.
[17] G.P.Zhang, E.B.Patuwo and M.Y.Hu, “A Simulation Study of Artificial
Fig. 13 Predicted and actual values of human heart beat (second Neural Networks for Nonlinear Time Series Forecasting”,
dataset) ECG time series for the choice of sigmoid activation function Comput.Oper.Res. 28,(2001), pp.381-396.
where m = 6 and L = 61
[18] G.P.Zhang, “Time Series Forecasting using a Hybrid ARIMA and
Neural Network Model”, Neurocomputing, 50, (2003),pp. 159-175.
Fig. 13 illustrates the predicted and the actual values for the
test set where the predicted values obtained using ELM and
the actual values are shown in thin and thick solid lines
respectively. The results show that the predicted values are in
close agreement with the actual values.
V. CONCLUSION
In this paper, we studied the application of Extreme
Learning Machine algorithm for chaotic time series generated
by the Mackey Glass delay differential equation with different
time delays, Santa Fe A and UCR heart beat rate ECG time
series. We performed our experiments using sigmoid, sin and
hardlim activation functions and demonstrated that the ELM
algorithm using sin and sigmoid activation functions can
achieve high prediction accuracy. Also from our study we
262