US20070294247A1 - Identifying optimal multi-scale patterns in time-series streams - Google Patents
Identifying optimal multi-scale patterns in time-series streams Download PDFInfo
- Publication number
- US20070294247A1 US20070294247A1 US11/471,002 US47100206A US2007294247A1 US 20070294247 A1 US20070294247 A1 US 20070294247A1 US 47100206 A US47100206 A US 47100206A US 2007294247 A1 US2007294247 A1 US 2007294247A1
- Authority
- US
- United States
- Prior art keywords
- time series
- series data
- window
- current window
- functions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000006870 function Effects 0.000 claims abstract description 54
- 238000000034 method Methods 0.000 claims abstract description 44
- 239000013598 vector Substances 0.000 claims description 47
- 239000011159 matrix material Substances 0.000 claims description 31
- 230000010365 information processing Effects 0.000 claims description 15
- 238000000513 principal component analysis Methods 0.000 claims 3
- 238000012545 processing Methods 0.000 description 28
- 230000008569 process Effects 0.000 description 14
- 238000013459 approach Methods 0.000 description 13
- 230000000875 corresponding effect Effects 0.000 description 11
- 238000004590 computer program Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 238000013500 data storage Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 230000000737 periodic effect Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 230000003109 amnesic effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003534 oscillatory effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000000611 regression analysis Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 238000000700 time series analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
Definitions
- the present invention generally relates to the field of stream processing systems, and more particularly relates to identifying optimal local patterns in a data stream time series.
- Typical approaches for pattern discovery and summarization of time series rely on fixed transforms, with a predetermined set of bases or approximating functions, as described in (S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, unsupervised stream mining. VLDB J., 13(3), 2004; M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD, 2004, Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In VLDB, 2002; T. Palpanas, M. Vlachos, E.
- Wavelets use translated and dilated sine-like waves and have been successfully applied to more bursty data, such as images and video streams
- these approaches assume a fixed-length, sliding window.
- short-window Fourier cannot reveal anything about-periods larger than the sliding window length.
- Wavelets are by nature multi-scale, but they still use a fixed set of bases, which is also often hard to choose.
- a method, system, and computer readable medium for identifying local patterns in at least one time series data stream.
- the method includes generating multiple ordered levels of hierarchal approximation functions.
- the multiple ordered levels are generated directly from at least one given time series data stream including at least one set of time series data.
- the hierarchical approximation functions for each level of the multiple levels is based upon creating a set of approximating functions.
- the hierarchical approximation functions are also based upon selecting a current window with a current window length from a set of varying window lengths. The current window is selected for a current level of the multiple levels.
- a system for identifying local patterns in at least one time series data stream includes at least one information processing system.
- the information processing system includes a memory and a data stream analyzer communicatively coupled to the memory.
- the data stream analyzer generates multiple ordered levels of hierarchal approximation functions directly from at least one given time series data stream including at least one set of time series data.
- the hierarchical approximation functions for each level of the multiple levels is based upon creating a set of approximating functions and selecting a current window with a current window length from a set of varying window lengths. The current window is selected for a current level of the multiple levels.
- a computer readable medium for identifying local patterns in at least one time series data stream.
- the computer readable medium includes instructions for generating multiple ordered levels of hierarchal approximation functions.
- the multiple ordered levels are generated directly from at least one given time series data stream including at least one set of time series data.
- the hierarchical approximation functions for each level of the multiple levels is based upon creating a set of approximating functions.
- the hierarchical approximation functions are also based upon selecting a current window with a current window length from a set of varying window lengths. The current window is selected for a current level of the multiple levels.
- One advantage of the present invention is that an optimal orthonormal transform is determined from data itself, as opposed to using a predetermined basis or approximating function (such as piecewise constant, short-window Fourier or wavelets).
- Another advantage of the present invention is that it provides a hierarchical, recursive summarization or approximation of the stream that examines the time series at multiple time scales (i.e., window sizes) and efficiently discovers the key patterns in each, as well as the key windows. Besides providing insight about the behavior of the time series by concisely describing the main trends in a time series, the discovered patterns can also be used to facilitate further data processing.
- FIG. 1 is a diagram illustrating an exemplary stream processing system, according to an embodiment of the present invention
- FIG. 2 is a diagram illustrating a more detailed view of an information processing system, according to an embodiment of the present invention
- FIG. 3 is an exemplary x-y graph illustrating singular value decomposition, according to an embodiment of the present invention.
- FIG. 4 illustrates local patterns for an exemplary fixed window
- FIG. 5 is an exemplary graph illustrating a power profile a sine wave, according to an embodiment of the present invention.
- FIG. 6 illustrates multi-scale pattern discovery, according to an embodiment of the present invention
- FIG. 7 is a simplified version of the multi-scale pattern discovery illustrated in FIG. 6 ;
- FIG. 8 an operational flow diagram illustrating an exemplary process of identifying local patterns in one or more time series data streams, according to an embodiment of the present invention.
- the present invention as would be known to one of ordinary skill in the art could be produced in hardware or software, or in a combination of hardware and software. However in one embodiment the invention is implemented in software.
- the system, or method, according to the inventive principles as disclosed in connection with the preferred embodiment may be produced in a single computer system having separate elements or means for performing the individual functions or steps described or claimed or one or more elements or means combining the performance of any of the functions or steps disclosed or claimed, or may be arranged in a distributed computer system, interconnected by any suitable means as would be known by one of ordinary skill in the art.
- the invention and the inventive principles are not limited to any particular kind of computer system but may be used with any general purpose computer, as would be known to one of ordinary skill in the art, arranged to perform the functions described and the method steps described.
- the operations of such a computer, as described above, may be according to a computer program contained on a medium for use in the operation or control of the computer, as would be known to one of ordinary skill in the art.
- the computer medium which may be used to hold or contain the computer program product, may be a fixture of the computer such as an embedded memory or may be on a transportable medium such as a disk, as would be known to one of ordinary skill in the art.
- any such computing system can include, inter alia, at least a computer readable medium allowing a computer to read data, instructions, messages or message packets, and other computer readable information from the computer readable medium.
- the computer readable medium may include non-volatile memory, such as ROM, Flash memory, floppy disk, Disk drive memory, CD-ROM, and other permanent storage. Additionally, a computer readable medium may include, for example, volatile storage such as RAM, buffers, cache memory, and network circuits.
- the computer readable medium may include computer readable information in a transitory state medium such as a network link and/or a network interface, including a wired network or a wireless network that allows a computer to read such computer readable information.
- a transitory state medium such as a network link and/or a network interface, including a wired network or a wireless network that allows a computer to read such computer readable information.
- an exemplary stream processing system 100 determines optimal local patterns that describe the main trends in a time series data stream.
- the stream processing system 100 examines the time series at multiple scales (i.e. window sizes) and efficiently discovers the key patterns in each.
- the stream processing system 100 also selects a window size out of a set of window sizes that most concisely captures the key oscillatory as well as aperiodic trends.
- the stream processing system 100 is a distributed system that can operate in an SMP computing environment. It should be noted that the present invention is not limited to a distributed processing system and it is within the true scope and spirit of the invention to be implemented on other processing architectures.
- the stream processing system 100 includes data streams 140 , 142 , 144 , which in one embodiment, are time series data streams comprising one or more sets of time series data.
- the stream processing system 100 can execute on a plurality of processing nodes 102 , 104 coupled to one another node via a plurality of network adapters 106 , 108 .
- Each processing node 102 , 104 is an independent computer with its own operating system image 110 , 112 , channel controller 114 , 116 , memory 118 , 120 , and processor(s) 122 , 124 on a system memory bus 126 , 128 .
- a system input/output bus 130 , 132 couples I/O adapters 134 , 136 and network adapter 106 , 108 .
- I/O adapters 134 , 136 couples I/O adapters 134 , 136 and network adapter 106 , 108 .
- processor 122 , 124 is shown in each processing node 102 , 104 , each processing node 102 , 104 is capable of having more than one processor.
- Each network adapter is linked together via a network switch 138 .
- the various processing nodes 102 , 104 are able to be part of a processing cluster. All of these variations are considered a part of the claimed invention.
- FIG. 2 is a block diagram illustrating a more detailed view of the information processing system 102 , according to the present invention.
- the information processing system 102 is based upon a suitably configured processing system adapted to implement the exemplary embodiment of the present invention. Any suitably configured processing system is similarly able to be used as the information processing system 102 by embodiments of the present invention, for example, a personal computer, workstation, or the like.
- the information processing system 102 includes a computer 202 .
- the computer 202 has a processor 122 that is connected to the main memory 118 and the channel controller 114 via the system bus 230 .
- the computer 202 also includes a mass storage interface 204 , terminal interface 206 , 1 /O adapter 134 , and network adapter hardware 106 .
- the mass storage interface 204 is used to connect mass storage devices, such as data storage device 208 , to the information processing system 102 system.
- One specific type of data storage device is a computer readable medium such as a CD drive, which may be used to store data to and read data from a CD or DVD 210 or floppy diskette CD (not shown).
- Another type of data storage device is a data storage device configured to support, for example, NTFS type file system operations.
- main memory 118 comprises a data stream analyzer 212 for determining optimal local patterns which describe the main trends in a time series data stream 140 , 142 , 144 .
- the data stream analyzer 212 in one embodiment, includes an approximation function estimator 214 .
- the approximation function estimator 214 determines appropriate approximating functions directly from data in a data stream. This process is discussed in greater detail below.
- the data stream analyzer 212 implements a hierarchical, recursive summarization or approximation of a data stream 140 , 142 , 144 that examines the time series at multiple time scales (i.e., window sizes) and efficiently discovers the key patterns in each via a local pattern identifier 216 , as well as the key windows via a window size comparator 218 and a window size selector 220 .
- the local pattern identifier 216 identifies locally optimal patterns within each window and the window size comparator 218 , in one embodiment, comparator the information captured by each of these patterns across various windows sizes.
- the window size selector 220 selects the optimal window sizes that most concisely capture the key oscillatory as well as periodic trends based on the window size comparison.
- the selection of the appropriate window sizes and approximating function can be performed across levels (i.e., not only within levels).
- the processes performed by the data stream analyzer 212 and each of its components are discussed in greater detail below. Besides providing insight about the behavior of the time series by concisely describing the main trends in a time series, the discovered patterns can also be used to facilitate further data processing.
- the data stream analyzer 212 can also perform fast, incremental estimation in a streaming setting.
- the information processing system 102 utilizes conventional virtual addressing mechanisms to allow programs to behave as if they have access to a large, single storage entity, referred to herein as a computer system memory, instead of access to multiple, smaller storage entities such as the main memory 118 and data storage device 208 .
- a computer system memory is used herein to generically refer to the entire virtual memory of the information processing system 102 .
- Embodiments of the present invention further incorporate interfaces that each includes separate, fully programmed microprocessors that are used to off-load processing from the CPU 122 .
- Terminal interface 206 is used to directly connect one or more terminals 224 to computer 202 to provide a user interface to the computer 202 .
- These terminals 224 which are able to be non-intelligent or fully programmable workstations, are used to allow system administrators and users to communicate with the information processing system 102 .
- the terminal 224 is also able to consist of user interface and peripheral devices that are connected to computer 202 and controlled by terminal interface hardware included in the terminal I/F 206 that includes video adapters and interfaces for keyboards, pointing devices, and the like.
- An operating system (not shown) included in the main memory is a suitable multitasking operating system such as the Linux, UNIX, Windows XP, and Windows Server 2001 operating system.
- Embodiments of the present invention are able to use any other suitable operating system.
- Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system (not shown) to be executed on any processor located within the processing node 102 .
- the network adapter hardware 206 is used to provide an interface to a network 226 .
- Embodiments of the present invention are able to be adapted to work with any data communications connections including present day analog and/or digital techniques or via a future networking mechanism.
- SVD singular value decomposition
- v j is a unit-length vector defining one of the axes in the row space
- the columns v i of V ⁇ [v 1 . . . v r ] are the right singular vectors of A and they form an orthonormal basis its row space.
- the columns u i of U ⁇ [u 1 . . . u r ] are the left singular vectors and form a basis of the column space of A.
- ⁇ diag[ ⁇ 1 . . . ⁇ r ] is a diagonal matrix with positive values ⁇ i , called the singular values of A.
- V (w) Right singular vectors of X (w) . ⁇ (w) Singular values of X (w) .
- U (w) Left singular vectors of X (w) . k Dimension of approximating subspace. ⁇ tilde over (V) ⁇ (w) Same as V (w) , ⁇ (w) , U (w) but only with k highest ⁇ tilde over ( ⁇ ) ⁇ (w) singular values and vectors.
- the data stream analyzer 212 find the patterns that best summarize the series at this window size.
- the patterns in one embodiment, are w-dimensional vectors v i ⁇ [v i,1 , . . . v i,w ] T ⁇ R w chosen so that they capture “most” of the information in the series. The process of choosing the patterns is discussed in greater detail below.
- the right window size is not known a priori.
- x nor X (w) need to be fully materialized at any point in time.
- the last row of X (w) is stored. Note that non-overlapping windows are chosen.
- X (w) has m ⁇ w+1 rows, with row t comprising values x t , x t+1 , . . . x t+w .
- row t comprising values x t , x t+1 , . . . x t+w .
- non-overlapping windows are equally effective for pattern discovery and also lend themselves better to incremental, streaming estimation using limited resources.
- the original time series does not have to be scalar, but can also be vector-valued itself.
- the same process is performed, but only each row of X (w) is now a concatenation of rows of X(instead of a concatenation of scalar values).
- the general time-delay coordinate matrix is constructed as follows:
- DELAY (X ⁇ R m ⁇ n , w), which concatenates w consecutive rows of X (comprising of n numbers) into one row of X (w) (comprising of n ⁇ w numbers).
- the above SVD update algorithm is only used as an example and in no way limits the present invention.
- the above algorithm does not need to store the left singular vectors. Because the data stream analyzer 212 finds patterns at multiple scales without an upper bound on the window size, an algorithm that does not need to store the left singular vectors is a suitable choice.
- the SVD update algorithm is not limited to an algorithm that does not need to store the left singular vectors.
- an exponential forgetting scheme can be incorporated.
- the algorithm updates k-n numbers. Therefore, in one embodiment, the total space requirements are O(nk) and the time per update is also O(nk).
- the incremental update algorithms in one embodiment, need only the observed values and can therefore easily handle missing values by imputing them based on current estimates of the singular vectors.
- FIG. 4 illustrates local patterns for a fixed window.
- X (w) is transferred to time-delay coordinates.
- the local patterns in one embodiment, are the right singular vectors of X (w) , which are optimal in the sense that they minimize the total squared approximation error of the rows X (t) (w) .
- An exemplar algorithm for identifying local patterns is given below.
- the projections ⁇ tilde over (P) ⁇ (w) onto the local patterns ⁇ tilde over (v) ⁇ (i) are discussed in greater detail below.
- the above algorithm for identifying local patterns can be applied in general to n-dimensional vector-valued series.
- the pseudocode is the same, since the DELAY algorithm discussed above can also operate on matrices X ⁇ R m ⁇ n .
- the power can be defined, which is an estimate the error per window element.
- this is an estimate of the variance per dimension, assuming that the discarded dimensions correspond to isotropic Gaussian noise (i.e., uncorrelated with same variance in each dimension).
- the trend does not comprise of exact copies, the power is not zero, but it still exhibits a sharp drop. This fact is used when choosing the “best” window.
- the power profile ⁇ (w) versus w is determined.
- the data stream analyzer 212 determines the optimal local patterns for multiple windows (as well as the associated power profiles) in order to determine the optimal window size.
- the size of the window set W needed to be examined is dramatically reduced.
- this is still computationally expensive (for each window O(ktw)time is still needed) and all points (needed for large window sizes, close to the time series length) are required to be buffered.
- this complexity can be reduced even further.
- FIG. 7 shows a more simplified version of the multi-scale pattern discovery of FIG. 6 .
- the na ⁇ ve approach is to construct X (200) from scratch and determine the SVD.
- the patterns found from X (100) can be reused.
- x (100) can be reduced into just two points, namely their projections p 1,1 (100) and p 1,2 (100) onto v 1 (100) and v 2 (100) , respectively.
- X (200) can be represented with just four numbers, x (1) (100,1) ⁇ [p 1,1 (100) , p 1,2 (100) , p 2,1 (100) , p 2,2 (100) ] T ⁇ R 4 .
- FIG. 7 shows a more simplified version of the process discussed above with respect to FIG. 6 .
- FIG. 7 shows a data stream 700 that comprises a set of times series data.
- the time series is broken into a given number of hierarchical levels 702 , 704 , 706 .
- One or more local approximations 708 are determined directly from the data in the first data stream 700 .
- approximating functions and a window size 710 and an approximation error 712 can be determined.
- the next hierarchical level 704 in one embodiment, can be passed the local approximation results such as the approximation functions and window size 710 from the previous level 702 for determining the local approximations for the next time series subset 714 .
- the local approximations 716 for the time series subset 714 results in approximation functions and a window size(s) 718 and an approximation error 720 .
- the approximation functions and window size(s) 718 can be passed to the next level 706 for determining a local approximation(s) 724 for the next time series subset 722 . This process can be continued for each time series subset.
- the patterns extracted from X (100,1) are four-dimensional vectors v i (100,1) ⁇ R 4
- the patterns for X (200) are 200-dimensional vectors v i (200) ⁇ R (200) .
- v (100,1) and v i (100,0) ⁇ v i (100) can be combined to estimate v (200) .
- the level-l in one embodiment, is defined recursively by
- a linear combination of the columns ⁇ tilde over (V) ⁇ (100,0) ⁇ V 0 (100,0) weighted according to v 1 (100,1) [3:4] gives the second half of the 200-dimensional pattern v 0 1 (200,1) [101: 200](right slanted entries in FIG. 6 ).
- the columns of V 0 (100,1) are combined according to v 1 (100,2) [1:2] (for the first half, v 0 1 (100,2) [1:200]) and v 1 (100,2) [3:4] (for the second half, v 0 1 (100,2) [201:400]) and son on, for the higher levels.
- the maximum level L is determined based on the length m of the time series so far, L ⁇ log W (m/w 0 ).
- k is desired to be relatively small since it determines the buffering requirements of the streaming approach.
- k 6.
- a good compromise is a value in the range 10 ⁇ w 0 ⁇ 20.
- the total time for the hierarchical approach is O(k 2 t), i.e., linear with respect to the time series length. Even though this is an improvement over the O(t 3 k)time of the non-hierarchical approach, all of the points, in one embodiment, are buffered, as is discussed below.
- the data stream analyzer 212 performs a hierarchical, recursive summarization or approximation of the stream that examines the time series at multiple time scales (i.e., window sizes) and efficiently discovers the key patterns in each, as well as the key windows.
- time scales i.e., window sizes
- the procedure for examining the time series at multiple scales is discussed.
- only one iteration of each loop in IncrementalSVD (for LocalPattern) and in Hierarchical is recursively invoked, as soon as the necessary number of points has arrived. Subsequently, these points are discarded and proceed with the next non-overlapping window.
- consecutive points of x(or, in general, rows of X) are buffered until w of them are accumulated thereby forming one row of X (w)) .
- one iteration of the outer loop in IncrementalSVD is performed to update all k local patterns.
- the w points (or rows) are discarded and proceed with the next w.
- the first k rows of X (w) can be chosen to be initially buffered and used to bootstrap the SVD estimates, which are subsequently updated.
- FIG. 8 shows an exemplary process of identifying local patterns in one or more time series data streams.
- the operational flow diagram of FIG. 8 begins at step 802 and flows directly to step 804 .
- the information processing system 102 receives a data stream comprising a set of time series data.
- the data stream analyzer 212 breaks the set of time series data into a given number of hierarchical levels l.
- the data stream analyzer 212 creates a set of nested summaries for each of the hierarchical level l.
- the creation of the nested summaries comprises the following.
- the data stream analyzer 212 determines the approximation function(s) for a portion of the set of time series data directly from the time series data.
- the data stream analyzer 212 determines the approximation error between the current window and the set of time series data portion.
- the data stream analyzer 212 passes the approximation function(s) determined at step 810 to the next hierarchical level as a set of time series data. The control flow then exits at step 814 .
- the present invention can be realized in hardware, software, or a combination of hardware and software.
- a system according to a preferred embodiment of the present invention can be realized in a centralized fashion in one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system—or other apparatus adapted for carrying out the methods described herein—is suited.
- a typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
- routines executed to implement the embodiments of the present invention may be referred to herein as a “program.”
- the computer program typically is comprised of a multitude of instructions that will be translated by the native computer into a machine-readable format and hence executable instructions.
- programs are comprised of variables and data structures that either reside locally to the program or are found in memory or on storage devices.
- various programs described herein may be identified based upon the application for which they are implemented in a specific embodiment of the invention. However, it should be appreciated that any particular program nomenclature that follows is used merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Complex Calculations (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method, system, and computer readable medium for identifying local patterns in at least one time series data stream are disclosed. The method comprises generating multiple ordered levels of hierarchal approximation functions. The multiple ordered levels are generated directly from at least one given time series data stream including at least one set of time series data. The hierarchical approximation functions for each level of the multiple levels is based upon creating a set of approximating functions. The hierarchical approximation functions are also based upon selecting a current window with a current window length from a set of varying window lengths. The current window is selected for a current level of the multiple levels.
Description
- This invention was made with Government support under Contract No.: H98230-05-3-0001 awarded by Intelligence Agencies. The Government has certain rights in this invention.
- The present invention generally relates to the field of stream processing systems, and more particularly relates to identifying optimal local patterns in a data stream time series.
- Data streams have recently received much attention in several communities (e.g., theory, databases, networks, data mining) because of several important applications (e.g., network traffic analysis, moving object tracking, financial data analysis, sensor monitoring, environmental monitoring, scientific data processing). Many recent efforts concentrate on summarization and pattern discovery in time series data streams. Some of these recent efforts are further described in (Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In VLDB, 2002; T. Palpanas, M. Vlachos, E. J. Keogh, D. Gunopulos, and W. Truppel. Online amnesic approximation of streaming time series. In ICDE, 2004; S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, unsupervised stream mining. VLDB J.,13(3), 2004; M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD, 2004; K. Chakrabarti, E. Keogh, S. Mehotra, and M. Pazzani. Localy adaptive dimensionality reduction for indexing large time series databases. TODS, 27(2), 2002; P. Patel, E. Keogh, J. Lin, and S. Lonardi. Mining motifs in massive time series databases. In ICDM, 2002; B. Chiu, E. Keogh, and S. Lonardi. Probabilistic discovery of time series motifs. In KDD, 2003.).
- Typical approaches for pattern discovery and summarization of time series rely on fixed transforms, with a predetermined set of bases or approximating functions, as described in (S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, unsupervised stream mining. VLDB J., 13(3), 2004; M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD, 2004, Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In VLDB, 2002; T. Palpanas, M. Vlachos, E. J. Keogh, D. Gunopulos, and W. Truppel. Online amnesic approximation of streaming time series. In ICDE, 2004, and K. Chakrabarti, E. Keogh, S. Mehotra, and M. Pazzani. Localy adaptive dimensionality reduction for indexing large time series databases. TODS, 27(2), 2002). For example, the short-window Fourier transform uses translated sine waves of fixed length, and has been successful in speech processing, as is further described in (M. R. Portnoff. Short-time Fourier analysis of sampled speech. IEEE Trans. ASSP, 29(3), 1981). Wavelets use translated and dilated sine-like waves and have been successfully applied to more bursty data, such as images and video streams However, these approaches assume a fixed-length, sliding window. For example, short-window Fourier cannot reveal anything about-periods larger than the sliding window length. Wavelets are by nature multi-scale, but they still use a fixed set of bases, which is also often hard to choose.
- In time series stream methods, the work described in “A multiresolution symbolic representation of time series” by Megalooikonomou, Wang, Li, and Faloutsos, in ICDE 2005: 668-679 produces a single representative for a set of scales, using vector quantization within each scale. Its main focus is on finding good-quality and intuitive distance measures for indexing and similarity search. However, this approach does not produce a window size. The window sizes are chosen a priori. Also, this approach it is not applicable to streams, it is severely restricted in the type of approximation (each window is approximated by a discrete value, based on the vector quantization output) and hence the method cannot be composed so the next level reuses the approximations of the previous level.
- The work described in “A data compression technique for sensor networks with dynamic bandwidth allocation” by Lin, Gunopulos, Kalogeraki, and Lonardi, in TIME 2005: 186-188 also uses vector quantization in order to reduce power consumption for wireless sensor networks. This approach only examines a single, a priori chosen window size.
- The work in “Knowledge discovery from heterogeneous dynamic systems using change-point correlations” by Idé and Inoue, in SDM 2005: 571-576) employs a similar technique for change point detection. The change point scores are then used to correlate complex time series. This approach examines only a single, a priori chosen window size, and the computation required is too costly to be feasible in a streaming environment.
- Therefore a need exists to overcome the problems with the prior art as discussed above.
- Briefly, in accordance with the present invention, disclosed are a method, system, and computer readable medium for identifying local patterns in at least one time series data stream. The method includes generating multiple ordered levels of hierarchal approximation functions. The multiple ordered levels are generated directly from at least one given time series data stream including at least one set of time series data. The hierarchical approximation functions for each level of the multiple levels is based upon creating a set of approximating functions. The hierarchical approximation functions are also based upon selecting a current window with a current window length from a set of varying window lengths. The current window is selected for a current level of the multiple levels.
- In another embodiment of the present invention, a system for identifying local patterns in at least one time series data stream is disclosed. The system includes at least one information processing system. The information processing system includes a memory and a data stream analyzer communicatively coupled to the memory. The data stream analyzer generates multiple ordered levels of hierarchal approximation functions directly from at least one given time series data stream including at least one set of time series data. The hierarchical approximation functions for each level of the multiple levels is based upon creating a set of approximating functions and selecting a current window with a current window length from a set of varying window lengths. The current window is selected for a current level of the multiple levels.
- In yet another embodiment, a computer readable medium for identifying local patterns in at least one time series data stream. The computer readable medium includes instructions for generating multiple ordered levels of hierarchal approximation functions. The multiple ordered levels are generated directly from at least one given time series data stream including at least one set of time series data. The hierarchical approximation functions for each level of the multiple levels is based upon creating a set of approximating functions. The hierarchical approximation functions are also based upon selecting a current window with a current window length from a set of varying window lengths. The current window is selected for a current level of the multiple levels.
- One advantage of the present invention is that an optimal orthonormal transform is determined from data itself, as opposed to using a predetermined basis or approximating function (such as piecewise constant, short-window Fourier or wavelets). Another advantage of the present invention is that it provides a hierarchical, recursive summarization or approximation of the stream that examines the time series at multiple time scales (i.e., window sizes) and efficiently discovers the key patterns in each, as well as the key windows. Besides providing insight about the behavior of the time series by concisely describing the main trends in a time series, the discovered patterns can also be used to facilitate further data processing.
- The accompanying figures where like reference numerals refer to identical or functionally similar elements throughout the separate views, and which together with the detailed description below are incorporated in and form part of the specification, serve to further illustrate various embodiments and to explain various principles and advantages all in accordance with the present invention, in which:
-
FIG. 1 is a diagram illustrating an exemplary stream processing system, according to an embodiment of the present invention; -
FIG. 2 is a diagram illustrating a more detailed view of an information processing system, according to an embodiment of the present invention; -
FIG. 3 is an exemplary x-y graph illustrating singular value decomposition, according to an embodiment of the present invention; -
FIG. 4 illustrates local patterns for an exemplary fixed window; -
FIG. 5 is an exemplary graph illustrating a power profile a sine wave, according to an embodiment of the present invention; -
FIG. 6 illustrates multi-scale pattern discovery, according to an embodiment of the present invention; -
FIG. 7 is a simplified version of the multi-scale pattern discovery illustrated inFIG. 6 ; and -
FIG. 8 an operational flow diagram illustrating an exemplary process of identifying local patterns in one or more time series data streams, according to an embodiment of the present invention. - The present invention as would be known to one of ordinary skill in the art could be produced in hardware or software, or in a combination of hardware and software. However in one embodiment the invention is implemented in software. The system, or method, according to the inventive principles as disclosed in connection with the preferred embodiment, may be produced in a single computer system having separate elements or means for performing the individual functions or steps described or claimed or one or more elements or means combining the performance of any of the functions or steps disclosed or claimed, or may be arranged in a distributed computer system, interconnected by any suitable means as would be known by one of ordinary skill in the art.
- According to the inventive principles as disclosed in connection with the preferred embodiment, the invention and the inventive principles are not limited to any particular kind of computer system but may be used with any general purpose computer, as would be known to one of ordinary skill in the art, arranged to perform the functions described and the method steps described. The operations of such a computer, as described above, may be according to a computer program contained on a medium for use in the operation or control of the computer, as would be known to one of ordinary skill in the art. The computer medium, which may be used to hold or contain the computer program product, may be a fixture of the computer such as an embedded memory or may be on a transportable medium such as a disk, as would be known to one of ordinary skill in the art.
- The invention is not limited to any particular computer program or logic or language, or instruction but may be practiced with any such suitable program, logic or language, or instructions as would be known to one of ordinary skill in the art. Without limiting the principles of the disclosed invention any such computing system can include, inter alia, at least a computer readable medium allowing a computer to read data, instructions, messages or message packets, and other computer readable information from the computer readable medium. The computer readable medium may include non-volatile memory, such as ROM, Flash memory, floppy disk, Disk drive memory, CD-ROM, and other permanent storage. Additionally, a computer readable medium may include, for example, volatile storage such as RAM, buffers, cache memory, and network circuits.
- Furthermore, the computer readable medium may include computer readable information in a transitory state medium such as a network link and/or a network interface, including a wired network or a wireless network that allows a computer to read such computer readable information. The present invention, according to an embodiment, overcomes problems with the prior art by providing a more efficient mechanism for memory copy operations. The present invention allows the processor to continue executing subsequent instructions during a memory copy operation thereby avoiding unnecessary processor downtime.
- Exemplary Stream Processing System
- According to an embodiment of the present invention, as shown in
FIG. 1 , an exemplarystream processing system 100 is shown. Thestream processing system 100, in one embodiment determines optimal local patterns that describe the main trends in a time series data stream. Thestream processing system 100, in one embodiment, examines the time series at multiple scales (i.e. window sizes) and efficiently discovers the key patterns in each. Thestream processing system 100 also selects a window size out of a set of window sizes that most concisely captures the key oscillatory as well as aperiodic trends. In one embodiment, thestream processing system 100 is a distributed system that can operate in an SMP computing environment. It should be noted that the present invention is not limited to a distributed processing system and it is within the true scope and spirit of the invention to be implemented on other processing architectures. - The
stream processing system 100, in one embodiment, includes data streams 140, 142, 144, which in one embodiment, are time series data streams comprising one or more sets of time series data. Thestream processing system 100 can execute on a plurality ofprocessing nodes network adapters processing node operating system image channel controller memory system memory bus output bus O adapters network adapter processor processing node processing node various processing nodes - Exemplary Information Processing System
-
FIG. 2 is a block diagram illustrating a more detailed view of theinformation processing system 102, according to the present invention. Although the following discussion is with respect to theprocessing node 102, it is likewise applicable to theother processing nodes 104 inFIG. 1 . Theinformation processing system 102 is based upon a suitably configured processing system adapted to implement the exemplary embodiment of the present invention. Any suitably configured processing system is similarly able to be used as theinformation processing system 102 by embodiments of the present invention, for example, a personal computer, workstation, or the like. Theinformation processing system 102 includes acomputer 202. Thecomputer 202 has aprocessor 122 that is connected to themain memory 118 and thechannel controller 114 via the system bus 230. Thecomputer 202 also includes amass storage interface 204,terminal interface O adapter 134, andnetwork adapter hardware 106. Themass storage interface 204 is used to connect mass storage devices, such asdata storage device 208, to theinformation processing system 102 system. One specific type of data storage device is a computer readable medium such as a CD drive, which may be used to store data to and read data from a CD orDVD 210 or floppy diskette CD (not shown). Another type of data storage device is a data storage device configured to support, for example, NTFS type file system operations. -
main memory 118 comprises adata stream analyzer 212 for determining optimal local patterns which describe the main trends in a timeseries data stream data stream analyzer 212, in one embodiment, includes anapproximation function estimator 214. Theapproximation function estimator 214, in one embodiment, determines appropriate approximating functions directly from data in a data stream. This process is discussed in greater detail below. - In one embodiment, the
data stream analyzer 212 implements a hierarchical, recursive summarization or approximation of adata stream local pattern identifier 216, as well as the key windows via awindow size comparator 218 and awindow size selector 220. Thelocal pattern identifier 216, in one embodiment, identifies locally optimal patterns within each window and thewindow size comparator 218, in one embodiment, comparator the information captured by each of these patterns across various windows sizes. - The
window size selector 220 then selects the optimal window sizes that most concisely capture the key oscillatory as well as periodic trends based on the window size comparison. The selection of the appropriate window sizes and approximating function can be performed across levels (i.e., not only within levels). The processes performed by thedata stream analyzer 212 and each of its components are discussed in greater detail below. Besides providing insight about the behavior of the time series by concisely describing the main trends in a time series, the discovered patterns can also be used to facilitate further data processing. Thedata stream analyzer 212 can also perform fast, incremental estimation in a streaming setting. - Although illustrated as concurrently resident in the
main memory 118, it is clear that respective components of themain memory 118 are not required to be completely resident in themain memory 118 at all times or even at the same time. In one embodiment, theinformation processing system 102 utilizes conventional virtual addressing mechanisms to allow programs to behave as if they have access to a large, single storage entity, referred to herein as a computer system memory, instead of access to multiple, smaller storage entities such as themain memory 118 anddata storage device 208. Note that the term “computer system memory” is used herein to generically refer to the entire virtual memory of theinformation processing system 102. - Although only one
CPU 122 is illustrated forcomputer 202, computer systems with multiple CPUs can be used equally effectively. Embodiments of the present invention further incorporate interfaces that each includes separate, fully programmed microprocessors that are used to off-load processing from theCPU 122.Terminal interface 206 is used to directly connect one ormore terminals 224 tocomputer 202 to provide a user interface to thecomputer 202. Theseterminals 224, which are able to be non-intelligent or fully programmable workstations, are used to allow system administrators and users to communicate with theinformation processing system 102. The terminal 224 is also able to consist of user interface and peripheral devices that are connected tocomputer 202 and controlled by terminal interface hardware included in the terminal I/F 206 that includes video adapters and interfaces for keyboards, pointing devices, and the like. - An operating system (not shown) included in the main memory is a suitable multitasking operating system such as the Linux, UNIX, Windows XP, and Windows Server 2001 operating system. Embodiments of the present invention are able to use any other suitable operating system. Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system (not shown) to be executed on any processor located within the
processing node 102. Thenetwork adapter hardware 206 is used to provide an interface to anetwork 226. Embodiments of the present invention are able to be adapted to work with any data communications connections including present day analog and/or digital techniques or via a future networking mechanism. - Although the exemplary embodiments of the present invention are described in the context of a fully functional computer system, those skilled in the art will appreciate that embodiments are capable of being distributed as a program product via floppy disk,
e.g. CD 210 and its equivalents, floppy disk (not shown), or other form of recordable media, or via any type of electronic transmission mechanism. - Exemplary Notation To Be Used Throughout The Following Discussion
- Throughout the following discussion boldface lowercase letters are used for column vectors, v≡[v1v2 . . . vn ]T ε Rn, and boldface capital letters for matrices, A ε Rm×n. The notation a(i) is adopted for the columns A≡[a1 a2 . . . a3] and a(i) is adopted for the rows of A≡[a(1) a(2) . . . am]T. Note that a(i) are also column vectors, not row vectors. Throughout this discussion, vectors are represented as column vectors. For matrix/vector elements, subscripts, a(i,j), or brackets, a[i,j] are used. The rows of A are points in an (at most) n-dimensional space, a(i) ε Rn which is the row space of A. A “special” orthonormal basis for the row space can be found that defines a new coordinate system as shown in
FIG. 3 . For example,FIG. 3 illustrates singular value decomposition (“SVD”) (for dimension n=2), with respect to row space. Each point inFIG. 3 corresponds to a row of the matrix A and vj, j=1, 2 are the left singular vectors of A. The square is the one dimensional approximation of a(i) by projecting it-onto the first singular vector. - If vj is a unit-length vector defining one of the axes in the row space, then for each row a(i), it's j-th coordinate in the new axes is the dot product a(i) T vj=:pij so that, if V:=[v1 . . . vr] and we define P :=AV, then each row of P is the same point as the corresponding row of A but with respect to the new coordinate system. Therefore, lengths and distances are preserved, i.e. ∥a(i)∥=∥p(i)∥ and ∥a(i)−a(j)∥=∥p(i)−p(j)∥, for all 1≦i,j≦m.
- However, the new coordinate system is “special” in the following sense. If only the first k columns of P (e.g. a matrix named {tilde over (P)} thus effectively projecting each point into a space with lower dimension k. Also, rows of the matrix Ã:={tilde over (P)}{tilde over (V)}T(EQ 1) are the same points translated back into the original coordinate system of the row space, e.g. the square 302 in
FIG. 3 if k=1), where {tilde over (V)} comprises the first k columns of V. Then {tilde over (P)}. maximizes the sum of squares ∥{tilde over (P)}∥F 2=Σi,j=1 m,kpij −2 or, equivalently, minimizes the sum of squared residual distances (e.g. the thick dotted 304 line inFIG. 3 , if k=1), ∥X−{tilde over (X)}∥F 2=Σi=1 m∥x(i)−{tilde over (x)}(i)∥2. - Therefore, from the point of view of the row space A=P VT. The same can be done for the column space of A and get A=UQT, where U is also column-orthonormal, like V. It turns out that U and V have a special significance, which is formally stated as follows:
- (Singular Value Decomposition)
- Every matrix A ε Rm×n can be decomposed into A=U Σ VT where U ε Rm×r, V ε Rn×r, and Σ ε Rr×r, with r≦min(m, n) the rank of A. The columns vi of V≡[v1 . . . vr]are the right singular vectors of A and they form an orthonormal basis its row space. Similarly, the columns ui of U≡[u1 . . . ur] are the left singular vectors and form a basis of the column space of A. Finally, Σ≡diag[σ1 . . . σr] is a diagonal matrix with positive values σi, called the singular values of A.
- From the above, the matrix of projections P is P=U Σ. Next, the properties of a low-dimensional approximation can be formally stated using the first k singular values (and corresponding singular vectors) of A:
- (Low-Rank Approximation)
- If only the singular vectors corresponding to the k highest singular values (k<r), i.e. if Ũ:=[u1 u2 . . . uk], {tilde over (V)}:=[v1 v2 . . . vk] and {tilde over (Σ)}=diag[σ1σ2 . . . σk] are kept, then Ã=Ũ{tilde over (Σ)}{tilde over (V)}T is the best approximation of A, in the sense that it minimizes the error ∥A−Ã∥F 2:=Σi,j=1 m,n|aij−ãij|2=Σi=k+1 rσi 2 (EQ 2). In equation (1), note the special significance of the singular values for representing the approximation's squared error. Furthermore, since U and V are orthonormal, Σi=1 rσi 2=∥A∥F 2 and Σi=1 kσi 2=∥Ã∥F 2 (EQ 3). The following table, Table 1, references common notations that are used throughout this discussion.
-
TABLE 1 Frequently used notation. SYMBOL DESCRIPTION y Vector A ∈ Rn (lowercase bold), always column vectors. A Matrix A ∈ Rm×n (uppercase bold). aj j-th column of matrix A a(i) i-th row of matrix A, as a column vector. ||y|| Euclidean norm of vector y. ||A||F Frobenius norm of matrix A, ||A||F 2 = Σi,j=1 m,nai,j 2. X Time series matrix, with each row corresponding to a timestamp. X(w) The delay coordinates matrix corresponding to X, for window w. V(w) Right singular vectors of X(w). Σ(w) Singular values of X(w). U(w) Left singular vectors of X(w). k Dimension of approximating subspace. {tilde over (V)}(w) Same as V(w), Σ(w), U(w) but only with k highest {tilde over (Σ)}(w) singular values and vectors. Ũ(w) {tilde over (P)}(w) Projection of X(w) onto the first k right singular vectors, {tilde over (P)}(w) := Ũ(w) {tilde over (Σ)}(w) = X(w) {tilde over (V)}(w). {tilde over (V)}(w 0 ,l)Hierarchical singular vectors, values and {tilde over (Σ)}(w 0 ,l)projections, for the k highest singular values. Ũ(w 0 ,l){tilde over (V)} 0(w 0 ,l)Hierarchically estimated patterns (bases). - Preliminaries
- (Fixed-Window Optimal Patterns)
- As discussed above, for a given a time series xt, t=1,2, . . . and a window size w, the
data stream analyzer 212 find the patterns that best summarize the series at this window size. The patterns, in one embodiment, are w-dimensional vectors vi≡[vi,1, . . . vi,w]T ε Rw chosen so that they capture “most” of the information in the series. The process of choosing the patterns is discussed in greater detail below. However, in one embodiment, the right window size is not known a priori. Therefore, with respect to multi-scale pattern estimation thedata analyzer 212 finds (i) the optimal patterns for each of these, and (ii) the best window w* to describe the key patterns in the series given a time series x, and a set of windows W :={w1, w2, w3, . . . }. - To find these patterns the concept of time-delay coordinates is introduced. For a time series xt, t=1,2, . . . with m points seen so far, when looking for patterns of length w, the series are divided into consecutive, non-overlapping subsequences of length w. Thus, if the original series is a m×1 matrix (not necessarily materialized), it is substituted
-
- Instead of m scalar values there is a sequence of m/w vectors with dimension w. Patterns are searched for among these time-delay vectors.
- (Delay Coordinates)
- Given a sequence x≡[x1, x2, . . . , xt, . . . , xm]Tand a delay (or window)w, the delay coordinates are a┌m/w┐x w matrix with the t′-th row equal to X(t′) (w):=[x(t′−1)w+1, x(t′−1)w+2, . . . , xt′w]T. It should be noted that neither x nor X(w) need to be fully materialized at any point in time. In one embodiment, the last row of X(w) is stored. Note that non-overlapping windows are chosen. However, overlapping windows can also be chosen, X(w) has m−w+1 rows, with row t comprising values xt, xt+1, . . . xt+w. In this case, there are some subtle differences as is further described in (M. Ghil, M. Allen, M. Dettinger, K. Ide, D. Kondrashov, M. Mann, A. Robertson, A. Saunders, Y. Tian, F. Varadi, and P. Yiou. Advanced spectral methods for climatic time series. Rev. Geophys., 40(1), 2002.), which is hereby incorporated by reference in its entirety. The subtle differences are akin to the differences between “standard” wavelets and maximum-overlap or redundant wavelets, which is further described in (D. B. Percival and A. T. Walden. Wavelet Methods for Time Series Analysis. Cambridge Univ. Press, 2000.) and is hereby incorporated by reference in its entirety.
- However, in one embodiment, non-overlapping windows are equally effective for pattern discovery and also lend themselves better to incremental, streaming estimation using limited resources. More generally, the original time series does not have to be scalar, but can also be vector-valued itself. The same process is performed, but only each row of X(w) is now a concatenation of rows of X(instead of a concatenation of scalar values). More precisely, the general time-delay coordinate matrix is constructed as follows:
- The following is pseudo code for DELAY (X εRm×n, w), which concatenates w consecutive rows of X (comprising of n numbers) into one row of X(w) (comprising of n×w numbers).
-
m′ ← └m/w┘and n′ ← nw Output is X(w) ∈Rm′×n′ {not necessarily materialized} for t = 1 to m′ do Row X(t) (w) ← concatenation of rows X((t−1)w+1), X((t−1)w+2),...X(t,w) end for - (Incremental SVD)
- Batch SVD algorithms are usually very costly. For an m x n matrix A, even finding only the highest singular value and corresponding singular vector needs time O(n2m), where n<m. Aside from computational cost, the SVD updated is incrementally updated as new rows are added to A. SVD update algorithms such as those described in (M. Brand. Fast online SVD revisions for lightweight recommender systems. In SDM, 2003 and S. Guha, D. Gunopulos, and N. Koudas. Correlating synchronous and asynchronous data streams. In KDD, 2003.), which are hereby incorporated by reference in their entirety can support both row additions as well as deletions. However, besides the right singular vectors vi, both of these approaches need to store the left singular vectors ui (whose size is proportional to the time series length).
- An exemplary SVD update algorithm is shown below.
-
- The above SVD update algorithm is only used as an example and in no way limits the present invention. The above algorithm does not need to store the left singular vectors. Because the
data stream analyzer 212 finds patterns at multiple scales without an upper bound on the window size, an algorithm that does not need to store the left singular vectors is a suitable choice. However, the SVD update algorithm is not limited to an algorithm that does not need to store the left singular vectors. Furthermore, if more emphasis is needed to be placed on recent trends, an exponential forgetting scheme can be incorporated. For each new row, the algorithm updates k-n numbers. Therefore, in one embodiment, the total space requirements are O(nk) and the time per update is also O(nk). Finally, the incremental update algorithms, in one embodiment, need only the observed values and can therefore easily handle missing values by imputing them based on current estimates of the singular vectors. - Identifying Locally Optimal Patterns
-
FIG. 4 illustrates local patterns for a fixed window. InFIG. 4 , the window length is w=4. It should be noted that the length w=4 is only for illustrative purposes. Starting with the original time series x inFIG. 4 , X(w) is transferred to time-delay coordinates. The local patterns, in one embodiment, are the right singular vectors of X(w), which are optimal in the sense that they minimize the total squared approximation error of the rows X(t) (w). An exemplar algorithm for identifying local patterns is given below. - Local Pattern (x ε Rm, w, k=3)
-
LOCAL PATTERN ( x ∈ Rm, w, k = 3) Use delay coord.X(w) ← DELAY( x, w ) Compute SVD of X(w) = U(w) Σ(w) V(w) Local patterns are v1 (w),...,vk (w) Power is π(w) ← Σi=k+1 wσi 2 / w(Σt=1 mxt 2 − Σi=1 kσi 2)/ w {tilde over (P)}(w) ← Ũ(w) {tilde over (Σ)}(w){low − dim.proj.onto local patterns} return {tilde over (V)}(w), {tilde over (P)}(w), {tilde over (Σ)}(w) and π(w) - The projections {tilde over (P)}(w) onto the local patterns {tilde over (v)}(i) are discussed in greater detail below. Note that the above algorithm for identifying local patterns can be applied in general to n-dimensional vector-valued series. The pseudocode is the same, since the DELAY algorithm discussed above can also operate on matrices X ε Rm×n. In one embodiment, the first argument of LocalPattern may be a matrix, with one row x(t) ε Rn per timestamp t=1,2, . . . , m.
- In one embodiment, when computing the SVD, the highest k singular values and the corresponding singular vectors can be used because only {tilde over (V)}(w) and {tilde over (P)}(w) are returned. Therefore, the process of computing the full SVD can be avoided and more efficient algorithms, just for the quantities that are actually needed can be used. Also, note that {tilde over (Σ)}(w) can be computed from {tilde over (P)}(w), since by construction σi 2=∥pi∥2=Σj=1 mpji 2 (EQ 4). However, these are returned separately, which avoids duplicate computation. Furthermore, equation (3) does not hold exactly for the estimates returned by IncrementalSVD and, in one embodiment, it is better to use the estimates of the singular values σi 2 computed as part of IncrementalSVD.
- In one embodiment, a default value of k=3 local patterns, although in another embodiment, an energy-based criteria to choose k can be chosen. It should be noted that the present invention is not limited to k=3, as k can be greater than 3 or less than 3. In one
embodiment 3 or fewer patterns are sufficient because the first pattern captures the average trend (aperiodic if present) and the next two capture the main low-frequency and high frequency periodic trends. These trends are further described in (M. Ghil, M. Allen, M. Dettinger, K. Ide, D. Kondrashov, M. Mann, A. Robertson, A. Saunders, Y. Tian, F. Varadi, and P. Yiou. Advanced spectral methods for climatic time series. Rev. Geophys., 40(1), 2002.), which is hereby incorporated by reference in its entirety. For a single window w, batch algorithms for computing the k highest singular values of a m×n matrix (n<m) are 0(kmn2). Therefore, for window size w the time complexity is -
- Once optimal patterns for a number of different windows sizes have been determined, the
data stream analyzer 212 determines which of these windows best describes the main trends. Intuitively, if there is a trend that repeats with a period of T, then different subsequences in the time-delay coordinate space should be highly correlated when w≈T. This is illustrated inFIG. 5 , which shows apower profile 500 of sine wave xt=sin(2πt/50)+εt, with Gaussian noise εt˜N(5,0.5). It should be noted the trends can be arbitrary andFIG. 5 shows a sine wave only as an example. Theplot 502 inFIG. 5 shows the squared approximation error per window element, using k=1 pattern on a sine wave with period T=50. As expected, for window size w=T=50 the approximation error drops sharply and essentially corresponds to the Gaussian noise floor. For windows w=iT that are multiples of T, the error also drops. Also the error for all windows is proportional to 1/w, since it is per window element. Eventually, for window size equal to the length of the entire time series w=m(not shown inFIG. 5 , where m=2000), π(m)=0 since first pattern is the only singular vector, which coincides with the series itself. Therefore, the residual error is zero. - Formally, the squared approximation error of the time-delay matrix X(w) is ε(w):=Σt∥{tilde over (x)}(t) (w)−x(t) (w)∥2=∥{tilde over (X)}(w)−X(w)∥F 2, where {tilde over (X)}(w):={tilde over (P)}(w)({tilde over (V)}(w))T is the reconstruction, (see Equation 1). From
Equations -
- In other words, this is an estimate of the variance per dimension, assuming that the discarded dimensions correspond to isotropic Gaussian noise (i.e., uncorrelated with same variance in each dimension). The variance is lower when w=T, where T is the period of an arbitrary main trend.
- The following lemma follows from the above observations. Note that the conclusion is valid both ways, i.e., perfect copies imply zero power and vice versa. Also, the conclusion holds regardless of alignment (i.e., the periodic part does not have to start at the beginning of a windowed subsequence). A change in alignment only affects the phase of the discovered local patterns, but not their shape or the reconstruction accuracy.
- (Zero Power)
- If x ε R′ comprises of exact copies of a subsequence of length T then, for every number of patterns k=1, 2, . . . and at each multiple of T, π(iT)=0, i=1,2, . . . , and vice versa. In general, if the trend does not comprise of exact copies, the power is not zero, but it still exhibits a sharp drop. This fact is used when choosing the “best” window.
- Choosing The Window
- The following are exemplary steps for interpreting the power profile to choose the appropriate window that best captures the main trends, according to one embodiment. The power profile π(w) versus w is determined. The first window w0* that exhibits a sharp drop π(w0*) is identified and all other drops occurring at windows w≈iw0*, i=2,3, . . . that are approximately multiples of w0* are ignored. If there are several sharp drops at windows wi* that are not multiples of each other, then any of these is suitable. In one embodiment, the smallest one is chosen. In another embodiment, the window wi* is chosen based on prior knowledge about the domain if available. If no sharp drops exist, then no strong periodic/cyclic components are present. However, the local patterns at any window can still be examined to gain a picture of the time series behavior.
- Multiple Scale Patterns
- As discussed above, the
data stream analyzer 212 determines the optimal local patterns for multiple windows (as well as the associated power profiles) in order to determine the optimal window size. The following discussion further discusses this process in more detail. In one embodiment, a geometric progression of window sizes is chosen. Rather than estimating the patterns for windows of length w0, w0+1, w0+2, w0+3, . . . , the patterns, in one embodiment, are estimated for windows of w0, 2w0, 4w0, . . . , or more generally, for windows of length wt:=w0·Wl for l=0,1,2, . . . . Thus, the size of the window set W needed to be examined is dramatically reduced. However, this is still computationally expensive (for each window O(ktw)time is still needed) and all points (needed for large window sizes, close to the time series length) are required to be buffered. However, this complexity can be reduced even further. - For example,
FIG. 6 shows a multi-scale pattern discovery (hierarchical, where w0=4, W=2, k=2).FIG. 7 shows a more simplified version of the multi-scale pattern discovery ofFIG. 6 . As an example, let k=2 local patterns for a window size of w0=100 and the patterns desired to be determined for window w(100.1)=100·21=200. The naïve approach is to construct X(200) from scratch and determine the SVD. However, the patterns found from X(100) can be reused. Using k=2 patterns v1 (100) and v2 (100), the first w0=100 points x1, x2, . . . , x(100) can be reduced into just two points, namely their projections p1,1 (100) and p1,2 (100) onto v1 (100) and v2 (100), respectively. Similarly, the next w0=100 points x101, x102, . . . , x200 can also be reduced into two numbers p2,1 (100) and p2,2 (100), and so on. These projections, by construction, approximate the original series well. Therefore, the first row x(1) (200)≡[x1, . . . , x200]T ε R200 of X(200) can be represented with just four numbers, x(1) (100,1)≡[p1,1 (100), p1,2 (100), p2,1 (100), p2,2 (100)]T ε R4. Doing the same for the other rows of X(200), a matrix X(100,1) with just n=4 columns can be constructed. The local patterns are determined using X(100,1) instead of X(200). Repeating this process recursively allows for the local patterns for window w(100,2)=100·22=400 and so on to be determined. - As stated above,
FIG. 7 shows a more simplified version of the process discussed above with respect toFIG. 6 .FIG. 7 shows adata stream 700 that comprises a set of times series data. The time series is broken into a given number ofhierarchical levels local approximations 708 are determined directly from the data in thefirst data stream 700. From thelocal approximations 710, approximating functions and awindow size 710 and anapproximation error 712 can be determined. The nexthierarchical level 704, in one embodiment, can be passed the local approximation results such as the approximation functions andwindow size 710 from theprevious level 702 for determining the local approximations for the nexttime series subset 714. - The local approximations 716 for the
time series subset 714, in one embodiment, results in approximation functions and a window size(s) 718 and anapproximation error 720. The approximation functions and window size(s) 718, in one embodiment, can be passed to thenext level 706 for determining a local approximation(s) 724 for the nexttime series subset 722. This process can be continued for each time series subset. - (Level-(w0,l)Window)
- The level-(w0,l) window, in one embodiment, corresponds to an original window size (or scale) wl:=w0·Wl. Patterns at each level l are found recursively, using patterns from the previous
level l− 1. In the above example, w0=100 and l=0,1. Since w0 and Ware fixed for a particular sequence of scales w, only level-l windows and patterns, in one embodiment, need to be referred to. The recursive construction is based on the level-l delay matrix and corresponding patterns. - (Level-/Delay Matrix X(w
0 ,l)) - Given a starting window w0 and a scale factor W, the level-l delay matrix is simply X(w
0 ,0):=X(w0 ) for l=0 and for l=1,2, . . . is recursively defined by X(w0 l):=DELAY({tilde over (P)}(wo ,l−10,W) where P(wo ,l):=X(w0 ,l){tilde over (V)}(w0 ,l) is the projection onto the level-l patterns {tilde over (V)}(w0 ,l) which are found based on X(w0 ,l). The level-l delay matrix is an approximation of the delay matrix X(wl) for window size wl=w0Wl. In the above example, the patterns extracted from X(100,1) are four-dimensional vectors vi (100,1) ε R4, whereas the patterns for X(200) are 200-dimensional vectors vi (200) ε R(200). However, v(100,1) and vi (100,0)≡vi (100) can be combined to estimate v(200). - (Level-l Local Pattern)
V 0i (w0 ,l) - The level-
l pattern V 0i (w0 ,l), for all l=1,2, . . . , k, corresponding to a window of wl=w0Wl is simply v 0i (w0 ,0):=vi (w0) for l=0 and for l=1, 2, . . . The level-l, in one embodiment, is defined recursively by -
- (EQ 6) for j=1, 2, . . . , W. It is an approximation of the local patterns vi (w
l ) of the original delay matrix X(wl), for window size wl=w0W1. Considerv 01 (100,1) the above example. The first k=2 out of kW=4 numbers in v1 (100,1) approximate the patterns among the 2-dimensional vectors pj (100,0), which in turn capture patterns among the 100-dimensional vectors xi (100,0) of the original time-delay matrix. Thus, but forming the appropriate linear combination of the 100-dimensional patterns vi (100,0)≡v 0i (100,0) (i.e., the columns of {tilde over (V)}(100,0)≡V 0(100,0), weighted according to v1 (100,1)[1:2], the first half of the 200-dimensional pattern v 01 (100,1)[1:100]can be constructed (left-slanted entries inFIG. 6 ). - Similarly, a linear combination of the columns {tilde over (V)}(100,0)≡
V 0(100,0) weighted according to v1 (100,1)[3:4]gives the second half of the 200-dimensional pattern v 01 (200,1)[101: 200](right slanted entries inFIG. 6 ). For level l=2 the columns ofV 0(100,1) are combined according to v1 (100,2)[1:2] (for the first half, v 01 (100,2)[1:200]) and v1 (100,2)[3:4] (for the second half, v 01 (100,2)[201:400]) and son on, for the higher levels. - Lemma 1 (Orthonormality of v 0i (w
0 ,l)) - For
v 0i (wo ,l)=1 and, for i≠j, (v 0i (w0 ,l))T (v 0j (w0 ,l))=0, where i,j=1,2, . . . , k. PROOF, for level l=0 they are orthonormal since they coincide with the original patterns vi (w0 ) which are by construction orthonormal. The process proceeds by induction on the level of l≧1. Without loss of generality, assume that k=2 and, for brevity, letB≡V 0(w0 ,l−1) and bi,1≡vi (w0 ,l)[1:k] so that bi,2≡vi (w0 ,l)[k+1:k], vi (w0 ,l)=[bi,1, b1,2]Then -
- since B preserves dot products as an orthonormal matrix (by inductive hypothesis) and vi (w
0 ,l) are orthonormal by construction. In one embodiment, the maximum level L is determined based on the length m of the time series so far, L≠ logW(m/w0). - The detailed hierarchical SVD algorithm is shown below:
-
(HIERARCHICAL ( x ∈ Rm, w0, W, L, k = 6)) {Start with level l = 0, corresponding to window w0} {tilde over (V)}(w 0 ,0), {tilde over (P)}(w0 ,0), {tilde over (Σ)}(w0 ,0), π(w0 ,0) ←LOCALPATTERN ( x, w0, k) {Levels l, corresponding to wondow wl = w0 · W1 } for level l = 1 to L do {tilde over (V)}(w 0 ,l), {tilde over (P)}(w0 ,l), {tilde over (Σ)}(w0 ,l), π(w0 ,1) ←LOCALPATTERN({tilde over (P)}(w 0 ,l−1), W, k)Compute patterns v0i (w 0 ,l) for window size wl are basedon Equation (6) end for - Choosing the Initial Window
- The initial window w0 has some impact on the quality of the approximations. This also depends on the relationship of k to w0 (the larger k is, the better the approximation and if k=w0 then {tilde over (P)}(w
0 ,1)=X(w0 ) i.e., no information is discarded at the first level). However, in one embodiment k is desired to be relatively small since it determines the buffering requirements of the streaming approach. Hence, in one embodiment, k=6. However, the present invention is not limited to k=6 and, in one embodiment, an energy-based thresholding, which can be done incrementally, can be chosen. - If w0 is too small, then too much of the variance is discarded too early. If w0 is unnecessarily big, this increases buffering requirements and the benefits of the hierarchical approach diminish. In one embodiment, a good compromise is a value in the range 10≦w0≦20. Finally, out of the six patterns that are kept per level, the first two or three are of interest and reported to the user. The remaining are kept to ensure that X(w
0 ,l) is a good approximation of X(wi ). - Choosing the Scales
- As discussed above, if there is a sharp drop of π(T) at window w=T, then drops at multiples w=iT, i=2,3, . . . are also observed. Therefore, in one embodiment, a few different starting windows w0 and scale factors W that are relatively prime to each other are chosen. In one example, the following three choices are sufficient to quickly zero in on the best windows and the associated optimal local patterns: k=6 and (W0, W)ε{(9,2), (10,2), (15,3)}.
- Complexity
- For a total of L≠ logW(t/w0)=O(log t)the first k singular values and vectors of X(w
0 ,l) ε Rt/(w0 Wl )x{dot over (W)}k are determined, for l=1,2, . . . . A batch SVD algorithm requires time -
- which is
-
- since k<w0. Summing over l=1, . . . , L, O(W2k2t) is obtained Finally, for l=0,
-
- is needed. Thus, the total complexity is O(W2k2t+kw0t). Since Wand w0 are fixed, the following is true:
- Lemma 2 (Batch Hierarchical Complexity)
- The total time for the hierarchical approach is O(k2t), i.e., linear with respect to the time series length. Even though this is an improvement over the O(t3k)time of the non-hierarchical approach, all of the points, in one embodiment, are buffered, as is discussed below.
- Streaming Computation
- As stated above, the
data stream analyzer 212 performs a hierarchical, recursive summarization or approximation of the stream that examines the time series at multiple time scales (i.e., window sizes) and efficiently discovers the key patterns in each, as well as the key windows. In this section, the procedure for examining the time series at multiple scales is discussed. In one embodiment, only one iteration of each loop in IncrementalSVD (for LocalPattern) and in Hierarchical is recursively invoked, as soon as the necessary number of points has arrived. Subsequently, these points are discarded and proceed with the next non-overlapping window. - Modifying Local Pattern
- In one embodiment, consecutive points of x(or, in general, rows of X) are buffered until w of them are accumulated thereby forming one row of X(w)). At that point, one iteration of the outer loop in IncrementalSVD is performed to update all k local patterns. Then, the w points (or rows) are discarded and proceed with the next w. Also, since on higher levels the number of points for SVD may be small and close to k, the first k rows of X(w) can be chosen to be initially buffered and used to bootstrap the SVD estimates, which are subsequently updated.
- Modifying Hierarchical
- For level l=0 the modified LocalPattern is used on the original series, as above. However, the k projections are stored onto the level-0 patterns. W consecutive sets of these projections are buffered and as soon as kW values accumulate, the k local patterns for level l=1 are updated. Then the kW projections from level-0 are discarded, but the k level-1 projections are kept. The same process is performed for all other levels l=2.
- Complexity
- Compared to the batch computation,
-
- time is needed to compute the first k singular values and vectors of X(w
0 ,l) for l=1,2, . . . . For l=0 -
- time is needed. Summing over l=0,1, . . . , LO(kt)is obtained. With respect to space, w0 points are buffered for l=0 and Wk points for each of the remaining L=O(log t) levels, for a total of O(k log t). Therefore, the following is true:
- Lemma 3 (Streaming, Hierarchical Complexity)
- Amortized cost is O(k) per incoming point and total space is O(k log t). Since k=6, the update time is constant per incoming point and the space requirements grow logarithmically with respect to the size t of the series. Table 2 below summarizes the time and space complexity for each approach.
-
Time Space Non-hier. Hier. Non-hier. Hier Batch O(t3k) O(tk2) all all Incremental O(t2k) O(tk) O(t) O(k log t) - Exemplary Process For Identifying Local Patterns In At Least One Time Series Data Stream
-
FIG. 8 shows an exemplary process of identifying local patterns in one or more time series data streams. The operational flow diagram ofFIG. 8 begins atstep 802 and flows directly to step 804. Theinformation processing system 102, atstep 804, receives a data stream comprising a set of time series data. Thedata stream analyzer 212, atstep 806, breaks the set of time series data into a given number of hierarchical levels l. Thedata stream analyzer 212, atstep 808 creates a set of nested summaries for each of the hierarchical level l. - The creation of the nested summaries comprises the following. The
data stream analyzer 212, atstep 810, determines the approximation function(s) for a portion of the set of time series data directly from the time series data. Thedata stream analyzer 212, atstep 812, determines the approximation error between the current window and the set of time series data portion. Thedata stream analyzer 212, atstep 814, passes the approximation function(s) determined atstep 810 to the next hierarchical level as a set of time series data. The control flow then exits atstep 814. - Non-Limiting Examples
- The present invention can be realized in hardware, software, or a combination of hardware and software. A system according to a preferred embodiment of the present invention can be realized in a centralized fashion in one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system—or other apparatus adapted for carrying out the methods described herein—is suited. A typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
- In general, the routines executed to implement the embodiments of the present invention, whether implemented as part of an operating system or a specific application, component, program, module, object or sequence of instructions may be referred to herein as a “program.” The computer program typically is comprised of a multitude of instructions that will be translated by the native computer into a machine-readable format and hence executable instructions. Also, programs are comprised of variables and data structures that either reside locally to the program or are found in memory or on storage devices. In addition, various programs described herein may be identified based upon the application for which they are implemented in a specific embodiment of the invention. However, it should be appreciated that any particular program nomenclature that follows is used merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature.
- Although specific embodiments of the invention have been disclosed, those having ordinary skill in the art will understand that changes can be made to the specific embodiments without departing from the spirit and scope of the invention. The scope of the invention is not to be restricted, therefore, to the specific embodiments, and it is intended that the appended claims cover any and all such applications, modifications, and embodiments within the scope of the present invention.
Claims (20)
1. A method for identifying local patterns in at least one time series data stream, the method comprising:
generating multiple ordered levels of hierarchal approximation functions directly from at least one given time series data stream including at least one set of time series data, wherein the hierarchical approximation functions for each level of the multiple levels is based upon
creating a set of approximating functions; and
selecting a current window with a current window length from a set of varying window lengths, wherein the current window is selected for a current level of the multiple levels.
2. The method of claim 1 , wherein the generating multiple levels of hierarchical approximation functions includes generating multiple increasing consecutive numerically ordered levels.
3. The method of claim 1 , wherein the current window is a portion of the set of time series data divided into consecutive sub-sequences, and wherein the current window length along with the hierarchal approximating functions reduces an approximation error between the current window and the set of time series data portion.
4. The method of claim 1 , wherein the current window length for the current window is larger than a current window length for a previous window in the multiple levels of hierarchal approximation functions.
5. The method of claim 1 , wherein the time series data stream is divided into non-overlapping consecutive subsequences.
6. The method of claim 1 , wherein the hierarchical approximation functions for each of level of the multiple levels is based upon further creating a set of coefficients that summarize the set of time series data portion in the current window.
7. The method of claim 1 , wherein the hierarchical approximation functions for each level of the multiple levels is based upon reusing the set of approximating functions from the previous window.
8. The method of claim 1 , wherein the hierarchical approximation functions for each of level of the multiple levels is based reducing a total squared reconstruction error, via principal component analysis.
9. The method of claim 8 , wherein total squared reconstruction error is given by equation
Σi=k+1 wσi 2 /w=(Σt=1 m x t 2−Σi=1 kσi 2)/w
Σi=k+1 wσi 2 /w=(Σt=1 m x t 2−Σi=1 kσi 2)/w
wherein k is an number of patterns chosen {1 . . . k}, wherein the patterns are vectors equal to a window length w, where m is a multiple of the window size and wherein xt is the set of time series data portion, and wherein σ is a singular value for a given index i.
10. The method of claim 9 , further comprising using a subspace tracking algorithm to approximate incrementally at least one the following quantities
the window length w w,
singular values of a matrix σ,
local patterns for window length w,
local approximations for each window of length w, and k.
11. The method of claim 3 , further comprising:
calculating the approximation error between the current window and the set of time series data portion; and
basing the current window length thereon.
12. A system for identifying local patterns in at least one time series data stream, the system comprising:
at least one information processing system, wherein the information processing system includes:
a memory; and
a data stream analyzer communicatively coupled to the memory, wherein the data stream analyzer:
generates multiple ordered levels of hierarchal approximation functions directly from at least one given time series data stream including at least one set of time series data, wherein the hierarchical approximation functions for each level of the multiple levels is based upon
creating a set of approximating functions; and
selecting a current window with a current window length from a set of varying window lengths, wherein the current window is selected for a current level of the multiple levels.
13. The system of claim 12 , wherein the generation of multiple levels of hierarchical approximation functions includes generating multiple increasing consecutive numerically ordered levels.
14. The system of claim 12 , wherein the current window is a portion of the set of time series data divided into consecutive sub-sequences, and wherein the current window length along with the hierarchal approximating functions reduces an approximation error between the current window and the set of time series data portion.
15. The system of claim 12 , wherein the hierarchical approximation functions for each level of the multiple levels is based upon at least one of:
further creating a set of coefficients that summarize the set of time series data
portion in the current window;
reusing the set of approximating functions from the previous window; and
reducing a total squared reconstruction error, via principal component analysis.
16. The method of claim 14 , further comprising:
calculating the approximation error between the current window and the set of time series data portion; and
basing the current window length thereon.
17. A computer readable medium for identifying local patterns in at least one time series data stream, the computer readable medium comprising instructions for:
generating multiple ordered levels of hierarchal approximation functions directly from at least one given time series data stream including at least one set of time series data, wherein the hierarchical approximation functions for each level of the multiple levels is based upon
creating a set of approximating functions; and
selecting a current window with a current window length from a set of varying window lengths, wherein the current window is selected for a current level of the multiple levels.
18. The computer readable medium of claim 17 , wherein the generating multiple levels of hierarchical approximation functions includes generating multiple increasing consecutive numerically ordered levels.
19. The computer readable medium of claim 17 , wherein the current window is a portion of the set of time series data divided into consecutive sub-sequences, and wherein the current window length along with the hierarchal approximating functions reduces an approximation error between the current window and the set of time series data portion.
20. The computer readable medium of claim 17 , wherein the hierarchical approximation functions for each of level of the multiple levels is based upon at least one of:
further creating a set of coefficients that summarize the set of time series data portion in the current window;
reusing the set of approximating functions from the previous window; and
reducing a total squared reconstruction error, via principal component analysis.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/471,002 US20070294247A1 (en) | 2006-06-20 | 2006-06-20 | Identifying optimal multi-scale patterns in time-series streams |
US12/551,033 US7945570B2 (en) | 2006-06-20 | 2009-08-31 | Identifying optimal multi-scale patterns in time-series streams |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/471,002 US20070294247A1 (en) | 2006-06-20 | 2006-06-20 | Identifying optimal multi-scale patterns in time-series streams |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/551,033 Continuation US7945570B2 (en) | 2006-06-20 | 2009-08-31 | Identifying optimal multi-scale patterns in time-series streams |
Publications (1)
Publication Number | Publication Date |
---|---|
US20070294247A1 true US20070294247A1 (en) | 2007-12-20 |
Family
ID=38862725
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/471,002 Abandoned US20070294247A1 (en) | 2006-06-20 | 2006-06-20 | Identifying optimal multi-scale patterns in time-series streams |
US12/551,033 Expired - Fee Related US7945570B2 (en) | 2006-06-20 | 2009-08-31 | Identifying optimal multi-scale patterns in time-series streams |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/551,033 Expired - Fee Related US7945570B2 (en) | 2006-06-20 | 2009-08-31 | Identifying optimal multi-scale patterns in time-series streams |
Country Status (1)
Country | Link |
---|---|
US (2) | US20070294247A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080168375A1 (en) * | 2007-01-07 | 2008-07-10 | International Business Machines Corporation | Systems and methods for simultaneous summarization of data cube streams |
US7487184B1 (en) | 2008-05-09 | 2009-02-03 | International Business Machines Corporation | Method, system, and computer program product for improved round robin for time series data |
US20100121793A1 (en) * | 2007-02-21 | 2010-05-13 | Ryohei Fujimaki | Pattern generation method, pattern generation apparatus, and program |
WO2010138536A1 (en) * | 2009-05-27 | 2010-12-02 | Yin Zhang | Method and apparatus for spatio-temporal compressive sensing |
CN103744973A (en) * | 2014-01-11 | 2014-04-23 | 西安电子科技大学 | Video copy detection method based on multi-feature Hash |
US20150169654A1 (en) * | 2013-12-13 | 2015-06-18 | International Business Machines Corporation | Managing time series databases |
US20170195090A1 (en) * | 2016-01-04 | 2017-07-06 | Siemens Aktiengesellschaft | Entropy-based validation of sensor measurements |
WO2018036641A1 (en) * | 2016-08-26 | 2018-03-01 | Huawei Technologies Co., Ltd. | Device and method arranged for executing information processing on a data stream |
CN113484882A (en) * | 2021-06-24 | 2021-10-08 | 武汉大学 | GNSS sequence prediction method and system of multi-scale sliding window LSTM |
CN114911788A (en) * | 2022-07-15 | 2022-08-16 | 中国长江三峡集团有限公司 | Data interpolation method and device and storage medium |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006017944A1 (en) * | 2004-08-16 | 2006-02-23 | Abb Research Ltd | Method and system for bi-directional data conversion between iec 61970 and iec 61850 |
US8630409B2 (en) | 2011-04-05 | 2014-01-14 | International Business Machines Corporation | Two-party private estimation of dataset similarity |
US9244887B2 (en) * | 2012-07-13 | 2016-01-26 | Sas Institute Inc. | Computer-implemented systems and methods for efficient structuring of time series data |
US9087306B2 (en) | 2012-07-13 | 2015-07-21 | Sas Institute Inc. | Computer-implemented systems and methods for time series exploration |
US9147218B2 (en) | 2013-03-06 | 2015-09-29 | Sas Institute Inc. | Devices for forecasting ratios in hierarchies |
US9934259B2 (en) | 2013-08-15 | 2018-04-03 | Sas Institute Inc. | In-memory time series database and processing in a distributed environment |
US10169720B2 (en) | 2014-04-17 | 2019-01-01 | Sas Institute Inc. | Systems and methods for machine learning using classifying, clustering, and grouping time series data |
CN105095614A (en) | 2014-04-18 | 2015-11-25 | 国际商业机器公司 | Method and device for updating prediction model |
CN105224543A (en) | 2014-05-30 | 2016-01-06 | 国际商业机器公司 | For the treatment of seasonal effect in time series method and apparatus |
US9892370B2 (en) | 2014-06-12 | 2018-02-13 | Sas Institute Inc. | Systems and methods for resolving over multiple hierarchies |
US9208209B1 (en) | 2014-10-02 | 2015-12-08 | Sas Institute Inc. | Techniques for monitoring transformation techniques using control charts |
FR3027403B1 (en) * | 2014-10-17 | 2017-04-21 | Renault Sas | METHOD FOR DIAGNOSING FAILURES IN A STATIONARY BATTERY SET |
US9418339B1 (en) | 2015-01-26 | 2016-08-16 | Sas Institute, Inc. | Systems and methods for time series analysis techniques utilizing count data sets |
US10983682B2 (en) | 2015-08-27 | 2021-04-20 | Sas Institute Inc. | Interactive graphical user-interface for analyzing and manipulating time-series projections |
US10678886B2 (en) | 2016-01-01 | 2020-06-09 | Tata Consultancy Services Limited | Systems and methods for analyzing sensor data using incremental autoregression techniques |
US10229092B2 (en) | 2017-08-14 | 2019-03-12 | City University Of Hong Kong | Systems and methods for robust low-rank matrix approximation |
US10331490B2 (en) | 2017-11-16 | 2019-06-25 | Sas Institute Inc. | Scalable cloud-based time series analysis |
US10621141B2 (en) | 2018-01-31 | 2020-04-14 | Oracle International Corporation | Multivariate memory vectorization technique to facilitate intelligent caching in time-series databases |
US10338994B1 (en) | 2018-02-22 | 2019-07-02 | Sas Institute Inc. | Predicting and adjusting computer functionality to avoid failures |
US10255085B1 (en) | 2018-03-13 | 2019-04-09 | Sas Institute Inc. | Interactive graphical user interface with override guidance |
US10560313B2 (en) | 2018-06-26 | 2020-02-11 | Sas Institute Inc. | Pipeline system for time-series data forecasting |
US10685283B2 (en) | 2018-06-26 | 2020-06-16 | Sas Institute Inc. | Demand classification based pipeline system for time-series data forecasting |
US12130891B2 (en) * | 2020-11-05 | 2024-10-29 | Samsung Electronics Co., Ltd. | Method of live video event detection based on natural language queries, and an apparatus for the same |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6115708A (en) * | 1998-03-04 | 2000-09-05 | Microsoft Corporation | Method for refining the initial conditions for clustering with applications to small and large database clustering |
US20030229636A1 (en) * | 2002-06-06 | 2003-12-11 | Mattausch Hans Juergen | Pattern matching and pattern recognition system, associative memory apparatus, and pattern matching and pattern recognition processing method |
US20040103095A1 (en) * | 2002-11-06 | 2004-05-27 | Canon Kabushiki Kaisha | Hierarchical processing apparatus |
US20040254930A1 (en) * | 2003-04-16 | 2004-12-16 | Chiranjit Acharya | Construction and selection of a finite mixture model for use in clustering and vector quantization |
-
2006
- 2006-06-20 US US11/471,002 patent/US20070294247A1/en not_active Abandoned
-
2009
- 2009-08-31 US US12/551,033 patent/US7945570B2/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6115708A (en) * | 1998-03-04 | 2000-09-05 | Microsoft Corporation | Method for refining the initial conditions for clustering with applications to small and large database clustering |
US20030229636A1 (en) * | 2002-06-06 | 2003-12-11 | Mattausch Hans Juergen | Pattern matching and pattern recognition system, associative memory apparatus, and pattern matching and pattern recognition processing method |
US7203382B2 (en) * | 2002-06-06 | 2007-04-10 | President Of Hiroshima University | Pattern matching and pattern recognition system, associative memory apparatus, and pattern matching and pattern recognition processing method |
US20040103095A1 (en) * | 2002-11-06 | 2004-05-27 | Canon Kabushiki Kaisha | Hierarchical processing apparatus |
US20040254930A1 (en) * | 2003-04-16 | 2004-12-16 | Chiranjit Acharya | Construction and selection of a finite mixture model for use in clustering and vector quantization |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080168375A1 (en) * | 2007-01-07 | 2008-07-10 | International Business Machines Corporation | Systems and methods for simultaneous summarization of data cube streams |
US7505876B2 (en) * | 2007-01-07 | 2009-03-17 | International Business Machines Corporation | Systems and methods for simultaneous summarization of data cube streams |
US20100121793A1 (en) * | 2007-02-21 | 2010-05-13 | Ryohei Fujimaki | Pattern generation method, pattern generation apparatus, and program |
US8447705B2 (en) * | 2007-02-21 | 2013-05-21 | Nec Corporation | Pattern generation method, pattern generation apparatus, and program |
US7487184B1 (en) | 2008-05-09 | 2009-02-03 | International Business Machines Corporation | Method, system, and computer program product for improved round robin for time series data |
US8051039B2 (en) | 2008-05-09 | 2011-11-01 | International Business Machines Corporation | Method, system and computer program product for improved round robin for time series data |
WO2010138536A1 (en) * | 2009-05-27 | 2010-12-02 | Yin Zhang | Method and apparatus for spatio-temporal compressive sensing |
US20150169654A1 (en) * | 2013-12-13 | 2015-06-18 | International Business Machines Corporation | Managing time series databases |
US9361329B2 (en) * | 2013-12-13 | 2016-06-07 | International Business Machines Corporation | Managing time series databases |
US20160246829A1 (en) * | 2013-12-13 | 2016-08-25 | International Business Machines Corporation | Managing time series databases |
CN103744973A (en) * | 2014-01-11 | 2014-04-23 | 西安电子科技大学 | Video copy detection method based on multi-feature Hash |
US20170195090A1 (en) * | 2016-01-04 | 2017-07-06 | Siemens Aktiengesellschaft | Entropy-based validation of sensor measurements |
US11403478B2 (en) * | 2016-01-04 | 2022-08-02 | Siemens Aktiengesellschaft | Entropy-based validation of sensor measurements |
WO2018036641A1 (en) * | 2016-08-26 | 2018-03-01 | Huawei Technologies Co., Ltd. | Device and method arranged for executing information processing on a data stream |
CN113484882A (en) * | 2021-06-24 | 2021-10-08 | 武汉大学 | GNSS sequence prediction method and system of multi-scale sliding window LSTM |
CN114911788A (en) * | 2022-07-15 | 2022-08-16 | 中国长江三峡集团有限公司 | Data interpolation method and device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
US7945570B2 (en) | 2011-05-17 |
US20100063974A1 (en) | 2010-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7945570B2 (en) | Identifying optimal multi-scale patterns in time-series streams | |
Papadimitriou et al. | Optimal multi-scale patterns in time series streams | |
Chen et al. | A clustering algorithm for multiple data streams based on spectral component similarity | |
US9805138B2 (en) | Efficient calculation of all-pair path-based distance measures | |
Yeh et al. | PROUD: a probabilistic approach to processing similarity queries over uncertain data streams | |
Li et al. | A fast algorithm for learning overcomplete dictionary for sparse representation based on proximal operators | |
Rambhatla et al. | Noodl: Provable online dictionary learning and sparse coding | |
WO2010080857A2 (en) | Scalable media fingerprint extraction | |
Rahmani et al. | Innovation pursuit: A new approach to the subspace clustering problem | |
Bonet et al. | Channel-wise early stopping without a validation set via NNK polytope interpolation | |
Yamaç et al. | Convolutional sparse support estimator network (csen): From energy-efficient support estimation to learning-aided compressive sensing | |
Cotter et al. | Application of tree-based searches to matching pursuit | |
Lin et al. | Weight-value convergence of the SOM algorithm for discrete input | |
Ogras et al. | Online summarization of dynamic time series data | |
CN109165586B (en) | Intelligent image processing method for AI chip | |
Rakotomamonjy et al. | Greedy methods, randomization approaches, and multiarm bandit algorithms for efficient sparsity-constrained optimization | |
Windridge et al. | A morphologically optimal strategy for classifier combination: Multiple expert fusion as a tomographic process | |
Chan et al. | Efficient sub-window nearest neighbor search on matrix | |
Karamitopoulos et al. | Current trends in time series representation | |
CN110543614A (en) | Sparse regression method, device, equipment and storage medium | |
Ravishankar et al. | Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL)-The $\ell_0 $ Method | |
Yang | Dimensionality Reduced $\ell^{0} $-Sparse Subspace Clustering | |
Bogert et al. | The Principle of Uncertain Maximum Entropy | |
Qua et al. | Fast Online ℓ0 Elastic Net Subspace Clustering via A Novel Dictionary Update Strategy | |
Qu et al. | Fast Online $ L_0 $ Elastic Net Subspace Clustering via A Novel Dictionary Update Strategy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PAPADIMITRIOU, SPYRIDON;YU, PHILIP S.;REEL/FRAME:018022/0403 Effective date: 20060615 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |