Abstract
Similarity search over time series is essential in many applications. However, it may cause a “dimensionality curse” due to the high dimensionality of time series. Various dimensionality reduction methods have been developed. Some of them sacrifice max deviation to get a faster dimensionality reduction, such as equal-length segment dimensionality reduction methods: Piecewise Linear Approximation (PLA), Piecewise Aggregate Approximation (PAA), Chebyshev Polynomials (CHEBY), Piecewise Aggregate Approximation Lagrangian Multipliers (PAALM) and Symbolic Aggregate Approximation (SAX). Some sacrifice dimensionality reduction time for the smallest max deviation, such as adaptive-length segment dimensionality reduction methods: Adaptive Piecewise Linear Approximation (APLA) and Adaptive Piecewise Constant Approximation (APCA). APLA uses a guaranteed upper bound for the best max deviation with slow dimensionality reductiontime.
We investigate the existing basic dimensionality reduction techniques for time series data. We point out the limitations of the existing dimensionality reduction techniques and evaluate the dimensionality reduction techniques on real-life datasets. We also conduct preliminary research and make some improvements to the existing dimensionality reduction techniques. Our experimental results on several datasets compare and verify the efficiency and effectiveness of the existing techniques, including several dimensionality reduction techniques, one index-building method, and two k-Nearest Neighbours (k-NN) search methods.
We propose an adaptive-length segment dimensionality reduction method called Self Adaptive Piecewise Linear Approximation (SAPLA). It consists of 1) initialization, 2) split & merge iteration, and 3) segment endpoint movement iteration. Increment area, reconstruction area, conditional upper bound and several equations are applied to prune redundant computations. The experiments show this method can speed up about n (time series length) times than APLA when reducing the dimensionality of the original time series with minor max deviation loss. We propose a lower bound distance measure between two time series to guarantee no false dismissals and tightness for
adaptive-length segment dimensionality reduction methods. When mapping time series into a tree index, we propose Distance Based Covering with Convex Hull Tree (DBCH-tree) and split node and pick branch by using the proposed lower bounding distance, not waste area. Our experiments show that DBCH-tree helps improve k-NN search over time series using dimensionality reduction methods.
Date of Award | Dec 2021 |
---|---|
Original language | English |
Supervisor | Hongxia Wang (Supervisor) |