kshape - Python implementation of k-Shape

Returns list of tuples with the clusters found by kshape. The first value of the tuple is zscore normalized centroid. The second value of the tuple is the index of assigned series to this cluster. The results can be examined by drawing graphs of the zscore normalized values and the corresponding centroid.




Related Projects

Akumuli - Time-series database

Akumuli is a time-series database for modern hardware. It can be used to capture, store and process time-series data in real-time. The word "akumuli" can be translated from Esperanto as "accumulate".

K-Nearest-Neighbors-with-Dynamic-Time-Warping - Python implementation of KNN and DTW classification algorithm

When it comes to building a classification algorithm, analysts have a broad range of open source options to choose from. However, for time series classification, there are less out-of-the box solutions. I began researching the domain of time series classification and was intrigued by a recommended technique called K Nearest Neighbors and Dynamic Time Warping. A meta analysis completed by Mitsa (2010) suggests that when it comes to timeseries classification, 1 Nearest Neighbor (K=1) and Dynamic Timewarping is very difficult to beat [1].

chronix.server - The Chronix Server implementation that is based on Apache Solr.

The Chronix Server is an implementation of the Chronix API that stores time series in Apache Solr. Chronix uses several techniques to optimize query times and storage demand. Thus Chronix achieves on a benchmark asking serveral ranges (.5 day up to 180 days) an average runtime per range-query of 23 milliseconds. The dataset contains about 3.7 billion pairs and takes 108 GB serialized as CSV. Chronix needs only 8.7 GB to store the dataset. Everything runs on a standard laptop computer. No need of clustering, parallel processing or another complex stuff. Check it out and give it a try. The repository chronix.examples contains some examples.

tsfresh - Automatic extraction of relevant features from time series:

"Time Series Feature extraction based on scalable hypothesis tests". The package contains many feature extraction methods and a robust feature selection algorithm.

flint - A Time Series Library for Apache Spark

The ability to analyze time series data at scale is critical for the success of finance and IoT applications based on Spark. Flint is Two Sigma's implementation of highly optimized time series operations in Spark. It performs truly parallel and rich analyses on time series data by taking advantage of the natural ordering in time series data to provide locality-based optimizations. Flint is an open source library for Spark based around the TimeSeriesRDD, a time series aware data structure, and a collection of time series utility and analysis functions that use TimeSeriesRDDs. Unlike DataFrame and Dataset, Flint's TimeSeriesRDDs can leverage the existing ordering properties of datasets at rest and the fact that almost all data manipulations and analysis over these datasets respect their temporal ordering properties. It differs from other time series efforts in Spark in its ability to efficiently compute across panel data or on large scale high frequency data.

Clustering Demo in Silverlight using K-Means Algorithm


This explains clustering and K-means algorithm in an efficient way using a live demo in Silverlight. The demo can be used to understand the working of k-means algorithm through user-defined data points.

Gnocchi - Time series database

Gnocchi is an open-source |time series| database. The problem that Gnocchi solves is the storage and indexing of |time series| data and resources at a large scale. This is useful in modern cloud platforms which are not only huge but also are dynamic and potentially multi-tenant. Gnocchi takes all of that into account. Gnocchi has been designed to handle large amounts of aggregates being stored while being performant, scalable and fault-tolerant. While doing this, the goal was to be sure to not build any hard dependency on any complex storage system.

CausalImpact - An R package for causal inference in time series

This R package implements an approach to estimating the causal effect of a designed intervention on a time series. For example, how many additional daily clicks were generated by an advertising campaign? Answering a question like this can be difficult when a randomized experiment is not available. The package aims to address this difficulty using a structural Bayesian time-series model to estimate how the response metric might have evolved after the intervention if the intervention had not occurred.As with all approaches to causal inference on non-experimental data, valid conclusions require strong assumptions. The CausalImpact package, in particular, assumes that the outcome time series can be explained in terms of a set of control time series that were themselves not affected by the intervention. Furthermore, the relation between treated series and control series is assumed to be stable during the post-intervention period. Understanding and checking these assumptions for any given application is critical for obtaining valid conclusions.

SiriDB - Highly-scalable, robust and super fast time series database

SiriDB is a highly-scalable, robust and super fast time series database. Build from the ground up SiriDB uses a unique mechanism to operate without indexes and allows server resources to be added on the fly. SiriDB's unique query language includes dynamic grouping of time series for easy and super fast analysis over large amount's of time series.

Beringei - High performance, in-memory storage engine for time series data.

In the fall of 2015, we published the paper “Gorilla: A Fast, Scalable, In-Memory Time Series Database” at VLDB 2015. Beringei is the open source representation of the ideas presented in this paper. Beringei is a high performance time series storage engine. Time series are commonly used as a representation of statistics, gauges, and counters for monitoring performance and health of a system.

gluon-ts - GluonTS - Probabilistic Time Series Modeling in Python

GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (incubating). GluonTS provides utilities for loading and iterating over time series datasets, state of the art models ready to be trained, and building blocks to define your own models and quickly experiment with different solutions.

InfluxDB - Distributed Time Series Database

InfluxDB is an open-source, distributed, time series database with no external dependencies. It's useful for recording metrics, events, and performing analytics. Everything in InfluxDB is a time series that you can perform standard functions on like min, max, sum, count, mean, median, percentiles, and more. Collect your data on any interval and compute rollups on the fly later.

Cubism.js - Time Series Visualization

Cubism.js is a D3 plugin for visualizing time series. Use Cubism to construct better realtime dashboards, pulling data from Graphite, Cube and other sources. Cubism fetches time series data incrementally: after the initial display, Cubism reduces server load by polling only the most recent values. Cubism renders incrementally, too, using Canvas to shift charts one pixel to the left.

Kairosdb - Fast distributed scalable time series database written on top of Cassandra

KairosDB is a fast distributed scalable time series database written on top of Cassandra. Data can be pushed in KairosDB via multiple protocols : Telnet, Rest, Graphite. KairosDB stores time series in Cassandra, the popular and performant NoSQL datastore. It supports aggregators which can perform an operation on data points and down samples. Standard functions like min, max, sum, count, mean etc.

TimeSeriesAnalysiswithPython - Time Series Analysis with Python

Overview: A lot of data that we see in nature are in continuous time series. This workshop will provide an overview on how to do time series analysis and introduce time series forecasting. Audience: People interested in Data analytics on time series data.

awesome-time-series-database - :clock7: A curated list of awesome time series databases, benchmarks and papers

OpenTSDB is a Classical time series database on top of HBase. Now support Cassandra and Bigtable. BTrDB (Berkeley Tree Database) is a High performance time series database designed to support high density data storage applications.

Timely - Accumulo backed time series database

Timely is a time series database application that provides secure access to time series data. Timely is written in Java and designed to work with Apache Accumulo and Grafana.

atsd-use-cases - Axibase Time Series Database: Usage Examples and Research Articles

Use Cases documentation demonstrates solutions to real-world data problems using Axibase Time Series Database (ATSD) and contains in-depth guides for programmatic integration with commonly-used enterprise software systems and services, as well as tutorials for data transformation and visualizations created with ATSD. Interactive visualizations tracking interesting datasets from a variety of sources.

Lindb - Distributed Time Series Database

LinDB is an open-source Time Series Database which provides high performance, high availability and horizontal scalability. LinDB takes a lot of best practice of TSDB and implements some optimizations based on the characteristics of time series data. Unlike writing a lot of Continuous-Query for InfluxDB, LinDB supports rollup in specific interval automatically after creating the database. Moreover, LinDB is extremely fast for parallel querying and computing of distributed time series data.

Dynamic Time Warp for Time Series Analysis


This is a conversion to C# of Stan Salvador, Philip Chan Fast DTW algorithm originally implemented in Java.