The ability to analyze time series data at scale is critical for the success of finance and IoT applications based on Spark. Flint is Two Sigma's implementation of highly optimized time series operations in Spark. It performs truly parallel and rich analyses on time series data by taking advantage of the natural ordering in time series data to provide locality-based optimizations. Flint is an open source library for Spark based around the TimeSeriesRDD, a time series aware data structure, and a collection of time series utility and analysis functions that use TimeSeriesRDDs. Unlike DataFrame and Dataset, Flint's TimeSeriesRDDs can leverage the existing ordering properties of datasets at rest and the fact that almost all data manipulations and analysis over these datasets respect their temporal ordering properties. It differs from other time series efforts in Spark in its ability to efficiently compute across panel data or on large scale high frequency data.
https://github.com/twosigma/flintTags | spark timeseries |
Implementation | Scala |
License | Apache |
Platform | OS-Independent |
A library for analyzing large-scale time series data.
flint is an open-source lint program for C++ developed and used at Facebook. flint is published on Github at https://github.com/facebook/flint; for discussions, there is a Google group at https://groups.google.com/d/forum/facebook-flint.
Flint is a responsive grid framework written in Sass, created to allow developers to rapidly produce responsive layouts that are built on a sophisticated foundation. Most notably? The syntax. Being a designer myself, a large amount of time was spent on this aspect. Flint is very unique in the fact that it uses only a single mixin for all output: _(...).
grid-system grid-framework framework ui grid layout flint breakpoint breakpoints semantic responsive-web-designFlint Particle System (Particle Engine) for Silverlight,based on original Flint Particles. XNA framework is not required.
particle particlesWe made Flint because we want people to build great apps for Apple platforms that make the most of native platform capabilities. We want to remove barriers to that, which means making it as simple as possible to get things running in a modern way.
ios tvos watchos apple siri logging appframework-libraryMarketStore is a database server optimized for financial timeseries data. You can think of it as an extensible DataFrame service that is accessible from anywhere in your system, at higher scalability. It is designed from the ground up to address scalability issues around handling large amounts of financial market data used in algorithmic trading backtesting, charting, and analyzing price history with data spanning many years, including tick-level for the all US equities or the exploding crypto currencies space. If you are struggling with managing lots of HDF5 files, this is perfect solution to your problem.
marketstore financial-analysis pandas-dataframe trading database timeseries timeseries-database cryptocurrency gdaxRobustly estimate trend and periodicity in a timeseries. Seasonal can recover sharp trend and period estimates from noisy timeseries data with only a few periods. It is intended for estimating season, trend, and level when initializing structural timeseries models like Holt-Winters [Hyndman], and its defaults are biased towards the kinds of training data that arise in that setting. Input samples are assumed evenly-spaced from a continuous real-valued signal with additive noise but no anomalies.
EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. Its features include Automatic partitioning, Columnar storage, Standard SQL support, Scales to petabytes, Timeseries and relational data, Fast range scans and lot more.
database columnar-database columnar-storage timeseries streaming distributed-database distributed analytics column-storeThis library contains a set of modular charting components used for building flexible interactive charts. It was built for React from the ground up, specifically to visualize timeseries data and network traffic data in particular. Low level elements are constructed using d3, while higher level composability is provided by React. Charts can be stacked as rows, overlaid on top of each other, or any combination, all in a highly declarative manner. The library is used throughout the public facing ESnet Portal.
chart timeseries react pond d3 chartsmtail is a tool for extracting metrics from application logs to be exported into a timeseries database or timeseries calculator for alerting and dashboarding.It aims to fill a niche between applications that do not export their own internal state, and existing monitoring systems, without patching those applications or rewriting the same framework for custom extraction glue code.
monitoring logs observability prometheus collector proxy metrics extraction mtail calculator vm compiler bytecodeGridDB is an In-Memory NoSQL Database for highly scalable IoT applications . It has a KVS (Key-Value Store)-type data model that is suitable for sensor data stored in a timeseries. It is a database that can be easily scaled-out according to the number of sensors. High Reliability It is equipped with a structure to spread out the replication of key value data among fellow nodes so that in the event of a node failure, automatic failover can be carried out in a matter of seconds by using the replication function of other nodes.
nosql key-value-store in-memory timeseries timeseries-database iotCheck your project for common sources of contributor friction. Flint checks if your project's folder contains the proper files and structure to allow potential contributors to understand: 1) the project's goals, 2) how to contribute, 3) usage guidelines, and 4) how to install the project.
.NET for Apache Spark provides high performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.
spark analytics bigdata spark-streaming spark-sql machine-learning fsharp dotnet-core dotnet-standard streaming apache-spark tpcds tpch azure hdinsight databricks emr microsoftApache Spark is a general purpose parallel computational engine for analytics at scale. At its core, it has a batch design center and is capable of working with disparate data sources. While this provides rich unified access to data, this can also be quite inefficient and expensive. Analytic processing requires massive data sets to be repeatedly copied and data to be reformatted to suit Spark. In many cases, it ultimately fails to deliver the promise of interactive analytic performance. For instance, each time an aggregation is run on a large Cassandra table, it necessitates streaming the entire table into Spark to do the aggregation. Caching within Spark is immutable and results in stale insight. At SnappyData, we take a very different approach. SnappyData fuses a low latency, highly available in-memory transactional database (GemFireXD) into Spark with shared memory management and optimizations. Data in the highly available in-memory store is laid out using the same columnar format as Spark (Tungsten). All query engine operators are significantly more optimized through better vectorization and code generation. The net effect is, an order of magnitude performance improvement when compared to native Spark caching, and more than two orders of magnitude better Spark performance when working with external data sources.
snappydata spark memory-database analytics stream transaction scaleJohn Snow Labs Spark-NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment. This library has been uploaded to the spark-packages repository https://spark-packages.org/package/JohnSnowLabs/spark-nlp .
nlp nlu natural-language-processing natural-language-understanding spark spark-ml pyspark machine-learning named-entity-recognition sentiment-analysis lemmatizer spell-checker tokenizer entity-extraction stemmer part-of-speech-tagger annotation-framework酷玩 Spark: Spark 源代码解析、Spark 类库等
spark spark-streaming structured-streaming sparkcore apache-sparkLightning-fast cluster computing with Apache Spark™ and Apache Cassandra®.This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and execute arbitrary CQL queries in your Spark applications.
cassandra spark cassandra-client cassandra-driver cassandra-libraryThis Apache Spark tutorial will guide you step-by-step into how to use the MovieLens dataset to build a movie recommender using collaborative filtering with Spark's Alternating Least Saqures implementation. It is organised in two parts. The first one is about getting and parsing movies and ratings data into Spark RDDs. The second is about building and using the recommender and persisting it for later use in our on-line recommender system. This tutorial can be used independently to build a movie recommender model based on the MovieLens dataset. Most of the code in the first part, about how to use ALS with the public MovieLens dataset, comes from my solution to one of the exercises proposed in the CS100.1x Introduction to Big Data with Apache Spark by Anthony D. Joseph on edX, that is also publicly available since 2014 at Spark Summit. Starting from there, I've added with minor modifications to use a larger dataset, then code about how to store and reload the model for later use, and finally a web service using Flask.
movie-recommendation movielens-dataset spark flask big-data bigdataPlease note: spark-ec2 is no longer under active development and the project has been archived. All the existing code, PRs and issues are still accessible but are now read-only. If you're looking for a similar tool that is under active development, we recommend you take a look at Flintrock. spark-ec2 allows you to launch, manage and shut down Apache Spark [1] clusters on Amazon EC2. It automatically sets up Apache Spark and HDFS on the cluster for you. This guide describes how to use spark-ec2 to launch clusters, how to run jobs on them, and how to shut them down. It assumes you've already signed up for an EC2 account on the Amazon Web Services site.
spark-jobserver provides a RESTful interface for submitting and managing Apache Spark jobs, jars, and job contexts. This repo contains the complete Spark job server project, including unit tests and deploy scripts. It was originally started at Ooyala, but this is now the main development repo. Other useful links: Troubleshooting, cluster, YARN client, YARN on EMR, Mesos, JMX tips.
spark rest-api spark-jobserver
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.