Displaying 1 to 20 from 25 results

mining - Business Intelligence (BI) in Python, OLAP

  •    Python

If you use Mac OSX you can install all dependencies using HomeBrew. For example, to connect to a PostgreSQL database make sure you install a driver like psycopg2. OpenMining supports all databases that the underlying ORM SQLAlchemy supports.

Spark - Fast Cluster Computing

  •    Scala

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

Zeppelin - Multi-purpose Notebook

  •    Java

A web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.

Jupyter - Web-based notebook environment for interactive computing

  •    Python

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. It supports over 40 programming languages.




Lens - Unified Analytics interface

  •    Java

Lens provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It provides a simple metadata layer which provides an abstract view over tiered data stores.

Knime - Data Analytics Platform

  •    Java

KNIME, pronounced [naim], is a modern data analytics platform that allows you to perform sophisticated statistics and data mining on your data to analyze trends and predict potential results. Its visual workbench combines data access, data transformation, initial investigation, powerful predictive analytics and visualization. KNIME also provides the ability to develop reports based on your information or automate the application of new insight back into production systems.

TrailDB - Efficient tool for storing and querying series of events

  •    C

TrailDB is a library, implemented in C, which allows you to query series of events at blazing speed. TrailDB is also optimized for speed of development: Use its simple API with your favorite language, in your favorite environment. TrailDB's secret sauce is data compression. It leverages predictability of time-based data to compress your data to a fraction of its original size. In contrast to traditional compression, you can query the encoded data directly, decompressing only the parts you need.

Dremio - The missing link in modern data

  •    Java

Dremio is a self-service data platform that empowers users to discover, curate, accelerate, and share any data at any time, regardless of location, volume, or structure. Modern data is managed by a wide range of technologies, including relational databases, NoSQL datastores, file systems, Hadoop, and others. Many of the newer datastores are often more agile and provide improved scalability, but at a cost to speed and ease of access via traditional SQL-based analysis tools. Additionally, raw data found in these stores is often too complex or inconsistent for analysis to use with business intelligence tools.


data-science-with-ruby - Practical Data Science with Ruby based tools.

  •    Ruby

Data Science is a new "sexy" buzzword without specific meaning but often used to substitute Statistics, Scientific Computing, Text and Data Mining and Visualization, Machine Learning, Data Processing and Warehousing as well as Retrieval Algorithms of any kind. This curated list comprises awesome tutorials, libraries, information sources about various Data Science applications using the Ruby programming language.

datasheets - Read data from, write data to, and modify the formatting of Google Sheets

  •    Python

datasheets is a library for interfacing with Google Sheets, including reading data from, writing data to, and modifying the formatting of Google Sheets. It is built on top of Google's google-api-python-client and oauth2client libraries using the Google Drive v3 and Google Sheets v4 REST APIs. It can be installed with pip via pip install datasheets.

AdaptDB

  •    Java

AdaptDB is an adaptive storage manager for analytical database workloads in a distributed setting. It works by partitioning datasets across a cluster and incrementally refining data partitioning as queries are run.

trck - Query engine for TrailDB

  •    C

trck is a tool to query TrailDBs for aggregate metrics based on individual user behavior. trck is a domain specific language that defines a finite state machine1 to find patterns in data. These programs are compiled into highly optimized parallel native code.

candela - Visualization components for the web

  •    Javascript

Candela is an open-source framework for creating interoperable, reusable visualization components for the web. Candela is a part of Kitware's Resonant platform. Candela focuses on making scalable, rich visualizations available with a normalized API for use in real-world data science applications. Please see our documentation at https://candela.readthedocs.io.

countly-sdk-js - Countly Product Analytics SDK for Icenium and Phonegap

  •    Java

Questions? Visit http://community.count.ly. Countly is an innovative, real-time, open source mobile analytics and push notifications platform. It collects data from mobile devices, and visualizes this information to analyze mobile application usage and end-user behavior. There are two parts of Countly: the server that collects and analyzes data, and mobile SDK that sends this data. Both parts are open source with different licensing terms.

countly-sdk-web - Countly Product Analytics SDK for websites and web applications

  •    Javascript

Countly is an innovative, real-time, open source mobile & web analytics, rich push notifications and crash reporting platform powering more than 2500 web sites and 14000 mobile applications as of 2017 Q3. It collects data from mobile phones, tablets, Apple Watch and other internet-connected devices, and visualizes this information to analyze application usage and end-user behavior. With the help of Javascript SDK, Countly is a web analytics platform with features on par with mobile SDKs. For more information about web analytics capabilities, see this link.

elasticsearch-demo - Simple data mining workflow using Elasticsearch, Minio, and fluentd.

  •    Javascript

The purpose of this demo is to show how to feed data into Elasticsearch from API calls, Fluent, Aircraft Delays, BitCoin price, and a desired Twitter hashtag for data analytics and then archive them to S3. This demo will run Fluentd, Elastisearch, Kibana, and the Minio S3 Server in a microservices architecture. If you want to use the Twitter app to mine data from Twitter, modify the twitter section of docker-compose.yml with your developer API credentials.

website - Public repository for the R4DS community website.

  •    CSS

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us. Pull requests shall be submitted with the target branch develop.

rainbow - A data layout optimization framework for wide tables stored on HDFS

  •    Java

Rainbow is a tool that helps improve the I/O performance of wide tables stored in columnar formats on HDFS. More information in our project main page.

cora-docs - CoRA Docs

  •    HTML

If you are submitting documentation for the current stable release, submit it to the corresponding branch. For example, documentation for CoRA 1.0 would be submitted to the 1.0 branch. Documentation intended for the next release of CoRA should be submitted to the master branch. If you are new to markdown, fear not...

Mads.jl - MADS: Model Analysis & Decision Support

  •    Julia

MADS has been tested to perform HPC simulations on a wide-range multi-processor clusters and parallel environments (Moab, Slurm, etc.). MADS utilizes adaptive rules and techniques which allows the analyses to be performed with a minimum user input. The code provides a series of alternative algorithms to execute each type of data- and model-based analyses.