Displaying 1 to 20 from 26 results

mining - Business Intelligence (BI) in Python, OLAP

  •    Python

If you use Mac OSX you can install all dependencies using HomeBrew. For example, to connect to a PostgreSQL database make sure you install a driver like psycopg2. OpenMining supports all databases that the underlying ORM SQLAlchemy supports.

Spark - Fast Cluster Computing

  •    Scala

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

Zeppelin - Multi-purpose Notebook

  •    Java

A web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.

Jupyter - Web-based notebook environment for interactive computing

  •    Python

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. It supports over 40 programming languages.




Lens - Unified Analytics interface

  •    Java

Lens provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It provides a simple metadata layer which provides an abstract view over tiered data stores.

Knime - Data Analytics Platform

  •    Java

KNIME, pronounced [naim], is a modern data analytics platform that allows you to perform sophisticated statistics and data mining on your data to analyze trends and predict potential results. Its visual workbench combines data access, data transformation, initial investigation, powerful predictive analytics and visualization. KNIME also provides the ability to develop reports based on your information or automate the application of new insight back into production systems.

TrailDB - Efficient tool for storing and querying series of events

  •    C

TrailDB is a library, implemented in C, which allows you to query series of events at blazing speed. TrailDB is also optimized for speed of development: Use its simple API with your favorite language, in your favorite environment. TrailDB's secret sauce is data compression. It leverages predictability of time-based data to compress your data to a fraction of its original size. In contrast to traditional compression, you can query the encoded data directly, decompressing only the parts you need.

Dremio - The missing link in modern data

  •    Java

Dremio is a self-service data platform that empowers users to discover, curate, accelerate, and share any data at any time, regardless of location, volume, or structure. Modern data is managed by a wide range of technologies, including relational databases, NoSQL datastores, file systems, Hadoop, and others. Many of the newer datastores are often more agile and provide improved scalability, but at a cost to speed and ease of access via traditional SQL-based analysis tools. Additionally, raw data found in these stores is often too complex or inconsistent for analysis to use with business intelligence tools.


data-science-with-ruby - Practical Data Science with Ruby based tools.

  •    Ruby

Data Science is a new "sexy" buzzword without specific meaning but often used to substitute Statistics, Scientific Computing, Text and Data Mining and Visualization, Machine Learning, Data Processing and Warehousing as well as Retrieval Algorithms of any kind. This curated list comprises awesome tutorials, libraries, information sources about various Data Science applications using the Ruby programming language.

datasheets - Read data from, write data to, and modify the formatting of Google Sheets

  •    Python

datasheets is a library for interfacing with Google Sheets, including reading data from, writing data to, and modifying the formatting of Google Sheets. It is built on top of Google's google-api-python-client and oauth2client libraries using the Google Drive v3 and Google Sheets v4 REST APIs. It can be installed with pip via pip install datasheets.

AdaptDB

  •    Java

AdaptDB is an adaptive storage manager for analytical database workloads in a distributed setting. It works by partitioning datasets across a cluster and incrementally refining data partitioning as queries are run.

trck - Query engine for TrailDB

  •    C

trck is a tool to query TrailDBs for aggregate metrics based on individual user behavior. trck is a domain specific language that defines a finite state machine1 to find patterns in data. These programs are compiled into highly optimized parallel native code.

candela - Visualization components for the web

  •    Javascript

Candela is an open-source framework for creating interoperable, reusable visualization components for the web. Candela is a part of Kitware's Resonant platform. Candela focuses on making scalable, rich visualizations available with a normalized API for use in real-world data science applications. Please see our documentation at https://candela.readthedocs.io.

countly-sdk-js - Countly Product Analytics SDK for Icenium and Phonegap

  •    Java

Questions? Visit http://community.count.ly. Countly is an innovative, real-time, open source mobile analytics and push notifications platform. It collects data from mobile devices, and visualizes this information to analyze mobile application usage and end-user behavior. There are two parts of Countly: the server that collects and analyzes data, and mobile SDK that sends this data. Both parts are open source with different licensing terms.

countly-sdk-web - Countly Product Analytics SDK for websites and web applications

  •    Javascript

Countly is an innovative, real-time, open source mobile & web analytics, rich push notifications and crash reporting platform powering more than 2500 web sites and 14000 mobile applications as of 2017 Q3. It collects data from mobile phones, tablets, Apple Watch and other internet-connected devices, and visualizes this information to analyze application usage and end-user behavior. With the help of Javascript SDK, Countly is a web analytics platform with features on par with mobile SDKs. For more information about web analytics capabilities, see this link.

elasticsearch-demo - Simple data mining workflow using Elasticsearch, Minio, and fluentd.

  •    Javascript

The purpose of this demo is to show how to feed data into Elasticsearch from API calls, Fluent, Aircraft Delays, BitCoin price, and a desired Twitter hashtag for data analytics and then archive them to S3. This demo will run Fluentd, Elastisearch, Kibana, and the Minio S3 Server in a microservices architecture. If you want to use the Twitter app to mine data from Twitter, modify the twitter section of docker-compose.yml with your developer API credentials.

aws-dbs-refarch-datalake - Reference Architectures for Datalakes on AWS

  •    HTML

A datalake is a data repository that stores data in its raw format until it is used for analytics. It is designed to store massive amount of data at scale. A schema to the dataset in data lake is given as part of transformation while reading it. Below is a pictorial representation of a typical datalake on AWS cloud. Keeping track of all of the raw assets that are loaded into your datalake, and then tracking all of the new data assets and versions that are created by data transformation, data processing, and analytics can be a major challenge. An essential component of an Amazon S3 based data lake is a Data Catalog. A data catalog is designed to provide a single source of truth about the contents of the data lake, and rather than end users reasoning about storage buckets and prefixes, a data catalog lets them interact with more familiar structures of databases, tables, and partitions.

Data-Wrangling-with-Python - Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices

  •    Jupyter

Data is the new Oil and it is ruling the modern way of life through incredibly smart tools and transformative technologies. But oil does not come out in its final form from the rig. It has to be refined through a complex processing network. Similarly, data needs to be curated, massaged and refined to be used in intelligent algorithms and consumer products. This is called wrangling and (according to Forbes) all the good data scientists spend almost 60-80% of their time on this, each day, every project. It involves scraping the raw data from multiple sources (including web and database tables), imputing, formatting, transforming – basically making it ready, to be used flawlessly in the modeling process. This course aims to teach you all the core ideas behind this process and to equip you with the knowledge of the most popular tools and techniques in the domain. As the programming framework, we have chosen Python, the most widely used language for data science. We work through real-life examples, not toy datasets. At the end of this course, you will be confident to handle a myriad array of sources to extract, clean, transform, and format your data for the great machine learning app you are thinking of building. Hop on and be the part of this exciting journey.

coe-industry-day - Information on the Phase II Industry Day for the Centers of Excellence at USDA.

  •    

We are no longer taking questions about the procurements on this repository. On June 27-29, 2018, the Centers of Excellence (CoE) teams from the General Services Administration (GSA) and the U.S. Department of Agriculture (USDA) held an Industry Day (June 27) and Reverse Industry Days (June 28 and 29).





We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.