We have collection of more than 1 Million open source products ranging from Enterprise product to
small libraries in all platforms. We aggregate information from all open source repositories.
Search and find the best for your needs. Check out projects section.
KNIME, pronounced [naim], is a modern data analytics platform that allows you to perform sophisticated statistics and data mining on your data to analyze trends and predict potential results. Its visual workbench combines data access, data transformation, initial investigation, powerful predictive analytics and visualization. KNIME also provides the ability to develop reports based on your information or automate the application of new insight back into production systems.
No 1 in Business Analytics: Data Mining, Predictive Analytics, ETL, Reporting, Dashboards in One Tool. 1000+ methods: data mining, business intelligence, ETL, data mining, data analysis + Weka + R, forecasting, visualization, business intelligence
RapidAnalytics is the 1st open source server for data mining and business analytics. It is based on the world-leading data mining solution RapidMiner and includes ETL, data mining, reporting, dashboards in a single server solution.
InfiniDB Community Edition is a scale-up, column-oriented database for data warehousing, analytics, business intelligence and read-intensive applications. InfiniDB's data warehouse columnar engine is multi-terabyte capable and accessed via MySQL.
Lens provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It provides a simple metadata layer which provides an abstract view over tiered data stores.
acceleratoRs are a collection of R based lightweight data science solutions that offer quick start for data scientists to experiment, prototype, and present their data analytics of specific domains.Each of accelerators shared in this repo is structured following the project template of the Microsoft Team Data Science Process, in a simplified and accelerator-friendly version. The analytics are scripted in R markdown (notebook), and can be used to conveniently yield outputs in various formats (ipynb, PDF, html, etc.).
Collector is the service for collecting stats. It has REST API and DB storage. Analytics is the service for generating reports. It has REST API. Migrator is the tool for migrating data from the DB to the Elasticsearch.The collector and analytics services are started by uWSGI. Migrator is started by cron to migrate the fresh data into Elasticsearch.
Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.
Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations. Druid can load both streaming and batch data.
GeoMesa is an open-source, distributed, spatio-temporal database built on a number of distributed cloud data storage systems, including Accumulo, HBase, Cassandra, and Kafka. Leveraging a highly parallelized indexing strategy, GeoMesa aims to provide as much of the spatial querying and data manipulation to Accumulo as PostGIS does to Postgres.
EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. Its features include Automatic partitioning, Columnar storage, Standard SQL support, Scales to petabytes, Timeseries and relational data, Fast range scans and lot more.
Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.
* This API uses a WSSE authentication header on every call, each example class has a method called getWSSEHeader for this purpose * The code examples can not be run without Partner API credentials (a Username and Secret). These must be obtained through an Adobe Partner Integration Manager after appropriate agreements are in place.* Each example passes JSON encoded data as a String and received JSON encoded data as a String * parsing the JSON data is left as an exercise for the developer so you