We have collection of more than 1 Million open source products ranging from Enterprise product to
small libraries in all platforms. We aggregate information from all open source repositories.
Search and find the best for your needs. Check out projects section.
No 1 in Business Analytics: Data Mining, Predictive Analytics, ETL, Reporting, Dashboards in One Tool. 1000+ methods: data mining, business intelligence, ETL, data mining, data analysis + Weka + R, forecasting, visualization, business intelligence
KNIME, pronounced [naim], is a modern data analytics platform that allows you to perform sophisticated statistics and data mining on your data to analyze trends and predict potential results. Its visual workbench combines data access, data transformation, initial investigation, powerful predictive analytics and visualization. KNIME also provides the ability to develop reports based on your information or automate the application of new insight back into production systems.
In this tutorial, you will learn how to create a predictive model for customer conversion based on a combination of in-house CRM data and Google Analytics Premium logs. It consists of an initial code lab using pre-generated sample data, followed by a detailed implementation guide that shows you how to put predictive analytics into practice using your own data.The material in this repository complements an article introducing the topic on the Google Cloud Platform Solutions website.
Predictive maintenance is one of the most common machine learning use cases and with the latest advancements in information technology, the volume of stored data is growing faster in this domain than ever before which makes it necessary to leverage big data analytic capabilities to efficiently transform large amounts of data into business intelligence. Microsoft has published a series of learning materials including blogs, solution templates, modeling guides and sample tutorials in the domain of predictive maintenance. In this tutorial, we extended those materials by providing a detailed step-by-step process of using Spark Python API PySpark to demonstrate how to approach predictive maintenance for big data scenarios. The tutorial covers typical data science steps such as data ingestion, cleansing, feature engineering and model development.The input data is simulated to reflect features that are generic for most of the predictive maintenance scenarios. To enable the tutorial to be completed very quickly, the data was simulated to be around 1.3 GB but the same PySpark framework can be easily applied to a much larger data set. The data is hosted on a publicly accessible Azure Blob Storage container and can be downloaded from here. In this tutorial, we import the data directly from the blob storage.
This package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any predictive modeling pipeline. This package can be used to get a good sense of any dataset before jumping on to building predictive models.
Deep learning has proven to show superior performance in certain domains such as object recognition and image classification. It has also gained popularity in domains such as finance where time-series data plays an important role. Predictive Maintenance is also a domain where data is collected over time to monitor the state of an asset with the goal of finding patterns to predict failures which can also benefit from certain deep learning algorithms. Among the deep learning methods, Long Short Term Memory LSTM networks are especially appealing to the predictive maintenance domain due to the fact that they are very good at learning from sequences. This fact lends itself to their applications using time series data by making it possible to look back for longer periods of time to detect failure patterns. In this notebook, we build an LSTM network for the data set and scenario described at Predictive Maintenance Template to predict remaining useful life of aircraft engines using the Turbofan Engine Degradation Simulation Data Set. In summary, the template uses simulated aircraft sensor values to predict when an aircraft engine will fail in the future so that maintenance can be planned in advance.We suggest that you use Data Science Virtual Machine for this tutorial which comes with CNTK pre-installed. You can then configure to enable CNTK as Keras back end.
C++ openFrameworks addon with a set of template classes for doing various types of interpolations on data with any number of dimensions. You can feed the system an arbitrary number of data, then resample at any resolution, or ask for the value at any percentage along the data. Input data can be floats (for 1D splines, Vec2f (for 2D splines), Vec3f (for 3D splines), or even matrices, or custom data types (e.g. biped pose). Demo at www.memo.tv/msainterpolator
<img align="right" src="https://raw.github.com/wiki/cloudera/oryx/OryxLogoSmall.png"/>The Oryx open source project provides simple, real-time large-scale machine learning /predictive analytics infrastructure. It implements a few classes of algorithm commonly used in business applications:*collaborative filtering / recommendation*, *classification / regression*, and *clustering*.It can continuously build models from a stream of data at large scale using[Apache Hadoop](http://hadoop.apache.org/).
GoldenOrb is a cloud-based project for massive-scale graph analysis, built upon Apache Hadoop and modeled after Google's Pregel architecture. It provides solutions to complex data problems, remove limits to innovation and contribute to the emerging ecosystem that spans all aspects of big data analysis. It enables users to run analytics on entire data sets instead of samples.
EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. Its features include Automatic partitioning, Columnar storage, Standard SQL support, Scales to petabytes, Timeseries and relational data, Fast range scans and lot more.
ITK is an open-source, cross-platform system that provides developers with an extensive suite of software tools for image analysis. Developed through extreme programming methodologies, ITK employs leading-edge algorithms for registering and segmenting multidimensional data.
RDF Data Cube Explorer, to enable access to multidimensional statistical data in ways that allow non-specialists to explore and create specific visualizations of that data. We focus on explorations of health data - in particular aimed at helping to support the formation and analysis of hypotheses about public health intervention strategies and their possible correlation to health-related behavior changes. The Data Cube Explorer was used to formulate and explore the hypothesis that youth tobacco
STINGER is a package designed to support streaming graph analytics by using in-memory parallel computation to accelerate the computation. STINGER is composed of the core data structure and the STINGER server, algorithms, and an RPC server that can be used to run queries and serve visualizations.
AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. It is a highly scalable data management system that can store, index, and manage semi-structured data, but it also supports a full-power query language with the expressiveness of SQL (and more).
The Data Science Virtual Machine (DSVM) is a customized VM image on Microsoft’s Azure cloud built specifically for doing data science. It has many popular data science and other tools pre-installed and pre-configured to jump-start building intelligent applications for advanced analytics. It is available on Windows Server 2016, Windows Server 2012, and on Linux. We offer Linux edition of the DSVM in either Ubuntu 16.04 LTS or on OpenLogic 7.2 CentOS-based Linux distributions. You can try the Data Science VM for free for 30 days (with $200 credits) with a free Azure Trial. The Linux (Ubuntu-based) DSVM also provides a test drive through a button on the product page. The Test Drive will provide full access to you own instance of the VM with just a free Microsoft account (No Azure subscription or CC needed).On this repo, we will feature tools, tips and extensions (see below) to the Data Science VM. We invite the DSVM user community to contribute any useful tools or scripts, extensions you may have written to enhance the user experience on the DSVM.