Data Mining SDK

  •        0

Data Mining Source Code Project.



comments powered by Disqus

Related Projects

Apache Mahout - Scalable machine learning library

Apache Mahout has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining.

Scikit Learn - Machine Learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy. It is simple and efficient tools for data mining and data analysis. It supports automatic classification, clustering, model selection, pre processing and lot more.


WebSearch.Net is an open-source research platform that provides uniform data source access, data modeling, feature calculation, data mining, etc.

MLIB - Apache Spark's scalable machine learning library

MLlib is a Spark implementation of some common machine learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction and lot more.

Archivist - Windows application to archive tweets

The Archivist is a Windows application that helps you archive tweets for later data-mining and analysis. It helps to Export and Visualize the tweets and trends.

LucidDB - RDBMS built entirely for Data Warehousing and Business Intelligence

LucidDB is the RDBMS built entirely for data warehousing and business intelligence. It is based on architectural cornerstones such as column-store, bitmap indexing, hash join/aggregation, and page-level multi versioning. Every component of LucidDB was designed with the requirements of flexible, high-performance data integration and sophisticated query processing in mind.

Hadoop Common

Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects

Ganglia - scalable distributed monitoring system

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.

Carrot2 - Search Results Clustering Engine

Carrot2 is an Open Source Search Results Clustering Engine. It could cluster the search results from various sources and generates small collection of documents. Carrot2 offers ready-to-use components for fetching search results from various sources including YahooAPI, GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, Google Desktop and more.

S-Space - A scalable software library for semantic spaces

The S-Space Package is a collection of algorithms for building Semantic Spaces as well as a highly-scalable library for designing new distributional semantics algorithms. Distributional algorithms process text corpora and represent the semantic for words as high dimensional feature vectors.