•        0

Weka-Parallel is a modification to Weka, created with the intention of being able to harness the power of Weka and the speed of parallel processing to be able to run a number of data mining and machine learning algorithms quickly.




Related Projects

Evaporative-cooling - Evaporative Cooling Feature Selection for Genetic Association Studies

IntroductionEvaporative cooling (EC) feature selection is a command-line data mining software implementation in Java and Fortran for filtering genetic association data. EC integrates Random Forests and Relief-F attribute importance measures in order to balance independent and interaction effects while removing attributes that are irrelevant to the phenotype. EC has been tested on single-nucleotide polymorphism (SNP) data. For those with access to a cluster, the parallel version of EC will be rel

En-deep - A framework for batch parallel processing of various NLP tasks.

Given a configuration file that lists the pre-programmed tasks (a scenario file, with each task corresponding to one Java class), this determines the dependencies among them and processes all of them in the right order. If multiple threads or program instances (which use the same scenario file) are used, parallel processing is possible. The program is also capable of processing multiple files in the same way (using wildcard patterns). See the detailed description of the configuration file here.

Parf - Parallel Random Forest Algorithm

The Random Forests algorithm is one of the best among the known classification algorithms, able to classify big quantities of data with great accuracy. Also, this algorithm is inherently parallelisable. Originally, the algorithm was written in the programming language Fortran 77, which is obsolete and does not provide many of the capabilities of modern programming languages; also, the original code is not an example of "clear" programming, so it is very hard to employ in education. Within this p


An algorithm that concurrently runs Weka's version of Apriori on partitions of a dataset