RecommendationEngine - A creative recommendation engine based on Hadoop, powered by an efficient and high scalable implementation of item-based collaborative filtering recommendation algorithm

  •        34

The key of Recommendation Engine is an efficient and scalable implementation of item-based collaborative filtering (CF) recommendation algorithm based on Hadoop.Item-based CF algorithm has become one of the most popular algorithms in recommendation systems. However, the item-based CF algorithm has been traditionally run in stand-alone mode and can be hindered by some hardware constraints, such as memory and computational limitations. Besides, in recent years recommendation systems are usually required to process large volumes of information with high dimensions, which poses some key challenges to provide recommendations quickly. So despite some excellent algorithms like item based CF running well in stand-alone mode, there is an impracticality in the condition of huge amount of users and items. This is the scalability problem and whether it can be solved properly determines the further development of recommendation systems.



Related Projects

recommendify - ruby/redis based recommendation engine (collaborative filtering)

ruby/redis based recommendation engine (collaborative filtering)

ger - Good Enough Recommendation (GER) Engine

Providing good recommendations can get greater user engagement and provide an opportunity to add value that would otherwise not exist. The main reason why many applications don't provide recommendations is the difficulty in either implementing a custom engine or using an existing engine. Good Enough Recommendations (GER) is a recommendation engine that is scalable, easily usable and easy to integrate. GER's goal is to generate good enough recommendations for your application or product, so that you can provide value quickly and painlessly.

mahout - Mirror of Apache Mahout

Mahout's goal is to build scalable machine learning libraries. With scalable we mean: Scalable to reasonably large data sets. Our core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms. Scalable to support your business case. Mahout is distributed under a commercially friendly Apache Software license. Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more. Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

recommendable - :+1::-1: A recommendation engine using Likes and Dislikes for your Ruby app

Recommendable is a gem that allows you to quickly add a recommendation engine for Likes and Dislikes to your Ruby application using my version of Jaccardian similarity and memory-based collaborative filtering. Bundling one of the queueing systems above is highly recommended to avoid having to manually refresh users' recommendations. If you bundle Sidekiq, you should also include 'sidekiq-middleware' in your Gemfile to ensure that a user will not get enqueued more than once at a time. If bundling Resque, you should include 'resque-loner' for this. As far as I know, there is no current way to avoid duplicate jobs in DelayedJob. Queueing for Torquebox is also supported.

mortar-recsys - A customizable recommendation engine for Hadoop and Pig by Mortar Data.

A customizable recommendation engine for Hadoop and Pig by Mortar Data. This project contains several complete, runnable examples of the Mortar recommendation engine on example data, as well as a template project for easily getting started with your own data.


The recommendation systems engine for C#. The engine is a library of already tested algorithms,include collaborative filtering. We will try to add some new algorithms into the liabary.

Jumper - Collaborative search engine in PHP

Jumper 2.0 is a collaborative community search platform that revolutionizes search by crowdsourcing knowledge management powered by a shared bookmarking engine. It is easily and quickly deployed into a community of practice that benefits users with complex and specialized search requirements. Jumper delivers universal search of any databases, flat files, fileshares, content systems, web pages, blogs and wikis, even people - through one simple search box.

Recommendation Engine Demo

How does the Amazon recommendation works? This is about visualizing the item to item collaborations filtering mechanism using a item-to-item matrix table. The item-to-item matrix, the vectors and the calculated data values are displayed. There are n different items and...

spark-movie-lens - An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

This Apache Spark tutorial will guide you step-by-step into how to use the MovieLens dataset to build a movie recommender using collaborative filtering with Spark's Alternating Least Saqures implementation. It is organised in two parts. The first one is about getting and parsing movies and ratings data into Spark RDDs. The second is about building and using the recommender and persisting it for later use in our on-line recommender system. This tutorial can be used independently to build a movie recommender model based on the MovieLens dataset. Most of the code in the first part, about how to use ALS with the public MovieLens dataset, comes from my solution to one of the exercises proposed in the CS100.1x Introduction to Big Data with Apache Spark by Anthony D. Joseph on edX, that is also publicly available since 2014 at Spark Summit. Starting from there, I've added with minor modifications to use a larger dataset, then code about how to store and reload the model for later use, and finally a web service using Flask.

Seeks - An Open Decentralized Platform for Collaborative Search, Filtering and content Curation

Seeks acts as a personalizing Web server or proxy between you and your data feeds. Connect most search engines, RSS/ATOM feeds, Twitter / Identica, Youtube / Dailymotion, Wikis, and basically any source of data, and Seeks will produce a fused personalized batch / stream of results to your queries. Its specific purpose is to regroup users whose queries are similar so they can share both the query results and their experience on these results.

Kylin - Extreme OLAP Engine for Big Data

Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc. It is designed to reduce query latency on Hadoop for 10+ billions of rows of data. It offers ANSI SQL on Hadoop and supports most ANSI SQL query functions.

Spamassasin - Intelligent Spam Filter

SpamAssassin is a mature, widely-deployed open source project that serves as a mail filter to identify Spam. SpamAssassin uses a variety of mechanisms including header and text analysis, Bayesian filtering, DNS blocklists, and collaborative filtering databases. SpamAssassin runs on a server, and filters spam before it reaches your mailbox.


DBLens is a Oracle-based toolkit for performing collaborative filtering. DBLens is a collection of PL/SQL code and accompanying shell scripts that provides a flexible and efficient method for performing collaborative filtering.

Apache Mahout - Scalable machine learning library

Apache Mahout has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining.

recommendable - A recommendation engine using Likes and Dislikes for your Ruby/Redis application.

A recommendation engine using Likes and Dislikes for your Ruby/Redis application.


Recommend is an open source recommendation engine written in PHP5. The goal of the project is to build a generic recommendation engine interfaced via high-level APIs. Includes user tracking, social networks, weighted objects, heuristical validation.

Conjecture - Scalable Machine Learning in Scalding

Conjecture is a framework for building machine learning models in Hadoop using the Scalding DSL. The goal of this project is to enable the development of statistical models as viable components in a wide range of product settings. Applications include classification and categorization, recommender systems, ranking, filtering, and regression (predicting real-valued numbers). Conjecture has been designed with a primary emphasis on flexibility and can handle a wide variety of inputs. Integration with Hadoop and scalding enable seamless handling of extremely large data volumes, and integration with established ETL processes. Predicted labels can either be consumed directly by the web stack using the dataset loader, or models can be deployed and consumed by live web code. Currently, binary classification (assigning one of two possible labels to input data points) is the most mature component of the Conjecture package.There are a few stages involved in training a machine learning model using Conjecture.

Pinot - A realtime distributed OLAP datastore

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

oozie - Oozie - workflow engine for Hadoop

Oozie - workflow engine for Hadoop