RecommendationEngine - A creative recommendation engine based on Hadoop, powered by an efficient and high scalable implementation of item-based collaborative filtering recommendation algorithm

  •        34

The key of Recommendation Engine is an efficient and scalable implementation of item-based collaborative filtering (CF) recommendation algorithm based on Hadoop.Item-based CF algorithm has become one of the most popular algorithms in recommendation systems. However, the item-based CF algorithm has been traditionally run in stand-alone mode and can be hindered by some hardware constraints, such as memory and computational limitations. Besides, in recent years recommendation systems are usually required to process large volumes of information with high dimensions, which poses some key challenges to provide recommendations quickly. So despite some excellent algorithms like item based CF running well in stand-alone mode, there is an impracticality in the condition of huge amount of users and items. This is the scalability problem and whether it can be solved properly determines the further development of recommendation systems.

https://github.com/tinylcy/RecommendationEngine
http://maven.apache.org

Tags
Implementation
License
Platform

   




Related Projects

recommendify - ruby/redis based recommendation engine (collaborative filtering)


ruby/redis based recommendation engine (collaborative filtering)

mahout - Mirror of Apache Mahout


Mahout's goal is to build scalable machine learning libraries. With scalable we mean: Scalable to reasonably large data sets. Our core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms. Scalable to support your business case. Mahout is distributed under a commercially friendly Apache Software license. Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more. Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

recommendable - :+1::-1: A recommendation engine using Likes and Dislikes for your Ruby app


Recommendable is a gem that allows you to quickly add a recommendation engine for Likes and Dislikes to your Ruby application using my version of Jaccardian similarity and memory-based collaborative filtering. Bundling one of the queueing systems above is highly recommended to avoid having to manually refresh users' recommendations. If you bundle Sidekiq, you should also include 'sidekiq-middleware' in your Gemfile to ensure that a user will not get enqueued more than once at a time. If bundling Resque, you should include 'resque-loner' for this. As far as I know, there is no current way to avoid duplicate jobs in DelayedJob. Queueing for Torquebox is also supported.

Beyond.Thoth


The recommendation systems engine for C#. The engine is a library of already tested algorithms,include collaborative filtering. We will try to add some new algorithms into the liabary.

Jumper - Collaborative search engine in PHP


Jumper 2.0 is a collaborative community search platform that revolutionizes search by crowdsourcing knowledge management powered by a shared bookmarking engine. It is easily and quickly deployed into a community of practice that benefits users with complex and specialized search requirements. Jumper delivers universal search of any databases, flat files, fileshares, content systems, web pages, blogs and wikis, even people - through one simple search box.


Recommendation Engine Demo


How does the Amazon recommendation works? This is about visualizing the item to item collaborations filtering mechanism using a item-to-item matrix table. The item-to-item matrix, the vectors and the calculated data values are displayed. There are n different items and...

Seeks - An Open Decentralized Platform for Collaborative Search, Filtering and content Curation


Seeks acts as a personalizing Web server or proxy between you and your data feeds. Connect most search engines, RSS/ATOM feeds, Twitter / Identica, Youtube / Dailymotion, Wikis, and basically any source of data, and Seeks will produce a fused personalized batch / stream of results to your queries. Its specific purpose is to regroup users whose queries are similar so they can share both the query results and their experience on these results.

Kylin - Extreme OLAP Engine for Big Data


Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc. It is designed to reduce query latency on Hadoop for 10+ billions of rows of data. It offers ANSI SQL on Hadoop and supports most ANSI SQL query functions.

Spamassasin - Intelligent Spam Filter


SpamAssassin is a mature, widely-deployed open source project that serves as a mail filter to identify Spam. SpamAssassin uses a variety of mechanisms including header and text analysis, Bayesian filtering, DNS blocklists, and collaborative filtering databases. SpamAssassin runs on a server, and filters spam before it reaches your mailbox.

DBLens


DBLens is a Oracle-based toolkit for performing collaborative filtering. DBLens is a collection of PL/SQL code and accompanying shell scripts that provides a flexible and efficient method for performing collaborative filtering.

Apache Mahout - Scalable machine learning library


Apache Mahout has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining.

recommendable - A recommendation engine using Likes and Dislikes for your Ruby/Redis application.


A recommendation engine using Likes and Dislikes for your Ruby/Redis application.

Recommend


Recommend is an open source recommendation engine written in PHP5. The goal of the project is to build a generic recommendation engine interfaced via high-level APIs. Includes user tracking, social networks, weighted objects, heuristical validation.

Conjecture - Scalable Machine Learning in Scalding


Conjecture is a framework for building machine learning models in Hadoop using the Scalding DSL. The goal of this project is to enable the development of statistical models as viable components in a wide range of product settings. Applications include classification and categorization, recommender systems, ranking, filtering, and regression (predicting real-valued numbers). Conjecture has been designed with a primary emphasis on flexibility and can handle a wide variety of inputs. Integration with Hadoop and scalding enable seamless handling of extremely large data volumes, and integration with established ETL processes. Predicted labels can either be consumed directly by the web stack using the dataset loader, or models can be deployed and consumed by live web code. Currently, binary classification (assigning one of two possible labels to input data points) is the most mature component of the Conjecture package.There are a few stages involved in training a machine learning model using Conjecture.

Pinot - A realtime distributed OLAP datastore


Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

oozie - Oozie - workflow engine for Hadoop


Oozie - workflow engine for Hadoop

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine


DSSTNE (pronounced "Destiny") is an open source software library for training and deploying recommendation models with sparse inputs, fully connected hidden layers, and sparse outputs. Models with weight matrices that are too large for a single GPU can still be trained on a single host. DSSTNE has been used at Amazon to generate personalized product recommendations for our customers at Amazon's scale.

Dust - A Polymorphic Engine for Filtering-Resistant Transport Protocols


A Polymorphic Engine for Filtering-Resistant Transport Protocols

Katta - Lucene and more in the cloud.


Katta is a scalable, failure tolerant, distributed, data storage for real time access. Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.