Carrot2 - Search Results Clustering Engine
Carrot2 is an Open Source Search Results Clustering Engine. It could cluster the search results from various sources and generates small collection of documents. Carrot2 offers ready-to-use components for fetching search results from various sources including YahooAPI, GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, Google Desktop and more.

It is implemented in Java. It has native API implementation in CSharp. Java runtime is not required and the performance is comparable to Java. It has support of REST interface which could be called from PHP and Ruby.
If you have search instances running in multiple nodes and search has to perform across the nodes, then you need a way to combine those results, filter and sort them. Carrot2 helps to do this job efficiently. It is well suited to work with Lucene, Solr and Nutch.
Carrot2 could be even called as meta search engine. It has built-in functionality to fetch results from all popular search-engines and combine them. It also offers supporting tools like command-line and GUI application to experiment with this product. Firefox and IE search plug-in is also available.
Demo: http://search.carrot2.org/stable/search
comments powered by Disqus
Related Products
Ganglia - scalable distributed monitoring system
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.
Hadoop Common
Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects
jCharts - Java based charting utility
jCharts is a 100% Java based charting utility that outputs a variety of charts. Servlets, JSP's, and Swing application could use this library to generate charts. It could generate charts of type Area, Area Stacked, Bar, Bar Clustered, Bar Clustered Horizontal, Bar Horizontal, Bar Stacked, Bar Stacked Horizontal, Combo, Hi/Low Open/Close, Line, Pie 2D, Pie 3D, Point, Radar, XY Plot and lot more.
Cascading - Data Processing Workflows on Hadoop
Cascading is a Data Processing API, Process Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on an Apache Hadoop cluster. It is a thin Java library and API that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application.
Solr
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
H2 Database
H2 database is very fast, open source database engine. It supports SQL and JDBC standards.
membase - distributed key-value database
Membase is an distributed, key-value database management system optimized for storing data behind interactive web applications. These applications must service many concurrent users, creating, storing, retrieving, aggregating, manipulating and presenting data in real-time. Supporting these requirements, membase processes data operations with quasi-deterministic low latency and high sustained throughput.
Quercus - Java implementation of PHP
Quercus is Caucho Technology's 100% Java implementation of PHP 5. Quercus comes with many PHP modules and extensions like PDF, PDO, MySQL, and JSON. Quercus allows for tight integration of Java services with PHP scripts, so using PHP with JMS or Grails is a quick and painless endeavor. With Quercus, PHP applications automatically take advantage of Java application server features just as connection pooling and clustered sessions.
Epylog - a Syslog parser
Epylog is a syslog parser which runs periodically, looks at your logs, processes some of the entries in order to present them in a more comprehensible format, and then mails you the output. It is written specifically for large network clusters where a lot of machines (around 50 and upwards) log to the same loghost using syslog or syslog-ng.
ejabberd - Jabber/XMPP instant messaging server
ejabberd is a Jabber/XMPP instant messaging server. It is a software to communicate and collaborate in real-time between two or more people based on typed text. It is cross-platform, fault-tolerant, clusterable and modular. The client communication could be encrypted. It supports IPv6, Web-based Administration Interface, Command line tool and lot more.