•        0

GrabberX is a site-mirroring tool. It is used to deal with form/cookie sealed websites, javascript generated links, and so on. The goal is not performance, but a handy tool that can help the crawl of other enterprise search engines.



comments powered by Disqus

Related Projects


Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

jQuery Library for SharePoint Web Services

This is a jQuery library which abstracts SharePoint's Web Services and makes them easier to use. It also includes functions which use the various Web Service operations to provide more useful (and cool) capabilities. It works entirely client side and requires no server install.

Norconex HTTP Collector - Enterprise Web Crawler

Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable.


Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Gigablast - Web and Enterprise search engine in C++

Gigablast is one of the remaining four search engines in the United States that maintains its own searchable index of over a billion pages. It is scalable to thousands of servers. Has scaled to over 12 billion web pages on over 200 servers. It supports Distributed web crawler, Document conversion, Automated data corruption detection and repair, Can cluster results from same site, Synonym search, Spell checker and lot more.

Searchdaimon - Enterprise Search

Searchdaimon is an open source search engine for corporate data and websites. It comes with a powerful administrator interface and can index websites and several common enterprise systems like SharePoint, Exchange, SQL databases, Windows file shares etc. It also supports many data sources (e.g., Word, PDF, Excel) and the possibility of faceted search, attribute navigation and collection sorting.


WebSearch.Net is an open-source research platform that provides uniform data source access, data modeling, feature calculation, data mining, etc.

Open Search Server

Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.

Carrot2 - Search Results Clustering Engine

Carrot2 is an Open Source Search Results Clustering Engine. It could cluster the search results from various sources and generates small collection of documents. Carrot2 offers ready-to-use components for fetching search results from various sources including YahooAPI, GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, Google Desktop and more.

IndexTank - Search Engine powers Reddit

IndexTank search engine powers search in Reddit, Social bookmarking site. IndexTank is acquired by LinkedIn and released the project as open source. It includes features like Variables boosts, Facets, Faceted search, Snippeting, Custom scoring functions, Suggest, and Autocomplete.