GrabberX

  •        103

GrabberX is a site-mirroring tool. It is used to deal with form/cookie sealed websites, javascript generated links, and so on. The goal is not performance, but a handy tool that can help the crawl of other enterprise search engines.

http://grabberx.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

  •    Python

grab-site is an easy preconfigured web crawler designed for backing up websites. Give grab-site a URL and it will recursively crawl the site and write WARC files. Internally, grab-site uses wpull for crawling. a dashboard with all of your crawls, showing which URLs are being grabbed, how many URLs are left in the queue, and more.

FAST ESP Web Parts for SharePoint Server 2007

  •    

This project provides a set of installable Web parts for integrating FAST ESP search capabilities with SharePoint Server 2007. With these Web parts SharePoint administrtors can quickly build ESP-based search sites in SharePoint Server 2007 by simply dropping in and configuring...

Wildcard Search Web Part for SharePoint 2010

  •    

The Wildcard Search web part for MOSS 2007 was wildly successful. Although, SharePoint 2010 has built-in wildcard searching functionality, the out-of-the box web part requires the user to add an asterisk to the search query. This web part resolves that issue.

SharePoint Search Service Tool

  •    

The SharePoint Search Service Tool is a rich web service client that allows a developer to explore the scopes and managed properties of a given SharePoint Search SSP, build queries in either Keyword or SQL Syntax, submit those queries and examine the raw web service results. ...

Heritrix

  •    Java

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix is designed to respect the robots.txt exclusion directives and META robots tags, and collect material at a measured, adaptive pace unlikely to disrupt normal website activity.


Add web part page quick search extension for SharePoint Server 2007

  •    

This small solution provide quick search feature on add web part page in SharePoint Server 2007. Now you can easy and fast search nesessary webpart - you need write only a first letters the name's webpart without any large page scrolling.

SharePoint Search XSL Samples

  •    

This project is a place to share examples of XSL that can be applied to SharePoint search web parts. Products include SharePoint Server 2010, Microsoft Office SharePoint Server 2007, Microsoft Search Server 2008, and Microsoft Search Server 2008 Express.

Data Extracting SDK

  •    

Data Extracting SDK can help you to extract information from the web resources in a simple way.

Norconex HTTP Collector - A Web Crawler in Java

  •    Java

Norconex HTTP Collector is a web spider, or crawler that aims to make Enterprise Search integrators and developers's life easier. It is Portable, Extensible, reusable, Robots.txt support, Obtain and manipulate document metadata, Resumable upon failure and lot more.

SharePoint 2007 Wildcard Search

  •    

A Microsoft Office SharePoint Server Search web part that allows for WildCard Searches and a second web part for the presentation of the search data using an XSL Transform document.

Nutch - Highly extensible, highly scalable Web crawler

  •    Java

Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

FAST Search for Sharepoint MOSS 2010 Query Tool

  •    

Tool to query FAST for Sharepoint and Sharepoint 2010 Enterprise Search. It utilizes the search web services to run your queries so you can test your queries remotely from your local machine. It shows your results, allows you to refine your query (FAST), and page your results.

SharePoint Column Filtered Search Web Part

  •    

SharePoint Column Filtered search provides a filtered view of a SharePoint full-text search. Results are filtered by column values selected at runtime. The web part is configured for one or more libraries and associated columns. The user selects column values for results to ma...

.Net helpers for the SharePoint Server 2007 Search Query Web Service.

  •    

This project was created for an MSDN article. The code and article demonstrate a number of helper classes that can be used to easily inject queries to the SharePoint Server 2007 Search Query Web Service.

Gigablast - Web and Enterprise search engine in C++

  •    C++

Gigablast is one of the remaining four search engines in the United States that maintains its own searchable index of over a billion pages. It is scalable to thousands of servers. Has scaled to over 12 billion web pages on over 200 servers. It supports Distributed web crawler, Document conversion, Automated data corruption detection and repair, Can cluster results from same site, Synonym search, Spell checker and lot more.

Open Search Server

  •    C++

Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.

Business Data - web information retrivial

  •    

We try to develop an opensource website crawler to retrieve business and marketing data from web sites or search engines.

SharePoint Search Bench

  •    DotNet

SharePoint Search Bench contains a desktop app for testing and executing searches against a Microsoft Office Search Server (MOSS) environment and a .NET class library API for developers to execute searches homogeneously across both the Search web service or object model.

Norconex HTTP Collector - Enterprise Web Crawler

  •    Java

Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable.

SharePoint XSL Templates

  •    

This project is a place to share useful XSL templates that can be reused in SharePoint Content Query Web Parts (CQWPs), Data View Web Parts (DVWPs), and other XSL-based Web Parts.