•        0

Crawl website with custom URIs and grab content




comments powered by Disqus

Related Projects


Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

Norconex HTTP Collector - Enterprise Web Crawler

Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable.

Heritrix: Internet Archive Web Crawler

The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.

Hubot - A Chat Bot for your Company

Hubot is a chat bot, modeled after GitHub's Campfire bot. It is extendable with community scripts and your own custom scripts, and can work on many different chat services.


An open source .NET web crawler written in C# using SQL 2005/2008. Arachnode.net is a complete and comprehensive .NET web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages.


A bot for MegaUpload that permit to download some files one after another

Crawltrack - Tracks the visits of Crawler

CrawlTrack is a web analytics tool. It could track the visitors, referrer, keyword used, country of origin, page views etc. The main advantage of this tool is it could track and record the information of crawler activity. It also blocks the attacks done by hackers.

Gigablast - Web and Enterprise search engine in C++

Gigablast is one of the remaining four search engines in the United States that maintains its own searchable index of over a billion pages. It is scalable to thousands of servers. Has scaled to over 12 billion web pages on over 200 servers. It supports Distributed web crawler, Document conversion, Automated data corruption detection and repair, Can cluster results from same site, Synonym search, Spell checker and lot more.

Love Letter IRC Bot

This IRC bot will administrate the card game quot;Love Letterquot; published by Alderac Entertainment. Configure it with your irc server, the channel you intend to play on, tell it who its administrators are, and choose your own name for the bot. Comes with a help file with bot commands.

Open Search Server

Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.