•        0

Ex-Crawler is divided into 3 subprojects (Crawler Daemon, distributed gui Client, (web) search engine) which together provide a flexible and powerful search engine supporting distributed computing. More informations: http://ex-crawler.sourceforge.net




comments powered by Disqus

Related Projects

Regexspider - Generic web crawler based on Regular expression matching

RegExSpider is a generic web crawler for collecting data from the internet based on regular expression matching. Config file: Please read the wiki page explaining how to use the configuration. Regular expressions needs to be declared in that file, read this for special characters usage.

Arkdl - Simple Java Downloader able to filter links and define storage structure on disk

Aim is to have a simple application that can:- browse a list of URL- filter the links found according to user defined filter (name, type...)- store the downloads in an user defined structure (ex by Site By Date , By Site by Type ...).Main objective is simplicity of usage .Data Storage and configuration must be in XML file only.Thinks PortableApps.Wiki is hosted there => http://arkdl.xwiki.com/xwiki/bin/view/Main/WebHome

Java-web-spider - A flexible web spider in java, with regular expression support for url and data ex

RT.In the needs of climbing the internet and collecting useful information. when design this, we'd like not to cache the result, but instead extract them immediately and can perform further operations on the data such as storing into an underlying system likewise.well, there're much more to tell but I'm lack of time at present.so I'd like you guys asking quesitons if any, feel free to mail me :)

OWASP Code Crawler

A Windows Form application built using .NET (C#). It's reg ex based grepping tool with reporting functionality, testing utilities and other interesting features. Code Crawler is also extensible it's built upon an XML database with around 290 library patterns.

webcrawler-ex - Python web crawler with or without HTMLParser targetting specific sites

Python web crawler with or without HTMLParser targetting specific sites


A crawler is a program that starts with a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.