Crawwwler - Open source large scale web crawler

  •        4682

This project is still in its absolute infancy. craWWWler will be a large scale web crawler written in C++ (no MFC). It currently has a very basic plugin architecture controlled by a purposely thin manager. The manager, however, is designed to be more like an ignition switch, occasional pump, and emergency shutdown. The manager is responsible for allowing one or mores plugins to subscribe to the output of other plugins. In this way, the plugins do not have to pass large amounts of data to other plugins via the manager class. Data is only passed on to interested parties. WARNING DO NOT let this loose on the web. Test it on a site you've downloaded onto your local machine and don't let it get anywhere near the WWW yet! It is NOT STABLE! We don't want to go around crashing sites because we don't yet know what we're doing! There are a lot of laws and caveats out there we need to be very aware of. The primary purpose of this software is availability of information, so lets keep it both legal and helpful. p.s. its an eclipse managed build project because I'm new to coding in Linux and don't know any better. Anyone have guidance on this?

http://code.google.com/p/crawwwler

Tags
Implementation
License
Platform

   




Related Projects

raspBerry+


raspBerry+ is a web-based administration platform for Blackberry Enterprise Server for MS Exchange (BES). You can group-based activate/kill/delete/add and get status of users, their handhelds and services. With a little download-area and a comment-system

RASP


RASP's A Sneakernet Proxy; download using a thumbdrive.

RasmusDSP


RasmusDSP is an embeddable Audio/MIDI processor. It contains various filters and generators (including SoundFont 2.0 compatible synthesizer). Has a script interpreter which is used to describe instruments, route Audio/MIDI signal between processor units.

Rasea


An acronym for cRoss-plAtform accesS control for Enterprise Applications. Rasea aims to become a reference in access control as a service based on the RBAC model.

Rascal


Rascal, the Advanced Scientific CALculator, is a platform independent modular calculator. Based on modules for integer, doubles, strings, vectors and matrices it can be easily extended with existing C or C++ code.



Rars


RARS is the Robot Auto Racing Simulation, in which the drivers are robot programs. It is intended as a competition among programmers. It consists of a simulation of the physics of cars, a graphic display of the race, and a robot driver for each car.

RARPlayer


This small program allows you to play a video directly from a RAR file and do so in real-time. Both VLC and MPlayer are supported video players.

RAReXtract


RAReXtract is a Front-End for the UnRAR command line utility for Mac OS X 10.5 (Leopard). Its purpose is the rapid and convenient extraction of RAR archives with a double click.

RAR Expander


Rar Expander is a MacOSX program which extracts the files contained in single or multi-volume RAR archives. It uses the official unRAR library internally so it is fully compatible with archives produced by WinRAR.

rarcrack


This program uses a brute force algorithm to guess your encrypted compressed file\'s password. If you forget your encrypted file password, this program is the solution. This program can crack zip,7z and rar file passwords.

RArcInfo


RArcInfo is a package for R (http://www.r-project.org) to import data from binary Arc/Info V7.X coverages and E00 files . This will allow R users to used it as a primary GIS tool.

rar brute force shell script - rarbrute


This is rarbrute, a shell script to brute force encrypted rar files under unix and linux. A long wordlist and a paper about security in internet cafes is included.

Raquel Database System


The system will : 1. use RAQUEL (= Relational Algebra Query, Update and Executive Language) for programming, implementing Third Manifesto principles. 2. have a 'Lego-like' architecture of building blocks and plug-ins, for wider applicability.

RAPv4


RAPv4 is an engine for building web application with only a business description (in XML format). NEW 04/2006 : Stable 2006 release. Add new functions like mail, sms, web services, graph, map engine (GIS), Excel output, QBE... and also a beta release of

Rafkill


2d Scroller. Clone of Raptor: Call of the Shadows and Tyrian. Fun game written in c++ using allegro.

rapple


Lightweight XML based transformation tool written in C that builds upon expat, tidylib and XSLT to tranform authored web content (incl. Word processor generated HTML) into styled web content suitable for publication.

RapidSMS


RapidSMS is an open-source internet and communications platform

RapidSmith


RapidSmith is a research-based FPGA CAD tool framework written in Java for modern Xilinx FPGAs. Based on XDL, its objective is to serve as a rapid prototyping platform for research ideas and algorithms relating to low level FPGA CAD tools.

Rapidshare Mass Downloader


What this program does is bringing out human interaction while downloading files from rapidshare(without premium account). It downloads all the rapidshare links sequentially to the specified location.

rapido visual profiler


rapido is a visual profiler for linux-x86. It traces function call using the ptrace interface and displays the information collected in a nice visual flow chart. rapido does not require the re-compilation of the application.