Duga3 - an extremely fast bittorrent crawler (and tracker) project

  •        0

NoteI have recently taken a liking to git and will be using gitorious for all future updates. Feel free to fork, contribute, and send a pull request for merge. AboutДуга-3 / Duga-3 / Arc-3 is based on another project I started called "k2". k2 was based off of something else I had done a while back. So, this would be the third incarnation, hence the name fitting the project again (in more than one way). Finally commited to SVN on June 15, 2010. This initial code should be good enough to crawl a large amount of RSS feeds on torrent sites, parse and store the majority of the torrents info, and the like. I managed to get 43 sites to initially work, and included 7 plugins mostly for example purposes. It uses bz2, cURL, Dom, and MySQLi to achieve it's level of speed. The open tracker which is included as part of Duga-3, but isn't integrated into the crawler in any way. This tracker was forked off of the original Whitsoft opentracker code almost three years ago, and has since been almost rewritten entirely to utilize MySQLi and FULLTEXT searching heavily. Right now the tracker supports the draft "IPv6" paper from bittorrent.org, and an unofficial extension known as "compact scraping". Recent developmentsI have started a Drizzle port of this, with no plans to actually release it (yet). Current "state" of the projectAs of June 28, 2010, my best guesses are: Crawler: beta / stable (mostly stable) Tracker: alpha / beta I have also done extensive testing on FreeBSD, Linux, and Win32 installs (specifically using MySQL, nginx, and PHP each time). The only lacking feature is symlinking in the crawler (which can be disabled) for any versions of Windows below Vista - this is due to mklink being introduced in Vista... Get the codeThere are no plans to ever make any tarballed / zipped releasesI am using Subversion to store this project - this is required in order to get the code, however Subversion is freely available on a multitude of platforms, and is very easy to use. I also wrote some instructions below for new users. Windows users should use Slik SVN for the below instructions, or something besides TortoiseSVN. Everyone else should follow this link for instructions on installing Subversion for any given OS. RecommendedGet the entire project by running the checkout: svn checkout http://duga3.googlecode.com/svn/trunk/ duga3Since there are usually daily updates, stay up to date by moving your console into the directory you checked out into and run: svn updateDIYOtherwise, if you can handle it yourself, you can also use export to "checkout" the entire project without the .svn folders: svn export http://duga3.googlecode.com/svn/trunk/ duga3If you want just the crawler: cd /your/web/root/location#example search interfacesvn export http://duga3.googlecode.com/svn/trunk/index.php#admin interface, can be ran from anywheresvn export http://duga3.googlecode.com/svn/trunk/admin/index.php admin/index.php#the ccrawler itself, make this forbiddensvn export http://duga3.googlecode.com/svn/trunk/lib/crawler lib/crawler...or maybe just the tracker: cd /your/web/root/locationmkdir tracker #optionalcd tracker#client announce filesvn export http://duga3.googlecode.com/svn/trunk/announce.php#client scrape filesvn export http://duga3.googlecode.com/svn/trunk/scrape.php#the "stats" page you could use as an example to make a bnbt style front-endsvn export http://duga3.googlecode.com/svn/trunk/tracker.php#the tracker itself, make this forbiddensvn export http://duga3.googlecode.com/svn/trunk/lib/opentracker lib/opentrackerAdditional infoFinal notesPlease take note of the README, and TODO files in both lib/crawler/ and lib/opentracker/! Known "bug" in crawler: It's possible for fullscrape files to not get deleted, be sure to clean your CACHEDIR manually every once in a while. ContactThank you to everyone who has sent me positive feedback or just a thanks, but I have removed my email from this page due to increasing levels of spam. My username is on the right ("Owners"), I think you can figure out how to send me an email from there ;)

http://code.google.com/p/duga3

Tags
Implementation
License
Platform

   




Related Projects

raspBerry+


raspBerry+ is a web-based administration platform for Blackberry Enterprise Server for MS Exchange (BES). You can group-based activate/kill/delete/add and get status of users, their handhelds and services. With a little download-area and a comment-system

RASP


RASP's A Sneakernet Proxy; download using a thumbdrive.

RasmusDSP


RasmusDSP is an embeddable Audio/MIDI processor. It contains various filters and generators (including SoundFont 2.0 compatible synthesizer). Has a script interpreter which is used to describe instruments, route Audio/MIDI signal between processor units.

Rasea


An acronym for cRoss-plAtform accesS control for Enterprise Applications. Rasea aims to become a reference in access control as a service based on the RBAC model.

Rascal


Rascal, the Advanced Scientific CALculator, is a platform independent modular calculator. Based on modules for integer, doubles, strings, vectors and matrices it can be easily extended with existing C or C++ code.

Rars


RARS is the Robot Auto Racing Simulation, in which the drivers are robot programs. It is intended as a competition among programmers. It consists of a simulation of the physics of cars, a graphic display of the race, and a robot driver for each car.

RARPlayer


This small program allows you to play a video directly from a RAR file and do so in real-time. Both VLC and MPlayer are supported video players.

RAReXtract


RAReXtract is a Front-End for the UnRAR command line utility for Mac OS X 10.5 (Leopard). Its purpose is the rapid and convenient extraction of RAR archives with a double click.

RAR Expander


Rar Expander is a MacOSX program which extracts the files contained in single or multi-volume RAR archives. It uses the official unRAR library internally so it is fully compatible with archives produced by WinRAR.

rarcrack


This program uses a brute force algorithm to guess your encrypted compressed file\'s password. If you forget your encrypted file password, this program is the solution. This program can crack zip,7z and rar file passwords.

RArcInfo


RArcInfo is a package for R (http://www.r-project.org) to import data from binary Arc/Info V7.X coverages and E00 files . This will allow R users to used it as a primary GIS tool.

rar brute force shell script - rarbrute


This is rarbrute, a shell script to brute force encrypted rar files under unix and linux. A long wordlist and a paper about security in internet cafes is included.

Raquel Database System


The system will : 1. use RAQUEL (= Relational Algebra Query, Update and Executive Language) for programming, implementing Third Manifesto principles. 2. have a 'Lego-like' architecture of building blocks and plug-ins, for wider applicability.

RAPv4


RAPv4 is an engine for building web application with only a business description (in XML format). NEW 04/2006 : Stable 2006 release. Add new functions like mail, sms, web services, graph, map engine (GIS), Excel output, QBE... and also a beta release of

Rafkill


2d Scroller. Clone of Raptor: Call of the Shadows and Tyrian. Fun game written in c++ using allegro.

rapple


Lightweight XML based transformation tool written in C that builds upon expat, tidylib and XSLT to tranform authored web content (incl. Word processor generated HTML) into styled web content suitable for publication.

RapidSMS


RapidSMS is an open-source internet and communications platform

RapidSmith


RapidSmith is a research-based FPGA CAD tool framework written in Java for modern Xilinx FPGAs. Based on XDL, its objective is to serve as a rapid prototyping platform for research ideas and algorithms relating to low level FPGA CAD tools.

Rapidshare Mass Downloader


What this program does is bringing out human interaction while downloading files from rapidshare(without premium account). It downloads all the rapidshare links sequentially to the specified location.

rapido visual profiler


rapido is a visual profiler for linux-x86. It traces function call using the ptrace interface and displays the information collected in a nice visual flow chart. rapido does not require the re-compilation of the application.