Duga3 - an extremely fast bittorrent crawler (and tracker) project

NoteI have recently taken a liking to git and will be using gitorious for all future updates. Feel free to fork, contribute, and send a pull request for merge. About–?—?–≥–į-3 / Duga-3 / Arc-3 is based on another project I started called "k2". k2 was based off of something else I had done a while back. So, this would be the third incarnation, hence the name fitting the project again (in more than one way). Finally commited to SVN on June 15, 2010. This initial code should be good enough to crawl a large amount of RSS feeds on torrent sites, parse and store the majority of the torrents info, and the like. I managed to get 43 sites to initially work, and included 7 plugins mostly for example purposes. It uses bz2, cURL, Dom, and MySQLi to achieve it's level of speed. The open tracker which is included as part of Duga-3, but isn't integrated into the crawler in any way. This tracker was forked off of the original Whitsoft opentracker code almost three years ago, and has since been almost rewritten entirely to utilize MySQLi and FULLTEXT searching heavily. Right now the tracker supports the draft "IPv6" paper from bittorrent.org, and an unofficial extension known as "compact scraping". Recent developmentsI have started a Drizzle port of this, with no plans to actually release it (yet). Current "state" of the projectAs of June 28, 2010, my best guesses are: Crawler: beta / stable (mostly stable) Tracker: alpha / beta I have also done extensive testing on FreeBSD, Linux, and Win32 installs (specifically using MySQL, nginx, and PHP each time). The only lacking feature is symlinking in the crawler (which can be disabled) for any versions of Windows below Vista - this is due to mklink being introduced in Vista... Get the codeThere are no plans to ever make any tarballed / zipped releasesI am using Subversion to store this project - this is required in order to get the code, however Subversion is freely available on a multitude of platforms, and is very easy to use. I also wrote some instructions below for new users. Windows users should use Slik SVN for the below instructions, or something besides TortoiseSVN. Everyone else should follow this link for instructions on installing Subversion for any given OS. RecommendedGet the entire project by running the checkout: svn checkout http://duga3.googlecode.com/svn/trunk/ duga3Since there are usually daily updates, stay up to date by moving your console into the directory you checked out into and run: svn updateDIYOtherwise, if you can handle it yourself, you can also use export to "checkout" the entire project without the .svn folders: svn export http://duga3.googlecode.com/svn/trunk/ duga3If you want just the crawler: cd /your/web/root/location#example search interfacesvn export http://duga3.googlecode.com/svn/trunk/index.php#admin interface, can be ran from anywheresvn export http://duga3.googlecode.com/svn/trunk/admin/index.php admin/index.php#the ccrawler itself, make this forbiddensvn export http://duga3.googlecode.com/svn/trunk/lib/crawler lib/crawler...or maybe just the tracker: cd /your/web/root/locationmkdir tracker #optionalcd tracker#client announce filesvn export http://duga3.googlecode.com/svn/trunk/announce.php#client scrape filesvn export http://duga3.googlecode.com/svn/trunk/scrape.php#the "stats" page you could use as an example to make a bnbt style front-endsvn export http://duga3.googlecode.com/svn/trunk/tracker.php#the tracker itself, make this forbiddensvn export http://duga3.googlecode.com/svn/trunk/lib/opentracker lib/opentrackerAdditional infoFinal notesPlease take note of the README, and TODO files in both lib/crawler/ and lib/opentracker/! Known "bug" in crawler: It's possible for fullscrape files to not get deleted, be sure to clean your CACHEDIR manually every once in a while. ContactThank you to everyone who has sent me positive feedback or just a thanks, but I have removed my email from this page due to increasing levels of spam. My username is on the right ("Owners"), I think you can figure out how to send me an email from there ;)



http://code.google.com/p/duga3

Bookmark and Share          347



comments powered by Disqus


Related Products

ROME

ROME is an set of Java tools for parsing, generating and publishing RSS and Atom feeds. The core ROME library depends only on the JDOM XML parser and supports parsing, generating and converting all of the popular RSS and Atom formats including RSS 0.90, RSS 0.91 Netscape, RSS 0.91 Userland, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, and Atom 1.0. You can parse to an RSS object model, an Atom object model or an abstract SyndFeed model that can model either family of formats.

Read more

Nutch

Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

Read more

Tiki Wiki CMS Groupware

Tiki Wiki CMS Groupware is a full-featured, web-based, multilingual (35+ languages), tightly integrated, all-in-one Wiki+CMS+Groupware using PHP, MySQL, Zend Framework, jQuery and Smarty. Actively developed by a very large international community, Tiki can be used to create all kinds of Web applications, sites, portals, knowledge bases, intranets, and extranets.

Read more

murder - Large scale server deploys using BitTorrent and the BitTornado library

Large scale server deploys using BitTorrent and the BitTornado library

Read more

Pligg - Social Publishing CMS

Pligg is an open source CMS (Content Management System) which provides social publishing software that encourages visitors to register on your website so that they can submit content and connect with other users. Our software creates websites where stories are created and voted on by members, not website editors. It is a user driven CMS that relies on independent authors content and participation to manage news articles.

Read more

Sphinix

Sphinix is free open-source SQL full-text search engine. How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.

Read more

Mootools - Compact Javascript Framework

MooTools is a compact, modular, Object-Oriented JavaScript framework designed for the intermediate to advanced JavaScript developer. It allows you to write powerful, flexible, and cross-browser code with its elegant, well documented, and coherent API.

Read more

MagpieRSS - XML-based RSS parser in PHP

MagpieRSS is compatible with RSS 0.9 through RSS 1.0. Also parses RSS 1.0's modules, RSS 2.0, and Atom. (with a few exceptions)

Read more

qBittorrent

An advanced and multi-platform BitTorrent client with a nice Qt4 user interface as well as a Web UI for remote control and an integrated search engine. qBittorrent aims to meet the needs of most users while using as little CPU and memory as possible.

Read more

Simple Machines Forum - Elegant, Effective and Powerful

Simple Machines Forum (SMF) is a free, professional grade software package that allows you to set up your own online community within minutes. Its powerful template engine provides a unique look and feel to the site.

Read more

Related Tags
Browse projects by tags.

We have collection of more than 1 Million open source products ranging from Enterprise product to small libraries in all platforms. We aggregate information from all open source repositories. Search and find the best for your needs.