Squzer - Distributed Web Crawler

  •        0

Squzer is the Declum's open-source, extensible, scale, multithreaded and quality web crawler project entirely written in the Python language.

http://squzer.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

Open Search Server


Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.

Gigablast - Web and Enterprise search engine in C++


Gigablast is one of the remaining four search engines in the United States that maintains its own searchable index of over a billion pages. It is scalable to thousands of servers. Has scaled to over 12 billion web pages on over 200 servers. It supports Distributed web crawler, Document conversion, Automated data corruption detection and repair, Can cluster results from same site, Synonym search, Spell checker and lot more.

Norconex HTTP Collector - Enterprise Web Crawler


Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable.

Norconex HTTP Collector - A Web Crawler in Java


Norconex HTTP Collector is a web spider, or crawler that aims to make Enterprise Search integrators and developers's life easier. It is Portable, Extensible, reusable, Robots.txt support, Obtain and manipulate document metadata, Resumable upon failure and lot more.

Grub


Grub Next Generation is distributed web crawling system (clients/servers) which helps to build and maintain index of the Web. It is client-server architecture where client crawls the web and updates the server. The peer-to-peer grubclient software crawls during computer idle time.

Archon


It is a web based distributed system controller used to build/manage linux boxes for Declum. @ Declum 2013

mnoGoSearch


mnoGoSearch for UNIX consists of a command line indexer and a search program which can be run under Apache Web Server, or any other HTTP server supporting CGI interface. mnoGoSearch for Unix is distributed in sources and can be compiled with a number of databases, depending on user's choice. It is known to work on a wide variety of the modern Unix operating systems including Linux, FreeBSD, Mac OSX, Solaris and others.

Ex-Crawler


Ex-Crawler is divided into 3 subprojects (Crawler Daemon, distributed gui Client, (web) search engine) which together provide a flexible and powerful search engine supporting distributed computing. More informations: http://ex-crawler.sourceforge.net

weibosearch - A distributed Sina Weibo Search spider base on Scrapy and Redis.


A distributed Sina Weibo Search spider base on Scrapy and Redis.

Nutch


Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

ASPseek


ASPseek is an Internet search engine software developed by SWsoft.ASPseek consists of an indexing robot, a search daemon, and a CGI search frontend. It can index as many as a few million URLs and search for words and phrases, use wildcards, and do a Boolean search. Search results can be limited to time period given, site or Web space (set of sites) and sorted by relevance (PageRank is used) or date.

Fast File Search


Fast File Search is a crawler of FTP servers and SMB shares (Windows shares and UNIX systems running Samba). WWW interface is provided for searching files. FFS is similar to FemFind but optimized for speed.

cpmtools - Tools for accessing CP/M file systems


Tools for accessing CP/M file systems

Yioop - Open Source Search Engine Software


Yioop is an open source, PHP search engine capable of crawling, index, and providing search results for hundred of millions of pages on relatively low end hardware. It can index a variety of text formats HTML, RSS, PDF, RTF, DOC and images GIF, JPEG, PNG, etc. It can import data from ARC, WARC, Media-Wiki, Open Directory RDF. It is easily localized to many languages. It has built-in support for new feeds, discussion groups, blogs, and wikis. It also supports mixing indexes to create mash ups.

ragno - Web spider/crawler written in ruby ('ragno' is Italian for 'spider')


Web spider/crawler written in ruby ('ragno' is Italian for 'spider')

grub.org - Distributed Internet Crawler


Grub is a distributed internet crawler/indexer designed to run on multi-platform systems, interfacing with a central server/database.

spider - Web Spider / Crawler written in Go with API to manage the legs.


Web Spider / Crawler written in Go with API to manage the legs.

humble-spider - A general document crawler/spider made on Ruby


A general document crawler/spider made on Ruby

Node-Web-Spider - A web Spider (Crawler) that follows all the links with a domain name


A web Spider (Crawler) that follows all the links with a domain name

WWW-Crawler-Lite - A single-threaded crawler/spider for the web.


A single-threaded crawler/spider for the web.