NCrawler

  •        370

Simple and very efficient multithreaded web crawler with pipeline based processing written in C#. Contains HTML, Text, PDF, and IFilter document processors and language detection(Google). Easy to add pipeline steps to extract, use and alter information.

http://ncrawler.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

goredis-crawler - Cross-platform persistent and distributed web crawler :ant: :computer:


A cross-platform persistent and distributed web crawler.goredis-crawler is persistent because the queue is stored in a remote database that is automatically re-initialized if interrupted. goredis-crawler is distributed because multiple instances of goredis-crawler will work on the remotely stored queue, so you can start as many crawlers as you want on separate machines to speed along the process. goredis-crawler is also fast because it is threaded and uses connection pools.

TheAllSeeingPie-NCrawler


A node.js based web crawler

Crawler-Web - Web component for Crawler which will use the results of the crawler


Web component for Crawler which will use the results of the crawler

Crawler - It's a simple web crawler that includes crawler, tokenizer, stemmer and classifier.


It's a simple web crawler that includes crawler, tokenizer, stemmer and classifier.

Norconex HTTP Collector - A Web Crawler in Java


Norconex HTTP Collector is a web spider, or crawler that aims to make Enterprise Search integrators and developers's life easier. It is Portable, Extensible, reusable, Robots.txt support, Obtain and manipulate document metadata, Resumable upon failure and lot more.



gocrawl - Polite, slim and concurrent web crawler.


gocrawl is a polite, slim and concurrent web crawler written in Go.For a simpler yet more flexible web crawler written in a more idiomatic Go style, you may want to take a look at fetchbot, a package that builds on the experience of gocrawl.

webleech - A web crawler framework, with a sample crawler for PCC (???????)


A web crawler framework, with a sample crawler for PCC (???????)

fess-crawler - Web/FileSystem Crawler Library


Fess Crawler is Crawler Framework.

Web-Crawler - Python Web Crawler with a max-depth parameter.


Python Web Crawler with a max-depth parameter.

Web-Crawler - A very simple Web Crawler


A very simple Web Crawler

Search-Engine-Web-Crawler - Search engine, web crawler, and index maker in Java.


Search engine, web crawler, and index maker in Java.

Web-Crawler - A web crawler implemented in Python


A web crawler implemented in Python

Web-crawler - Python's web crawler


Python's web crawler

web-crawler - A web crawler for nodejs


A web crawler for nodejs

web-crawler - A simple web crawler


A simple web crawler

Web-crawler - a simple and basic web crawler


a simple and basic web crawler

Sam-s-web-crawler - a small web crawler written in ruby


a small web crawler written in ruby

simple-web-crawler - simple web crawler in python with a maximum of 10 pages


simple web crawler in python with a maximum of 10 pages