Business Data - web information retrivial

  •        58

We try to develop an opensource website crawler to retrieve business and marketing data from web sites or search engines.

http://bdata.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

goredis-crawler - Cross-platform persistent and distributed web crawler :ant: :computer:


A cross-platform persistent and distributed web crawler.goredis-crawler is persistent because the queue is stored in a remote database that is automatically re-initialized if interrupted. goredis-crawler is distributed because multiple instances of goredis-crawler will work on the remotely stored queue, so you can start as many crawlers as you want on separate machines to speed along the process. goredis-crawler is also fast because it is threaded and uses connection pools.

Crawler-Web - Web component for Crawler which will use the results of the crawler


Web component for Crawler which will use the results of the crawler

simple-crawler - Simple crawler app in python for a class presentation in crawler.


Simple crawler app in python for a class presentation in crawler.

crawler - Hacker news crawler & Start up News crawler


Hacker news crawler & Start up News crawler

crawler-commons - crawler-commons (fork of https://code.google.com/p/crawler-commons/)


crawler-commons (fork of https://code.google.com/p/crawler-commons/)



Crawler - It's a simple web crawler that includes crawler, tokenizer, stemmer and classifier.


It's a simple web crawler that includes crawler, tokenizer, stemmer and classifier.

fess-crawler - Web/FileSystem Crawler Library


Fess Crawler is Crawler Framework.

webleech - A web crawler framework, with a sample crawler for PCC (???????)


A web crawler framework, with a sample crawler for PCC (???????)

Norconex HTTP Collector - A Web Crawler in Java


Norconex HTTP Collector is a web spider, or crawler that aims to make Enterprise Search integrators and developers's life easier. It is Portable, Extensible, reusable, Robots.txt support, Obtain and manipulate document metadata, Resumable upon failure and lot more.

gocrawl - Polite, slim and concurrent web crawler.


gocrawl is a polite, slim and concurrent web crawler written in Go.For a simpler yet more flexible web crawler written in a more idiomatic Go style, you may want to take a look at fetchbot, a package that builds on the experience of gocrawl.

Ex-Crawler


Ex-Crawler is divided into 3 subprojects (Crawler Daemon, distributed gui Client, (web) search engine) which together provide a flexible and powerful search engine supporting distributed computing. More informations: http://ex-crawler.sourceforge.net

Squzer - Distributed Web Crawler


Squzer is the Declum's open-source, extensible, scale, multithreaded and quality web crawler project entirely written in the Python language.

NCrawler


Simple and very efficient multithreaded web crawler with pipeline based processing written in C#. Contains HTML, Text, PDF, and IFilter document processors and language detection(Google). Easy to add pipeline steps to extract, use and alter information.

Crawler - Simple crawler on Kotlin


Simple crawler on Kotlin

Web-Crawler - Python Web Crawler with a max-depth parameter.


Python Web Crawler with a max-depth parameter.

Norconex HTTP Collector - Enterprise Web Crawler


Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable.

ufwc-crawler - Crabfarm crawler for the UFCW


Crabfarm crawler for the UFCW

manga-crawler - A ruby crawler to collect mangas


A ruby crawler to collect mangas

crawler - crawler for japanese text classification


crawler for japanese text classification