Business Data - web information retrivial

  •        0

We try to develop an opensource website crawler to retrieve business and marketing data from web sites or search engines.

http://bdata.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

Crawler-Web - Web component for Crawler which will use the results of the crawler


Web component for Crawler which will use the results of the crawler

simple-crawler - Simple crawler app in python for a class presentation in crawler.


Simple crawler app in python for a class presentation in crawler.

crawler - Hacker news crawler & Start up News crawler


Hacker news crawler & Start up News crawler

crawler-commons - crawler-commons (fork of https://code.google.com/p/crawler-commons/)


crawler-commons (fork of https://code.google.com/p/crawler-commons/)

Crawler - It's a simple web crawler that includes crawler, tokenizer, stemmer and classifier.


It's a simple web crawler that includes crawler, tokenizer, stemmer and classifier.

webleech - A web crawler framework, with a sample crawler for PCC (???????)


A web crawler framework, with a sample crawler for PCC (???????)

Norconex HTTP Collector - A Web Crawler in Java


Norconex HTTP Collector is a web spider, or crawler that aims to make Enterprise Search integrators and developers's life easier. It is Portable, Extensible, reusable, Robots.txt support, Obtain and manipulate document metadata, Resumable upon failure and lot more.

Ex-Crawler


Ex-Crawler is divided into 3 subprojects (Crawler Daemon, distributed gui Client, (web) search engine) which together provide a flexible and powerful search engine supporting distributed computing. More informations: http://ex-crawler.sourceforge.net

Squzer - Distributed Web Crawler


Squzer is the Declum's open-source, extensible, scale, multithreaded and quality web crawler project entirely written in the Python language.

NCrawler


Simple and very efficient multithreaded web crawler with pipeline based processing written in C#. Contains HTML, Text, PDF, and IFilter document processors and language detection(Google). Easy to add pipeline steps to extract, use and alter information.

Crawler - Simple crawler on Kotlin


Simple crawler on Kotlin

Web-Crawler - Python Web Crawler with a max-depth parameter.


Python Web Crawler with a max-depth parameter.

Norconex HTTP Collector - Enterprise Web Crawler


Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable.

ufwc-crawler - Crabfarm crawler for the UFCW


Crabfarm crawler for the UFCW

manga-crawler - A ruby crawler to collect mangas


A ruby crawler to collect mangas

crawler - crawler for japanese text classification


crawler for japanese text classification

Jahia-Crawler - This module integrates Apache Nutch crawler as alternative search engine in Jahia.


This module integrates Apache Nutch crawler as alternative search engine in Jahia.

Web-Crawler - A very simple Web Crawler


A very simple Web Crawler