Harvest Web Indexing

  •        0

Harvest is a web indexing package, originally disigned for distributed indexing, it can form a powerful system for indexing both large and small web sites. Also now includes Harvest-NG a highly efficient, modular, perl-based web crawler.




comments powered by Disqus

Related Projects

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

Educrawler - 为了教育抓�WEB内容



WebHarvest mirror (https://sourceforge.net/projects/web-harvest), with some modifications on the code done by me, like the complete porting to HttpComponents 4.2.3. I will be glad if WebHarvest maintainers want to merge my branch and take care of my additions.