Displaying 1 to 3 from 3 results


  •    DotNet

Simple and very efficient multithreaded web crawler with pipeline based processing written in C#. Contains HTML, Text, PDF, and IFilter document processors and language detection(Google). Easy to add pipeline steps to extract, use and alter information.

sandcrawler - sandcrawler.js - the server-side scraping companion.

  •    Javascript

sandcrawler.js is a node library aiming at providing developers with concise but exhaustive tools to scrape the web. Disclaimer: this library is an unreleased work in progress.

RuiJi.Net - crawler framework, distributed crawler extractor

  •    CSharp

This project exists thanks to all the people who contribute. RuiJi.Net is a distributed crawl framework written in netcore.