Displaying 20 to 25 from 25 results

supercrawler - A web crawler

  •    Javascript

Supercrawler is a Node.js web crawler. It is designed to be highly configurable and easy to use. When Supercrawler successfully crawls a page (which could be an image, a text document or any other file), it will fire your custom content-type handlers. Define your own custom handlers to parse pages, save data and do anything else you need.

krawler - A web crawling framework written in Kotlin

  •    Kotlin

Krawler is a web crawling framework written in Kotlin. It is heavily inspired by crawler4j by Yasser Ganjisaffar. The project is still very new, and those looking for a mature, well tested crawler framework should likely still use crawler4j. For those who can tolerate a bit of turbulence, Krawler should serve as a replacement for crawler4j with minimal modifications to existing applications. Using the Krawler framework is fairly simple. Minimally, there are two methods that must be overridden in order to use the framework. Overriding the shouldVisit method dictates what should be visited by the crawler, and the visit method dictates what happens once the page is visited. Overriding these two methods is sufficient for creating your own crawler, however there are additional methods that can be overridden to privde more robust behavior.

antch - Antch, a fast, powerful and extensible web crawling & scraping framework for Go

  •    Go

Antch, inspired by Scrapy. If you're familiar with scrapy, you can quickly get started. Antch is a fast, powerful and extensible web crawling & scraping framework for Go, used to crawl websites and extract structured data from their pages.

frequent - A utility for crawling websites and building frequency lists of words

  •    Python

frequent is a utility for crawling websites and building word frequency list. Mainly made because I wanted to be able to find top n most common words on different websites, but I imagine there might be more useful applications. Or not.

crawlerr - A simple and fully customizable web crawler/spider for Node

  •    Javascript

crawlerr is simple, yet powerful web crawler for Node.js, based on Promises. This tool allows you to crawl specific urls only based on wildcards. It uses Bloom filter for caching. A browser-like feeling. Creates a new Crawlerr instance for a specific website with custom options. All routes will be resolved to base.