Displaying 1 to 10 from 10 results

gocrawl - Polite, slim and concurrent web crawler.

  •    Go

gocrawl is a polite, slim and concurrent web crawler written in Go.For a simpler yet more flexible web crawler written in a more idiomatic Go style, you may want to take a look at fetchbot, a package that builds on the experience of gocrawl.

fetchbot - A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

  •    Go

Package fetchbot provides a simple and flexible web crawler that follows the robots.txt policies and crawl delays.The package has a single external dependency, robotstxt. It also integrates code from the iq package.

robots.js - Parser for robots.txt for node.js

  •    Javascript

robots.js — is parser for robots.txt files for node.js.RobotsParser — main class. This class provides a set of methods to read, parse and answer questions about a single robots.txt file.

robots-parser - NodeJS robots.txt parser with support for wildcard (*) matching.

  •    Javascript

NodeJS robots.txt parser. Returns true if crawling the specified URL is allowed for the specified user-agent.




Robots.txt-Parser-Class - Php class for robots.txt parse

  •    PHP

PHP class to parse robots.txt rules according to Google, Yandex, W3C and The Web Robots Pages specifications. Full list of supported specifications (and what's not supported, yet) are available in our Wiki.

robots-txt - Determine if a page may be crawled from robots.txt, robots meta tags and robot headers

  •    PHP

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers. Please see CHANGELOG for more information what has changed recently.


robots

  •    Java

Distributed robots.txt parser and rule checker through API access. If you are working on a distributed web crawler and you want to be polite in your action, then you will find this project very useful. Also, this project can be used to integrate into any SEO tool to check if the content is being indexed correctly by robots.