Displaying 1 to 12 from 12 results

colly - Fast and Elegant Scraping Framework for Gophers

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

colly - Elegant Scraper and Crawler Framework for Golang

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

crawler - A high performance web crawler in Elixir.

  •    Elixir

A high performance web crawler in Elixir, with worker pooling and rate limiting via OPQ. Below is a very high level architecture diagram demonstrating how Crawler works.




freshonions-torscraper - Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi

  •    Python

This is a copy of the source for the http://zlal32teyptf4tvi.onion hidden service, which implements a tor hidden service crawler / spider and web site. This software is made available under the GNU Affero GPL 3 License.. What this means is that is you deploy this software as part of networked software that is available to the public, you must make the source code available (and any modifications).

node-tarantula - web crawler/spider for nodejs

  •    Javascript

nodejs crawler/spider which provides a simple interface for crawling the Web. Its API has been inspired by crawler4j.

robotstxt - robots.txt file parsing and checking for R

  •    R

Provides functions to download and parse ‘robots.txt’ files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, …) are allowed to access specific resources on a domain.

node-krawler - Fast and lightweight web crawler with built-in cheerio, xml and json parser.

  •    Javascript

mikeal/request is used for fetching web pages so any desired option from this package can be passed to Krawler's constructor. After Krawler emits the 'data' event, it automatically continues to a next url address. It does not care if the result was processed or not. If you would like to have a full control over the result handling, you can turn on the custom callback option. Then you can control the program flow by invoking your callback. Don't forget to call it in every case, otherwise the queue will stuck.


email-extractor - extract emails from whole of website

  •    Javascript

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

not-your-average-web-crawler - A web crawler (for bug hunting) that gathers more than you can imagine

  •    Python

N.Y.A.W.C is a Python library that enables you to test your payload against all requests of a certain domain. It crawls all requests (e.g. GET, POST or PUT) in the specified scope and keeps track of the request and response data. During the crawling process the callbacks enable you to insert your payload at specific places and test if they worked. First make sure you're on Python 2.7/3.3 or higher. Then run the command below to install N.Y.A.W.C.

crawlerr - A simple and fully customizable web crawler/spider for Node

  •    Javascript

crawlerr is simple, yet powerful web crawler for Node.js, based on Promises. This tool allows you to crawl specific urls only based on wildcards. It uses Bloom filter for caching. A browser-like feeling. Creates a new Crawlerr instance for a specific website with custom options. All routes will be resolved to base.