Displaying 1 to 11 from 11 results

colly - Fast and Elegant Scraping Framework for Gophers

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

newspaper - 💡 News, full-text, and article metadata extraction in Python 3. Advanced docs:

  •    Python

Newspaper has seamless language extraction and detection. If no language is specified, Newspaper will attempt to auto detect a language. Check out The Documentation for full and detailed guides using newspaper.

ferret - Declarative web scraping

  •    Go

ferret is a web scraping system aiming to simplify data extraction from the web for such things like UI testing, machine learning and analytics. Having its own declarative language, ferret abstracts away technical details and complexity of the underlying technologies, helping to focus on the data itself. It's extremely portable, extensible and fast. The following example demonstrates the use of dynamic pages. First of all, we load the main Google Search page, type search criteria into an input box and then click a search button. The click action triggers a redirect, so we wait till its end. Once the page gets loaded, we iterate over all elements in search results and assign the output to a variable. The final for loop filters out empty elements that might be because of inaccurate use of selectors.

colly - Elegant Scraper and Crawler Framework for Golang

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.




Lulu - [Unmaintained] A simple and clean video/music/image downloader 👾

  •    Python

Sorry for this. Lulu is a friendly you-get fork (⏬ Dumb downloader that scrapes the web).

pomp - Screen scraping and web crawling framework

  •    Python

Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the hard Twisted dependency. If you want proxies, redirects, or similar, you may use the excellent requests library as the Pomp downloader.

gopa - [WIP] GOPA, a spider written in Golang, for Elasticsearch

  •    Go

GOPA, A Spider Written in Go. First of all, get it, two opinions: download the pre-built package or compile it yourself.

proxifier - A fast, modern and intelligent proxy rotator perfect for crawling and scraping public data

  •    Go

A fast, modern and intelligent proxy rotator perfect for crawling and scraping public data. Proxifier act as a proxy and remotely send and receive requests and responses from other proxies. Firstly, just download and install proxifier.


antch - Antch, a fast, powerful and extensible web crawling & scraping framework for Go

  •    Go

Antch, inspired by Scrapy. If you're familiar with scrapy, you can quickly get started. Antch is a fast, powerful and extensible web crawling & scraping framework for Go, used to crawl websites and extract structured data from their pages.

go-scrapy - Web crawling and scraping framework for Golang

  •    Go

go-scrapy is a very useful and productive web crawlign framework, used to crawl websites and extract structured data from parsed pages. Please go through examples to get an idea how to use this package.