Displaying 1 to 19 from 19 results

colly - Fast and Elegant Scraping Framework for Gophers

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

spider - Unsurprising JavaScript - No longer active

  •    Javascript

The Next-Gen Programming Language for the Web. Note: This project is no longer active.

colly - Elegant Scraper and Crawler Framework for Golang

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.




toapi - Every web site provides APIs.

  •    Python

Toapi give you the ability to make every web site provides APIs. Version v2.0.0, Completely rewrote.

ruia - Async Python 3.6+ web scraping micro-framework based on asyncio.

  •    Python

Ruia is an async web scraping micro-framework, written with asyncio and aiohttp, aims to make crawling url as convenient as possible.

Gigablast - Web and Enterprise search engine in C++

  •    C++

Gigablast is one of the remaining four search engines in the United States that maintains its own searchable index of over a billion pages. It is scalable to thousands of servers. Has scaled to over 12 billion web pages on over 200 servers. It supports Distributed web crawler, Document conversion, Automated data corruption detection and repair, Can cluster results from same site, Synonym search, Spell checker and lot more.


gopa-abandoned - GOPA, a spider written in Go

  •    Go

[狗爬], A Spider Written in Go. It's safety to press ctrl+c stop the current running Gopa, Gopa will handle the rest,saving the checkpoint, you may restore the job later,the world is still in your hand.

web-data-extractor - Extracting and parsing structured data with jQuery Selector, XPath or JsonPath from common web format like HTML, XML and JSON

  •    Java

Web Data Extractor - Extracting and parsing structured data with Jquery Selector, XPath or JsonPath from common web format like HTML, XML and JSON.

gopa - [WIP] GOPA, a spider written in Golang, for Elasticsearch

  •    Go

GOPA, A Spider Written in Go. First of all, get it, two opinions: download the pre-built package or compile it yourself.

marmot - 💐Marmot | Web Crawler/HTTP protocol Download Package 🐭

  •    Go

If you go get difficult, you can move those files under GOPATH in this project to your Golang env's GOPATH. HTTP Download Helper, Supports Many Features such as Cookie Persistence, HTTP(S) and SOCKS5 Proxy....

supercrawler - A web crawler

  •    Javascript

Supercrawler is a Node.js web crawler. It is designed to be highly configurable and easy to use. When Supercrawler successfully crawls a page (which could be an image, a text document or any other file), it will fire your custom content-type handlers. Define your own custom handlers to parse pages, save data and do anything else you need.

NScrapy - NScrapy is a

  •    CSharp

Below is a sample of NScrapy, the sample will visit Liepin, which is a Recruit web site Based on the seed URL defined in the [URL] attribute, NScrapy will visit each Postion information in detail page(the ParseItem method) , and visit the next page automatically(the VisitPage method). It is not necessary for the Spider writer to know how the Spiders distributed in different machine/process communicate with each other, and how the Downloader process get the urt that need to download, just tell NScrapy the seed URL, inhirt Spider.Spdier class and write some call back, NScrapy will take the rest of the work NScrapy support different kind of extension, including add your own DownloaderMiddleware, config HTTP header, user agent pool.