jsonscraper - JSON configurable concurrent scraper

  •        2

JSON configurable concurrent scraper. Written in Go.For given JSON config file(s), produces JSON file(s) with results.

https://github.com/ssimunic/jsonscraper

Tags
Implementation
License
Platform

   




Related Projects

web-scraper-chrome-extension - Web data extraction tool implemented as chrome extension

  •    Javascript

Web Scraper is a chrome browser extension built for data extraction from web pages. Using this extension you can create a plan (sitemap) how a web site should be traversed and what should be extracted. Using these sitemaps the Web Scraper will navigate the site accordingly and extract all data. Scraped data later can be exported as CSV. When submitting a bug please attach an exported sitemap if possible.

scraperjs - A complete and versatile web scraper.

  •    Javascript

Scraperjs is a web scraper module that make scraping the web an easy job. Try to spot the differences.

Soup - Web Scraper in Go, similar to BeautifulSoup

  •    Go

soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.

market_bot - Google Play Android App store scraper

  •    Ruby

Market Bot is a web scraper (web robot, web spider) for the Google Play Android app store. It can collect data on apps, charts, and developers. Google has recently changed the HTML and CSS for the Play Store. This has caused the release version of Market Bot to break. New code is in the master branch (unreleased) to begin fixing this problem. If you are interesed in helping then please join the discussion in issue 72.

scrape-it - :crystal_ball: A Node.js scraper for humans.

  •    Javascript

A Node.js scraper for humans. Please post questions on Stack Overflow. You can open issues with questions, as long you add a link to your Stack Overflow question.


x-ray - The next web scraper. See through the <html> noise.

  •    Javascript

Looking for a career upgrade? Check out the available Node.js & Javascript positions at these innovative companies.Flexible schema: Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.

crawler - A high performance web crawler in Elixir.

  •    Elixir

A high performance web crawler in Elixir, with worker pooling and rate limiting via OPQ. Below is a very high level architecture diagram demonstrating how Crawler works.

python-codeplex-scraper

  •    

This is a simple, lightweight (and probably fragile) web scraper for CodePlex. It allows you to retrieve public information for users and projects.

HTML Scraper

  •    Java

The HTML Scraper is a utility written in Java which acts as a 'screen scraper' for HTML pages.

facebook-page-post-scraper - Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis

  •    Python

UPDATE December 2017: Due to a bug on Facebook's end, using this scraper will only return a very small subset of posts (5-10% of posts) over a limited timeframe. Since Facebook now owns CrowdTangle, the (paid) canonical source of historical Facebook data, Facebook doesn't have an incentive to fix the linked bug. On December 12th, a Facebook engineer commented that they are developing a new endpoint for scraping posts chronologically. I will refactor this script once that happens. Until then, there likely will not be any PRs accepted.

scraper - A scraper for EmulationStation written in Go using hashing

  •    Go

An auto-scraper for EmulationStation written in Go using hashes. This currently works with NES, SNES, N64, GB, GBC, GBA, MD, SMS, 32X, GG, PCE, A2600, LNX, MAME/FBA(see below), Dreamcast(bin/gdi), PSX(bin/cue), ScummVM, SegaCD, WonderSwan, WonderSwan Color ROMs. The script works by crawling a directory of ROM files looking for known extensions. When it finds a file it hashes the ROM data minus any headers or special file formatting with the goal of hashing only the data pulled from the original game. It compares this hash to a DB I've compiled to look up the correct game in theGamesDB.net. It downloads the metadata and builds the gamelist.xml file.

colly - Elegant Scraper and Crawler Framework for Golang

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

app-store-scraper - scrape data from the itunes app store

  •    Javascript

Node.js module to scrape application data from the iTunes/Mac App Store. The goal is to provide an interface as close as possible to the google-play-scraper module.

colly - Fast and Elegant Scraping Framework for Gophers

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

aso - Tools for app store optimization on iTunes and Google Play

  •    Javascript

This Node.js library provides a set of functions to aid App Store Optimization of applications in iTunes and Google Play. The functions use either google-play-scraper or app-store-scraper to gather data, so bear in mind a lot of requests are performed under the hood and you may hit throttling limits when making too many calls in a short period of time.

scala-scraper - A Scala library for scraping content from HTML pages

  •    Scala

A library providing a DSL for loading and extracting content from HTML pages. Take a look at Examples.scala and at the unit specs for usage examples or keep reading for more thorough documentation. Feel free to use GitHub Issues for submitting any bug or feature request and Gitter to ask questions.