webhog - Downloads and stores a given URL (including js, css, and images) for offline use.

  •        20

webhog is a package that stores and downloads a given URL (including js, css, and images) for offline use and uploads it to a given AWS-S3 account (more persistance options to come). ##Usage Make a POST request to http://localhost:3000/scrape with a header set to value X-API-KEY: SCRAPEAPI. Pass in a JSON value of the URL you'd like to fetch: { "url": "http://facebook.com"} (as an example). You'll notice an Ent dir: /blah/blah/blah printed to the console - your assets are saved there. To test, open the given index.html file.

https://github.com/johnernaut/webhog

Tags
Implementation
License
Platform

   




Related Projects

huginn - Create agents that monitor and act on your behalf. Your agents are standing by!

  •    Ruby

Huginn is a system for building agents that perform automated tasks for you online. They can read the web, watch for events, and take actions on your behalf. Huginn's Agents create and consume events, propagating them along a directed graph. Think of it as a hackable version of IFTTT or Zapier on your own server. You always know who has your data. You do. Join us in our Gitter room to discuss the project.

Soup - Web Scraper in Go, similar to BeautifulSoup

  •    Go

soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.

morph - Take the hassle out of web scraping

  •    Ruby

Development is supported on Linux and Mac OS X. Just follow the instructions on the Docker site.