Soup - Web Scraper in Go, similar to BeautifulSoup

soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.



Related Projects

rvest - Simple web scraping for R

rvest helps you scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup. Create an html document from a url, a file on disk or a string containing html with read_html().

artoo - artoo.js - the client-side scraping companion.

artoo.js is a piece of JavaScript code meant to be run in your browser's console to provide you with some scraping utilities. The library's full documentation is available on github pages.

requests-html - Pythonic HTML Parsing for Humans™

This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.

That weird screen-scraping thing

Atropine is a library for assisting with screen-scraping tasks, particularly making that of making exhaustive assertions about the structure of HTML documents. It is built on top of the fantastic BeautifulSoup HTML parser.

Beautiful Soup - Python HTML/XML parser

Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "", or "Find the table heading that's got bold text, then give me that text."

dryscrape - [not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages

NOTE: This package is not actively maintained. It uses QtWebkit, which is end-of-life and probably doesn't get security fixes backported. Consider using a similar package like Spynner instead. dryscrape is a lightweight web scraping library for Python. It uses a headless Webkit instance to evaluate Javascript on the visited pages. This enables painless scraping of plain web pages as well as Javascript-heavy “Web 2.0” applications like Facebook.

Scrapy - Web crawling & scraping framework for Python

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Rawler?-The Web scraping Framework using XAML

This is the Web scraping Framework using XAML .This framework makes Web scraping possible by only XAML.

upton - A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)

Upton is a framework for easy web-scraping with a useful debug mode that doesn't hammer your target's servers. It does the repetitive parts of writing scrapers, so you only have to write the unique parts for each site. Just specify a URL to a list of links -- or simply a list of links --, an XPath expression or CSS selector for the links and a block of what to do with the content of the pages you've scraped. Upton comes with some pre-written blocks (Procs, technically) for scraping simple lists and tables, like the list function above.

milewski-ctfp-pdf - Bartosz Milewski's 'Category Theory for Programmers' unofficial PDF and LaTeX source

This is an unofficial PDF version of "Category Theory for Programmers" by Bartosz Milewski, converted from his blogpost series. Conversion is done by scraping the blog with Mercury Web Parser to get a clean HTML content, modifying and tweaking with Beautiful Soup, finally, converting to LaTeX with Pandoc. See for additional information.

scrubyt - A simple to learn and use, yet powerful web scraping toolkit!

scraper - Simple web scraping for Google Chrome.

pjscrape - A web-scraping framework written in Javascript, using PhantomJS and jQuery

A web-scraping framework written in Javascript, using PhantomJS and jQuery - A Rust library to extract useful data from HTML documents, suitable for web scraping.

A library to extract useful data from HTML documents, suitable for web scraping. Note: All the API is currently unstable and will change as I use this library more in real world projects. If you have any suggestions or feedback, please open an issue or send me an email.

ineed - Web scraping and HTML-reprocessing. The easy way.

Web scraping and HTML-reprocessing. The easy way.ineed doesn't build and traverse DOM-tree, it operates on sequence of HTML tokens instead. Whole processing is done in one-pass, therefore, it's blazing fast! The token stream is produced by parse5 which parses HTML exactly the same way modern browsers do.

metainspector - Ruby gem for web scraping purposes

MetaInspector is a gem for web scraping purposes. You give it an URL, and it lets you easily get its title, links, images, charset, description, keywords, meta tags...

cyborg - Python web scraping framework

Cyborg is an asyncio Python 3 web scraping framework that helps you write programs to extract information from websites by reading and inspecting their HTML.

node-scraper - Easier web scraping using node.js and jQuery

A little module that makes scraping websites a little easier. Uses node.js and jQuery. First argument is an url as a string, second is a callback which exposes a jQuery object with your scraped site as "body" and third is an object from the request containing info about the url.