Soup - Web Scraper in Go, similar to BeautifulSoup

  •        54

soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.

https://github.com/anaskhan96/soup

Tags
Implementation
License
Platform

   




Related Projects

rvest - Simple web scraping for R

  •    R

rvest helps you scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup. Create an html document from a url, a file on disk or a string containing html with read_html().

artoo - artoo.js - the client-side scraping companion.

  •    Javascript

artoo.js is a piece of JavaScript code meant to be run in your browser's console to provide you with some scraping utilities. The library's full documentation is available on github pages.

requests-html - Pythonic HTML Parsing for Humans™

  •    HTML

This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.

That weird screen-scraping thing

  •    

Atropine is a library for assisting with screen-scraping tasks, particularly making that of making exhaustive assertions about the structure of HTML documents. It is built on top of the fantastic BeautifulSoup HTML parser.


Beautiful Soup - Python HTML/XML parser

  •    Python

Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."

dryscrape - [not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages

  •    Python

NOTE: This package is not actively maintained. It uses QtWebkit, which is end-of-life and probably doesn't get security fixes backported. Consider using a similar package like Spynner instead. dryscrape is a lightweight web scraping library for Python. It uses a headless Webkit instance to evaluate Javascript on the visited pages. This enables painless scraping of plain web pages as well as Javascript-heavy “Web 2.0” applications like Facebook.

Scrapy - Web crawling & scraping framework for Python

  •    Python

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Rawler?-The Web scraping Framework using XAML

  •    

This is the Web scraping Framework using XAML .This framework makes Web scraping possible by only XAML.

upton - A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)

  •    HTML

Upton is a framework for easy web-scraping with a useful debug mode that doesn't hammer your target's servers. It does the repetitive parts of writing scrapers, so you only have to write the unique parts for each site. Just specify a URL to a list of links -- or simply a list of links --, an XPath expression or CSS selector for the links and a block of what to do with the content of the pages you've scraped. Upton comes with some pre-written blocks (Procs, technically) for scraping simple lists and tables, like the list function above.

scrape - A simple, higher level interface for Go web scraping.

  •    Go

A simple, higher level interface for Go web scraping. When scraping with Go, I find myself redefining tree traversal and other utility functions.

milewski-ctfp-pdf - Bartosz Milewski's 'Category Theory for Programmers' unofficial PDF and LaTeX source

  •    TeX

This is an unofficial PDF version of "Category Theory for Programmers" by Bartosz Milewski, converted from his blogpost series. Conversion is done by scraping the blog with Mercury Web Parser to get a clean HTML content, modifying and tweaking with Beautiful Soup, finally, converting to LaTeX with Pandoc. See scraper.py for additional information.

scraperjs - A complete and versatile web scraper.

  •    Javascript

Scraperjs is a web scraper module that make scraping the web an easy job. Try to spot the differences.

scrubyt - A simple to learn and use, yet powerful web scraping toolkit!

  •    Javascript

A simple to learn and use, yet powerful web scraping toolkit!

scraper - Simple web scraping for Google Chrome.

  •    Javascript

Simple web scraping for Google Chrome.

pjscrape - A web-scraping framework written in Javascript, using PhantomJS and jQuery

  •    Javascript

A web-scraping framework written in Javascript, using PhantomJS and jQuery

select.rs - A Rust library to extract useful data from HTML documents, suitable for web scraping.

  •    Rust

A library to extract useful data from HTML documents, suitable for web scraping. Note: All the API is currently unstable and will change as I use this library more in real world projects. If you have any suggestions or feedback, please open an issue or send me an email.

ineed - Web scraping and HTML-reprocessing. The easy way.

  •    Javascript

Web scraping and HTML-reprocessing. The easy way.ineed doesn't build and traverse DOM-tree, it operates on sequence of HTML tokens instead. Whole processing is done in one-pass, therefore, it's blazing fast! The token stream is produced by parse5 which parses HTML exactly the same way modern browsers do.

metainspector - Ruby gem for web scraping purposes

  •    Ruby

MetaInspector is a gem for web scraping purposes. You give it an URL, and it lets you easily get its title, links, images, charset, description, keywords, meta tags...

pjscrape - A web-scraping framework written in Javascript, using PhantomJS and jQuery

  •    Javascript

pjscrape is a framework for anyone who's ever wanted a command-line tool for web scraping using Javascript and jQuery. Built for PhantomJS, it allows you to scrape pages in a fully rendered, Javascript-enabled context from the command line, no browser required. Please see http://nrabinowitz.github.io/pjscrape/ for usage, examples, and documentation.