Soup - Web Scraper in Go, similar to BeautifulSoup

  •        49

soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.

https://github.com/anaskhan96/soup

Tags
Implementation
License
Platform

   




Related Projects

rvest - Simple web scraping for R


rvest helps you scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup. Create an html document from a url, a file on disk or a string containing html with read_html().

requests-html - Pythonic HTML Parsing for Humans™


This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.

That weird screen-scraping thing


Atropine is a library for assisting with screen-scraping tasks, particularly making that of making exhaustive assertions about the structure of HTML documents. It is built on top of the fantastic BeautifulSoup HTML parser.

Beautiful Soup - Python HTML/XML parser


Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."

dryscrape - [not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages


NOTE: This package is not actively maintained. It uses QtWebkit, which is end-of-life and probably doesn't get security fixes backported. Consider using a similar package like Spynner instead. dryscrape is a lightweight web scraping library for Python. It uses a headless Webkit instance to evaluate Javascript on the visited pages. This enables painless scraping of plain web pages as well as Javascript-heavy “Web 2.0” applications like Facebook.



Scrapy - Web crawling & scraping framework for Python


Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Rawler?-The Web scraping Framework using XAML


This is the Web scraping Framework using XAML .This framework makes Web scraping possible by only XAML.

milewski-ctfp-pdf - Bartosz Milewski's 'Category Theory for Programmers' unofficial PDF and LaTeX source


This is an unofficial PDF version of "Category Theory for Programmers" by Bartosz Milewski, converted from his blogpost series. Conversion is done by scraping the blog with Mercury Web Parser to get a clean HTML content, modifying and tweaking with Beautiful Soup, finally, converting to LaTeX with Pandoc. See scraper.py for additional information.

scrubyt - A simple to learn and use, yet powerful web scraping toolkit!


A simple to learn and use, yet powerful web scraping toolkit!

scraper - Simple web scraping for Google Chrome.


Simple web scraping for Google Chrome.

node-scraper - Easier web scraping using node.js and jQuery


Easier web scraping using node.js and jQuery

pjscrape - A web-scraping framework written in Javascript, using PhantomJS and jQuery


A web-scraping framework written in Javascript, using PhantomJS and jQuery

select.rs - A Rust library to extract useful data from HTML documents, suitable for web scraping.


A library to extract useful data from HTML documents, suitable for web scraping. Note: All the API is currently unstable and will change as I use this library more in real world projects. If you have any suggestions or feedback, please open an issue or send me an email.

ineed - Web scraping and HTML-reprocessing. The easy way.


Web scraping and HTML-reprocessing. The easy way.ineed doesn't build and traverse DOM-tree, it operates on sequence of HTML tokens instead. Whole processing is done in one-pass, therefore, it's blazing fast! The token stream is produced by parse5 which parses HTML exactly the same way modern browsers do.

metainspector - Ruby gem for web scraping purposes


MetaInspector is a gem for web scraping purposes. You give it an URL, and it lets you easily get its title, links, images, charset, description, keywords, meta tags...

pjscrape - A web-scraping framework written in Javascript, using PhantomJS and jQuery


pjscrape is a framework for anyone who's ever wanted a command-line tool for web scraping using Javascript and jQuery. Built for PhantomJS, it allows you to scrape pages in a fully rendered, Javascript-enabled context from the command line, no browser required. Please see http://nrabinowitz.github.io/pjscrape/ for usage, examples, and documentation.

cyborg - Python web scraping framework


Cyborg is an asyncio Python 3 web scraping framework that helps you write programs to extract information from websites by reading and inspecting their HTML.

sketchy - A task based API for taking screenshots and scraping text from websites.


Sketchy is a task based API for taking screenshots and scraping text from websites.Sketchy's capture model contains all of the information associated with screenshotting, scraping, and storing html files from a provided URL. Screenshots (sketches), text scrapes, and html files can either be stored locally or on an S3 bucket. Optionally, token auth can be configured for creating and retrieving captures. Sketchy can also perform callbacks if required.

colly - Fast and Elegant Scraping Framework for Gophers


Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.