Displaying 1 to 20 from 99 results

colly - Fast and Elegant Scraping Framework for Gophers

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

artoo - artoo.js - the client-side scraping companion.

  •    Javascript

artoo.js is a piece of JavaScript code meant to be run in your browser's console to provide you with some scraping utilities. The library's full documentation is available on github pages.

Embed - Get info from any web service or page

  •    PHP

PHP library to get information from any web page (using oembed, opengraph, twitter-cards, scrapping the html, etc). It's compatible with any web service (youtube, vimeo, flickr, instagram, etc) and has adapters to some sites like (archive.org, github, facebook, etc). This package is installable and autoloadable via Composer as embed/embed.

scrape-it - :crystal_ball: A Node.js scraper for humans.

  •    Javascript

A Node.js scraper for humans. Please post questions on Stack Overflow. You can open issues with questions, as long you add a link to your Stack Overflow question.




Scrapy - Web crawling & scraping framework for Python

  •    Python

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

requests-html - Pythonic HTML Parsing for Humans™

  •    HTML

This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.

scraperjs - A complete and versatile web scraper.

  •    Javascript

Scraperjs is a web scraper module that make scraping the web an easy job. Try to spot the differences.


headless-chrome-crawler - Distributed crawler powered by Headless Chrome

  •    Javascript

Crawlers based on simple requests to HTML files are generally fast. However, it sometimes ends up capturing empty bodies, especially when the websites are built on such modern frontend frameworks as AngularJS, React and Vue.js. Note: headless-chrome-crawler contains Puppeteer. During installation, it automatically downloads a recent version of Chromium. To skip the download, see Environment variables.

colly - Elegant Scraper and Crawler Framework for Golang

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

webmagic - A scalable web crawler framework for Java.

  •    Java

A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.

ferret - Declarative web scraping

  •    Go

ferret is a web scraping system aiming to simplify data extraction from the web for such things like UI testing, machine learning and analytics. Having its own declarative language, ferret abstracts away technical details and complexity of the underlying technologies, helping to focus on the data itself. It's extremely portable, extensible and fast. The following example demonstrates the use of dynamic pages. First of all, we load the main Google Search page, type search criteria into an input box and then click a search button. The click action triggers a redirect, so we wait till its end. Once the page gets loaded, we iterate over all elements in search results and assign the output to a variable. The final for loop filters out empty elements that might be because of inaccurate use of selectors.

facebook_data_analyzer - Analyze facebook copy of your data with ruby language

  •    Ruby

Analyze Facebook copy of your data. Download zip file from Facebook and get info about friends, ranking by message, vocabulary, contacts, friends added statistics and more. It won't work if you use different language because of date formatting, different titles on pages. This script uses nokogiri internally to parse data.

panther - A browser testing and web crawling library for PHP and Symfony

  •    PHP

Panther is a convenient standalone library to scrape websites and to run end-to-end tests using real browsers. Panther is super powerful, it leverages the W3C's WebDriver protocol to drive native web browsers such as Google Chrome and Firefox.

Lulu - [Unmaintained] A simple and clean video/music/image downloader 👾

  •    Python

Sorry for this. Lulu is a friendly you-get fork (⏬ Dumb downloader that scrapes the web).

Casperjs - Navigation scripting and testing utility for PhantomJS and SlimerJS

  •    Javascript

CasperJS is a navigation scripting & testing utility for PhantomJS and SlimerJS (still experimental). It eases the process of defining a full navigation scenario and provides useful high-level functions, methods & syntactic sugar for doing common tasks such as Filling forms, Clicking links, Capturing screenshots of a page, Downloading resources, even binary ones, Writing functional test suites, exporting results as JUnit XML (xUnit) and lot more.

ImageScraper - :scissors: High performance, multi-threaded image scraper

  •    Python

A high performance, easy to use, multithreaded command line tool which downloads images from the given webpage. Note that ImageScraper depends on lxml, requests, setproctitle, and future. If you run into problems in the compilation of lxml through pip, install the libxml2-dev and libxslt-dev packages on your system.

Soup - Web Scraper in Go, similar to BeautifulSoup

  •    Go

soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.

Active Statement Generator (moved to activestatement project)

  •    LINQ

!!!! Unfortunately, I must have made something really bad to the Team Foundation Server project, because I started having problems with semi-disappearing files on the server. I chose to move the files to the activestatement project !!!! I'm using this project both to handle a...