RARBG-scraper - With Selenium headless browsing and CAPTCHA solving

  •        9

Scraping RARBG for torrents using Scrapy. Including headless browsing with Selenium and CAPTCHA solving with pytesseract and Pillow.

https://github.com/evyatarmeged/RARBG-scraper

Tags
Implementation
License
Platform

   




Related Projects

Jackett - API Support for your favorite torrent trackers.

  •    CSharp

This project is a new fork and is recruiting development help. If you are able to help out please contact us. Jackett works as a proxy server: it translates queries from apps (Sonarr, Radarr, SickRage, CouchPotato, Mylar, DuckieTV, etc) into tracker-site-specific http queries, parses the html response, then sends results back to the requesting software. This allows for getting recent uploads (like RSS) and performing searches. Jackett is a single repository of maintained indexer scraping & translation logic - removing the burden from other apps.

Scrapy - Web crawling & scraping framework for Python

  •    Python

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

scraperjs - A complete and versatile web scraper.

  •    Javascript

Scraperjs is a web scraper module that make scraping the web an easy job. Try to spot the differences.

colly - Fast and Elegant Scraping Framework for Gophers

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

Soup - Web Scraper in Go, similar to BeautifulSoup

  •    Go

soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.


ferret - Declarative web scraping

  •    Go

ferret is a web scraping system aiming to simplify data extraction from the web for such things like UI testing, machine learning and analytics. Having its own declarative language, ferret abstracts away technical details and complexity of the underlying technologies, helping to focus on the data itself. It's extremely portable, extensible and fast. The following example demonstrates the use of dynamic pages. First of all, we load the main Google Search page, type search criteria into an input box and then click a search button. The click action triggers a redirect, so we wait till its end. Once the page gets loaded, we iterate over all elements in search results and assign the output to a variable. The final for loop filters out empty elements that might be because of inaccurate use of selectors.

scrape-it - :crystal_ball: A Node.js scraper for humans.

  •    Javascript

A Node.js scraper for humans. Please post questions on Stack Overflow. You can open issues with questions, as long you add a link to your Stack Overflow question.

django-dynamic-scraper - Creating Scrapy scrapers via the Django admin interface

  •    Python

Creating Scrapy scrapers via the Django admin interface

facebook-page-post-scraper - Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis

  •    Python

UPDATE December 2017: Due to a bug on Facebook's end, using this scraper will only return a very small subset of posts (5-10% of posts) over a limited timeframe. Since Facebook now owns CrowdTangle, the (paid) canonical source of historical Facebook data, Facebook doesn't have an incentive to fix the linked bug. On December 12th, a Facebook engineer commented that they are developing a new endpoint for scraping posts chronologically. I will refactor this script once that happens. Until then, there likely will not be any PRs accepted.

colly - Elegant Scraper and Crawler Framework for Golang

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

scraper - Simple web scraping for Google Chrome.

  •    Javascript

Simple web scraping for Google Chrome.

ineed - Web scraping and HTML-reprocessing. The easy way.

  •    Javascript

Web scraping and HTML-reprocessing. The easy way.ineed doesn't build and traverse DOM-tree, it operates on sequence of HTML tokens instead. Whole processing is done in one-pass, therefore, it's blazing fast! The token stream is produced by parse5 which parses HTML exactly the same way modern browsers do.

node-scraper - Easier web scraping using node.js and jQuery

  •    Javascript

A little module that makes scraping websites a little easier. Uses node.js and jQuery. First argument is an url as a string, second is a callback which exposes a jQuery object with your scraped site as "body" and third is an object from the request containing info about the url.

x-ray - The next web scraper. See through the <html> noise.

  •    Javascript

Looking for a career upgrade? Check out the available Node.js & Javascript positions at these innovative companies.Flexible schema: Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.

scala-scraper - A Scala library for scraping content from HTML pages

  •    Scala

A library providing a DSL for loading and extracting content from HTML pages. Take a look at Examples.scala and at the unit specs for usage examples or keep reading for more thorough documentation. Feel free to use GitHub Issues for submitting any bug or feature request and Gitter to ask questions.

world_cup_json - Rails backend for a scraper that outputs World Cup data as JSON

  •    Ruby

This should now be working for the World Cup in 2018! Should have all events and goals and match stats streaming live, please file an issue or hit me up on twitter @mutualarising if anything has gone awry. Note: FIFA is now using much more JS that they were 4 years ago to hide and show information. I'll try to make sure as the tournament goes on that things like penalties are showing up correctly. As always, this runs on a scraper. Changes to HTML structure or banning the IP address it is scraping from could break it at any time. PRs welcome.

ImageScraper - :scissors: High performance, multi-threaded image scraper

  •    Python

A high performance, easy to use, multithreaded command line tool which downloads images from the given webpage. Note that ImageScraper depends on lxml, requests, setproctitle, and future. If you run into problems in the compilation of lxml through pip, install the libxml2-dev and libxslt-dev packages on your system.

panther - A browser testing and web crawling library for PHP and Symfony

  •    PHP

Panther is a convenient standalone library to scrape websites and to run end-to-end tests using real browsers. Panther is super powerful, it leverages the W3C's WebDriver protocol to drive native web browsers such as Google Chrome and Firefox.