cloudflare-scrape - A Python module to bypass Cloudflare's anti-bot page.

  •        109

A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests. Cloudflare changes their techniques periodically, so I will update this repo frequently. This can be useful if you wish to scrape or crawl a website protected with Cloudflare. Cloudflare's anti-bot page currently just checks if the client supports Javascript, though they may add additional techniques in the future.

https://github.com/Anorov/cloudflare-scrape

Tags
Implementation
License
Platform

   




Related Projects

CloudFlair - 🔎 Find origin servers of websites behind by CloudFlare using Internet-wide scan data from Censys

  •    Python

CloudFlair is a tool to find origin servers of websites protected by CloudFlare who are publicly exposed and don't restrict network access to the CloudFlare IP ranges as they should. The tool uses Internet-wide scan data from Censys to find exposed IPv4 hosts presenting an SSL certificate associated with the target's domain name.

portia - Visual scraping for Scrapy

  •    Python

Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web page to identify the data you wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages. For more detailed instructions, and alternatives to using Docker, see the Installation docs.

facebook-page-post-scraper - Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis

  •    Python

UPDATE December 2017: Due to a bug on Facebook's end, using this scraper will only return a very small subset of posts (5-10% of posts) over a limited timeframe. Since Facebook now owns CrowdTangle, the (paid) canonical source of historical Facebook data, Facebook doesn't have an incentive to fix the linked bug. On December 12th, a Facebook engineer commented that they are developing a new endpoint for scraping posts chronologically. I will refactor this script once that happens. Until then, there likely will not be any PRs accepted.

x-ray - The next web scraper. See through the <html> noise.

  •    Javascript

Looking for a career upgrade? Check out the available Node.js & Javascript positions at these innovative companies.Flexible schema: Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.


sites-using-cloudflare - :broken_heart: Archived list of domains using Cloudflare DNS at the time of the CloudBleed announcement

  •    

This is an (archived) list of sites on Cloudflare DNS at the time of the CloudBleed HTTPS traffic leak announcement. Original vuln thread by Google Project Zero.This list is archived and no longer under active maintenance. It may contain stale or inaccurate data that will not be corrected. Do not link to it from press releases, it is not intended for end-users. If people want to find it, they can Google it.

Cloudflare-CNAME-Setup - Cloudflare Partner Panel

  •    PHP

This project allows Cloudflare Hosting Partner to provide a panel for customers, which allows customers to have CNAME setup for free. cf.tlo.xyz is the site installed the stable version of this panel. The software is up-to-date and you can trust.

ferret - Declarative web scraping

  •    Go

ferret is a web scraping system aiming to simplify data extraction from the web for such things like UI testing, machine learning and analytics. Having its own declarative language, ferret abstracts away technical details and complexity of the underlying technologies, helping to focus on the data itself. It's extremely portable, extensible and fast. The following example demonstrates the use of dynamic pages. First of all, we load the main Google Search page, type search criteria into an input box and then click a search button. The click action triggers a redirect, so we wait till its end. Once the page gets loaded, we iterate over all elements in search results and assign the output to a variable. The final for loop filters out empty elements that might be because of inaccurate use of selectors.

panther - A browser testing and web crawling library for PHP and Symfony

  •    PHP

Panther is a convenient standalone library to scrape websites and to run end-to-end tests using real browsers. Panther is super powerful, it leverages the W3C's WebDriver protocol to drive native web browsers such as Google Chrome and Firefox.

node-google - A Node.js module to search and scrape Google.

  •    Javascript

This module allows you to search google by scraping the results. It does NOT use the Google Search API. PLEASE DO NOT ABUSE THIS. The intent of using this is convenience vs the cruft that exists in the Google Search API.This is not sponsored, supported, or affiliated with Google Inc.

scrape-it - :crystal_ball: A Node.js scraper for humans.

  •    Javascript

A Node.js scraper for humans. Please post questions on Stack Overflow. You can open issues with questions, as long you add a link to your Stack Overflow question.

CloudFlare Developer Toolkit

  •    

CloudFlare Developer Toolkit is an open source library that allows Microsoft .NET developers to easily integrate their applications with CloudFlare service.

scrape - A simple, higher level interface for Go web scraping.

  •    Go

A simple, higher level interface for Go web scraping. When scraping with Go, I find myself redefining tree traversal and other utility functions.

cloud-to-butt - Chrome extension that replaces occurrences of 'the cloud' with 'my butt'

  •    Javascript

Note that there are forks of this extension that simply replace 'cloud' with 'butt'. In my personal opinion, that approach is too broad and it's less funny as a result, but it is clearly a very polarizing issue in the cloud-to-butt user community. Forks are free to do whatever they like. But officially, this extension replaces only the phrase described above, and therefore it did not replace your cloudflare URLs with buttflare URLs. Thank you for your concern. In Chrome, choose Window > Extensions. Drag CloudToButt.crx into the page that appears.

RED_HAWK - All in one tool for Information Gathering, Vulnerability Scanning and Crawling

  •    PHP

RED HAWK's CMS Detector currently is able to detect the following CMSs (Content Management Systems) in case the website is using some other CMS, Detector will return could not detect. Want to contribute to RED HAWK or point out something wrong? Just create a new issue here: https://github.com/Tuhinshubhra/RED_HAWK/issues/new I'd love to hear from you.

rvest - Simple web scraping for R

  •    R

rvest helps you scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup. Create an html document from a url, a file on disk or a string containing html with read_html().

pjscrape - A web-scraping framework written in Javascript, using PhantomJS and jQuery

  •    Javascript

pjscrape is a framework for anyone who's ever wanted a command-line tool for web scraping using Javascript and jQuery. Built for PhantomJS, it allows you to scrape pages in a fully rendered, Javascript-enabled context from the command line, no browser required. Please see http://nrabinowitz.github.io/pjscrape/ for usage, examples, and documentation.

cfapp_sample - A Sample CloudFlare App

  •    Javascript

A Sample CloudFlare App

cf-ui - :gem: Cloudflare UI Framework

  •    Javascript

This repository is no longer maintained. We decided to merge cf-ui into our internal monorepo and we will keep the future development there. We do not accept pull requests here. However, we plan to synchronize our internal changes with this repository. cf-ui is a set of over 50 packages used to build UIs at Cloudflare using projects such as React, Fela, Lerna and more.