cloudflare-scrape - A Python module to bypass Cloudflare's anti-bot page.

  •        1252

A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests. Cloudflare changes their techniques periodically, so I will update this repo frequently. This can be useful if you wish to scrape or crawl a website protected with Cloudflare. Cloudflare's anti-bot page currently just checks if the client supports Javascript, though they may add additional techniques in the future.

https://github.com/Anorov/cloudflare-scrape

Tags
Implementation
License
Platform

   




Related Projects

autoscraper - A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  •    Python

This project is made for automatic web scraping to make scraping easy. It gets a url or the html content of a web page and a list of sample data which we want to scrape from that page. This data can be text, url or any html tag value of that page. It learns the scraping rules and returns the similar elements. Then you can use this learned object with new urls to get similar content or the exact same element of those new pages. It's compatible with python 3.

CloudFlair - 🔎 Find origin servers of websites behind by CloudFlare using Internet-wide scan data from Censys

  •    Python

CloudFlair is a tool to find origin servers of websites protected by CloudFlare who are publicly exposed and don't restrict network access to the CloudFlare IP ranges as they should. The tool uses Internet-wide scan data from Censys to find exposed IPv4 hosts presenting an SSL certificate associated with the target's domain name.

portia - Visual scraping for Scrapy

  •    Python

Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web page to identify the data you wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages. For more detailed instructions, and alternatives to using Docker, see the Installation docs.

ferret - Declarative web scraping

  •    Go

ferret is a web scraping system. It aims to simplify data extraction from the web for UI testing, machine learning, analytics and more. ferret allows users to focus on the data. It abstracts away the technical details and complexity of underlying technologies using its own declarative language. It is extremely portable, extensible, and fast. It as the ability to scrape JS rendered pages, handle all page events and emulate user interactions.

r-web-scraping-cheat-sheet - Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium

  •    R

Inspired by Hartley Brody, this cheat sheet is about web scraping using rvest,httr and Rselenium. It covers many topics in this blog. While Hartley uses python's requests and beautifulsoup libraries, this cheat sheet covers the usage of httr and rvest. While rvest is good enough for many scraping tasks, httr is required for more advanced techniques. Usage of Rselenium(web driver) is also covered.


facebook-page-post-scraper - Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis

  •    Python

UPDATE December 2017: Due to a bug on Facebook's end, using this scraper will only return a very small subset of posts (5-10% of posts) over a limited timeframe. Since Facebook now owns CrowdTangle, the (paid) canonical source of historical Facebook data, Facebook doesn't have an incentive to fix the linked bug. On December 12th, a Facebook engineer commented that they are developing a new endpoint for scraping posts chronologically. I will refactor this script once that happens. Until then, there likely will not be any PRs accepted.

wgcf - 🚤 Cross-platform, unofficial CLI for Cloudflare Warp

  •    Go

You can find pre-compiled binaries on the releases page. Run wgcf in a terminal without any arguments to display the help screen. All commands and parameters are documented.

metascraper - Easily scrape data from websites using Open Graph, HTML metadata & fallbacks.

  •    HTML

A library to easily scrape metadata from an article on the web using Open Graph, JSON+LD, regular HTML metadata, and series of fallbacks. metascraper is library to easily scrape metadata from an article on the web using Open Graph metadata, regular HTML metadata, and series of fallbacks.

x-ray - The next web scraper. See through the <html> noise.

  •    Javascript

Looking for a career upgrade? Check out the available Node.js & Javascript positions at these innovative companies.Flexible schema: Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.

sites-using-cloudflare - :broken_heart: Archived list of domains using Cloudflare DNS at the time of the CloudBleed announcement

  •    

This is an (archived) list of sites on Cloudflare DNS at the time of the CloudBleed HTTPS traffic leak announcement. Original vuln thread by Google Project Zero.This list is archived and no longer under active maintenance. It may contain stale or inaccurate data that will not be corrected. Do not link to it from press releases, it is not intended for end-users. If people want to find it, they can Google it.

Cloudflare-CNAME-Setup - Cloudflare Partner Panel

  •    PHP

This project allows Cloudflare Hosting Partner to provide a panel for customers, which allows customers to have CNAME setup for free. cf.tlo.xyz is the site installed the stable version of this panel. The software is up-to-date and you can trust.

workers-graphql-server - 🔥Lightning-fast, globally distributed Apollo GraphQL server, deployed at the edge using Cloudflare Workers

  •    Javascript

An Apollo GraphQL server, built with Cloudflare Workers. Try a demo by looking at a deployed GraphQL playground. Why this rules: Cloudflare Workers is a serverless application platform for deploying your projects across Cloudflare's massive distributed network. Deploying your GraphQL application to the edge is a huge opportunity to build consistent low-latency API servers, with the added benefits of "serverless" (I know, the project has server in it): usage-based pricing, no cold starts, and instant, easy-to-use deployment software, using Wrangler.

cloudflare-docs - Cloudflare’s developer docs.

  •    Javascript

To get write access to this repo, please reach out to the Developer Docs room in chat. Except as otherwise noted, Cloudflare and any contributors grant you a license to the Cloudflare Developer Documentation and other content in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.

libreddit - Private front-end for Reddit written in Rust

  •    Rust

10 second pitch: Libreddit is a portmanteau of "libre" (meaning freedom) and "Reddit". It is a private front-end like Invidious but for Reddit. Browse the coldest takes of r/unpopularopinion without being tracked. A checkmark in the "Cloudflare" category here refers to the use of the reverse proxy, Cloudflare. The checkmark will not be listed for a site which uses Cloudflare DNS but rather the proxying service which grants Cloudflare the ability to monitor traffic to the website.

miniflare - 🔥 Fully-local simulator for Cloudflare Workers

  •    TypeScript

Miniflare is a simulator for developing and testing Cloudflare Workers. It's an alternative to wrangler dev, written in TypeScript, that runs your workers in a sandbox implementing Workers' runtime APIs.

panther - A browser testing and web crawling library for PHP and Symfony

  •    PHP

Panther is a convenient standalone library to scrape websites and to run end-to-end tests using real browsers. Panther is super powerful, it leverages the W3C's WebDriver protocol to drive native web browsers such as Google Chrome and Firefox.

node-google - A Node.js module to search and scrape Google.

  •    Javascript

This module allows you to search google by scraping the results. It does NOT use the Google Search API. PLEASE DO NOT ABUSE THIS. The intent of using this is convenience vs the cruft that exists in the Google Search API.This is not sponsored, supported, or affiliated with Google Inc.

scrape-it - :crystal_ball: A Node.js scraper for humans.

  •    Javascript

A Node.js scraper for humans. Please post questions on Stack Overflow. You can open issues with questions, as long you add a link to your Stack Overflow question.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.