haipproxy - :sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

Related Projects

scrapy-redis - Redis-based components for Scrapy.

Redis-based components for Scrapy. You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.

Scrapy - Web crawling & scraping framework for Python

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

YugaByte Database - Transactional, high-performance database for building internet-scale, globally-distributed applications

A cloud-native database for building mission-critical applications. This repository contains the Community Edition of the YugaByte Database.YugaByte offers both SQL and NoSQL in a single, unified db. It is meant to be a system-of-record/authoritative database that applications can rely on for correctness and availability. It allows applications to easily scale up and scale down in the cloud, on-premises or across hybrid environments without creating operational complexity or increasing the risk of outages.

Redisson - Redis based In-Memory Data Grid for Java

Redisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.

scrapy-redis - Redis-based components for scrapy that allows distributed crawling

Redis-based components for scrapy that allows distributed crawling

awesome-scrapy - A curated list of awesome packages, articles, and other cool resources from the Scrapy community


A curated list of awesome packages, articles, and other cool resources from the Scrapy community. Scrapy is a fast high-level web crawling & scraping framework for Python. scrapyscript Run a Scrapy spider programmatically from a script or a Celery task - no project required.

crawler - A high performance web crawler in Elixir.

A high performance web crawler in Elixir, with worker pooling and rate limiting via OPQ. Below is a very high level architecture diagram demonstrating how Crawler works.

PolarDB - Share-nothing, Distributed database based on PostgreSQL

PolarDB for PostgreSQL (PolarDB for short) is an open-source database system based on PostgreSQL. It extends PostgreSQL to become a share-nothing distributed database, which supports global data consistency and ACID across database nodes, distributed SQL processing, and data redundancy and high availability through Paxos based replication.

SwiftQ - Distributed Task Queue

SwiftQ is a distributed task queue for server side swift applications. Task queues are used as a mechanism to distribute work across machines. SwiftQ uses messages to communicate between clients and workers. In this case the message broker is Redis. SwiftQ uses the reliable queue pattern. This ensures that all tasks get processed even in the event of networking problems or consumer crashes. SwiftQ can be used for real time operations as well as delayed execution of tasks. SwiftQ can consist of multiple producers and consumers allowing for high availability and horizontal scaling. It is recommended that a separate Redis database is used to avoid conflicting name space.

Cm_Cache_Backend_Redis - A Zend_Cache backend for Redis with full support for tags (works great with Magento)

There are two supported methods of achieving High Availability and Load Balancing with Cm_Cache_Backend_Redis.You may achieve high availability and load balancing using Redis Sentinel. To enable use of Redis Sentinel the server specified should be a comma-separated list of Sentinel servers and the sentinel_master option should be specified to indicate the name of the sentinel master set (e.g. 'mymaster'). If using sentinel_master you may also specify load_from_slaves in which case a random slave will be chosen for performing reads in order to load balance across multiple Redis instances. Using the value '1' indicates to only load from slaves and '2' to include the master in the random read slave selection.

Tendis - Tendis is a high-performance distributed storage system fully compatible with the Redis protocol

Tendis is a high-performance distributed storage system which is fully compatible with the Redis protocol. It uses RocksDB as the storage engine, and all data is stored to disks through RocksDB. Users can access Tendis using a Redis client, and the application hardly needs to be changed. In addition, Tendis supports storage capacity far exceeding memory, which can greatly reduce user storage costs. Similar to Redis clusters, Tendis uses a decentralized distributed solution. The gossip protocol is used for communication between nodes, and all nodes in a cluster can be routed to the correct node when a user accesses. Cluster nodes support automatic discovery of other nodes, detect faulty nodes, and ensure the application is almost not affected when the master node failed.

cronsun - A Distributed, Fault-Tolerant Cron-Style Job System.

cronsun is a distributed cron-style job system. It's similar with crontab on stand-alone *nix. The goal of this project is to make it much easier to manage jobs on lots of machines and provides high availability. cronsun is different from Azkaban, Chronos, Airflow.

Dqlite - High-availability SQLite

Dqlite is a fast, embedded, persistent SQL database with Raft consensus that is perfect for fault-tolerant IoT and Edge devices. Dqlite (“distributed SQLite”) extends SQLite across a cluster of machines, with automatic failover and high-availability to keep your application running. It uses C-Raft, an optimised Raft implementation in C, to gain high-performance transactional consensus and fault tolerance while preserving SQlite’s outstanding efficiency and tiny footprint.

scrapy-examples - Multifarious Scrapy examples

Multifarious scrapy examples with integrated proxies and agents, which make you comfy to write a spider. There are several depths in the spider, and the spider gets real data from depth2.

phxqueue - A high-availability, high-throughput and highly reliable distributed queue based on the Paxos algorithm

PhxQueue is a high-availability, high-throughput and highly reliable distributed queue based on the Paxos protocol. It guarantees At-Least-Once Delivery. It is widely used in WeChat for WeChat Pay, WeChat Media Platform, and many other important businesses.

Squzer - Distributed Web Crawler

Squzer is the Declum's open-source, extensible, scale, multithreaded and quality web crawler project entirely written in the Python language.

