haipproxy - :sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

  •        31

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

https://spiderclub.github.io/haipproxy/
https://github.com/SpiderClub/haipproxy

Tags
Implementation
License
Platform

   




Related Projects

scrapy-redis - Redis-based components for Scrapy.

  •    Python

Redis-based components for Scrapy. You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.

Scrapy - Web crawling & scraping framework for Python

  •    Python

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

YugaByte Database - Transactional, high-performance database for building internet-scale, globally-distributed applications

  •    C++

A cloud-native database for building mission-critical applications. This repository contains the Community Edition of the YugaByte Database.YugaByte offers both SQL and NoSQL in a single, unified db. It is meant to be a system-of-record/authoritative database that applications can rely on for correctness and availability. It allows applications to easily scale up and scale down in the cloud, on-premises or across hybrid environments without creating operational complexity or increasing the risk of outages.

Redisson - Redis based In-Memory Data Grid for Java

  •    Java

Redisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.


scrapy-redis - Redis-based components for scrapy that allows distributed crawling

  •    Python

Redis-based components for scrapy that allows distributed crawling

crawler - A high performance web crawler in Elixir.

  •    Elixir

A high performance web crawler in Elixir, with worker pooling and rate limiting via OPQ. Below is a very high level architecture diagram demonstrating how Crawler works.

SwiftQ - Distributed Task Queue

  •    Swift

SwiftQ is a distributed task queue for server side swift applications. Task queues are used as a mechanism to distribute work across machines. SwiftQ uses messages to communicate between clients and workers. In this case the message broker is Redis. SwiftQ uses the reliable queue pattern. This ensures that all tasks get processed even in the event of networking problems or consumer crashes. SwiftQ can be used for real time operations as well as delayed execution of tasks. SwiftQ can consist of multiple producers and consumers allowing for high availability and horizontal scaling. It is recommended that a separate Redis database is used to avoid conflicting name space.

Cm_Cache_Backend_Redis - A Zend_Cache backend for Redis with full support for tags (works great with Magento)

  •    PHP

There are two supported methods of achieving High Availability and Load Balancing with Cm_Cache_Backend_Redis.You may achieve high availability and load balancing using Redis Sentinel. To enable use of Redis Sentinel the server specified should be a comma-separated list of Sentinel servers and the sentinel_master option should be specified to indicate the name of the sentinel master set (e.g. 'mymaster'). If using sentinel_master you may also specify load_from_slaves in which case a random slave will be chosen for performing reads in order to load balance across multiple Redis instances. Using the value '1' indicates to only load from slaves and '2' to include the master in the random read slave selection.

cronsun - A Distributed, Fault-Tolerant Cron-Style Job System.

  •    Go

cronsun is a distributed cron-style job system. It's similar with crontab on stand-alone *nix. The goal of this project is to make it much easier to manage jobs on lots of machines and provides high availability. cronsun is different from Azkaban, Chronos, Airflow.

Dqlite - High-availability SQLite

  •    C

Dqlite is a fast, embedded, persistent SQL database with Raft consensus that is perfect for fault-tolerant IoT and Edge devices. Dqlite (“distributed SQLite”) extends SQLite across a cluster of machines, with automatic failover and high-availability to keep your application running. It uses C-Raft, an optimised Raft implementation in C, to gain high-performance transactional consensus and fault tolerance while preserving SQlite’s outstanding efficiency and tiny footprint.

scrapy-examples - Multifarious Scrapy examples

  •    Python

Multifarious scrapy examples with integrated proxies and agents, which make you comfy to write a spider. There are several depths in the spider, and the spider gets real data from depth2.

phxqueue - A high-availability, high-throughput and highly reliable distributed queue based on the Paxos algorithm

  •    C++

PhxQueue is a high-availability, high-throughput and highly reliable distributed queue based on the Paxos protocol. It guarantees At-Least-Once Delivery. It is widely used in WeChat for WeChat Pay, WeChat Media Platform, and many other important businesses.

Squzer - Distributed Web Crawler

  •    Python

Squzer is the Declum's open-source, extensible, scale, multithreaded and quality web crawler project entirely written in the Python language.

haredis - High-availability redis in Node.js.

  •    Javascript

High-availability redis in Node.js.

Patroni - A template for PostgreSQL High Availability with ZooKeeper, etcd, or Consul

  •    Python

Patroni is a template for you to create your own customized, high-availability solution using Python and - for maximum accessibility - a distributed configuration store like ZooKeeper, etcd, Consul or Kubernetes. Database engineers, DBAs, DevOps engineers, and SREs who are looking to quickly deploy HA PostgreSQL in the datacenter-or anywhere else-will hopefully find it useful.

DDMQ - DDMQ is a distributed messaging product with low latency, high throughput and high availability

  •    Java

DDMQ is a distributed messaging product built by DiDi Infrastructure Team based on Apache RocketMQ. As a distributed messaging middleware, DDMQ provides low latency, high throughput and high available messaging service to many important large-scale distributed systems inside DiDi. DDMQ provides realtime messaging, delay-time messaging and transactional messaging to satisfy different scenarios. Through an easy-to-use Web Console and simple SDK Client, developers can experience producing and consuming messages in the most simple and stable way.