haipproxy - :sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

  •        23

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

https://spiderclub.github.io/haipproxy/
https://github.com/SpiderClub/haipproxy

Tags
Implementation
License
Platform

   




Related Projects

scrapy-redis - Redis-based components for Scrapy.

  •    Python

Redis-based components for Scrapy. You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.

Scrapy - Web crawling & scraping framework for Python

  •    Python

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

YugaByte Database - Transactional, high-performance database for building internet-scale, globally-distributed applications

  •    C++

A cloud-native database for building mission-critical applications. This repository contains the Community Edition of the YugaByte Database.YugaByte offers both SQL and NoSQL in a single, unified db. It is meant to be a system-of-record/authoritative database that applications can rely on for correctness and availability. It allows applications to easily scale up and scale down in the cloud, on-premises or across hybrid environments without creating operational complexity or increasing the risk of outages.

Redisson - Redis based In-Memory Data Grid for Java

  •    Java

Redisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.


scrapy-redis - Redis-based components for scrapy that allows distributed crawling

  •    Python

Redis-based components for scrapy that allows distributed crawling

crawler - A high performance web crawler in Elixir.

  •    Elixir

A high performance web crawler in Elixir, with worker pooling and rate limiting via OPQ. Below is a very high level architecture diagram demonstrating how Crawler works.

SwiftQ - Distributed Task Queue

  •    Swift

SwiftQ is a distributed task queue for server side swift applications. Task queues are used as a mechanism to distribute work across machines. SwiftQ uses messages to communicate between clients and workers. In this case the message broker is Redis. SwiftQ uses the reliable queue pattern. This ensures that all tasks get processed even in the event of networking problems or consumer crashes. SwiftQ can be used for real time operations as well as delayed execution of tasks. SwiftQ can consist of multiple producers and consumers allowing for high availability and horizontal scaling. It is recommended that a separate Redis database is used to avoid conflicting name space.

Cm_Cache_Backend_Redis - A Zend_Cache backend for Redis with full support for tags (works great with Magento)

  •    PHP

There are two supported methods of achieving High Availability and Load Balancing with Cm_Cache_Backend_Redis.You may achieve high availability and load balancing using Redis Sentinel. To enable use of Redis Sentinel the server specified should be a comma-separated list of Sentinel servers and the sentinel_master option should be specified to indicate the name of the sentinel master set (e.g. 'mymaster'). If using sentinel_master you may also specify load_from_slaves in which case a random slave will be chosen for performing reads in order to load balance across multiple Redis instances. Using the value '1' indicates to only load from slaves and '2' to include the master in the random read slave selection.

cronsun - A Distributed, Fault-Tolerant Cron-Style Job System.

  •    Go

cronsun is a distributed cron-style job system. It's similar with crontab on stand-alone *nix. The goal of this project is to make it much easier to manage jobs on lots of machines and provides high availability. cronsun is different from Azkaban, Chronos, Airflow.

scrapy-examples - Multifarious Scrapy examples

  •    Python

Multifarious scrapy examples with integrated proxies and agents, which make you comfy to write a spider. There are several depths in the spider, and the spider gets real data from depth2.

phxqueue - A high-availability, high-throughput and highly reliable distributed queue based on the Paxos algorithm

  •    C++

PhxQueue is a high-availability, high-throughput and highly reliable distributed queue based on the Paxos protocol. It guarantees At-Least-Once Delivery. It is widely used in WeChat for WeChat Pay, WeChat Media Platform, and many other important businesses.

Squzer - Distributed Web Crawler

  •    Python

Squzer is the Declum's open-source, extensible, scale, multithreaded and quality web crawler project entirely written in the Python language.

haredis - High-availability redis in Node.js.

  •    Javascript

High-availability redis in Node.js.

Patroni - A template for PostgreSQL High Availability with ZooKeeper, etcd, or Consul

  •    Python

Patroni is a template for you to create your own customized, high-availability solution using Python and - for maximum accessibility - a distributed configuration store like ZooKeeper, etcd, Consul or Kubernetes. Database engineers, DBAs, DevOps engineers, and SREs who are looking to quickly deploy HA PostgreSQL in the datacenter-or anywhere else-will hopefully find it useful.

DDMQ - DDMQ is a distributed messaging product with low latency, high throughput and high availability

  •    Java

DDMQ is a distributed messaging product built by DiDi Infrastructure Team based on Apache RocketMQ. As a distributed messaging middleware, DDMQ provides low latency, high throughput and high available messaging service to many important large-scale distributed systems inside DiDi. DDMQ provides realtime messaging, delay-time messaging and transactional messaging to satisfy different scenarios. Through an easy-to-use Web Console and simple SDK Client, developers can experience producing and consuming messages in the most simple and stable way.

sparrow - Sparrow scheduling platform (U.C. Berkeley).

  •    Python

Sparrow is a high throughput, low latency, and fault-tolerant distributed cluster scheduler. Sparrow is designed for applications that require resource allocations frequently for very short jobs, such as analytics frameworks. Sparrow schedules from a distributed set of schedulers that maintain no shared state. Instead, to schedule a job, a scheduler obtains intantaneous load information by sending probes to a subset of worker machines. The scheduler places the job's tasks on the least loaded of the probed workers. This technique allows Sparrow to schedule in milliseconds, two orders of magnitude faster than existing approaches. Sparrow also handles failures: if a scheduler fails, a client simply directs scheduling requests to an alternate scheduler. To read more about Sparrow, check out our paper.