:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis
https://spiderclub.github.io/haipproxy/Tags | high-availability scrapy ipproxy distributed redis crawler scheduler spider |
Implementation | Python |
License | MIT |
Platform | Windows Linux |
Redis-based components for Scrapy. You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.
scrapy crawler distributed redis[Crawler for Golang] Pholcus is a distributed, high concurrency and powerful web crawler software.
crawler spider multi-interface distributed-crawler high-concurrency-crawler fastest-crawler cross-platform-crawler web-crawlerScrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
crawler web-crawler scraping text-extraction spiderA cloud-native database for building mission-critical applications. This repository contains the Community Edition of the YugaByte Database.YugaByte offers both SQL and NoSQL in a single, unified db. It is meant to be a system-of-record/authoritative database that applications can rely on for correctness and availability. It allows applications to easily scale up and scale down in the cloud, on-premises or across hybrid environments without creating operational complexity or increasing the risk of outages.
distributed-database database cassandra redis cpp high-performance newsql sql nosqlRedisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.
cache distributed-caching distributed-locks redis-client redis-cluster collections java-collections hashmap set queuePossibly the best practice of Scrapy and renting a house (可能是 Scrapy 和租房的最佳实践)
scrapy scrapy-crawler scrapy-spider docker scrapydRedis-based components for scrapy that allows distributed crawling
A curated list of awesome packages, articles, and other cool resources from the Scrapy community. Scrapy is a fast high-level web crawling & scraping framework for Python. scrapyscript Run a Scrapy spider programmatically from a script or a Celery task - no project required.
awesome scrapy awesome-scrapyA high performance web crawler in Elixir, with worker pooling and rate limiting via OPQ. Below is a very high level architecture diagram demonstrating how Crawler works.
elixir crawler spider scraper scraper-engine offline filesPolarDB for PostgreSQL (PolarDB for short) is an open-source database system based on PostgreSQL. It extends PostgreSQL to become a share-nothing distributed database, which supports global data consistency and ACID across database nodes, distributed SQL processing, and data redundancy and high availability through Paxos based replication.
database distributed-database rdbms postgresqlSwiftQ is a distributed task queue for server side swift applications. Task queues are used as a mechanism to distribute work across machines. SwiftQ uses messages to communicate between clients and workers. In this case the message broker is Redis. SwiftQ uses the reliable queue pattern. This ensures that all tasks get processed even in the event of networking problems or consumer crashes. SwiftQ can be used for real time operations as well as delayed execution of tasks. SwiftQ can consist of multiple producers and consumers allowing for high availability and horizontal scaling. It is recommended that a separate Redis database is used to avoid conflicting name space.
server-side-swiftThere are two supported methods of achieving High Availability and Load Balancing with Cm_Cache_Backend_Redis.You may achieve high availability and load balancing using Redis Sentinel. To enable use of Redis Sentinel the server specified should be a comma-separated list of Sentinel servers and the sentinel_master option should be specified to indicate the name of the sentinel master set (e.g. 'mymaster'). If using sentinel_master you may also specify load_from_slaves in which case a random slave will be chosen for performing reads in order to load balance across multiple Redis instances. Using the value '1' indicates to only load from slaves and '2' to include the master in the random read slave selection.
Tendis is a high-performance distributed storage system which is fully compatible with the Redis protocol. It uses RocksDB as the storage engine, and all data is stored to disks through RocksDB. Users can access Tendis using a Redis client, and the application hardly needs to be changed. In addition, Tendis supports storage capacity far exceeding memory, which can greatly reduce user storage costs. Similar to Redis clusters, Tendis uses a decentralized distributed solution. The gossip protocol is used for communication between nodes, and all nodes in a cluster can be routed to the correct node when a user accesses. Cluster nodes support automatic discovery of other nodes, detect faulty nodes, and ensure the application is almost not affected when the master node failed.
redis rocksdb nosql high-performance kvstore cpp17 kv tendiscronsun is a distributed cron-style job system. It's similar with crontab on stand-alone *nix. The goal of this project is to make it much easier to manage jobs on lots of machines and provides high availability. cronsun is different from Azkaban, Chronos, Airflow.
crontab job-scheduler cron fault-toleranceA fast, high availability, fully Redis compatible store.
redis cache key-value-database key-value-store persistence hashtable dictionaryDqlite is a fast, embedded, persistent SQL database with Raft consensus that is perfect for fault-tolerant IoT and Edge devices. Dqlite (“distributed SQLite”) extends SQLite across a cluster of machines, with automatic failover and high-availability to keep your application running. It uses C-Raft, an optimised Raft implementation in C, to gain high-performance transactional consensus and fault tolerance while preserving SQlite’s outstanding efficiency and tiny footprint.
sqlite database embedded-database distributed-database embedded distributedMultifarious scrapy examples with integrated proxies and agents, which make you comfy to write a spider. There are several depths in the spider, and the spider gets real data from depth2.
PhxQueue is a high-availability, high-throughput and highly reliable distributed queue based on the Paxos protocol. It guarantees At-Least-Once Delivery. It is widely used in WeChat for WeChat Pay, WeChat Media Platform, and many other important businesses.
Squzer is the Declum's open-source, extensible, scale, multithreaded and quality web crawler project entirely written in the Python language.
crawler distributed-systems downloader file search spider
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.