scrapy-redis - Redis-based components for Scrapy.

  •        9

Redis-based components for Scrapy. You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.

http://scrapy-redis.readthedocs.io
https://github.com/rmax/scrapy-redis

Tags
Implementation
License
Platform

   




Related Projects

scrapy-redis - Redis-based components for scrapy that allows distributed crawling

  •    Python

Redis-based components for scrapy that allows distributed crawling

distribute_crawler - 使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现

  •    Python

使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现

Scrapy - Web crawling & scraping framework for Python

  •    Python

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.


Redisson - Redis based In-Memory Data Grid for Java

  •    Java

Redisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.

ants-go - open source, distributed, restful crawler engine in golang

  •    Go

I wrote a crawler engine named ants in python base on scrapy. But sometimes, dynamic language is chaos. So I start to write it in a compile language.

scrapyrt - Scrapy realtime

  •    Python

HTTP server which provides API for scheduling Scrapy spiders and making requests with spiders. Allows you to easily add HTTP API to your existing Scrapy project. All Scrapy project components (e.g. middleware, pipelines, extensions) are supported out of the box. You simply run Scrapyrt in Scrapy project directory and it starts HTTP server allowing you to schedule your spiders and get spider output in JSON format.

node-redlock - A node.js redlock implementation for distributed, highly-available redis locks

  •    Javascript

This is a node.js implementation of the redlock algorithm for distributed redis locks. It provides strong guarantees in both single-redis and multi-redis environments, and provides fault tolerance through use of multiple independent redis instances or clusters. Please make sure to use a client with built-in cluster support, such as ioredis.

CurioDB - Distributed & Persistent Redis Clone built with Scala & Akka

  •    Scala

CurioDB is a distributed and persistent Redis clone, built with Scala and Akka. Please note that despite the fancy logo, this is a toy project, hence the name "Curio", and any suitability as a drop-in replacement for Redis is purely incidental.

scrapy-proxies - Random proxy middleware for Scrapy

  •    Python

Processes Scrapy requests using a random proxy from list to avoid IP ban and improve crawling speed. For older versions of Scrapy (before 1.0.0) you have to use scrapy.contrib.downloadermiddleware.retry.RetryMiddleware and scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware middlewares instead.

elasticell - Elastic Key-Value Storage With Strong Consistency and Reliability

  •    Go

Elasticell is a distributed NoSQL database with strong consistency and reliability. Compatible with Redis protocol Use Elasticell as Redis. You can replace Redis with Elasticell to power your application without changing a single line of code in most cases(unsupport-redis-commands).

Go Redis - Type-safe Redis client for Golang

  •    Go

Redis client for Golang. It supports Publish /Subscribe. Transactions. Pipeline and TxPipeline. Scripting. Timeouts. Redis Sentinel. Redis Cluster. Cluster of Redis Servers without using cluster mode and Redis Sentinel. Ring. Instrumentation. Cache friendly. Rate limiting. Distributed Locks.

tidis - Distributed transactional NoSQL database, Redis protocol compatible using tikv as backend

  •    Go

Tidis is a Distributed NoSQL database, providing a Redis protocol API (string, list, hash, set, sorted set), written in Go. Tidis is like TiDB layer, providing protocol transform and data structure compute, powered by TiKV backend distributed storage which use Raft for data replication and 2PC for distributed transaction.

scrapy-zhihu-github - scrapy examples for crawling zhihu and github

  •    Python

scrapy examples for crawling zhihu and github

scrapyd - A service daemon to run Scrapy spiders

  •    Python

Scrapyd is a service for running Scrapy spiders. It allows you to deploy your Scrapy projects and control their spiders using an HTTP JSON API.

scrapy-examples - Multifarious Scrapy examples

  •    Python

Multifarious scrapy examples with integrated proxies and agents, which make you comfy to write a spider. There are several depths in the spider, and the spider gets real data from depth2.

Redisson - Redis based In-Memory Data Grid for Java

  •    Java

Redisson - Distributed and Scalable Java data structures (Set, SortedSet, Map, ConcurrentMap, List, Queue, Deque, Lock, AtomicLong, CountDownLatch, Publish / Subscribe, HyperLogLog) on top of Redis server. Advanced redis java client. It supports over 28+ data structures and services, Synchronous / asynchronous / reactive interfaces and lot more.