caoliuscrapy - 一个抓取草榴技术讨论帖子内容并在本地展示的小爬虫

  •        201

一个抓取草榴技术讨论帖子内容并在本地展示的小爬虫

https://github.com/leyle/caoliuscrapy

Tags
Implementation
License
Platform

   




Related Projects

scrapyrt - Scrapy realtime

  •    Python

HTTP server which provides API for scheduling Scrapy spiders and making requests with spiders. Allows you to easily add HTTP API to your existing Scrapy project. All Scrapy project components (e.g. middleware, pipelines, extensions) are supported out of the box. You simply run Scrapyrt in Scrapy project directory and it starts HTTP server allowing you to schedule your spiders and get spider output in JSON format.

scrapy-proxies - Random proxy middleware for Scrapy

  •    Python

Processes Scrapy requests using a random proxy from list to avoid IP ban and improve crawling speed. For older versions of Scrapy (before 1.0.0) you have to use scrapy.contrib.downloadermiddleware.retry.RetryMiddleware and scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware middlewares instead.

scrapy-redis - Redis-based components for Scrapy.

  •    Python

Redis-based components for Scrapy. You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.

Introduction-to-Tornado - This is the sample code for the Introduction to Tornado book, published by O'Reilly Media

  •    Python

Tornado is a scalable, non-blocking web server and web application framework written in Python. It is also light-weight to deploy, fun to write for, and incredibly powerful. Tornado was written with performance in mind, aiming to solve the C10k problem, so by design it’s an extremely high-performance framework. It’s also packed with handy tools for dealing with social networks, user authentication, and all sorts of asynchronous fun. In this book, we’ll cover the basics of the Tornado framework, starting with the features that make it so great, and working our way towards some real-life examples. We’ll cover the best practices for implementation and deployment, as well as a sampling of uses for the technology. These are the complete code examples for the chapters in the O'Reilly book of the same name, with complete application examples that you can run on your own. These shouldn't require anything beyond the basic install of Tornado and Python 2.6+, except for the MongoDB examples (which obviously require MongoDB, as well as pymongo).


scrapy-redis - Redis-based components for scrapy that allows distributed crawling

  •    Python

Redis-based components for scrapy that allows distributed crawling

scrapy-zhihu-github - scrapy examples for crawling zhihu and github

  •    Python

scrapy examples for crawling zhihu and github

scrapyd - A service daemon to run Scrapy spiders

  •    Python

Scrapyd is a service for running Scrapy spiders. It allows you to deploy your Scrapy projects and control their spiders using an HTTP JSON API.

scrapy-examples - Multifarious Scrapy examples

  •    Python

Multifarious scrapy examples with integrated proxies and agents, which make you comfy to write a spider. There are several depths in the spider, and the spider gets real data from depth2.

tornado-boilerplate - A standard layout for Tornado apps

  •    Python

tornado-boilerplate is an attempt to set up an convention for Tornado app layouts, to assist in writing utilities to deploy such applications. A bit of convention can go a long way. This app layout is the one assumed by buedafab.

tornado-celery - Non-blocking Celery client for Tornado

  •    Python

NOTE: Currently callbacks only work with AMQP and Redis backends. To use the Redis backend, you must install tornado-redis.

tornado-production-skeleton - Simple example of a Tornado app in production

  •    Python

This is a skeletal example of one way to run a Tornado application in production. It currently covers running the application under Supervisor. Future additions may include automating initial setup and deploying new code (e.g. with Fabric) and running multiple processes behind a proxy (e.g. nginx). This is our application; it's just the chat demo from the Tornado distribution.

tornado - Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed

  •    Python

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user. This example does not use any of Tornado's asynchronous features; for that see this simple chat room.

crc-gen-verilog-vhdl

  •    C

CRC Generator is a command-line application that generates Verilog or VHDL code for CRC of any data width between 1 and 1024 and polynomial width between 1 and 1024. The code is written in C for Win32, bus easily portable for other platforms

2048 - A small clone of 1024 (https://play.google.com/store/apps/details?id=com.veewo.a1024)

  •    CSS

A small clone of 1024, based on Saming's 2048 (also a clone). Anna Harren and sigod are maintainers for this repository.

python-certifi - (Python Distribution) A carefully curated collection of Root Certificates for validating the trustworthiness of SSL certificates while verifying the identity of TLS hosts

  •    Python

Certifi is a carefully curated collection of Root Certificates for validating the trustworthiness of SSL certificates while verifying the identity of TLS hosts. It has been extracted from the Requests project. Browsers and certificate authorities have concluded that 1024-bit keys are unacceptably weak for certificates, particularly root certificates. For this reason, Mozilla has removed any weak (i.e. 1024-bit key) certificate from its bundle, replacing it with an equivalent strong (i.e. 2048-bit or greater key) certificate from the same CA. Because Mozilla removed these certificates from its bundle, certifi removed them as well.

fastdfs - FastDFS is an open source high performance distributed file system (DFS)

  •    C

FastDFS is an open source high performance distributed file system. It's major functions include: file storing, file syncing and file accessing (file uploading and file downloading), and it can resolve the high capacity and load balancing problem. FastDFS should meet the requirement of the website whose service based on files such as photo sharing site and video sharing site. FastDFS has two roles: tracker and storage. The tracker takes charge of scheduling and load balancing for file access. The storage store files and it's function is file management including: file storing, file syncing, providing file access interface. It also manage the meta data which are attributes representing as key value pair of the file. For example: width=1024, the key is "width" and the value is "1024".

django-dynamic-scraper - Creating Scrapy scrapers via the Django admin interface

  •    Python

Creating Scrapy scrapers via the Django admin interface

distribute_crawler - 使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现

  •    Python

使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现