tumblr_spider - 汤不热 python 多线程爬虫

  •        50

汤不热 python 多线程爬虫

https://github.com/facert/tumblr_spider

Tags
Implementation
License
Platform

   




Related Projects

Spider - Data Sharding and Partitioning for MySQL

  •    C

Spider is the first step when accessing a remote database and a storage engine that updates and improves upon the existing architecture of MySQL. Spider supports database transaction. In addition Spider allows an unlimited amount of users to quickly access the MySQL database. Spider supports MySQL partitioning. Spider operates as a "cluster" acting as a single system that supports unlimited parallel processing.

tumblr-boilerplate - :zap: A true bare bones Tumblr theme for a quick jump-start

  •    HTML

A fully functional bare-bones Tumblr theme that works out of the box. Style it to your needs. The goal of the project was to remove uncessary code easing the development process. Tumblr will auto-inject code (such as Open Graph Protocol, Twitter Cards & javascript) into the final result for your page. This is out of the theme developers' control. Running it through a HTML Validator or Page Speed may spit out warnings & errors.

Spider Compiler

  •    

Spider Compiler parses the input of a spider programming source file and compiles it (with help of csc.exe; the C#-Compiler) to an exe-file. This project is developed in C#.

SPIDER on Rails

  •    Java

SPIDER on Rails (new name of J2EE Spider) is a open source tool for rapidly developing form-based web applications. See more: http://www.infoq.com/news/2008/03/J2EE-Spider

WLPG Tumblr Plugin

  •    CSharp

Windows Live Photo Gallery Tumblr publishing plugin makes it easier for you to publish your tumblr images directly from your Windows Live Photo Gallery. It's developed under C# and it's free! The project is developed under VS 2010


tumblr-utils - Utilities for dealing with Tumblr blogs, Tumblr backup

  •    Python

This is a collection of utilities dealing with Tumblr blogs. These scripts are or have been useful to me over the years.

node-rolling-spider - A library for controlling a Parrot Rolling Spider drone via BLE.

  •    Javascript

There are a few steps you should take when getting started with this. We're going to learn how to get there by building out a simple script that will take off, move forward a little, then land.To connect you need to create a new Drone instance.

node-readability - Scrape/Crawl article from any site automatically

  •    Javascript

In my case, the speed of spider is about 1500k documents per day, and the maximize crawling speed is 1.2k /minute, avg 1k /minute, the memory cost are about 200 MB on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.

Monkey-Spider

  •    Python

The Monkey-Spider is a crawler based low-interaction Honeyclient Project. It is not only restricted to this use but it is developed as such. The Monkey-Spider crawles Web sites to expose their threats to Web clients.

dhtspider - Bittorrent dht network spider

  •    Javascript

Bittorrent dht network infohash spider, for engiy.com[a bittorrent resource search engine]

scrapy-examples - Multifarious Scrapy examples

  •    Python

Multifarious scrapy examples with integrated proxies and agents, which make you comfy to write a spider. There are several depths in the spider, and the spider gets real data from depth2.

php-spider - A configurable and extensible PHP web spider

  •    PHP

The easiest way to install PHP-Spider is with composer. Find it on Packagist. This is a very simple example. This code can be found in example/example_simple.php. For a more complete example with some logging, caching and filters, see example/example_complex.php. That file contains a more real-world example.

Tumblr#

  •    DotNet

A fully featured C# Tumblr client for Windows 8, Windows Phone 8 and WPF. Includes multi-photo posting using multipart forms.

TMTumblrSDK - Unopinionated and flexible library for easily integrating Tumblr data into your iOS or OS X application

  •    Objective-C

An unopinionated and flexible library for easily integrating Tumblr data into your iOS or OS X application. The library uses ARC requires at least iOS 9 or OS X 10.10. If you have any feature requests, please let us know by creating an issue or submitting a pull request. Please use the Tumblr API responsibly and adhere to the Application Developer and API License Agreement.

tumblr.js - JavaScript client for the Tumblr API

  •    HTML

The official JavaScript client library for the Tumblr API. Check out the detailed documentation here. Different API methods use different kinds of authentication.

tarantula - a big hairy fuzzy spider that crawls your site, wreaking havoc

  •    Ruby

a big hairy fuzzy spider that crawls your site, wreaking havoc

node-crawler - Web Crawler/Spider for NodeJS + server-side jQuery ;-)

  •    Javascript

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

spidermonkey - DEFUNCT: PHP Web Spider I started in 2011, accepts regex, css selectors

  •    PHP

DEFUNCT: PHP Web Spider I started in 2011, accepts regex, css selectors

anemone - Anemone web-spider framework

  •    Ruby

Anemone web-spider framework