Displaying 1 to 20 from 42 results

dht - BitTorrent DHT Protocol && DHT Spider.

  •    Go

See the video on the Youtube.It contains two modes, the standard mode and the crawling mode. The standard mode follows the BEPs, and you can use it as a standard dht server. The crawling mode aims to crawl as more metadata info as possiple. It doesn't follow the standard BEPs protocol. With the crawling mode, you can build another BTDigg.




Crawler-Detect - 🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

  •    PHP

CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent and http_from header. Currently able to detect 1,000's of bots/spiders/crawlers. Run composer require jaybizzle/crawler-detect 1.* or add "jaybizzle/crawler-detect" :"1.*" to your composer.json.

huntsman - Super configurable async web spider

  •    Javascript

Huntsman takes one or more 'seed' urls with the spider.queue.add() method. Once the process is kicked off with spider.start(), it will take care of extracting links from the page and following only the pages we want.

glyphhanger - Your web font utility belt

  •    Javascript

Your web font utility belt. It shows what unicode-ranges are used on a web site (optionally for a font-family or for each font-family). It can also subset web fonts. It makes julienne fries. Available on npm.

AlipayOrdersSupervisor - :sparkles: 使用Node监视支付宝订单,即时通知服务器以实现免签约支付接口

  •    Javascript

支付宝免签约支付接口实现脚本 - NodeJS 版本 . 目前支付宝已经加强了登录的校验,极大影响工具便利性,现在推出了另一种解决方案,见利用有赞云和有赞微小店实现个人收款解决方案提供一种思路参考,可以直接按此仓库使用的方法应用到自己的系统中,或使用该仓库作为一个独立的服务.


tumblr_spider - 汤不热 python 多线程爬虫

  •    Python

汤不热 python 多线程爬虫

node-readability - Scrape/Crawl article from any site automatically

  •    Javascript

In my case, the speed of spider is about 1500k documents per day, and the maximize crawling speed is 1.2k /minute, avg 1k /minute, the memory cost are about 200 MB on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.

portSpider - 🕷 A lightning fast multithreaded network scanner framework with modules.

  •    Python

I'm not responsible for anything you do with this program, so please only use it for good and educational purposes. Copyright (c) 2017 by David Schütz. Some rights reserved.

marvel-characters - :boom: all Marvel comic book characters

  •    Javascript

A list of all public comic book character names in the Marvel universe, sourced from the API.Total Characters: 1,252 Last Updated: Sunday, July 19th, 2015.

snap - Creates a static snapshot of a website. Sort of like wget's mirror mode, but with nice urls

  •    Javascript

Like wget -r <url> but specifically designed to support "pretty" URLs. With wget, a URL pointing to /foo would result in /foo.html, but this means the URL has now changed.With snap, it will create the directory /foo and save the file to /foo/index.html so that the URL /foo still works.

voxel-spider - blocky spider creatures for your voxel.js game

  •    Javascript

Return a function createSpider from the voxel-engine instance game.Create a spider.

spider-pig - Get a list of local URL links from a root URL.

  •    Javascript

Get a list of local URL links from a root URL. Works with JavaScript generated content. Can also act as a live-DOM CSS search across multiple files (find all the templates that are using the CSS selector I want to change). Normalizes all of the matching URLs to be full absolute URLs (including host and protocol and path, etc).

node-tarantula - web crawler/spider for nodejs

  •    Javascript

nodejs crawler/spider which provides a simple interface for crawling the Web. Its API has been inspired by crawler4j.

Laravel-Crawler-Detect - A Laravel wrapper for CrawlerDetect - the web crawler detection library

  •    PHP

Run composer require jaybizzle/laravel-crawler-detect 1.* or add "jaybizzle/laravel-crawler-detect": "1.*" to your composer.json file. The last version compatible with Laravel 4 was v1.0.2 so if you need that, you will have to fix your composer.json to that specific version.

phanos - This is a simple stress test tool. This tool just walk around yours site.

  •    Javascript

Simple human like stress test tool. This tools doesn't provide any stat info, or special logging functionality. This tool just walking on yours site as true user.

node-krawler - Fast and lightweight web crawler with built-in cheerio, xml and json parser.

  •    Javascript

mikeal/request is used for fetching web pages so any desired option from this package can be passed to Krawler's constructor. After Krawler emits the 'data' event, it automatically continues to a next url address. It does not care if the result was processed or not. If you would like to have a full control over the result handling, you can turn on the custom callback option. Then you can control the program flow by invoking your callback. Don't forget to call it in every case, otherwise the queue will stuck.