Displaying 1 to 20 from 30 results

spider - Unsurprising JavaScript - No longer active

  •    Javascript

The Next-Gen Programming Language for the Web. Note: This project is no longer active.

huntsman - Super configurable async web spider

  •    Javascript

Huntsman takes one or more 'seed' urls with the spider.queue.add() method. Once the process is kicked off with spider.start(), it will take care of extracting links from the page and following only the pages we want.

glyphhanger - Your web font utility belt

  •    Javascript

Your web font utility belt. It shows what unicode-ranges are used on a web site (optionally for a font-family or for each font-family). It can also subset web fonts. It makes julienne fries. Available on npm.

AlipayOrdersSupervisor - :sparkles: 使用Node监视支付宝订单,即时通知服务器以实现免签约支付接口

  •    Javascript

支付宝免签约支付接口实现脚本 - NodeJS 版本 . 目前支付宝已经加强了登录的校验,极大影响工具便利性,现在推出了另一种解决方案,见利用有赞云和有赞微小店实现个人收款解决方案提供一种思路参考,可以直接按此仓库使用的方法应用到自己的系统中,或使用该仓库作为一个独立的服务.




node-readability - Scrape/Crawl article from any site automatically

  •    Javascript

In my case, the speed of spider is about 1500k documents per day, and the maximize crawling speed is 1.2k /minute, avg 1k /minute, the memory cost are about 200 MB on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.

dhtspider - Bittorrent dht network spider

  •    Javascript

Bittorrent dht network infohash spider, for engiy.com[a bittorrent resource search engine]

marvel-characters - :boom: all Marvel comic book characters

  •    Javascript

A list of all public comic book character names in the Marvel universe, sourced from the API.Total Characters: 1,252 Last Updated: Sunday, July 19th, 2015.


snap - Creates a static snapshot of a website. Sort of like wget's mirror mode, but with nice urls

  •    Javascript

Like wget -r <url> but specifically designed to support "pretty" URLs. With wget, a URL pointing to /foo would result in /foo.html, but this means the URL has now changed.With snap, it will create the directory /foo and save the file to /foo/index.html so that the URL /foo still works.

voxel-spider - blocky spider creatures for your voxel.js game

  •    Javascript

Return a function createSpider from the voxel-engine instance game.Create a spider.

spider2 - A 2nd generation spider to crawl any article site, automatic read title and article.

  •    Javascript

A 2nd generation spider to crawl any article site, automatic reading title and content.In my case, the speed of spider is about 700 thousands documents per day, 22 million per month, and the maximize crawling speed is 450 per minute, avg 80 per minute, the memory cost are about 200 megabytes on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.

spider-pig - Get a list of local URL links from a root URL.

  •    Javascript

Get a list of local URL links from a root URL. Works with JavaScript generated content. Can also act as a live-DOM CSS search across multiple files (find all the templates that are using the CSS selector I want to change). Normalizes all of the matching URLs to be full absolute URLs (including host and protocol and path, etc).

node-tarantula - web crawler/spider for nodejs

  •    Javascript

nodejs crawler/spider which provides a simple interface for crawling the Web. Its API has been inspired by crawler4j.

phanos - This is a simple stress test tool. This tool just walk around yours site.

  •    Javascript

Simple human like stress test tool. This tools doesn't provide any stat info, or special logging functionality. This tool just walking on yours site as true user.

wanimal-spider - An image spider for a certain Lofter Blog called WANIMAL.

  •    Javascript

An image spider for a certain Lofter Blog called WANIMAL. Notice: You'd better to redirect your STDOUT by using >.

scrap.js - Scrapping tool for node.js

  •    Javascript

This example makes use of jQuery to traverse the page, and shows how to download binary files. This example shows how to download the page as string and use regular expressions with jsMatch to extract meaningful parts.

node-krawler - Fast and lightweight web crawler with built-in cheerio, xml and json parser.

  •    Javascript

mikeal/request is used for fetching web pages so any desired option from this package can be passed to Krawler's constructor. After Krawler emits the 'data' event, it automatically continues to a next url address. It does not care if the result was processed or not. If you would like to have a full control over the result handling, you can turn on the custom callback option. Then you can control the program flow by invoking your callback. Don't forget to call it in every case, otherwise the queue will stuck.

email-extractor - extract emails from whole of website

  •    Javascript

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

es6-crawler-detect - :spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent

  •    Javascript

This Library is an ES6 version of the original PHP class @CrawlerDetect, it helps you detect bots/crawlers and spiders only by scanning the user-agent string or from the global request.headers. If you find a bot/spider/crawler user agent that CrawlerDetect fails to detect, please submit a pull request with the regex pattern added to the data array in ./crawler/crawlers.js.