Displaying 1 to 20 from 23 results

Styleguide

  •    Javascript

Devbridge Styleguide helps you create, share, and automate a living visual style library of your brand. Share your digital brand standards, improve team collaboration, and implement an independent easily-extendable modular structure. Note, do not download files directly from git repository, unless you know what you are doing.

weixin - 微信小游戏辅助合集(加减大师、包你懂我、大家来找茬腾讯版、头脑王者、好友画我、悦动音符、我最在行、星途WeGoing、猜画小歌、知乎答题王、腾讯中国象棋、跳一跳、题多多黄金版)

  •    Javascript

微信小游戏辅助合集(加减大师、包你懂我、大家来找茬腾讯版、头脑王者、好友画我、悦动音符、我最在行、星途WeGoing、猜画小歌、知乎答题王、腾讯中国象棋、跳一跳、题多多黄金版)

TheA11yMachine - The A11y Machine is an automated accessibility testing tool which crawls and tests pages of any web application to produce detailed reports

  •    Javascript

Accessibility is not only a concern for disabled people. Bots can be considered as such, like DuckDuckGo, Google or Bing. By respecting these standards, you're likely to have a better ranking. Also it helps to clean your code. Accessibility issues are often left unaddressed for budget reasons. In fact most of the cost is spent looking for errors on your website. The A11y Machine greatly help with this task, you can thus focus on fixing your code and reap the benefits. If you would like to validate your pages against the HTML5 recommendation, then you need to install Java.




huntsman - Super configurable async web spider

  •    Javascript

Huntsman takes one or more 'seed' urls with the spider.queue.add() method. Once the process is kicked off with spider.start(), it will take care of extracting links from the page and following only the pages we want.

node-readability - Scrape/Crawl article from any site automatically

  •    Javascript

In my case, the speed of spider is about 1500k documents per day, and the maximize crawling speed is 1.2k /minute, avg 1k /minute, the memory cost are about 200 MB on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.

grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

  •    Python

grab-site is an easy preconfigured web crawler designed for backing up websites. Give grab-site a URL and it will recursively crawl the site and write WARC files. Internally, grab-site uses wpull for crawling. a dashboard with all of your crawls, showing which URLs are being grabbed, how many URLs are left in the queue, and more.


github-dependency-crawl - :beetle: Crawl GitHub issues to build a dependency graph

  •    Javascript

Crawl GitHub issues to build a dependency graph.Where keys indicate issues in the graph, and each maps to a list of its dependencies.

snap - Creates a static snapshot of a website. Sort of like wget's mirror mode, but with nice urls

  •    Javascript

Like wget -r <url> but specifically designed to support "pretty" URLs. With wget, a URL pointing to /foo would result in /foo.html, but this means the URL has now changed.With snap, it will create the directory /foo and save the file to /foo/index.html so that the URL /foo still works.

spider2 - A 2nd generation spider to crawl any article site, automatic read title and article.

  •    Javascript

A 2nd generation spider to crawl any article site, automatic reading title and content.In my case, the speed of spider is about 700 thousands documents per day, 22 million per month, and the maximize crawling speed is 450 per minute, avg 80 per minute, the memory cost are about 200 megabytes on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.

node-amazon-products - A node.js module to crawl product IDs from Amazon.

  •    CoffeeScript

A node.js module to crawl product IDs from Amazon. Amazon Product Advertising API can access pages which are between 1 to 10 only. Using this module you can retrieve pages between 1 to 400 like an Amazon product list page.

node-amazon-reviews - A node.js module to crawl product reviews from Amazon.

  •    CoffeeScript

A node.js module to crawl product reviews from Amazon. Amazon Product Advertising API provides almost attributes about a product. But review data cannot be gathered by API. Use this module if you want to get product reviews.

crawl - Lightweight library for scalable crawlers in Go.

  •    Go

Lightweight library for crawlers in Go. HTML parsing and extracting is done thanks to goquery.

crawl - Utility to crawl and diff websites for node.js

  •    Javascript

NOTE: This project is no longer being maintained by me. If you are interested in taking over maintenance of this project, let me know. Crawl, as it's name implies, will crawl around a website, discovering all of the links and their relationships starting from a base URL. The output of crawl is a JSON object representing a sitemap of every resource within a site, including each links outbound references and any inbound refferers.

diffbot-php-client - The official Diffbot client library

  •    PHP

This package is a slightly overengineered Diffbot API wrapper. It uses PSR-7 and PHP-HTTP friendly client implementations to make API calls. To learn more about Diffbot see here and their homepage. Right now it only supports Analyze, Product, Image, Discussion, Crawl, Search, and Article APIs, but can also accommodate Custom APIs. Video and Bulk API support coming soon. Full documentation available here.

parsehub - ParseHub Node.js Client

  •    Javascript

This is the unofficial Node.js client for ParseHub, a platform for scraping websites and creating APIs out of the extracted data. Require the module and initialize an instance using your ParseHub API key.

phantomCrawl

  •    Javascript

This project is licensed under the MIT Licence. See LICENCE.txt for details.