Displaying 1 to 20 from 48 results

x-ray - The next web scraper. See through the <html> noise.

  •    Javascript

Looking for a career upgrade? Check out the available Node.js & Javascript positions at these innovative companies.Flexible schema: Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.

Styleguide

  •    Javascript

Devbridge Styleguide helps you create, share, and automate a living visual style library of your brand. Share your digital brand standards, improve team collaboration, and implement an independent easily-extendable modular structure. Note, do not download files directly from git repository, unless you know what you are doing.

cloudflare-scrape - A Python module to bypass Cloudflare's anti-bot page.

  •    Python

A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests. Cloudflare changes their techniques periodically, so I will update this repo frequently. This can be useful if you wish to scrape or crawl a website protected with Cloudflare. Cloudflare's anti-bot page currently just checks if the client supports Javascript, though they may add additional techniques in the future.

scrape-it - :crystal_ball: A Node.js scraper for humans.

  •    Javascript

A Node.js scraper for humans. Please post questions on Stack Overflow. You can open issues with questions, as long you add a link to your Stack Overflow question.




node-google - A Node.js module to search and scrape Google.

  •    Javascript

This module allows you to search google by scraping the results. It does NOT use the Google Search API. PLEASE DO NOT ABUSE THIS. The intent of using this is convenience vs the cruft that exists in the Google Search API.This is not sponsored, supported, or affiliated with Google Inc.

wring - Extract content from webpages using CSS Selectors, XPath, and JS expressions

  •    PureScript

Wring utilizes PhantomJS for some of its commands. To use these, install it using your system package manager by running something like brew install phantomjs on OS X, or apt-get install phantomjs on Ubuntu. You can make sure it's on your PATH by running phantomjs -v.

node-readability - Scrape/Crawl article from any site automatically

  •    Javascript

In my case, the speed of spider is about 1500k documents per day, and the maximize crawling speed is 1.2k /minute, avg 1k /minute, the memory cost are about 200 MB on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.


scrape-url - Scrape URLs with CSS selectors

  •    Javascript

Scrape URLs with CSS selectors and returns elements with jQuery-like interface.See example.js for more information.

cheerio-advanced-selectors - Add advanced selector support to cheerio

  •    Javascript

This module is inspired by cheerio-eq with the added support for many different selectors.Gotcha: The result returned from .load() isn't a cheerio object but a custom function used to wrap the cheerio-advanced-selector logic (see issue 2).

awwwards-stream - scrape Awwwards data

  •    Javascript

Creates a readable stream of Awwwards.com data by scraping their HTML.⚠️ This is fragile and should only be used for offline experimentation / artistic purposes. It is not an official API and you should rate limit your requests to keep stress off the Awwwards servers. It may break at any point and should not be used in a live Node.js server.

node-linkscrape - A Node.js module to scrape and normalize links from an HTML string.

  •    Javascript

This module allows scrapes links from an HTML string and normalizes them. It does not actually perform the HTTP request. Use superagent or request for that.You must pass in the URL (of where the HTML string came from) to the scrape() method so that it can normalize the links.

npm-user - Get user info of an npm user

  •    Javascript

Since npm has no API for this we're forced to scrape the profile page.Use the faster npm-email if you only need the email.

metahub - github metadata cache/mirror

  •    Javascript

A per-repo, always-up-to-date cache of Github's meta-data (like issues, PRs, and comments). Great for speeding up requests and avoiding running over GitHub's API limits.Long-running services that want to frequently calculate/respond to metrics based on Github Issues, PRs, and comments. Metahub was created for Mary Poppins, a tool for helping to manage issues and PRs on populat Github repos.