Displaying 1 to 20 from 52 results

x-ray - The next web scraper. See through the <html> noise.

  •    Javascript

Looking for a career upgrade? Check out the available Node.js & Javascript positions at these innovative companies.Flexible schema: Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.

Styleguide

  •    Javascript

Devbridge Styleguide helps you create, share, and automate a living visual style library of your brand. Share your digital brand standards, improve team collaboration, and implement an independent easily-extendable modular structure. Note, do not download files directly from git repository, unless you know what you are doing.

cloudflare-scrape - A Python module to bypass Cloudflare's anti-bot page.

  •    Python

A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests. Cloudflare changes their techniques periodically, so I will update this repo frequently. This can be useful if you wish to scrape or crawl a website protected with Cloudflare. Cloudflare's anti-bot page currently just checks if the client supports Javascript, though they may add additional techniques in the future.

scrape-it - :crystal_ball: A Node.js scraper for humans.

  •    Javascript

A Node.js scraper for humans. Please post questions on Stack Overflow. You can open issues with questions, as long you add a link to your Stack Overflow question.




autoscraper - A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  •    Python

This project is made for automatic web scraping to make scraping easy. It gets a url or the html content of a web page and a list of sample data which we want to scrape from that page. This data can be text, url or any html tag value of that page. It learns the scraping rules and returns the similar elements. Then you can use this learned object with new urls to get similar content or the exact same element of those new pages. It's compatible with python 3.

metascraper - Easily scrape data from websites using Open Graph, HTML metadata & fallbacks.

  •    HTML

A library to easily scrape metadata from an article on the web using Open Graph, JSON+LD, regular HTML metadata, and series of fallbacks. metascraper is library to easily scrape metadata from an article on the web using Open Graph metadata, regular HTML metadata, and series of fallbacks.

node-google - A Node.js module to search and scrape Google.

  •    Javascript

This module allows you to search google by scraping the results. It does NOT use the Google Search API. PLEASE DO NOT ABUSE THIS. The intent of using this is convenience vs the cruft that exists in the Google Search API.This is not sponsored, supported, or affiliated with Google Inc.


wring - Extract content from webpages using CSS Selectors, XPath, and JS expressions

  •    PureScript

Wring utilizes PhantomJS for some of its commands. To use these, install it using your system package manager by running something like brew install phantomjs on OS X, or apt-get install phantomjs on Ubuntu. You can make sure it's on your PATH by running phantomjs -v.

node-readability - Scrape/Crawl article from any site automatically

  •    Javascript

In my case, the speed of spider is about 1500k documents per day, and the maximize crawling speed is 1.2k /minute, avg 1k /minute, the memory cost are about 200 MB on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.

scrape-url - Scrape URLs with CSS selectors

  •    Javascript

Scrape URLs with CSS selectors and returns elements with jQuery-like interface.See example.js for more information.

cheerio-advanced-selectors - Add advanced selector support to cheerio

  •    Javascript

This module is inspired by cheerio-eq with the added support for many different selectors.Gotcha: The result returned from .load() isn't a cheerio object but a custom function used to wrap the cheerio-advanced-selector logic (see issue 2).

awwwards-stream - scrape Awwwards data

  •    Javascript

Creates a readable stream of Awwwards.com data by scraping their HTML.⚠️ This is fragile and should only be used for offline experimentation / artistic purposes. It is not an official API and you should rate limit your requests to keep stress off the Awwwards servers. It may break at any point and should not be used in a live Node.js server.

node-linkscrape - A Node.js module to scrape and normalize links from an HTML string.

  •    Javascript

This module allows scrapes links from an HTML string and normalizes them. It does not actually perform the HTTP request. Use superagent or request for that.You must pass in the URL (of where the HTML string came from) to the scrape() method so that it can normalize the links.

npm-user - Get user info of an npm user

  •    Javascript

Since npm has no API for this we're forced to scrape the profile page.Use the faster npm-email if you only need the email.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.