Displaying 1 to 20 from 20 results

x-ray - The next web scraper. See through the <html> noise.

  •    Javascript

Looking for a career upgrade? Check out the available Node.js & Javascript positions at these innovative companies.Flexible schema: Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.

wring - Extract content from webpages using CSS Selectors, XPath, and JS expressions

  •    PureScript

Wring utilizes PhantomJS for some of its commands. To use these, install it using your system package manager by running something like brew install phantomjs on OS X, or apt-get install phantomjs on Ubuntu. You can make sure it's on your PATH by running phantomjs -v.

cheerio-advanced-selectors - Add advanced selector support to cheerio

  •    Javascript

This module is inspired by cheerio-eq with the added support for many different selectors.Gotcha: The result returned from .load() isn't a cheerio object but a custom function used to wrap the cheerio-advanced-selector logic (see issue 2).

cheerio-eq - Add :eq() selector functionality to cheerio

  •    Javascript

Add :eq() selector functionality to cheerio.If you are looking for a more advanced solution with support for multiple advanved selectors, check out the cheerio-advanced-selectors module.

node-scrap - A simple screen scraper module that uses jQuery style semantics.

  •    Javascript

A simple screen scraper module that uses jQuery style semantics.In every screen scraper program that I wrote, I had to include request and cheerio. I would then have to check the response error object and the response code. It became a bit annoying. Hence this package.

grunt-dom-munger - Grunt task to read and manipulate HTML with CSS selectors.

  •    Javascript

Read and manipulate HTML with CSS selectors. Ex. read <script> tags from your html. Remove nodes, add nodes, and more.

node-nom - Dead simple site scrapper for Node.js

  •    Javascript

Om nom nom. Super simple asynchronous screen scrapper for Node.js. Nom uses cheerio to provide the core jQuery API for grabbing and manipulating the response.The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

marky-markdown-lite - A version of marky-markdown that does less.

  •    Javascript

A version of marky-markdown that does less. This little module converts markdown to HTML with markdown-it (a fast and CommonMark compliant parser), then parses that HTML into a queryable DOM object using cheerio.

domtosource - This JavaScript module wraps around Cheerio and magically calculates the line and column numbers where DOM elements appear in the HTML source code

  •    HTML

This module wraps around Cheerio and magically calculates the line and column number where DOM elements appear in the HTML source code. In the usage example above, you can see that domtosource.find() takes two parameters.

nodejs-web-scraper-cookbook - 📝 Resources for web scraping with node.js


The web is a wealth of information, not all of it is easily accesible in a "data format" like RSS or an API. Web scrapers can turn inaccessible information into actionable inputs. In this repo I'll try to provide some examples of various strategies for web scraping. I'll also document the various tools and packages I find useful along with various other tips. This repo is primarily concerned with web scraping using Node.js.

css-razor - css-razor uses cheerio.js to parse static html to remove unused selectors in CSS.

  •    Javascript

css-razor is a fast way to remove unused selectors from css. Essentially, it accomplishes the same goal as uncss. However, it accomplishes this goal differently. Rather than loading a webpage in phantomjs and using document.querySelector to determine if a selector is being used, css-razor uses cheeriojs to parse static html and css files to removed unused selectors. Below is an example of building an html file from a react app created with create-react-app. The resulting HTML file can then be used for server rendering and detecting selectors with css-razor.

backbone-serverside-adapters - Node.js server-side Backbone adapters

  •    Javascript

This project helps to run isomorphic Backbone applications on node.js servers. It intends to provide a feasible subset of a set of adapters to fill in the blanks of missing DOM and HTML5 APIs on the server side. For more context on the problematics, see e.g. my Mozilla Hacks article or HTML5 DevConf presentation, or for another viewpoint, some of Spike's isomorphic app presentations. This project exists for those that don't want to build a full abstraction layer on top of Backbone to hide jQuery, XHR and Backbone History API problems.

cheers - Scrape a website efficiently, block by block, page by page. Based on cheerio and curl.

  •    HTML

Scrape a website efficiently, block by block, page by page. This is a Cheerio based scraper, useful to extract data from a website using CSS selectors. The motivation behind this package is to provide a simple cheerio-based scraping tool, able to divide a website into blocks, and transform each block into a JSON object using CSS selectors.

cheerio-req - An http request module sending back a Cheerio object.

  •    Javascript

An http request module sending back a Cheerio object. Please post questions on Stack Overflow. You can open issues with questions, as long you add a link to your Stack Overflow question.

node-krawler - Fast and lightweight web crawler with built-in cheerio, xml and json parser.

  •    Javascript

mikeal/request is used for fetching web pages so any desired option from this package can be passed to Krawler's constructor. After Krawler emits the 'data' event, it automatically continues to a next url address. It does not care if the result was processed or not. If you would like to have a full control over the result handling, you can turn on the custom callback option. Then you can control the program flow by invoking your callback. Don't forget to call it in every case, otherwise the queue will stuck.

cheerio-repl - Node JS REPL for interacting with a live Cheerio DOM.

  •    Javascript

A Node JS REPL for interacting with live Cheerio DOM.

Vue-StudyMaps - 使用 Vue.js 开发的聚合应用。通过爬虫抓取平时浏览的网站,省去逐个点开网页的时间。

  •    Javascript

使用 Vue.js 开发的聚合应用。通过爬虫抓取平时浏览的网站,省去逐个点开网页的时间。

gulp-cheerio - Manipulate HTML and XML files with Cheerio in Gulp.

  •    Javascript

This is a plugin for gulp which allows you to manipulate HTML and XML files using cheerio. The main run function passed to cheerio now receives either two or three arguments ($, file[, done]) instead of one or two arguments ($[, done]). Make sure you update your build scripts accordingly. See the usage examples below.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.