lightcrawler - Crawl a website and run it through Google lighthouse

  •        55

Crawl a website and run it through Google lighthouse

https://github.com/github/lightcrawler

Dependencies:

async : ^2.4.1
cheerio : ^1.0.0-rc.1
colors : ^1.1.2
lighthouse : ^2.1.0
simplecrawler : ^1.1.3
yargs : ^8.0.2

Tags
Implementation
License
Platform

   




Related Projects

lighthouse - Auditing, performance metrics, and best practices for Progressive Web Apps

  •    Javascript

Lighthouse analyzes web apps and web pages, collecting modern performance metrics and insights on developer best practices. Lighthouse is integrated directly into the Chrome Developer Tools, under the "Audits" panel.

lighthousebot - Run Lighthouse in CI, as a web service, using Docker. Pass/Fail GH pull requests.

  •    Javascript

This repo contains the frontend and backend for running Lighthouse in CI and integration with Github Pull Requests. An example web service is hosted for demo purposes. Note: This repo was previously named "lighthouse-ci".

headless-chrome-crawler - Distributed crawler powered by Headless Chrome

  •    Javascript

Crawlers based on simple requests to HTML files are generally fast. However, it sometimes ends up capturing empty bodies, especially when the websites are built on such modern frontend frameworks as AngularJS, React and Vue.js. Note: headless-chrome-crawler contains Puppeteer. During installation, it automatically downloads a recent version of Chromium. To skip the download, see Environment variables.

lighthouse-ci - Run Lighthouse in CI using Docker

  •    Javascript

This repo contains the frontend and backend for the Lighthouse CI server. Please note: This drop in service is considered Beta. There are no SLAs or uptime guarantees. If you're interested in running your own CI server in a Docker container, check out Running your own CI server.

crawler - An easy to use, powerful crawler implemented in PHP. Can execute Javascript.

  •    PHP

This package provides a class to crawl links on a website. Under the hood Guzzle promises are used to crawl multiple urls concurrently. Because the crawler can execute JavaScript, it can crawl JavaScript rendered sites. Under the hood Chrome and Puppeteer are used to power this feature.


catgate - CatGate is a small crawler framework based on Chrome extension

  •    Vue

CatGate is a small crawler framework based on Chrome extension . CatGate是一个基于浏览器插件的数据抓取工具。做成浏览器插件无需模拟登入,能最真实的模仿用户行为和特征。

enhanced-github - :rocket: Chrome extension to display size of each file, download link and copy file contents directly to clipboard

  •    Javascript

Note: For private repos(Issue #6), Github Access Token is required. Follow the steps mentioned below to add your Github Access Token. Since this extension fetches data using Github public v3 API for showing file size and download_url, it consumes free quota which is very less Github API Rate Limiting.

ferret - Declarative web scraping

  •    Go

ferret is a web scraping system aiming to simplify data extraction from the web for such things like UI testing, machine learning and analytics. Having its own declarative language, ferret abstracts away technical details and complexity of the underlying technologies, helping to focus on the data itself. It's extremely portable, extensible and fast. The following example demonstrates the use of dynamic pages. First of all, we load the main Google Search page, type search criteria into an input box and then click a search button. The click action triggers a redirect, so we wait till its end. Once the page gets loaded, we iterate over all elements in search results and assign the output to a variable. The final for loop filters out empty elements that might be because of inaccurate use of selectors.

morkdown - A simple Markdown editor

  •    Javascript

Morkdown is primarily designed to render GitHub Flavored Markdown (GFM), so it's ideal for your README.md. When rendering the Markdown, it uses the same syntax highlighter as GitHub (the Python Pygments library) and the styling is near identical to GitHub. Markdown content is parsed using [marked]marked(via brucedown), a JavaScript Markdown parser capable of parsing GFM.Morkdown is a Google Chrome App coupled to a Node server and uses CodeMirror for the editor panel.

archived-morkdown - A simple Markdown editor

  •    Javascript

Morkdown is primarily designed to render GitHub Flavored Markdown (GFM), so it's ideal for your README.md. When rendering the Markdown, it uses the same syntax highlighter as GitHub (the Python Pygments library) and the styling is near identical to GitHub. Markdown content is parsed using [marked]marked(via brucedown), a JavaScript Markdown parser capable of parsing GFM. Morkdown is a Google Chrome App coupled to a Node server and uses CodeMirror for the editor panel.

rendora - dynamic server-side rendering using headless Chrome to effortlessly solve the SEO problem for modern javascript websites

  •    Go

Rendora can be seen as a reverse HTTP proxy server sitting between your backend server (e.g. Node.js/Express.js, Python/Django, etc...) and potentially your frontend proxy server (e.g. nginx, traefik, apache, etc...) or even directly to the outside world that does actually nothing but transporting requests and responses as they are except when it detects whitelisted requests according to the config. In that case, Rendora instructs a headless Chrome instance to request and render the corresponding page and then return the server-side rendered page back to the client (i.e. the frontend proxy server or the outside world). This simple functionality makes Rendora a powerful dynamic renderer without actually changing anything in both frontend and backend code. Dynamic rendering means that the server provides server-side rendered HTML to web crawlers such as GoogleBot and BingBot and at the same time provides the typical initial HTML to normal users in order to be rendered at the client side. Dynamic rendering is meant to improve SEO for websites written in modern javascript frameworks like React, Vue, Angular, etc...

markdown-new-tab - Google Chrome Extension — 🗒️ ⏰ ☑️ Take down notes, save reminders, paste links, create checklists or tables with markdown [M↓] directly in your 'New Tab' page

  •    Javascript

Take down notes 🗒️, save reminders ⏰, paste links 🔗, create checklists ☑️ or tables, all using markdown... directly in your 'New Tab' page! Markdown New Tab is a replacement for the default 'New Tab' page on Google Chrome 🆕 🎉. Markdown New Tab is a replacement for the default Google Chrome new tab page. Refer to this brilliant cheat sheat to get familiar with the markdown syntax.

mobile-chrome-apps - Chrome apps on Android and iOS

  •    Python

The Chrome Apps for Mobile Toolchain is no longer being actively developed. We intend to keep it functional, but do not intend on adding any new features. Chrome Apps for Mobile is a project based on Apache Cordova to run your Chrome Apps on both Android and iOS. The project provides a native application wrapper around your Chrome App, allowing you to distribute it via the Google Play Store and the Apple App Store. Cordova plugins give your App access to a wide range of APIs, including many of the core Chrome APIs. The newest version of Chrome Apps for Mobile includes Chrome APIs for identity, Google Cloud Messaging (GCM) and rich notifications.

vive-diy-position-sensor - Code & schematics for position tracking sensor using HTC Vive's Lighthouse system and a Teensy board

  •    C++

Lighthouse position tracking system consists of:   – two stationary infrared-emitting base stations (we'll use existing HTC Vive setup),   – IR receiving sensor and processing module (this is what we'll create).The base stations are usually placed high in the room corners and "overlook" the room. Each station has an IR LED array and two rotating laser planes, horizontal and vertical. Each cycle, after LED array flash (sync pulse), laser planes sweep the room horizontally/vertically with constant rotation speed. This means that the time between the sync pulse and the laser plane "touching" sensor is proportional to horizontal/vertical angle from base station's center direction. Using this timing information, we can calculate 3d lines from each base station to sensor, the crossing of which yields 3d coordinates of our sensor (see calculation details). Great thing about this approach is that it doesn't depend on light intensity and can be made very precise with cheap hardware.

lighthouse - A simple scriptable popup dialog to run on X.

  •    C

A simple flexible popup dialog to run on X. In the demo a hotkey is mapped to lighthouse | sh with lighthouserc using cmd.py, which is included in config/lighthouse/ and installed by lighthouse-install.

github-gmail - [Chrome extension] Open GitHub notifications with shortcuts in Gmail.

  •    Javascript

If you no longer want to receive notifications for a certain thread on GitHub, click the Mute thread button or using a shortcut shift + h, it will open a background window to load the mute thread request, and close itself when done. Install the extension via Chrome webstore.

html-pdf-chrome - HTML to PDF converter via Chrome/Chromium

  •    TypeScript

HTML to PDF converter via Chrome/Chromium. Note: It is strongly recommended that you keep Chrome running side-by-side with Node.js. There is significant overhead starting up Chrome for each PDF generation which can be easily avoided.

hindsight - Internet history forensics for Google Chrome/Chromium

  •    Python

Hindsight is a free tool for analyzing web artifacts. It started with the browsing history of the Google Chrome web browser and has expanded to support other Chromium-based applications (with more to come!). Hindsight can parse a number of different types of web artifacts, including URLs, download history, cache records, bookmarks, autofill records, saved passwords, preferences, browser extensions, HTTP cookies, and Local Storage records (HTML5 cookies). Once the data is extracted from each file, it is correlated with data from other history files and placed in a timeline. The only field you are required to complete is "Profile Path". This is the location of the Chrome profile you want to analyze (the default profile paths for different OSes is listed at the bottom of this page). Click "Run" and you'll be taken to the results page in where you can save the results to a spreadsheet (or other formats).

Google Advance Search

  •    

An easy way to create documents search at Google and read your emails and Much more

chrome-github-mate - Chrome extension to make single file download effortless and with more features

  •    Javascript

Chrome extension to enhance GitHub experiences. GitHub enable you download download codebase as a zip very easy, but it's painful to download a stand-alone file. Octo Mate made download file easy by click the icon of file.