Displaying 1 to 20 from 48 results

pangu.js - 為什麼你們就是不能加個空格呢?

  •    Javascript

Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols).

node-read - Get Readable Content from any page

  •    Javascript

Get Clean Reading Content from every web page

prose - :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction

  •    Go

prose is Go library for text (primarily English at the moment) processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. The library's functionality is split into subpackages designed for modular use.See the GoDoc documentation for more information.

code-review-tips - :microscope: Common problems to look for in a code review

  •    Javascript

Code reviews can inspire dread in both reviewer and reviewee. Having your code analyzed can feel as invasive as being screened by the TSA as you go off to your vacation. Even worse, reviewing other people's code can feel like a painful and ambiguous exercise, searching for problems and not even knowing where to begin. This project aims to provide some solid tips for how to review the code that you and your team write. All examples are written in JavaScript, but the advice should be applicable to any project of any language. This is by no means an exhaustive list, but hopefully this will help you catch as many bugs as possible long before users ever see your feature.

node-readability - Scrape/Crawl article from any site automatically

  •    Javascript

In my case, the speed of spider is about 1500k documents per day, and the maximize crawling speed is 1.2k /minute, avg 1k /minute, the memory cost are about 200 MB on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.

go-readability - Go package that cleans a HTML page for better readability.

  •    HTML

Go-Readability is a Go package that find the main readable content and the metadata from a HTML page. It works by removing clutter like buttons, ads, background images, script, etc. This package is based from Readability.js by Mozilla, and written line by line to make sure it looks and works as similar as possible. This way, hopefully all web page that can be parsed by Readability.js are parse-able by go-readability as well.

relaxed.ruby.style - A Relaxed Style Guide for Ruby & Configuration for RuboCop


A more liberal style guide for RuboCop. It comes with a config file that deactivates some of RuboCop's features. It is meant as a less restrictive foundation that you can use directly or base your style discussions on. RuboCop is an amazing tool, still some of its default rules feel overly strict. This might distract you from the helpful messages.

read-body - A simple site to pull the body of the post out and reformat for readability

  •    Javascript

A basic functionality readability almost-clone. This service will try to search for the body of the content to a URL, and represent it stripped back and with minimal styling.The page will be cached for a period of time, so subsequent requests will be faster.

spider2 - A 2nd generation spider to crawl any article site, automatic read title and article.

  •    Javascript

A 2nd generation spider to crawl any article site, automatic reading title and content.In my case, the speed of spider is about 700 thousands documents per day, 22 million per month, and the maximize crawling speed is 450 per minute, avg 80 per minute, the memory cost are about 200 megabytes on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.

goreadability - Port of arc90's readability project to Go

  •    Go

goreadability is a tool for extracting the primary readable content of a webpage. It is a Go port of arc90's readability project, based on ruby-readability.

readabilitySAX - a fast and platform independent readability port (JS)

  •    HTML

This is a port of the algorithm used by the Readability bookmarklet to extract relevant pieces of information out of websites to a SAX parser. The advantage over other ports, e.g. arrix/node-readability, is a smaller memory footprint and a much faster execution. In my tests, most pages, even large ones, were finished within 15ms (on node, see below for more information). It works with Rhino, so it runs on YQL, which may have interesting uses. And it works within a browser.

readable-proxy - Node proxy server attempting to fetch readable contents from any provided URL.

  •    Javascript

Proxy server to retrieve a readable version of any provided url, powered by Node, PhantomJS and Readability.js. Note about CORS: by design, the server will allow any origin to access it, so browsers can consume it from pages hosted on a different domain.

graby - Graby extract article content from web pages. This is a fork of Full-Text RSS v3.3

  •    PHP

Graby helps you extract article content from web pages. Also, if you want to understand how things work internally, it's really hard to read and understand. And finally, there are no tests at all.

php-readability - A fork of https://bitbucket.org/fivefilters/php-readability

  •    PHP

This is an extract of the Readability class from this full-text-rss fork. It can be defined as a better version of the original php-readability. The default php-readability lib is really old and needs to be improved. I found a great fork of full-text-rss from @Dither which improve the Readability class.

readability - Readability is Elixir library for extracting and curating articles.

  •    Elixir

Note: Readability requires Elixir 1.3 or higher. If result is different with your expectation, you can add options.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.