simple-html-tokenizer - A lightweight JavaScript library for tokenizing non-`<script>` HTML expected to be found in the `<body>` of a document

  •        24

Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates.



Related Projects

parse5 - HTML parsing/serialization toolset for Node

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.parse5 provides nearly everything you may need when dealing with HTML. It's the fastest spec-compliant HTML parser for Node to date. It parses HTML the way the latest version of your browser does. It has proven itself reliable in such projects as jsdom, Angular2, Polymer and many more.

Java tokenizer and parser tools

A JAVA suite for parsing arbitrary text data. Not just HTML or XML or Java, but all of them. Use it when the JDK tokenizers are too limited, JavaCC, JTB etc. are too complicated, or You need dynamic parser configuration

parsekit - Objective-C Tokenizer and Parser Generator. Supports Grammars.

Objective-C Tokenizer and Parser Generator. Supports Grammars.

sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements sub-word units (also known as wordpieces [Wu et al.] [Schuster et al.] and byte-pair-encoding (BPE) [Sennrich et al.]) with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.This is not an official Google product.

php-token-stream - Wrapper around PHP's tokenizer extension.

Wrapper around PHP's tokenizer extension.

Neko HTML Parser - simple HTML scanner

NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and fix up many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements. Automatically closes elements with optional end tags and can handle mismatched inline element tags.

Backbone.ModelBinder - Simple, flexible and powerful Model-View binding for Backbone.

Backbone is a great platform for writing client side applications but I've found that as views grow in complexity, synchronizing my models and views can be a pain.I've spent the past few months trying to use existing view-model binding libraries that others were kind enough to create and share with the world.Unfortunately in the majority of my backbone application I wasn't able to leverage the existing view-model binding libraries due to various limitations.I created a new `Backbone.ModelBinder`

html-as-custom-elements - HTML as Custom Elements

A demo is available, which shows implementation efforts for a number of fairly simple elements, and outlines the missing platform features each of them highlights. Even these simple elements have highlighted one major area of missing functionality in custom elements, which has been written up in the document "Gap Analysis: Accessibility". One of the axioms of the extensible web project is that high-level, declarative APIs should be able to be explained in terms of lower-level, imperative APIs. Not just lower-level algorithms, but APIs: the capabilities that we encapsulate inside a given HTML element should also be exposed directly to JavaScript authors. And those APIs should be factored into small, composable pieces, that build on each other to eventually produce the declarative edifice that is HTML. In this way, authors can reuse these platform capabilities without jumping through hoops (like instantiating a HTMLAnchorElement just to parse a URL) or rebuilding large parts of the platform for themselves (like creating their own scrolling logic just to get pull-to-refresh behavior).

react-html-email - Create elegant HTML email templates using React.

Modern HTML emails are a tangle of archaic HTML and inline styles. This library encapsulates the cruft into simple React components and helps avoid common pitfalls. react-html-email provides a set of components for a standard 600px table layout (inspired by HTML Email Boilerplate). React's Supported Tags and Attributes are extended to include a few deprecated attributes useful for legacy clients. In addition, a style prop validator is included which uses Campaign Monitor's CSS Support Guide to check for potential compatibility issues across email clients.

Workflow HTML / Versioning HTML Module for DotNetNuke - by Effority.Net

Based on the "Text/HTML Core Module" the Text/HTML Workflow Module offers simple versioning and approval abilities for your Text/HTML Module content.

startbootstrap-simple-sidebar - An off canvas sidebar navigation Bootstrap HTML template created by Start Bootstrap

Simple Sidebar is an off canvas sidebar navigation template for Bootstrap created by Start Bootstrap.After downloading, simply edit the HTML and CSS files included with the template in your favorite text editor to make changes. These are the only files you need to worry about, you can ignore everything else! To preview the changes you make to the code, you can open the index.html file in your web browser.

PHP Simple HTML DOM Parser

A simple PHP HTML DOM parser written in PHP5+, supports invalid HTML, and provides a very easy way to handle HTML elements.

mammoth.js - Convert Word documents (.docx files) to HTML

Mammoth is designed to convert .docx documents, such as those created by Microsoft Word, and convert them to HTML. Mammoth aims to produce simple and clean HTML by using semantic information in the document, and ignoring other details. For instance, Mammoth converts any paragraph with the style Heading 1 to h1 elements, rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading. There's a large mismatch between the structure used by .docx and the structure of HTML, meaning that the conversion is unlikely to be perfect for more complicated documents. Mammoth works best if you only use styles to semantically mark up your document.

Inquisitor - Site plugin for ElasticSearch to help understand and debug queries.

Inquisitor is a tool help understand and debug your queries in ElasticSearch. It support JSON Parsing and Formatting, Automatic Highlighting, Formatted Search Results, Analyzer testing, Tokenizer testing.


Pure Python implementation of GOLD Parser Engine. GOLD Parser Engine is a LALR(1) parser with DFA tokenizer. It uses compiled grammar table generated by GOLD Parser Builder (not included - available on

Indian Speech Synthesis System(festival)

festival-in will have different speech synthesis systems for respective Indian Languages based on quot;festivalquot; TTS (Text-To-Speech engine) under it's umbrella. It will have modules (tokenizer and lexical) for respective Indian Languages.


Friso is a Chinese tokenizer developed in C. It uses the popular mmseg algorithm to tokenize the Chinese characters.

twitter-korean-text - Korean tokenizer

Scala library to process Korean text

parsekit - Objective-C Tokenizer and Parser Generator. Supports Grammars.

I've forked ParseKit into a new faster/cleaner/smaller library called PEGKit. ParseKit should be considered deprecated, and PEGKit should be used for all new development.