Displaying 1 to 9 from 9 results

SentencePiece - Unsupervised text tokenizer for Neural Network-based text generation

  •    C++

SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units (e.g., byte-pair-encoding (BPE) and unigram language model with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre postprocessing.

SymSpell - 1 million times faster through Symmetric Delete spelling correction algorithm

  •    CSharp

Spelling correction & Fuzzy search: 1 million times faster through Symmetric Delete spelling correction algorithm The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.

esapp - An unsupervised Chinese word segmentation tool.

  •    C++

See test_package/example.cpp. The recommended way to use ESA++ package in your project is to install the package with Conan.




iparser - Yet another dependency parser, integrated with tokenizer, tagger and visualization tool.

  •    Python

Yet another multilingual dependency parser, integrated with tokenizer, part-of-speech tagger and visualization tool. IParser can parse raw sentence to dependency tree in CoNLL format, and is able to visualize trees in your browser. Currently, iparser is in a prototype state. It makes no warranty and may not be ready for practical usage.

WordSegmentationTM - Fast Word Segmentation with Triangular Matrix

  •    CSharp

Fast Word Segmentation using a Triangular Matrix approach. Faster 2x, lower memory consumption constant O(1) vs. linear O(n), better scaling, more GC friendly. For a Word Segmentation using a Dynamic Programming approach have a look at WordSegmentationDP.







We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.