SymSpell - SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm

  •        146

The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent. Lookup provides a very fast spelling correction of single words.

https://medium.com/@wolfgarbe/1000x-faster-spelling-correction-algorithm-2012-8701fcd87a5f
https://github.com/wolfgarbe/SymSpell

Tags
Implementation
License
Platform

   




Related Projects

fuzzysearch - :pig: Tiny and fast fuzzy search in Go

  •    Go

Inspired by bevacqua/fuzzysearch, a fuzzy matching library written in JavaScript. But contains some extras like ranking using Levenshtein distance (see RankMatch()) and finding matches in a list of words (see Find()). Fuzzy searching allows for flexibly matching a string with partial input, useful for filtering data very quickly based on lightweight user input.

fuzzysearch - :pig: Tiny and fast fuzzy search in Go

  •    Go

Inspired by bevacqua/fuzzysearch, a fuzzy matching library written in JavaScript. But contains some extras like ranking using Levenshtein distance (see RankMatch()) and finding matches in a list of words (see Find()). Fuzzy searching allows for flexibly matching a string with partial input, useful for filtering data very quickly based on lightweight user input.


fuzzywuzzy - Fuzzy String Matching in Python

  •    Python

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

gse - Go efficient text segmentation; support english, chinese, japanese and other. Go 语言高性能分词

  •    Go

Go efficient text segmentation; support english, chinese, japanese and other. Dictionary with double array trie (Double-Array Trie) to achieve, Sender algorithm is the shortest path based on word frequency plus dynamic programming.

textdistance - Compute distance between sequences

  •    Python

TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Work in progress. Now all algorithms compare two strings as array of bits.

tntsearch - A fully featured full text search engine written in PHP

  •    PHP

We created also some demo pages that show tolerant retrieval with n-grams in action. The package has bunch of helper functions like jaro-winkler and cosine similarity for distance calculations. It supports stemming for English, Croatian, Arabic, Italian, Russian, Portuguese and Ukrainian. If the built in stemmers aren't enough, the engine lets you easily plugin any compatible snowball stemmer. Some forks of the package even support Chinese. Unlike many other engines, the index can be easily updated without doing a reindex or using deltas.

jieba - 结巴中文分词

  •    Python

"Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module.

Chinese-Word-Vectors - 100+ Chinese Word Vectors 上百种预训练中文词向量

  •    Python

This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse), context features (word, ngram, character, and more), and corpora. One can easily obtain pre-trained vectors with different properties and use them for downstream tasks. Moreover, we provide a Chinese analogical reasoning dataset CA8 and an evaluation toolkit for users to evaluate the quality of their word vectors.

rmmseg-cpp - an re-implementation of rmmseg (Chinese word segmentation library for Ruby) in C++

  •    Ruby

an re-implementation of rmmseg (Chinese word segmentation library for Ruby) in C++

jellyfish - 🎐 a python library for doing approximate and phonetic matching of strings.

  •    Python

Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk <james.p.turk@gmail.com> and Michael Stephens.

Arab Techies

  •    Javascript

A collection of open source libraries and tools that provide solutions for common problems in processing Arabic text, especially in web applications. text normalization, phrase segmentation, text indexing, stop word lists, common spelling mistakes.

fast-levenshtein - Efficient Javascript implementation of Levenshtein algorithm with locale-specific collator support

  •    Javascript

An efficient Javascript implementation of the Levenshtein algorithm with locale-specific collator support. If you are not using any module loader system then the API will then be accessible via the window.Levenshtein object.