Displaying 1 to 9 from 9 results

textdistance - Compute distance between sequences

  •    Python

TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Work in progress. Now all algorithms compare two strings as array of bits.

SymSpell - 1 million times faster through Symmetric Delete spelling correction algorithm

  •    CSharp

Spelling correction & Fuzzy search: 1 million times faster through Symmetric Delete spelling correction algorithm The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.




LinSpell - Fast approximate strings search & spelling correction

  •    CSharp

The LinSpell spelling correction algorithm does not require edit candidate generation or specialized data structures like BK-tree or Norvig's algorithm. In most cases LinSpell is faster and requires less memory compared to BK-tree or Norvig's algorithm. LinSpell is language and character set independent. The word frequency list was created by intersecting the two lists mentioned below. By reciprocally filtering only those words which appear in both lists are used. Additional filters were applied and the resulting list truncated to ≈ 80,000 most frequent words.

SymSpellCompound - SymSpellCompound: compound aware automatic spelling correction

  •    

SymSpellCompound supports compound aware automatic spelling correction of multi-word input strings. It is built on top of SymSpell's 1 million times faster spelling correction algorithm. Splitting errors, concatenation errors, substitution errors, transposition errors, deletion errors and insertion errors can by mixed within the same word.

strsim-rs - :abc: Rust implementations of string similarity metrics

  •    Rust

You can change the version in the url to see the documentation for an older version in the changelog. If you don't want to install Rust itself, you can run $ ./dev for a development CLI if you have Docker installed.

damlev - :memo: The fastest JS implementation of the Damerau-Levenshtein edit distance

  •    TypeScript

This is the fastest implementation of Damerau-Levenshtein for JavaScript, an optimization of David Hamp-Gonsalves' port.


ceja - PySpark phonetic and string matching algorithms

  •    Python

Run pip install ceja to install the library. Import the functions with import ceja. After importing the code you can run functions like ceja.nysiis, ceja.jaro_winkler_similarity, etc.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.