Displaying 1 to 20 from 21 results

textdistance - Compute distance between sequences

  •    Python

TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Work in progress. Now all algorithms compare two strings as array of bits.

SymSpell - SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm

  •    CSharp

The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent. Lookup provides a very fast spelling correction of single words.

Buffer - Swift μ-framework for efficient array diffs, collection observation and cell configuration.

  •    Swift

Swift μ-framework for efficient array diffs, collection observation and data source implementation. Buffer is designed to be very granular and has APIs with very different degrees of abstraction.




levenshtein - This is the Go implementation to calculate Levenshtein Distance.

  •    Go

The library is fully capable of working with non-ascii strings. But the strings are not normalized. That is left as a user-dependant use case. Please normalize the strings before passing it to the library if you have such a requirement.Words selected are - "levenshtein" and "frankenstein".

levenshtein.c - Levenshtein algorithm in C

  •    C

Vladimir Levenshtein’s edit distance algorithm1 as a C library. There’s also a CLI: levenshtein(1), and a JavaScript version.Or clone the repo.

tpyo - A small script that enables you to make typos in JavaScript property names

  •    Javascript

Ever wanted to use Math.SQUIRTLE instead of Math.SQRT2? Think Function.prototype.apple looks shinier than apply? Or do you prefer Array.prototype.faReech over forEach? Look no further — tpyo’s got your back.tpyo (pronounced ‘typo’) is the result of combining the power of ES6 proxies with Levenshtein string distance. It’s a small script that enables you to make typos in JavaScript property names.


pybktree - Python BK-tree data structure to allow fast querying of "close" matches

  •    Python

pybktree is a generic, pure Python implementation of a BK-tree data structure, which allows fast querying of "close" matches (for example, matches with small hamming distance or Levenshtein distance). This module is based on the algorithm by Nick Johnson in his blog article on BK-trees. For large trees and fairly small N when calling find(), using a BKTree is much faster than doing a linear search. This is especially good when you're de-duping a few hundred thousand photos -- with a linear search that would become a very slow, O(N²) operation. With a BKTree, it's more like O(N log N).

pyphonetics - A Python 3 phonetics library.

  •    Python

More will be added in the future. The module is available in PyPI, just use pip install pyphonetics.

smetrics - String metrics library written in Go.

  •    Go

This library contains implementations of the Levenshtein distance, Jaro-Winkler and Soundex algorithms written in Go (golang). Other algorithms related with string metrics (or string similarity, whatever) are welcome. The Wagner-Fischer algorithm for calculating the Levenshtein distance. It runs on O(mn) and needs O(2m) space where m is the size of the smallest string. This is kinda optimized so it should be used in most cases.

StringDistances.jl - String Distances

  •    Julia

The function compare returns a similarity score between two strings. The function always returns a score between 0 and 1, with a value of 0 being completely different and a value of 1 being completely similar. Q-gram distances compare the set of all substrings of length q in each string.

text-metrics - Calculate various string metrics efficiently in Haskell

  •    Haskell

The library provides efficient implementations of various strings metric algorithms. It works with strict Text values. edit-distance allows to specify costs for every operation when calculating Levenshtein distance (insertion, deletion, substitution, and transposition). This is rarely needed though in real-world applications, IMO.

stopwords - Removes most frequent words (stop words) from a text content

  •    Go

stopwords is a go package that removes stop words from a text content. If instructed to do so, it will remove HTML tags and parse HTML entities. The objective is to prepare a text in view to be used by natural processing algos or text comparison algorithms such as SimHash. If the function is used with an unsupported language, it doesn't fail, but will apply english filter to the content.

go-fuzzywuzzy - Port of SeatGeek's fuzzywuzzy

  •    Go

This is a port of SeatGeek's fuzzywuzzy, a fuzzy string matching library. Also included is a port of python-levenshtein, a wicked-fast implementation of Levenshtein edit distance.

affinegap - :triangular_ruler: A Cython implementation of the affine gap string distance

  •    Python

Help us fix the problem as quickly as possible by following Mozilla's guidelines for reporting bugs. Copyright (c) 2016 Forest Gregg and Dedupeio. Released under the MIT License.

LinSpell - Fast approximate strings search & spelling correction

  •    CSharp

The LinSpell spelling correction algorithm does not require edit candidate generation or specialized data structures like BK-tree or Norvig's algorithm. In most cases LinSpell is faster and requires less memory compared to BK-tree or Norvig's algorithm. LinSpell is language and character set independent. The word frequency list was created by intersecting the two lists mentioned below. By reciprocally filtering only those words which appear in both lists are used. Additional filters were applied and the resulting list truncated to ≈ 80,000 most frequent words.

SymSpellCompound - SymSpellCompound: compound aware automatic spelling correction

  •    

SymSpellCompound supports compound aware automatic spelling correction of multi-word input strings. It is built on top of SymSpell's 1 million times faster spelling correction algorithm. Splitting errors, concatenation errors, substitution errors, transposition errors, deletion errors and insertion errors can by mixed within the same word.

levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix

  •    Go

This package implements distance and similarity metrics for strings, based on the Levenshtein measure, in Go. v1.2.1 Stable: Guaranteed no breaking changes to the API in future v1.x releases. Probably safe to use in production, though provided on "AS IS" basis.