simhash-js - Simhash implementation in Javascript

  •        27

A Javascript implementation of Charikar's hash for identification of similar documents. Consider two documents A and B that differ in just a single byte.



Related Projects

simhash - 中文文档simhash值计算

  •    C++


xxHash - Extremely fast non-cryptographic hash algorithm

  •    C

xxHash is an Extremely fast Hash algorithm, running at RAM speed limits. It successfully completes the SMHasher test suite which evaluates collision, dispersion and randomness qualities of hash functions. Code is highly portable, and hashes are identical on all platforms (little / big endian).Q.Score is a measure of quality of the hash function. It depends on successfully passing SMHasher test set. 10 is a perfect score. Algorithms with a score < 5 are not listed on this table.

imagehash - 🌄 Perceptual image hashing for PHP

  •    PHP

A perceptual hash is a fingerprint of a multimedia file derived from various features from its content. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are "close" to one another if the features are similar. Perceptual hashes are a different concept compared to cryptographic hash functions like MD5 and SHA1. With cryptographic hashes, the hash values are random. The data used to generate the hash acts like a random seed, so the same data will generate the same result, but different data will create different results. Comparing two SHA1 hash values really only tells you two things. If the hashes are different, then the data is different. And if the hashes are the same, then the data is likely the same. In contrast, perceptual hashes can be compared -- giving you a sense of similarity between the two data sets.

SetSimilaritySearch - All-pair set similarity search on millions of sets in Python and on a laptop (faster than MinHash LSH)

  •    Python

Efficient set similarity search algorithms in Python. For even better performance see the Go Implementation. A popular way to measure the similarity between two sets is Jaccard similarity, which gives a fractional score between 0 and 1.0.

Essentials - General purpose utilities and hash functions for Android and Java (aka java-common)

  •    Java

Essentials are a collection of general-purpose classes we found useful in many occasions. This project is bare bones compared to a rich menu offered by Guava or Apache Commons. Essentials is not a framework, it's rather a small set of utilities to make Java standard approaches more convenient or more efficient.

butteraugli - butteraugli estimates the psychovisual difference between two images

  •    C++

Butteraugli is a project that estimates the psychovisual similarity of two images. It gives a score for the images that is reliable in the domain of barely noticeable differences. Butteraugli not only gives a scalar score, but also computes a spatial map of the level of differences.One of the main motivations for this project is the statistical differences in location and density of different color receptors, particularly the low density of blue cones in the fovea. Another motivation comes from more accurate modeling of ganglion cells, particularly the frequency space inhibition.

multihash - Self describing hashes - for future proofing

  •    Shell

Multihash is a protocol for differentiating outputs from various well-established cryptographic hash functions, addressing size + encoding considerations. It is useful to write applications that future-proof their use of hashes, and allow multiple hash functions to coexist. See jbenet/random-ideas#1 for a longer discussion.

Zero-Allocation-Hashing - Zero-allocation hashing for Java

  •    Java

Zero-allocation, pretty fast implementations of non-cryptographic hash functions for byte sequences or blocks of memory

Testing cryptographic hash functions

  •    C

Utilities for measuring characteristics of cryptographic hash functions: avalanche property, (partial) collision search, Maurer's universal statistical test, Filiol's Mobius ANF statistical analisys. EdonC, EdonR, MD5 and SHA-1 hash-plugins included.

smhasher - Automatically exported from

  •    C++

This is the home for the MurmurHash family of hash functions along with the SMHasher test suite used to verify them. SMHasher is released under the MIT license. All MurmurHash versions are public domain software, and the author disclaims all copyright to their code. SMHasher is a test suite designed to test the distribution, collision, and performance properties of non-cryptographic hash functions - it aims to be the DieHarder of hash testing, and does a pretty good job of finding flaws with a number of popular hashes.


  •    Javascript

Sifter is a client and server-side library (via UMD) for textually searching arrays and hashes of objects by property – or multiple properties. It's designed specifically for autocomplete. The process is three-step: score, filter, sort. Seaching will provide back meta information and an "items" array that contains objects with the index (or key, if searching a hash) and a score that represents how good of a match the item was. Items that did not match will not be returned.

get-in - Functions for for hash map (assoc array) traversal.

  •    PHP

Functions for hash map (assoc array) traversal. When dealing with nested associative structures, traversing them can become quite a pain. Mostly because of the amount of isset checking that is necessary.

malsub - A Python RESTful API framework for online malware analysis and threat intelligence services

  •    Python

malsub is a Python 3.6.x framework that wraps several web services of online malware and URL analysis sites through their RESTful Application Programming Interfaces (APIs). It supports submitting files or URLs for analysis, retrieving reports by hash values, domains, IPv4 addresses or URLs, downloading samples and other files, making generic searches and getting API quota values. The framework is designed in a modular way so that new services can be added with ease by following the provided template module and functions to make HTTP GET and POST requests and to pretty print results. This approach avoids having to write individual and specialized wrappers for each and every API by leveraging what they have in common in their calls and responses. The framework is also multi-threaded and dispatches service API functions across a thread pool for each input argument, meaning that it spawns a pool of threads per each file provided for submission or per each hash value provided for report retrieval, for example. Most of these services require API keys that are generated after registering an account in their respective websites, which need to be specified in the apikey.yaml file according to the given structure. Note that some of the already bundled services are limited in supported operations due to the fact that they were developed with free API keys. API keys associated with paid subscriptions are allowed to make additional calls not open to the public and may not be restricted by a given quota. Yet, malsub can process multiple input arguments and pause between requests as a workaround for cooldown periods.

identicon.js - GitHub-style identicons as PNGs or SVGs in JS

  •    Javascript

GitHub-style identicons as PNGs or SVGs in JS. This little library will produce the same shape and (nearly) the same color as GitHub when given the same hash value. Supports PNG and SVG output formats. Note that GitHub uses an internal database identifier for the hash, so you can't simply md5 the username and get the same result. The creative visual design is borrowed from Jason Long of Git and GitHub fame.

satellite-js - Modular set of functions for SGP4 and SDP4 propagation of TLEs.

  •    Javascript

A library to make satellite propagation via TLEs possible in the web. Provides the functions necessary for SGP4/SDP4 calculations, as callable javascript. Also provides functions for coordinate transforms. The internals of this library are nearly identical to Brandon Rhode's sgp4 python library. However, it is encapsulated in a standard JS library (self executing function), and exposes only the functionality needed to track satellites and propagate paths. The only changes I made to Brandon Rhode's code was to change the positional parameters of functions to key:value objects. This reduces the complexity of functions that require 50+ parameters, and doesn't require the parameters to be placed in the exact order.

Hash Calculator

  •    CSharp

WPF Windows 7 program to compute SHA1, MD5 & CRC32 hash functions.

kissdb - (Keep It) Simple Stupid Database

  •    C

KISSDB is about the simplest key/value store you'll ever see, anywhere. It's written in plain vanilla C using only the standard string and FILE I/O functions, and should port to just about anything with a disk or something that acts like one. It stores keys and values of fixed length in a stupid-simple file format based on fixed-size hash tables. If a hash collision occurrs, a new "page" of hash table is appended to the database. The format is append-only. There is no delete. Puts that replace an existing value, however, will not grow the file as they will overwrite the existing entry.

Invoke-TheHash - PowerShell Pass The Hash Utils

  •    PowerShell

Invoke-TheHash contains PowerShell functions for performing pass the hash WMI and SMB tasks. WMI and SMB connections are accessed through the .NET TCPClient. Authentication is performed by passing an NTLM hash into the NTLMv2 authentication protocol. Local administrator privilege is not required client-side.