Displaying 1 to 14 from 14 results

dedupe - :id: A python library for accurate and scaleable fuzzy matching, record deduplication and entity-resolution

  •    Python

dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on structured data. dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

free-style - Make CSS easier and more maintainable by using JavaScript

  •    TypeScript

Free-Style is designed to make CSS easier and more maintainable by using JavaScript. There's a great presentation by Christopher Chedeau you should check out.

csvdedupe - :id: Command line tool for deduplicating CSV files

  •    Python

Command line tools for using the dedupe python library for deduplicating CSV files. csvdedupe - takes a messy input file or STDIN pipe and identifies duplicates.

duplicate-kriller - A fast file deduplicator

  •    Rust

Replaces files that have identical content with hardlinks, so that file data of all copies is stored only once, saving disk space. Useful for reducing sizes of multiple backups, messy collections of photos and music, countless copies of node_modules, and anything else that's usually immutable (since all hardlinked copies of a file will change when any one of them is changed).Works on macOS and Linux. Windows is not supported.




pkgcount - Produce a report on number of duplicate packages in node_modules.

  •    Javascript

Easily see how many packages, which versions and how many copies of each package are installed in your node_modules hierarchy.By default, pkgcount uses coloured output as a simple visual aide to help identify packages with high levels of duplication. Packages are shaded from yellow->red based on the number of duplicates.

file-dedupe - Fast duplicate file detection library

  •    Javascript

findup is quite fast - it is within 2x of the fastest duplicate finders written in C/C++. Based on the V8 profiler output, about 40% of the time is spent on I/O, 13% on crypto and 11% on file traversal, so any further gains in performance will need to come from I/O optimizations rather than code optimizations. BTW, you may notice that file-dedupe defaults to sync I/O. This is because the async I/O seems to have significant overhead for typical FS tasks. You can test this out by passing the --async flag on your system.

postcss-merge-selectors - PostCSS plugin to combine selectors that have identical rules

  •    Javascript

PostCSS plugin to combine selectors that have identical rules. Can be configured to only merge rules who's selectors match specific filters. This plugin isn't smart. It hasn't got a chuffing clue what your css is trying to achieve. Combining selectors might satisfy your urge to be tidy, but the warm fluffy feeling will subside pretty quickly when your new bijou css causes styles to be applied differently. In order to merge two selectors we have to move one of them. That means they may now override rules that used to be after them, or they may be overridden by rules that used to be before them. I recommend you use the selectorFilter option to only target specific selectors and the promote option if you need to move the resulting selectors further down the stylesheet. Test the resulting css carefully.

apollo-dedup-batch-http-link - @tipe/apollo-dedup-batch-http-link: batches multiple operations into a single HTTP dedup request

  •    Javascript

apollo-dedup-batch-http-link: batches multiple operations into a single HTTP dedup request. Instead of sending a single operation, it sends an array of operations to the server.


postcss-discard-duplicates - Discard duplicate rules in your CSS files with PostCSS.

  •    Javascript

Discard duplicate rules in your CSS files with PostCSS. It has to assume that your rules have already been transformed by another processor, otherwise it would be responsible for too many things.

dupe-krill - A fast file deduplicator

  •    Rust

Replaces files that have identical content with hardlinks, so that file data of all copies is stored only once, saving disk space. Useful for reducing sizes of multiple backups, messy collections of photos and music, countless copies of node_modules, and anything else that's usually immutable (since all hardlinked copies of a file will change when any one of them is changed). Works on macOS and Linux. Windows is not supported.

dedupe-examples - :id: Examples for using the dedupe library

  •    Python

Example scripts for the dedupe, a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. We recommend using virtualenv and virtualenvwrapper for working in a virtualized development environment. Read how to set up virtualenv.

pgdedupe - A simple command line interface to the datamade/dedupe library.

  •    Jupyter

A work-in-progress to provide a standard interface for deduplication of large databases with custom pre-processing and post-processing steps. In addition to running a database-level deduplication with dedupe, this script adds custom pre- and post-processing steps to improve the run-time and results, making this a hybrid between fuzzy matching and record linkage.

rabin - node native addon for rabin fingerprinting data streams

  •    C++

Node native addon module (C/C++) for Rabin fingerprinting data streams. Uses the implementation of Rabin fingerprinting from LBFS.

uniqs - Tiny utility to de-duplicate lists

  •    Javascript

This package has been written to accompany utilities like flatten as alternative to full-blown libraries like underscore or lodash.