IntraArchiveDeduplicator - Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching

  •        10

Tool for managing data-deduplication within extant compressed archive files, with a heavy focus on Manga/Comic-book archive files. This is a rather exotic tool that is intended to allow fairly fast duplicate detection for files within compressed archives.

https://github.com/fake-name/IntraArchiveDeduplicator

Tags
Implementation
License
Platform

   




Related Projects

phpMyFAQ - phpMyFAQ - Open Source FAQ system for PHP and MySQL, PostgreSQL and other databases

  •    PHP

phpMyFAQ is a multilingual, completely database-driven FAQ-system. It supports various databases to store all data, PHP 5.6+ is needed in order to access this data. phpMyFAQ also offers a multi-language Content Management System with a WYSIWYG editor and an Image Manager, real time search support with Elasticsearch, flexible multi-user support with user and group based permissions on categories and records, a wiki-like revision feature, a news system, user-tracking, 40+ supported languages, enhanced automatic content negotiation, HTML5/CSS3 based responsive templates, PDF-support, a backup-system, a dynamic sitemap, related FAQs, tagging, RSS feeds, built-in spam protection systems, OpenLDAP and Microsoft Active Directory support, and an easy to use installation script. phpMyFAQ is only supported on PHP 5.6.0 and up, you need a database as well. Supported databases are MySQL, Percona Server, PostgreSQL, Microsoft SQL Server, SQLite3 and MariaDB. If you want to use Elasticsearch as main search engine, you need Elasticsearch 2.x as well. Check our detailed requirements on phpmyfaq.de for more information.

lightnet - πŸŒ“ Bringing pjreddie's DarkNet out of the shadows #yolo

  •    C

LightNet provides a simple and efficient Python interface to DarkNet, a neural network library written by Joseph Redmon that's well known for its state-of-the-art object detection models, YOLO and YOLOv2. LightNet's main purpose for now is to power Prodigy's upcoming object detection and image segmentation features. However, it may be useful to anyone interested in the DarkNet library. Once you've downloaded LightNet, you can install a model using the lightnet download command. This will save the models in the lightnet/data directory. If you've installed LightNet system-wide, make sure to run the command as administrator.

fake2db - create custom test databases that are populated with fake data

  •    Python

Generate fake but valid data filled databases for test purposes using most popular patterns(AFAIK). Current support is sqlite, mysql, postgresql, mongodb, redis, couchdb. The installation through pypi retrieves 'fake-factory' as a main dependency.

Spilo - Highly available elephant herd: HA PostgreSQL cluster using Docker

  •    Python

Spilo is a Docker image that provides PostgreSQL and Patroni bundled together. Patroni is a template for PostgreSQL HA. Multiple Spilos can create a resilient High Available PostgreSQL cluster. For this, you'll need to start all participating Spilos with identical etcd addresses and cluster names.


Mosaictor

  •    

Mosaictor is a per project of mine that I started halfway my education. It is a photo mosaic creator using locally saved files and files obtained through Google Image Search. Currently, the quality of the code is as terrible as the project's name, so I'm looking to improve both.

PostGIS - Spatial and Geographic objects for PostgreSQL

  •    C

PostGIS is a spatial database extender for PostgreSQL object-relational database. It adds support for geographic objects allowing location queries to be run in SQL. PostGIS adds extra types (geometry, geography, raster and others) to the PostgreSQL database. It also adds functions, operators, and index enhancements that apply to these spatial types.

Sphinix - Search server

  •    C++

Sphinix is free open-source SQL full-text search engine. How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.

docker-postgresql - Dockerfile to build a PostgreSQL container image which can be linked to other containers

  •    Shell

Dockerfile to create a Docker container image for PostgreSQL. PostgreSQL is an object-relational database management system (ORDBMS) with an emphasis on extensibility and standards-compliance [source].

rdedup - Data deduplication engine, supporting optional compression and public key encryption.

  •    Rust

See wiki for current project status. rdedup is a data deduplication engine and a backup software.

Bulk File Manager - Bulk File Renamer/Deduplicator in .NET

  •    CSharp

This is a file deduplication utility and is equipped with bulk name management options as well. Large volume of duplicate flies, or a small volume of really big duplicate files which would make manual cleaning difficult or tedious will be made easier with this tool. It also provides name-based sorting for a large batch of files crossing directories.

JSmartWebMediaGallery

  •    Java

Dynamic web site (Photo Gallery, Photo Album) to publish media (photos and movies): - Albuns - Thumbnails generation - Search - ** Categorize media files in unlimited categories and sub-categories (like a tree). Tech: Java + JSP + Postgresql

enas - TensorFlow Code for paper "Efficient Neural Architecture Search via Parameter Sharing"

  •    Python

Authors' implementation of "Efficient Neural Architecture Search via Parameter Sharing" (2018) in TensorFlow. Includes code for CIFAR-10 image classification and Penn Tree Bank language modeling tasks.

libpostal - A C library for parsing/normalizing street addresses around the world

  •    C

Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services, check-ins, reviews). Yet even the simplest addresses are packed with local conventions, abbreviations and context, making them difficult to index/query effectively with traditional full-text search engines. This library helps convert the free-form addresses that humans use into clean normalized forms suitable for machine comparison and full-text indexing. Though libpostal is not itself a full geocoder, it can be used as a preprocessing step to make any geocoding application smarter, simpler, and more consistent internationally. The core library is written in pure C. Language bindings for Python, Ruby, Go, Java, PHP, and NodeJS are officially supported and it's easy to write bindings in other languages.

platformio

  •    Python

.. image:: https://travis-ci.org/platformio/platformio.svg?branch=develop :target: https://travis-ci.org/platformio/platformio :alt: Travis.CI Build Status .. image:: https://ci.appveyor.com/api/projects/status/dku0h2rutfj0ctls/branch/develop?svg=true :target: https://ci.appveyor.com/project/ivankravets/platformio :alt: AppVeyor.CI Build Status .. image:: https://circleci.com/gh/platformio/platformio/tree/develop.svg?style=svg :target: https://circleci.com/gh/platformio/platformio/tree/develop :

jsonplaceholder - A simple online fake REST API server

  •    Javascript

JSONPlaceholder is a simple fake REST API for testing and prototyping.It's like an image placeholder but for web developers.

postgres-operator - Postgres operator creates and manages PostgreSQL clusters running in Kubernetes

  •    Go

The operator watches additions, updates, and deletions of PostgreSQL cluster manifests and changes the running clusters accordingly. For example, when a user submits a new manifest, the operator fetches that manifest and spawns a new Postgres cluster along with all necessary entities such as Kubernetes StatefulSets and Postgres roles. See this Postgres cluster manifest for settings that a manifest may contain. The operator also watches updates to its own configuration and alters running Postgres clusters if necessary. For instance, if a pod docker image is changed, the operator carries out the rolling update. That is, the operator re-spawns one-by-one pods of each StatefulSet it manages with the new Docker image.

match - :crystal_ball: Scalable reverse image search built on Kubernetes and Elasticsearch

  •    Python

Match makes it easy to search for images that look similar to each other. Using a state-of-the-art perceptual hash, it is invariant to scaling and 90 degree rotations. Its HTTP API is quick to integrate and flexible for a number of reverse image search applications. Kubernetes and Elasticsearch allow Match to scale to billions of images with ease while giving you full control over where your data is stored. Match uses the awesome ascribe/image-match under the hood for most of the image search legwork. The number of gunicorn workers to spin up.

Spell Corrector

  •    DotNet

A spell corrector that uses Bayes algorithm and BK (Burkhard-Keller) tree.

Manticore Search - High performance full-text search engine with SQL and JSON support

  •    C++

Manticore Search is an open source high performance full-text search oriented engine. It is a fork of Sphinx Search. Manticore Search is written in C++. It means speed and low resource consumption, it means you don’t have to worry about a garbage collector that suddenly makes a trouble.