cuckoo-index - Cuckoo Index: A Lightweight Secondary Index Structure

  •        38

NOTE This is not an officially supported Google product. Cuckoo Index (CI) is a lightweight secondary index structure that represents the many-to-many relationship between keys and partitions of columns in a highly space-efficient way. At its core, CI associates variable-sized fingerprints in a Cuckoo filter with compressed bitmaps indicating qualifying partitions.

https://github.com/google/cuckoo-index

Tags
Implementation
License
Platform

   




Related Projects

cfilter - Cuckoo Filter implementation in Go, better than Bloom Filters (unmaintained, unfortunately)

  •    Go

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Some possible use-cases that depend on approximated set-membership queries would be databases, caches, routers, and storage systems where it is used to decide if a given item is in a (usually large) set, with some small false positive probability. Alternatively, given it is designed to be a viable replacement to Bloom filters, it can also be used to reduce the space required in probabilistic routing tables, speed longest-prefix matching for IP addresses, improve network state management and monitoring, and encode multicast forwarding information in packets, among many other applications. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

cuckoofilter

  •    C++

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

Pilosa - Distributed bitmap index that dramatically accelerates queries across multiple, massive data sets

  •    Go

Pilosa is an open source, distributed bitmap index that dramatically accelerates continuous analysis across multiple, massive data sets. Pilosa collapses the speed and batch layer by abstracting the index from data storage, optimizing it for massive scale, and making your data instantly queryable and continuously analyzable.

elasticsearch-reindex - Simple re-indexing

  •    Java

Note: This script will build and install the plugin assuming elasticsearch is found in /usr/share/elasticsearch. The script will call 'sudo' on the install part, so the script should be run as a user with sudo privileges. Since maven will be used to build the plugin, it requires maven to be installed, which can be installed with the command below on a debian/ubuntu system. This refeeds all documents in index 'indexold' with type 'typeold' into the index 'indexnew' with type 'typenew'. But only documents matching the specified filter will be refeeded. The internal Java API will be used which should be efficient. In this example, the term filter is used to limit the documents that will be reindexed, you can leave out the filter to copy all documents to the new index.


Pinot - A realtime distributed OLAP datastore

  •    Java

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

BoomFilters - Probabilistic data structures for processing continuous, unbounded streams.

  •    Go

Boom Filters are probabilistic data structures for processing continuous, unbounded streams. This includes Stable Bloom Filters, Scalable Bloom Filters, Counting Bloom Filters, Inverse Bloom Filters, Cuckoo Filters, several variants of traditional Bloom filters, HyperLogLog, Count-Min Sketch, and MinHash.Classic Bloom filters generally require a priori knowledge of the data set in order to allocate an appropriately sized bit array. This works well for offline processing, but online processing typically involves unbounded data streams. With enough data, a traditional Bloom filter "fills up", after which it has a false-positive probability of 1.

cuckoo - a memory-bound graph-theoretic proof-of-work system

  •    C++

Cuckoo Cycle is the first graph-theoretic proof-of-work, and the most memory bound, yet with instant verification. Unlike Hashcash, Cuckoo Cycle is immune from quantum speedup by Grover's search algorithm. With a 42-line complete specification, Cuckoo Cycle is less than half the size of either SHA256, Blake2b, or SHA3 (Keccak) as used in Bitcoin, Equihash and ethash. Simplicity matters.

cuckoo - Cuckoo Sandbox is an automated dynamic malware analysis system

  •    Javascript

PLEASE NOTE: Cuckoo Sandbox 2.x is currently unmaintained. Any open issues or pull requests will most likely not be processed, as a current full rewrite of Cuckoo is undergoing and will be announced soon. Cuckoo Sandbox is the leading open source automated malware analysis system.

ElasticSearch - Distributed, RESTful search and analytics engine

  •    Java

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

thingscoop - Search and filter videos based on objects that appear in them using convolutional neural networks

  •    Python

Thingscoop is a command-line utility for analyzing videos semantically - that means searching, filtering, and describing videos based on objects, places, and other things that appear in them.When you first run thingscoop on a video file, it uses a convolutional neural network to create an "index" of what's contained in the every second of the input by repeatedly performing image classification on a frame-by-frame basis. Once an index for a video file has been created, you can search (i.e. get the start and end times of the regions in the video matching the query) and filter (i.e. create a supercut of the matching regions) the input using arbitrary queries. Thingscoop uses a very basic query language that lets you to compose queries that test for the presence or absence of labels with the logical operators ! (not), || (or) and && (and). For example, to search a video the presence of the sky and the absence of the ocean: thingscoop search 'sky && !ocean' <file>.

elasticsearch-analysis-jieba - The plugin includes the `jieba` analyzer, `jieba` tokenizer, and `jieba` token filter, and have two mode you can choose

  •    Java

The plugin includes the `jieba` analyzer, `jieba` tokenizer, and `jieba` token filter, and have two mode you can choose. one is `index` which means it will be used when you want to index a document. another is `search` mode which used when you want to search something.

Cuckoo - Boilerplate-free mocking framework for Swift!

  •    Swift

Cuckoo was created due to lack of a proper Swift mocking framework. We built the DSL to be very similar to Mockito, so anyone using it in Java/Android can immediately pick it up and use it. Cuckoo has two parts. One is the runtime and the other one is an OS X command-line tool simply called CuckooGenerator.

cuckoo-droid - CuckooDroid - Automated Android Malware Analysis with Cuckoo Sandbox.

  •    Python

Contributed By Check Point Software Technologies LTD. CuckooDroid is an extension of Cuckoo Sandbox the Open Source software for automating analysis of suspicious files, CuckooDroid brigs to cuckoo the capabilities of execution and analysis of android application.

compass - Searchengine built on top of Lucene

  •    Java

Compass is a real time searchengine. It is built on top of lucene. It is transactional, distributed, supports Spring MVC, integrates with Hibernate.

ionic-ion-swipe-cards - Swipeable card based layout for Ionic and Angular

  •    Javascript

Include `ionic.swipecards.js` after the rest of your Ionic and Angular includes. Then use the following AngularJS directives:```html<swipe-cards> <swipe-card ng-repeat="card in cards" on-destroy="cardDestroyed($index)" on-card-swipe="cardSwiped($index)"> Card content here </swipe-card></swipe-cards>```To add new cards dynamically, just add them to the cards array:```javascript$scope.cards = [ { // card 1 }, { // card 2 }];$scope.cardDestroyed = function(index) { $scope.cards.splice(index

Solr/Lucene on Azure

  •    Lucene

This project hosts Solr/Lucene in Windows Azure using multi-instance replication for index-serving and single-instance for index generation with a persistent index mounted in Azure storage. Typical scenarios could be a commercial and publisher sites that need to scale the traffic with increasing query volume and need to index maximum 16 TB of data and require couple of index updates per day.

dex - Index and query analyzer for MongoDB: compares MongoDB log files and index entries to make index recommendations

  •    Python

Dex is a MongoDB performance tuning tool that compares queries to the available indexes in the queried collection(s) and generates index suggestions based on simple heuristics. Currently you must provide a connection URI for your database. Dex uses the URI you provide as a helpful way to determine when an index is recommended. Dex does not take existing indexes into account when actually constructing its ideal recommendation.

zombodb - Making Postgres and Elasticsearch work together like it's 2018

  •    C

ZomboDB brings powerful text-search and analytics features to Postgres by using Elasticsearch as an index type. Its comprehensive query language and SQL functions enable new and creative ways to query your relational data. From a technical perspective, ZomboDB is a 100% native Postgres extension that implements Postgres' Index Access Method API. As a native Postgres index type, ZomboDB allows you to CREATE INDEX ... USING zombodb on your existing Postgres tables. At that point, ZomboDB takes over and fully manages the remote Elasticsearch index and guarantees transactionally-correct text-search query results.

full text index for sqlite3

  •    C++

FT3's goal is to add full text index support to sqlite3 databases.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.