RoaringBitmap - A better compressed bitset in Java

  •        276

Roaring bitmaps are compressed bitmaps (also called bitsets) which tend to outperform conventional compressed bitmaps such as WAH or Concise.

http://roaringbitmap.org/
https://github.com/RoaringBitmap/RoaringBitmap
https://github.com/lemire/RoaringBitmap

Tags
Implementation
License
Platform

   




Related Projects

roaring - Roaring bitmaps in Go (golang)

  •    Go

This is a go port of the Roaring bitmap data structure.Roaring bitmaps are used by several major systems such as Apache Lucene and derivative systems such as Solr and Elasticsearch, Metamarkets' Druid, LinkedIn Pinot, Netflix Atlas, Apache Spark, OpenSearchServer, Cloud Torrent, Whoosh, Pilosa, Microsoft Visual Studio Team Services (VSTS), and eBay's Apache Kylin.

tranquility - Tranquility helps you send real-time event streams to Druid and handles partitioning, replication, service discovery, and schema rollover, seamlessly and without downtime

  •    Scala

Tranquility helps you send event streams to Druid, the raddest data store ever (http://druid.io/), in real-time. It handles partitioning, replication, service discovery, and schema rollover for you, seamlessly and without downtime. Tranquility is written in Scala, and bundles idiomatic Java and Scala APIs that work nicely with Finagle, Samza, Spark, Storm, and Trident. You only need to include the modules you are actually using.

Gimel - PayPal's Big Data Processing Framework

  •    Scala

Gimel provides unified Data API to access data from any storage like HDFS, GS, Alluxio, Hbase, Aerospike, BigQuery, Druid, Elastic, Teradata, Oracle, MySQL, etc.

javaewah - A compressed alternative to the Java BitSet class

  •    Java

The bit array data structure is implemented in Java as the BitSet class. Unfortunately, this fails to scale without compression. JavaEWAH is a word-aligned compressed variant of the Java bitset class. It uses a 64-bit run-length encoding (RLE) compression scheme. The goal of word-aligned compression is not to achieve the best compression, but rather to improve query processing time. Hence, we try to save CPU cycles, maybe at the expense of storage. However, the EWAH scheme we implemented is always more efficient storage-wise than an uncompressed bitmap (implemented in Java as the BitSet class). Unlike some alternatives, javaewah does not rely on a patented scheme.

Druid IO - Real Time Exploratory Analytics on Large Datasets

  •    Java

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations. Druid can load both streaming and batch data.


Quicksql - Simpler, Safer, Faster Unified SQL Analytics Engine for Multi-Datasources

  •    Java

Quicksql is a SQL query product which can be used for specific datastore queries or multiple datastores correlated queries. It supports relational databases, non-relational databases and even datastore which does not support SQL (such as Elasticsearch, Druid) . In addition, a SQL query can join or union data from multiple datastores in Quicksql. For example, you can perform unified SQL query on one situation that a part of data stored on Elasticsearch, but the other part of data stored on Hive. The most important is that QSQL is not dependent on any intermediate compute engine, users only need to focus on data and unified SQL grammar to finished statistics and analysis. An architecture diagram helps you access Quicksql more easily.

pydruid - A Python connector for Druid

  •    Python

pydruid exposes a simple API to create, execute, and analyze Druid queries. pydruid can parse query results into Pandas DataFrame objects for subsequent data analysis -- this offers a tight integration between Druid, the SciPy stack (for scientific computing) and scikit-learn (for machine learning). pydruid can export query results into TSV or JSON for further processing with your favorite tool, e.g., R, Julia, Matlab, Excel. It provides both synchronous and asynchronous clients. Additionally, pydruid implements the Python DB API 2.0, a SQLAlchemy dialect, and a provides a command line interface to interact with Druid.

bitset - Go package implementing bitsets

  •    Go

Package bitset implements bitsets, a mapping between non-negative integers and boolean values. It should be more efficient than map[uint] bool.It provides methods for setting, clearing, flipping, and testing individual integers.

druid - Metamarkets Druid Data Store

  •    Java

Metamarkets Druid Data Store

Druid, The Database Manager

  •    Java

Druid is a GUI tool for database build and management. Users can add/change/delete DB objects (tables, fields, etc). Druid generates for you: SQL scripts, docs in XHTML, PDF, DocBook, etc; code in C, C++ amp; Java Beans even for JDO and support Castor amp; OJB

Signoz - Open-source Observability platform and an alternative to DataDog, NewRelic

  •    Javascript

SigNoz is an opensource observability platform. SigNoz uses distributed tracing to gain visibility into your systems and powers data using Kafka (to handle high ingestion rate and backpressure) and Apache Druid (Apache Druid is a high performance real-time analytics database), both proven in the industry to handle scale.

dvi2bitmap

  •    C++

dvi2bitmap is a utility to convert TeX DVI files directly to bitmaps, without going through the complicated (and slow!) route of conversion via PostScript and PNM. The prime motivation for this is to prepare mathematical equations for inclusion in HTML files, but there is a broad range of uses beyond that. dvi2bitmap... * is written in portable C++, and the program acts as a wrapper round the libdvi2bitmap library (both static and shareable), which abstracts DVI and PK files and their cont

android-DisplayingBitmaps

  •    Java

Sample demonstrating how to load large bitmaps efficiently off the main UI thread, caching bitmaps (both in memory and on disk), managing bitmap memory and displaying bitmaps in UI elements such as ViewPager and ListView/GridView. This is a sample application for the Android Training class Displaying Bitmaps Efficiently.

PyLucene - Python extension for accessing Java Lucene

  •    Python

PyLucene is a Python extension for accessing Java Lucene. Its goal is to allow you to use Lucene's text indexing and searching capabilities from Python. It is API compatible with the latest version of Java Lucene, PyLucene is not a Lucene port but a Python wrapper around Java Lucene. PyLucene embeds a Java VM with Lucene into a Python process.

Ferret - The extensible information retrieval library for ruby.

  •    Ruby

Ferret is an information retrieval library in the same vein as Apache Lucene. Originally it was a full port of Lucene but it now uses it's own file format and indexing algorithm although it is still very similar in many ways to Lucene. Everything you can do in Lucene you should be able to do in Ferret.

golucene - Go (Golang) port of Apache Lucene

  •    Go

Is Lucy faster than Lucene? It's written in C, after all. That depends. As of this writing, Lucy launches faster than Lucene thanks to tighter integration with the system IO cache, but Lucene is faster in terms of raw indexing and search throughput once it gets going. These differences reflect the distinct priorities of the most active developers within the Lucy and Lucene communities more than anything else.

BitmapMerger - Play with bitmaps

  •    Java

Bitmap Merger is a simple project help you to merge two bitmaps without memory exceptions. The bitmaps are processed in background threads thereby taking the load away from UI thread. Along with merge, it also contains the image decoder for decoding images from resources/disk and are sampled to prevent OutOfMemoryError.

spark - .NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

  •    CSharp

.NET for Apache Spark provides high performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.

EWAHBoolArray - A compressed bitmap class in C++.

  •    C++

The class EWAHBoolArray is a compressed bitset data structure. It supports several word sizes by a template parameter (16-bit, 32-bit, 64-bit). You should expect the 64-bit word-size to provide better performance, but higher memory usage, while a 32-bit word-size might compress a bit better, at the expense of some performance.The library also provides a basic BoolArray class which can serve as a traditional bitmap.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.