Displaying 1 to 9 from 9 results

Gaffer - A large-scale entity and relation database supporting aggregation of properties

  •    Java

Gaffer is a graph database framework. It allows the storage of very large graphs containing rich properties on the nodes and edges. Several storage options are available, including Accumulo, Hbase and Parquet. It is designed to be as flexible, scalable and extensible as possible, allowing for rapid prototyping and transition to production systems.

incubator-hudi - Upserts And Incremental Processing on Big Data

  •    Java

Hoodie is a Apache Spark library that provides the ability to efficiently do incremental processing on datasets in HDFS

hoodie - Spark Library for Hadoop Upserts And Incrementals

  •    Java

Hoodie is a Apache Spark library that provides the ability to efficiently do incremental processing on datasets in HDFS

iceberg - Iceberg is a table format for large, slow-moving tabular data

  •    Java

Iceberg is a new table format for storing large, slow-moving tabular data. It is designed to improve on the de-facto standard table layout built into Hive, Presto, and Spark.Iceberg is under active development at Netflix.




parquet-rs - Apache Parquet implementation in Rust

  •    Rust

See crate documentation on available API. To update Parquet format to a newer version, check if parquet-format version is available. Then simply update version of parquet-format crate in Cargo.toml.

hudi - Spark Library for Hadoop Upserts And Incrementals

  •    Java

Hoodie is a Apache Spark library that provides the ability to efficiently do incremental processing on datasets in HDFS

eel-sdk - Big Data Toolkit for the JVM

  •    Scala

Eel is a toolkit for manipulating data in the hadoop ecosystem. By hadoop ecosystem we mean file formats common to the big-data world, such as parquet, orc, csv in locations such as HDFS or Hive tables. In contrast to distributed batch or streaming engines such as Spark or Flink, Eel is an SDK intended to be used directly in process. Eel is a lower level API than higher level engines like Spark and is aimed for those use cases when you want something like a file API. Here are some of our notes comparing eel to other tools that offer functionality similar to eel.

devops-python-tools - DevOps CLI Tools for Hadoop, Spark, HBase, Log Anonymizer, Ambari Blueprints, AWS CloudFormation, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Elasticsearch, Solr, Travis CI, Pig, IPython - Python / Jython Tools

  •    Python

A few of the Big Data, NoSQL & Linux tools I've written over the years. All programs have --help to list the available options. For many more tools see the DevOps Perl Tools and Advanced Nagios Plugins Collection repos which contains many Hadoop, NoSQL, Web and infrastructure tools and Nagios plugins.