Gaffer is a graph database framework. It allows the storage of very large graphs containing rich properties on the nodes and edges. Several storage options are available, including Accumulo, Hbase and Parquet. It is designed to be as flexible, scalable and extensible as possible, allowing for rapid prototyping and transition to production systems.
accumulo graph graph-database hadoop big-data aggregation hbase parquet sparkHoodie is a Apache Spark library that provides the ability to efficiently do incremental processing on datasets in HDFS
hadoop spark parquet analytics-database ingestion hoodie hudi columnar storageHoodie is a Apache Spark library that provides the ability to efficiently do incremental processing on datasets in HDFS
hadoop spark parquet analytics-database ingestion hoodieIceberg is a new table format for storing large, slow-moving tabular data. It is designed to improve on the de-facto standard table layout built into Hive, Presto, and Spark.Iceberg is under active development at Netflix.
spark hadoop parquet avroSee crate documentation on available API. To update Parquet format to a newer version, check if parquet-format version is available. Then simply update version of parquet-format crate in Cargo.toml.
parquet hadoopHoodie is a Apache Spark library that provides the ability to efficiently do incremental processing on datasets in HDFS
hadoop spark parquet analytics-database ingestion hoodieEel is a toolkit for manipulating data in the hadoop ecosystem. By hadoop ecosystem we mean file formats common to the big-data world, such as parquet, orc, csv in locations such as HDFS or Hive tables. In contrast to distributed batch or streaming engines such as Spark or Flink, Eel is an SDK intended to be used directly in process. Eel is a lower level API than higher level engines like Spark and is aimed for those use cases when you want something like a file API. Here are some of our notes comparing eel to other tools that offer functionality similar to eel.
parquet orc hive kudu kafka big-data etl hadoopA few of the Big Data, NoSQL & Linux tools I've written over the years. All programs have --help to list the available options. For many more tools see the DevOps Perl Tools and Advanced Nagios Plugins Collection repos which contains many Hadoop, NoSQL, Web and infrastructure tools and Nagios plugins.
ambari cloudformation hbase json avro parquet spark pyspark travis-ci pig elasticsearch solr xml hadoop hdfs dockerhub docker aws
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.