node-parquet - NodeJS module to access apache parquet format files

  •        955

Parquet is a columnar storage format available to any project in the Hadoop ecosystem. This nodejs module provides native bindings to the parquet functions from parquet-cpp.A pure javascript parquet format driver (still in development) is also provided.

https://github.com/skale-me/node-parquet

Dependencies:

hexdump-nodejs : ^0.1.0
minimist : ^1.2.0
nan : ^2.5.0
varint : ^5.0.0

Tags
Implementation
License
Platform

   




Related Projects

incubator-parquet-mr - Mirror of Apache Parquet

  •    Java

Parquet is a very active project, and new features are being added quickly; below is the state as of June 2013.<table> <tr><th>Feature</th><th>In trunk</th><th>In dev</th><th>Planned</th><th>Expected release</th></tr> <tr><td>Type-specific encoding</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr> <tr><td>Hive integration</td><td>YES (<a href ="https://github.com/Parquet/parquet-mr/pull/28">28</a>)</td><td></td></td><td></td><td>1.0</td></tr> <tr><td>Pig integration</td><td>YES</td><t

influxdb_iox - Pronounced (influxdb eye-ox), short for iron oxide

  •    Rust

Persistence is through Parquet files in object storage. It is a design goal to support integration with other big data systems through object storage and Parquet specifically. For more details on the motivation behind the project and some of our goals, read through the InfluxDB IOx announcement blog post. If you prefer a video that covers a little bit of InfluxDB history and high level goals for InfluxDB IOx you can watch Paul Dix's announcement talk from InfluxDays NA 2020. For more details on the motivation behind the selection of Apache Arrow, Flight and Parquet, read this.

parquet-go - Golang version of Read/Write parquet file

  •    Go

parquet-go is a pure-go implementation of reading and writing the parquet format file. Look at examples in example/.

Gaffer - A large-scale entity and relation database supporting aggregation of properties

  •    Java

Gaffer is a graph database framework. It allows the storage of very large graphs containing rich properties on the nodes and edges. Several storage options are available, including Accumulo, Hbase and Parquet. It is designed to be as flexible, scalable and extensible as possible, allowing for rapid prototyping and transition to production systems.

parquet-format - Mirror of Apache Parquet

  •    Java

Parquet is a columnar storage format that supports nested data. This provides all generated metadata code.


parquet-mr - Mirror of Apache Parquet

  •    Java

Parquet is a columnar storage format that supports nested data. This provides the java implementation.

incubator-hudi - Upserts And Incremental Processing on Big Data

  •    Java

Hoodie is a Apache Spark library that provides the ability to efficiently do incremental processing on datasets in HDFS

Quilt - Data Engineering Infrastructure

  •    Python

With Quilt you can build, push, and install data packages. Data packages are versioned, reusable data structures that can be loaded into Python. Quilt is designed to support reproducible, auditable, and compliant workflows. Quilt consists of three source-level components data catalog, data registry and data compiler.

spindle - Next-generation web analytics processing with Scala, Spark, and Parquet.

  •    Javascript

Spindle is Brandon Amos' 2014 summer internship project with Adobe Research and is not under active development.Analytics platforms such as Adobe Analytics are growing to process petabytes of data in real-time. Delivering responsive interfaces querying this amount of data is difficult, and there are many distributed data processing technologies such as Hadoop MapReduce, Apache Spark, Apache Drill, and Cloudera Impala to build low-latency query systems.

sparser - Sparser: Raw Filtering for Faster Analytics over Raw Data

  •    C

This code base implements Sparser, raw filtering for faster analytics over raw data. Sparser can parse JSON, Avro, and Parquet data up to 22x faster than the state of the art. For more details, check out our paper published at VLDB 2018. Then enter 1 at the Sparser> prompt.

docpad - Empower your website frontends with layouts, meta-data, pre-processors (markdown, jade, coffeescript, etc

  •    CoffeeScript

Hi! I'm DocPad, I streamline the web development process and help close the gap between experts and beginners. I've been used in production by big and small companies for over a year and a half now to create plenty of amazing and powerful web sites and applications quicker than ever before. What makes me different is instead of being a box to cram yourself into and hold you back, I'm a freeway to what you want to accomplish, just getting out of your way and allowing you to create stuff quicker than ever before without limits. Leave the redundant stuff up to me, so you can focus on the awesome stuff.Discover my features below, or skip ahead to the installation instructions to get started with a fully functional pre-made website in a few minutes from reading this.

Vespa - Yahoo's big data serving engine

  •    Java

Vespa is an engine for low-latency computation over large data sets. It stores and indexes your data such that queries, selection and processing over the data can be performed at serving time. Vespa is serving platform for Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Gemini, Flickr.

Kylin - Extreme OLAP Engine for Big Data

  •    Java

Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc. It is designed to reduce query latency on Hadoop for 10+ billions of rows of data. It offers ANSI SQL on Hadoop and supports most ANSI SQL query functions.

Apache Tajo - A big data warehouse system on Hadoop

  •    Java

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

eland - Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

  •    Python

Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API. Where possible the package uses existing Python APIs and data structures to make it easy to switch between numpy, pandas, scikit-learn to their Elasticsearch powered equivalents. In general, the data resides in Elasticsearch and not in memory, which allows Eland to access large datasets stored in Elasticsearch.

eta - Embedded JS template engine for Node, Deno, and the browser

  •    TypeScript

Eta is a lightweight and blazing fast embedded JS templating engine that works inside Node, Deno, and the browser. Created by the developers of Squirrelly, it's written in TypeScript and emphasizes phenomenal performance, configurability, and low bundle size. Simply put, Eta is super: super lightweight, super fast, super powerful, and super simple. Like with EJS, you don't have to worry about learning an entire new templating syntax. Just write JavaScript inside your templates.

email-templates - :mailbox: Create, preview, and send custom email templates for Node

  •    Javascript

Create, preview, and send custom email templates for Node.js. Highly configurable and supports automatic inline CSS, stylesheets, embedded images and fonts, and much more! Made for sending beautiful emails with Lad.NEW: v3.x is released (you'll need Node v6.4.0+); see breaking changes below. 2.x branch docs available if necessary.

node-msgpack - A space-efficient object serialization library for NodeJS

  •    Javascript

node-msgpack is an addon for NodeJS that provides an API for serializing and de-serializing JavaScript objects using the MessagePack library. The performance of this addon compared to the native JSON object isn't too bad, and the space required for serialized data is far less than JSON. node-msgpack is currently slower than the built-in JSON.stringify() and JSON.parse() methods. In recent versions of node.js, the JSON functions have been heavily optimized. node-msgpack is still more compact, and we are currently working performance improvements. Testing shows that, over 500k iterations, msgpack.pack() is about 5x slower than JSON.stringify(), and msgpack.unpack() is about 3.5x slower than JSON.parse().

essential-image-optimization - Essential Image Optimization - an eBook

  •    CSS

Bring up a terminal and type node --version. Node should respond with a version at or above 0.10.x. If you require Node, go to nodejs.org and click on the big green Install button.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.