geotrellis-netcdf - Scala/Spark Project For Reading NetCDF

  •        1

This repository contains an example project that demonstrates how to read NetCDF data from S3 or a local filesystem into a Spark/Scala program using NetCDF Java and manipulate the data using GeoTrellis. The ability to easily and efficiently read NetCDF data into a GeoTrellis program opens the possibility for those who are familiar with GeoTrellis and its related and surrounding tools to branch into climate research, and also makes it possible for climate researchers to take advantage of the many benefits that GeoTrellis can provide. Because GeoTrellis is a raster-oriented library, the approach that is demonstrated in this repository is to use the NetCDF library to load and query datasets and present the results as Java arrays which can be readily turned into GeoTrellis tiles. Once the data have been transformed into GeoTrellis tiles, they can be masked, summarized, and/or manipulated like any other GeoTrellis raster data. The results of that are shown below. In the last section there is a brief discussion of ideas for improving the S3 Reader.



Related Projects

geotrellis - GeoTrellis is a geographic data processing engine for high performance applications.

  •    Scala

GeoTrellis is a Scala library and framework that uses Spark to work with raster data. It is released under the Apache 2 License.GeoTrellis reads, writes, and operates on raster data as fast as possible. It implements many Map Algebra operations as well as vector to raster or raster to vector operations.

NetCDF library for .NET


NetCDF (network Common Data Form) is a software library and a standard binary data format supported by Unidata ( that enables the creation, access, and network sharing of array-oriented scientific data. This project is dedicated to ...

NCO netCDF Operators

  •    C

The netCDF Operators, or NCO, are a suite of file operators which facilitate manipulation and analysis of self-describing data stored in the (freely available) netCDF format. Volunteers welcome! See homepage for details and download links.

netcdf4-python - netcdf4-python: python/numpy interface to the netCDF C library

  •    Python

Python/numpy interface to the netCDF C library. For details on the latest updates, see the Changelog.

SDS: Scientific DataSet library and tools


The SDS library makes it easy for .Net developers to read, write and share scalars, vectors, matrices and multidimensional grids which are very common in scientific modelling. It supports CSV, NetCDF and other file format

PostGIS - Spatial and Geographic objects for PostgreSQL

  •    C

PostGIS is a spatial database extender for PostgreSQL object-relational database. It adds support for geographic objects allowing location queries to be run in SQL. PostGIS adds extra types (geometry, geography, raster and others) to the PostgreSQL database. It also adds functions, operators, and index enhancements that apply to these spatial types.

xarray - N-D labeled arrays and datasets in Python

  •    Python

xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.


  •    C

Repository of python interfaces to major scientific libraries, like HDF, netCDF and MPI. The interfaces are based on SWIG or PyRex, and should be portable to most Linux/Unix platforms.


  •    Fortran

EXODUS II is a model developed to store and retrieve transient data for finite element analyses. It is used for preprocessing, postprocessing, as well as code to code data transfer. ExodusII is based on netcdf. Includes the nemesis parallel extension

Climate Diagnostics Suite (CDS)

  •    C

CDS plots and analyzes output from numerical climate models. It is capable of comparing climate model simulations and/or observational data. It consists mostly of a series of MathWorks Matlab language scripts and reads data in NetCDF format.


  •    C

CDO is a collection of command line Operators to manipulate and analyse Climate and forecast model Data. Supported data formats are GRIB, netCDF, SERVICE, EXTRA and IEG.


  •    C

cdfread is a program for people wirking with mass spectrometry datasets. cdfread implements the routines to read mass spectra and mass chromatograms from data files in netCDF (quot;Andi-MSquot;) format. Centroid and profile data are supported.



MEXCDF is a MATLAB interface to netCDF files.


  •    Python

NCVTK: A VTK-based tool to visualize data stored in the NetCDF file format.


  •    C

Compares 2 NetCDF data files in-place to find specific variables, dimensions and/or attributes that differ. Highly recommended for regression testing models that generate massive datasets.


  •    C

ncclamp is a command-line tool for NetCDF files that allows you to replace values in-place for a variable given the old value to be replaced, the new value, and a comparison operator. The change is applied across all of the variable's dimensions.

snappydata - SnappyData - The Spark Database. Stream, Transact, Analyze, Predict in one cluster

  •    Scala

Apache Spark is a general purpose parallel computational engine for analytics at scale. At its core, it has a batch design center and is capable of working with disparate data sources. While this provides rich unified access to data, this can also be quite inefficient and expensive. Analytic processing requires massive data sets to be repeatedly copied and data to be reformatted to suit Spark. In many cases, it ultimately fails to deliver the promise of interactive analytic performance. For instance, each time an aggregation is run on a large Cassandra table, it necessitates streaming the entire table into Spark to do the aggregation. Caching within Spark is immutable and results in stale insight. At SnappyData, we take a very different approach. SnappyData fuses a low latency, highly available in-memory transactional database (GemFireXD) into Spark with shared memory management and optimizations. Data in the highly available in-memory store is laid out using the same columnar format as Spark (Tungsten). All query engine operators are significantly more optimized through better vectorization and code generation. The net effect is, an order of magnitude performance improvement when compared to native Spark caching, and more than two orders of magnitude better Spark performance when working with external data sources.

spark-nlp - Natural Language Understanding Library for Apache Spark.

  •    Jupyter

John Snow Labs Spark-NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment. This library has been uploaded to the spark-packages repository .

spark-cassandra-connector - DataStax Spark Cassandra Connector

  •    Scala

Lightning-fast cluster computing with Apache Spark™ and Apache Cassandra®.This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and execute arbitrary CQL queries in your Spark applications.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.