Displaying 1 to 20 from 30 results

spark - .NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

  •    CSharp

.NET for Apache Spark provides high performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.

Mobius - C# and F# language binding and extensions to Apache Spark

  •    CSharp

Mobius provides C# language binding to Apache Spark enabling the implementation of Spark driver program and data processing operations in the languages supported in the .NET framework like C# or F#.For more code samples, refer to Mobius\examples directory or Mobius\csharp\Samples directory.




Gimel - PayPal's Big Data Processing Framework

  •    Scala

Gimel provides unified Data API to access data from any storage like HDFS, GS, Alluxio, Hbase, Aerospike, BigQuery, Druid, Elastic, Teradata, Oracle, MySQL, etc.

LearningSpark - Scala examples for learning to use Spark

  •    Scala

This project contains snippets of Scala code for illustrating various Apache Spark concepts. It is intended to help you get started with learning Apache Spark (as a Scala programmer) by providing a super easy on-ramp that doesn't involve Unix, cluster configuration, building from sources or installing Hadoop. Many of these activities will be necessary later in your learning experience, after you've used these examples to achieve basic familiarity. It is intended to accompany a number of posts on the blog A River of Bytes.


registry - Schema Registry

  •    Java

Registry is a versioned entity framework that allows to build various registry services such as Schema Registry, ML Model Registry etc..

streamline - StreamLine - Streaming Analytics

  •    Java

Develop and deploy Streaming Analytics applications visually with bindings for streaming engine and multiple source/sinks, rich set of streaming operators and operational lifecycle management. Streaming Analytics Manager makes it easy to develop, monitor streaming applications and also provides analytics of data thats being processed by streaming application.

realtime-dashboard-example - This is a real-time dashboard example using Spark Streaming and Node.js

  •    Java

Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. At AppsFlyer, we use Spark for many of our offline processing services. Spark Streaming joined our technology stack a few months ago for real-time work flows, reading directly from Kafka to provide value to our clients in near-real-time.

fdp-modelserver - An umbrella project for multiple implementations of model serving

  •    Scala

-kafkastreamserver - implementation of model scoring and queryable state using Kafka streams Also includes implementation of custom Kafka streams store.

real-time-stream-processing-engine - This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch

  •    Scala

This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch. #Pre-Requisites for this project ####Elasticsearch Setup i) Download the Elasticsearch 5.0.0-alpha5 or latest version and unzip it.

trapezium - Framework to build batch, streaming and api services to deploy machine learning models using Spark and Akka compute

  •    Scala

Trapezium is a maven project. Following instructions will create Trapezium jar for your repository. On all your Spark nodes, create a file /opt/bda/environment and add environment for your cluster, e.g., DEV|QA|UAT|PROD. You can do this through a setup script so that any new node to your cluster will have this file automatically created. This file allows Trapezium to read data from different data sources or data locations based on your environment.