Displaying 1 to 11 from 11 results

spark - .NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

  •    CSharp

.NET for Apache Spark provides high performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.

MCW-Big-data-and-visualization - MCW Big data and visualization

  •    Javascript

Margie's Travel (MT) provides concierge services for business travelers. In an increasingly crowded market, they are always looking for ways to differentiate themselves, and provide added value to their corporate customers. They are looking to pilot a web app that their internal customer service agents can use to provide additional information useful to the traveler during the flight booking process. They want to enable their agents to enter in the flight information and produce a prediction as to whether the departing flight will encounter a 15-minute or longer delay, considering the weather forecasted for the departure hour.




sparklens - Qubole Sparklens tool for performance tuning Apache Spark

  •    Scala

Sparklens is a profiling tool for Spark with built-in Spark Scheduler simulator. Its primary goal is to make it easy to understand the scalability limits of spark applications. It helps in understanding how efficiently is a given spark application using the compute resources provided to it. May be your application will run faster with more executors and may be it wont. Sparklens can answer this question by looking at a single run of your application. It helps you narrow down to few stages (or driver, or skew or lack of tasks) which are limiting your application from scaling out and provides contextual information about what could be going wrong with these stages. Primarily it helps you approach spark application tuning as a well defined method/process instead of something you learn by trial and error, saving both developer and compute time.

Spark-Example - Spark1

  •    Scala

Spark1.6和spark2.2的示例,包含kafka,flume,structuredstreaming,jedis,elasticsearch,mysql,dataframe

spark-select - A library for Spark DataFrame using MinIO Select API

  •    Scala

MinIO Spark select enables retrieving only required data from an object using Select API. S3 Select is supported with CSV, JSON and Parquet files using minioSelectCSV, minioSelectJSON and minioSelectParquet values to specify the data format.

db2-event-store-akka-streams - Use Akka to implement a WebSockets endpoint and stream data to Db2 Event Store

  •    Jupyter

In this code pattern, we will build a Scala app that uses Akka to implement a WebSockets endpoint which streams data to a Db2 Event Store database. For our data, we'll use online retail order details in CSV format. We'll use Jupyter notebooks with Scala and Brunel to visualize the Event Store data. Install IBM® Db2® Event Store Developer Edition on Mac, Linux, or Windows by following the instructions here.


Spark-Structured-Streaming-Examples - Spark Structured Streaming / Kafka / Cassandra / Elastic

  •    Scala

As checkpointing enables us to process our data exactly once, we need to delete the checkpointing folders to re run our examples. Coming from radio stations stored inside a parquet file, the stream is emulated with .option("maxFilesPerTrigger", 1) option.

sparkplug - Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌

  •    Scala

Spark package to "plug" holes in data using SQL based rules. At Indix, we work with a lot of data. Our data pipelines run a wide variety of ML models against our data. There are cases where we have to "plug" or override certain values or predictions in our data. This maybe due to bugs or deficiencies in our current models or just the inherent quality in the source/raw data.