We have collection of more than 1 Million open source products ranging from Enterprise product to
small libraries in all platforms. We aggregate information from all open source repositories.
Search and find the best for your needs. Check out projects section.
.NET for Apache Spark provides high performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.
Margie's Travel (MT) provides concierge services for business travelers. In an increasingly crowded market, they are always looking for ways to differentiate themselves, and provide added value to their corporate customers. They are looking to pilot a web app that their internal customer service agents can use to provide additional information useful to the traveler during the flight booking process. They want to enable their agents to enter in the flight information and produce a prediction as to whether the departing flight will encounter a 15-minute or longer delay, considering the weather forecasted for the departure hour.
Sparklens is a profiling tool for Spark with built-in Spark Scheduler simulator. Its primary goal is to make it easy to understand the scalability limits of spark applications. It helps in understanding how efficiently is a given spark application using the compute resources provided to it. May be your application will run faster with more executors and may be it wont. Sparklens can answer this question by looking at a single run of your application. It helps you narrow down to few stages (or driver, or skew or lack of tasks) which are limiting your application from scaling out and provides contextual information about what could be going wrong with these stages. Primarily it helps you approach spark application tuning as a well defined method/process instead of something you learn by trial and error, saving both developer and compute time.
MinIO Spark select enables retrieving only required data from an object using Select API. S3 Select is supported with CSV, JSON and Parquet files using minioSelectCSV, minioSelectJSON and minioSelectParquet values to specify the data format.
In this code pattern, we will build a Scala app that uses Akka to implement a WebSockets endpoint which streams data to a Db2 Event Store database. For our data, we'll use online retail order details in CSV format. We'll use Jupyter notebooks with Scala and Brunel to visualize the Event Store data. Install IBM® Db2® Event Store Developer Edition on Mac, Linux, or Windows by following the instructions here.
As checkpointing enables us to process our data exactly once, we need to delete the checkpointing folders to re run our examples. Coming from radio stations stored inside a parquet file, the stream is emulated with .option("maxFilesPerTrigger", 1) option.
Spark package to "plug" holes in data using SQL based rules. At Indix, we work with a lot of data. Our data pipelines run a wide variety of ML models against our data. There are cases where we have to "plug" or override certain values or predictions in our data. This maybe due to bugs or deficiencies in our current models or just the inherent quality in the source/raw data.