Displaying 1 to 6 from 6 results

spark-nlp - Natural Language Understanding Library for Apache Spark.

  •    Jupyter

John Snow Labs Spark-NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment. This library has been uploaded to the spark-packages repository https://spark-packages.org/package/JohnSnowLabs/spark-nlp .

fdp-modelserver - An umbrella project for multiple implementations of model serving

  •    Scala

-kafkastreamserver - implementation of model scoring and queryable state using Kafka streams Also includes implementation of custom Kafka streams store.

rasterframes - Geospatial Raster support for Spark DataFrames

  •    Scala

RasterFrames™ brings the power of Spark DataFrames to geospatial raster data, empowered by the map algebra and tile layer operations of GeoTrellis. Please see the Getting Started section of the Users' Manual to start using RasterFrames.




sparklens - Qubole Sparklens tool for performance tuning Apache Spark

  •    Scala

Sparklens is a profiling tool for Spark with built-in Spark Scheduler simulator. Its primary goal is to make it easy to understand the scalability limits of spark applications. It helps in understanding how efficiently is a given spark application using the compute resources provided to it. May be your application will run faster with more executors and may be it wont. Sparklens can answer this question by looking at a single run of your application. It helps you narrow down to few stages (or driver, or skew or lack of tasks) which are limiting your application from scaling out and provides contextual information about what could be going wrong with these stages. Primarily it helps you approach spark application tuning as a well defined method/process instead of something you learn by trial and error, saving both developer and compute time.

sparkflow - Easy to use library to bring Tensorflow on Apache Spark

  •    Python

This is an implementation of Tensorflow on Spark. The goal of this library is to provide a simple, understandable interface in using Tensorflow on Spark. With SparkFlow, you can easily integrate your deep learning model with a ML Spark Pipeline. Underneath, SparkFlow uses a parameter server to train the Tensorflow network in a distributed manner. Through the api, the user can specify the style of training, whether that is Hogwild or async with locking. While there are other libraries that use Tensorflow on Apache Spark, Sparkflow's objective is to work seemlessly with ML Pipelines, provide a simple interface for training Tensorflow graphs, and give basic abstractions for faster development. For training, Sparkflow uses a parameter server which lives on the driver and allows for asynchronous training. This tool provides faster training time when using big data.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.