Displaying 1 to 20 from 26 results

pachyderm - Reproducible Data Science at Scale!

  •    Go

Pachyderm is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you're looking for a way to "productionize" them, Pachyderm can make this easy for you. Install Pachyderm locally or deploy on AWS/GCE/Azure in about 5 minutes.

NakedTensor - Bare bone examples of machine learning in TensorFlow

  •    Python

This is a bare bones example of TensorFlow, a machine learning package published by Google. You will not find a simpler introduction to it. In each example, a straight line is fit to some data. Values for the slope and y-intercept of the line that best fit the data are determined using gradient descent. If you do not know about gradient descent, check out the Wikipedia page.




spaCy - 💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython

  •    Python

spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license. 💫 Version 2.0 out now! Check out the new features here.

Stream-Framework - Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis

  •    Python

Stream Framework is a python library which allows you to build activity streams & newsfeeds using Cassandra and/or Redis. If you're not using python have a look at Stream, which supports Node, Ruby, PHP, Python, Go, Scala, Java and REST. Stream Framework's authors also offer a web service for building scalable newsfeeds & activity streams at Stream. It allows you to create your feeds by talking to a beautiful and easy to use REST API. There are clients available for Node, Ruby, PHP, Python, Go, Scala and Java. The Get Started page explains the API & concept in a few clicks. It's a lot easier to use, free up to 3 million feed updates and saves you the hassle of maintaining Cassandra, Redis, Faye, RabbitMQ and Celery workers.

Hue - The open source Apache Hadoop UI

  •    Java

Hue is a Web application for interacting with Apache Hadoop. It supports a FileBrowser for accessing HDFS, JobBrowser for accessing MapReduce jobs (MR1/MR2-YARN), Job Designer for creating MapReduce/Streaming/Java jobs, HBase Browser for exploring and modifying HBase tables and data, Oozie App for submitting and scheduling workflows and bundles, A Pig/HBase/Sqoop2 shell, Beeswax application for executing Hive queries, Search app for querying Solr and Solr Cloud.

hpat - A compiler-based big data framework in Python

  •    Python

High Performance Analytics Toolkit (HPAT) scales analytics/ML codes in Python to bare-metal cluster/cloud performance automatically. It compiles a subset of Python (Pandas/Numpy) to efficient parallel binaries with MPI, requiring only minimal code changes. HPAT is orders of magnitude faster than alternatives like Apache Spark. HPAT's documentation can be found here.


Big Data Twitter Demo

  •    

This demo analyzes tweets in real-time, even including a dashboard. The tweets are also archived in Azure DB/Blob and Hadoop where Excel can be used for BI!

conjure-up - Deploying complex solutions, magically.

  •    Python

Installing big software like whoa.This is the runtime application for processing spells to get those big software solutions up and going with as little hindrance as possible.

usql - U-SQL Examples and Issue Tracking

  •    CSharp

U-SQL is a new language from Microsoft for processing big data. U-SQL combines the familiar syntax of SQL with the expressiveness of custom code written in C#, on top of a scale-out runtime that can handle any size data.

clusterdock - clusterdock is a framework for creating Docker-based container clusters

  •    Python

clusterdock is a Python 3 project that enables users to build, start, and manage Docker container-based clusters. It uses a pluggable system for defining new types of clusters using folders called topologies and is a swell project, if I may say so myself.

acousticbrainz-server - The server components for the AcousticBrainz project

  •    Python

The server components for the AcousticBrainz project. Full installation instructions are available in INSTALL.md file. After installing, continue the following steps.

listenbrainz-server - Server for the ListenBrainz project

  •    Python

The ListenBrainz project is similar to the original AudioScrobbler®. Unlike the original project, ListenBrainz is open source and publishes its data as open data. A team of former Last.fm and current MusicBrainz hackers created the first version of ListenBrainz in a weekend. Since the original project was created, technology has advanced at an incredibly rapid pace, which made re-creating the original project fairly straightforward.

dvid - Distributed, Versioned, Image-oriented Dataservice

  •    Go

Status: In production use at Janelia. See wiki page for outside lab use of DVID. See the DVID Wiki for more information including installation and examples of use.

hazelcast-csharp-client - Hazelcast IMDG .NET Client

  •    CSharp

C# client implementation for Hazelcast, the open source in-memory data grid. A comparison of features supported by various clients can be found here. Hazelcast .Net Client supports .Net Framemork 4.0+ and Net Core 2.0+ .

hazelcast-go-client - Hazelcast IMDG Go Client

  •    Go

Go client implementation for Hazelcast, the open source in-memory data grid. Go client is implemented using the Hazelcast Open Binary Client Protocol.

hazelcast-python-client - Hazelcast IMDG Python Client

  •    Python

Python client implementation for Hazelcast, the open source in-memory data grid. Please take a look at our Getting Started guide.

aws-etl-orchestrator - A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda

  •    Python

Extract, transform, and load (ETL) operations collectively form the backbone of any modern enterprise data lake. It transforms raw data into useful datasets and, ultimately, into actionable insight. An ETL job typically reads data from one or more data sources, applies various transformations to the data, and then writes the results to a target where data is ready for consumption. The sources and targets of an ETL job could be relational databases in Amazon Relational Database Service (Amazon RDS) or on-premises, a data warehouse such as Amazon Redshift, or object storage such as Amazon Simple Storage Service (Amazon S3) buckets. Amazon S3 as a target is especially commonplace in the context of building a data lake in AWS. AWS offers AWS Glue, which is a service that helps author and deploy ETL jobs. AWS Glue is a fully managed extract, transform, and load service that makes it easy for customers to prepare and load their data for analytics. Other AWS Services also can be used to implement and manage ETL jobs. They include: AWS Database Migration Service (AWS DMS), Amazon EMR (using the Steps API), and even Amazon Athena.

k8s-ingress-claim - An admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains

  •    Go

k8s-ingress-claim provides an admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains by ingresses that have already been claimed by existing ingresses. This is implemented as an External Admission Webhook with the k8s-ingress-claim service running as a deployment on each cluster.