gleam - Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly

  •        30

Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize.Gleam is built in Go, and the user defined computation can be written in Go, Unix pipe tools, or any streaming programs.



Related Projects

glow - Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc

  •    Go

Glow is providing a library to easily compute in parallel threads or distributed to clusters of machines. This is written in pure Go.I am also working on a Go+Luajit system, , which is more flexible and more performant.

disco - a Map/Reduce framework for distributed computing

  •    Erlang

Disco is a distributed map-reduce and big-data framework. Like the original framework, which was publicized by Google, Disco supports parallel computations over large data sets on an unreliable cluster of computers. This makes it a perfect tool for analyzing and processing large datasets without having to bother about difficult technical questions related to distributed computing, such as communication protocols, load balancing, locking, job scheduling or fault tolerance, all of which are taken care by Disco. Note: For installing Disco, you cannot use the zip or tar.gz packages generated by github, instead you should clone this repository.

Hadoop Common

  •    Java

Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects

Redisson - Redis based In-Memory Data Grid for Java

  •    Java

Redisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.

disco - a Map/Reduce framework for distributed computing

  •    Erlang

a Map/Reduce framework for distributed computing


  •    CSS

Source repo for the book that I and my students in my course at Northeastern University, CS7680 Special Topics in Computing Systems: Programming Models for Distributed Computing, are writing on the topic of programming models for distributed systems. This is a book about the programming constructs we use to build distributed systems. These range from the small, RPC, futures, actors, to the large; systems built up of these components like MapReduce and Spark. We explore issues and concerns central to distributed systems like consistency, availability, and fault tolerance, from the lens of the programming models and frameworks that the programmer uses to build these systems.

Jepsen - A framework for distributed systems verification, with fault injection

  •    Clojure

Jepsen is a Clojure library. A test is a Clojure program which uses the Jepsen library to set up a distributed system, run a bunch of operations against that system, and verify that the history of those operations makes sense. Jepsen has been used to verify everything from eventually-consistent commutative databases to linearizable coordination systems to distributed task schedulers. It can also generate graphs of performance and availability, helping you characterize how a system responds to different faults.

Atomix - Scalable, fault-tolerant distributed systems protocols and primitives for the JVM

  •    Java

Atomix is an event-driven framework for coordinating fault-tolerant distributed systems built on the Raft consensus algorithm. It provides the building blocks that solve many common distributed systems problems including group membership, leader election, distributed concurrency control, partitioning, and replication.

OrangeFS - Scale-out Network File System

  •    C

OrangeFS is a scale-out network file system designed for use on high-end computing (HEC) systems that provides very high-performance access to multi-server-based disk storage, in parallel. The OrangeFS server and client are user-level code, making them very easy to install and manage. OrangeFS has optimized MPI-IO support for parallel and distributed applications, and it is leveraged in production installations and used as a research platform for distributed and parallel storage.

service-fabric - Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale

  •    C++

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale. Service Fabric runs on Windows and Linux, on any cloud, any datacenter, across geographic regions, or on your laptop. Learn about Service Fabric's Core Subsystems, mapped to this repo's folder structure.

barefoot - Java library for integrating the map into software and services with state-of-the-art online and offline map matching that can be used stand-alone and in the cloud

  •    Java

An open source Java library for online and offline map matching with OpenStreetMap. Together with its extensive set of geometric and spatial functions, an in-memory map data structure and basic machine learning functions, it is a versatile basis for scalable location-based services and spatio-temporal data analysis on the map. It is designed for use in parallel and distributed systems and, hence, includes a stand-alone map matching server and can be used in distributed systems for map matching services in the cloud. Barefoot consists of a software library and a (Docker-based) map server that provides access to street map data from OpenStreetMap and is flexible to be used in distributed cloud infrastructures as map data server or side-by-side with Barefoot's stand-alone servers for offline (matcher server) and online map matching (tracker server), or other applications built with Barefoot library. Access to map data is provided with a fast and flexible in-memory map data structure. Together with GeographicLib [1] and ESRI's geometry API [2], it provides an extensive set of geographic and geometric operations for spatial data analysis on the map.

Ganglia - scalable distributed monitoring system

  •    C

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.

distsys-class - Class materials for a distributed systems lecture series


This outline accompanies a 12-16 hour overview class on distributed systems fundamentals. The course aims to introduce software engineers to the practical basics of distributed systems, through lecture and discussion. Participants will gain an intuitive understanding of key distributed systems terms, an overview of the algorithmic landscape, and explore production concerns.A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.

orleans - Orleans - Distributed Virtual Actor Model

  •    CSharp

Orleans is a framework that provides a straight-forward approach to building distributed high-scale computing applications, without the need to learn and apply complex concurrency or other scaling patterns.It was created by Microsoft Research implementing the Virtual Actor Model and designed for use in the cloud.

rDSN - Robust Distributed System Nucleus (rDSN) is an open framework for quickly building and managing high performance and robust distributed systems

  •    C++

Robust Distributed System Nucleus (rDSN) is a framework for quickly building robust distributed systems. It has a microkernel for pluggable components, including applications, distributed frameworks, devops tools, and local runtime/resource providers, enabling their independent development and seamless integration. The project was originally developed for Microsoft Bing, and now has been adopted in production both inside and outside Microsoft.The core of rDSN is a service kernel with which we can develop (via Service API and Tool API) and plugin lots of different application, framework, tool, and local runtime modules, so that they can seamlessly benefit each other. Here is an incomplete list of the pluggable modules.

micro - A microservice toolkit for distributed systems development

  •    Go

Micro is a microservice toolkit. Its purpose is to simplify distributed systems development.Check out go-micro if you want to start writing services in Go now or ja-micro for Java. Examples of how to use micro with other languages can be found in examples/sidecar.

LuMongo - Realtime Time Distributed Search

  •    Java

LuMongo is a real-time distributed search and storage system based on Lucene. LuMongo is designed from the ground up to scale both vertically and horizontally across servers. LuMongo stores Lucene indexes directly into MongoDB. Documents can be stored natively into MongoDB. When stored natively document can be queried as normal out of MongoDB and use of Map-Reduce and the Aggregation Framework is possible.

harvey - A distributed operating system

  •    C

Welcome to Harvey, we are delighted that you are interested in the project. Harvey is a distributed operating system. It’s most directly descended from Plan 9 from Bell Labs. This heritage spans from its distributed application architecture all the way down to much of its code. However, Harvey aims to be a more practical, general-purpose operating system, so it also uses ideas and code from other systems.