SeaweedFS - Simple and highly scalable distributed file system

  •        60

SeaweedFS is a simple and highly scalable distributed file system. There are two objectives: to store billions of files! to serve the files fast! Instead of supporting full POSIX file system semantics, SeaweedFS choose to implement only a key~file mapping. Similar to the word "NoSQL", you can call it as "NoFS".

Instead of managing all file metadata in a central master, the central master only manages file volumes, and it lets these volume servers manage files and their metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers, allowing faster file access (just one disk read operation).

SeaweedFS started by implementing Facebook's Haystack design paper. SeaweedFS is currently growing, with more features on the way.!forum/seaweedfs



Related Projects

Jepsen - A framework for distributed systems verification, with fault injection

Jepsen is a Clojure library. A test is a Clojure program which uses the Jepsen library to set up a distributed system, run a bunch of operations against that system, and verify that the history of those operations makes sense. Jepsen has been used to verify everything from eventually-consistent commutative databases to linearizable coordination systems to distributed task schedulers. It can also generate graphs of performance and availability, helping you characterize how a system responds to different faults.


Yet-Another File System is a distributed filesystem implementation in C for my Distributed Systems class #columbia

replfs - A replicated filesystem implemented in C and C++ for CS244B - distributed systems

A replicated filesystem implemented in C and C++ for CS244B - distributed systems

storr - :package: Object cacher for R

Simple object cacher for R. storr acts as a very simple key-value store (supporting get/set/del for arbitrary R objects as data). The actual storage can be transient or persistent, local or distributed without changing the interface. To allow for distributed access, data is returned by content rather than simply by key (with a key/content lookup step) so that if another process changes the data, storr will retrieve the current version.storr always goes back to the common storage (database, filesystem, whatever) for the current object to hash mapping, ensuring consistency when using multiple processes. However, when retrieving or writing the data given a hash we can often avoid accessing the underlying storage. This means that repeated lookups happen quickly while still being able to reflect changes elsewhere; time savings can be substantial where large objects are being stored.

namazu - :fish: 鯰: Programmable fuzzy scheduler for testing distributed systems

Namazu (formerly named Earthquake) is a programmable fuzzy scheduler for testing real implementations of distributed system such as ZooKeeper.Namazu permutes Java function calls, Ethernet packets, Filesystem events, and injected faults in various orders so as to find implementation-level bugs of the distributed system. Namazu can also control non-determinism of the thread interleaving (by calling sched_setattr(2) with randomized parameters). So Namazu can be also used for testing standalone multi-threaded software.

storoom - a simple distributed storage

storoom is a simple distributed storage system, it uses key-value pair to store data. contains client/clientlib stornode, node to store data to physical storage system storoom, controller of stornode<s> or storoom<s> develop in

dsync - A distributed sync package.

A distributed locking and syncing package for Go.dsync is a package for doing distributed locks over a network of n nodes. It is designed with simplicity in mind and hence offers limited scalability (n <= 16). Each node will be connected to all other nodes and lock requests from any node will be broadcast to all connected nodes. A node will succeed in getting the lock if n/2 + 1 nodes (whether or not including itself) respond positively. If the lock is acquired it can be held for as long as the client desires and needs to be released afterwards. This will cause the release to be broadcast to all nodes after which the lock becomes available again.

glow - Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc

Glow is providing a library to easily compute in parallel threads or distributed to clusters of machines. This is written in pure Go.I am also working on a Go+Luajit system, , which is more flexible and more performant.

HBase - Hadoop database

HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.

DVFS - A Distributed Virtual FileSystem

A Distributed Virtual FileSystem

chirp-tools - Additional system tools for the Chirp Distributed Filesystem

Additional system tools for the Chirp Distributed Filesystem

Trieste - Distributed filesystem written in Python

Distributed filesystem written in Python


[Academic] Optimized chunk-based distributed filesystem with configurable replication

DFS - distributed filesystem

distributed filesystem

openafs - OpenAFS distributed filesystem

OpenAFS distributed filesystem

ccd - encrypted redundant distributed filesystem on top of couchdb

encrypted redundant distributed filesystem on top of couchdb

NACS - High secure encoded distributed filesystem

High secure encoded distributed filesystem

LegionFS - Yet another distributed filesystem

Yet another distributed filesystem

mingfs - mfs is a distributed-filesystem written by go lang.

mfs is a distributed-filesystem written by go lang.