Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize.Gleam is built in Go, and the user defined computation can be written in Go, Unix pipe tools, or any streaming programs.
distributed-computing map-reduce distributed-systems distributed-system distributedHPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data. It supports two type of configuration. Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities.
hadoop-alternative distributed-file-system map-reduce machine-learningApache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects
cluster map-reduce distributed-file-systemScalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs. Scalding is built on top of Cascading, a Java library that abstracts away low-level Hadoop details. Scalding is comparable to Pig, but offers tight integration with Scala, bringing advantages of Scala to your MapReduce jobs.
hadoop map-reduce cascadingA search engine which can hold 100 trillion lines of log data.
poseidon search-engine big-data map-reduceApache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.
cluster cluster-computing data-analytics analytics hdfs map-reduce big-datamrjob is a Python 2.7/3.3+ package that helps you write and run Hadoop Streaming jobs. It fully supports Amazon's Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. mrjob has basic support for Google Cloud Dataproc (Dataproc) which allows you to buy time on a Hadoop cluster on a minute-by-minute basis. It also works with your own Hadoop cluster.
map-reduce hadoop-streaming aws python-mapreduceApache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.
database distributed-database newsql oltp hbase hadoop map-reduceApache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem.
map-reduce batch-processing data-processing big-data hadoop yarn directed-acyclic-graphHue is a Web application for interacting with Apache Hadoop. It supports a FileBrowser for accessing HDFS, JobBrowser for accessing MapReduce jobs (MR1/MR2-YARN), Job Designer for creating MapReduce/Streaming/Java jobs, HBase Browser for exploring and modifying HBase tables and data, Oozie App for submitting and scheduling workflows and bundles, A Pig/HBase/Sqoop2 shell, Beeswax application for executing Hive queries, Search app for querying Solr and Solr Cloud.
hadoop-tools hadoop-client big-data map-reducePigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig or Cascading but you don't need to know much about either of them to use it. It provides the ability to write map-reduce queries as programs, not as scripts.
map-reducePangool is a framework on top of Hadoop that implements Tuple MapReduce. It supports secondary sorting, Built-in reduce-side joining capabilities, Built-in serialization support for Thrift and ProtoStuff and lot more.
map-reduce sort tupleCalculate multidimensional sums in a LevelDB and get live updates.Create a new sums db. Make sure your db has been given the powers of level-sublevel.
leveldb level sublevel sum rolling moving map-reducePhadoop allows you to write map/reduce tasks for Hadoop in PHP. I created it to give a techtalk about Hadoop in the company I worked in. It is not ready for production use yet but can help you to play with Hadoop in PHP. You can find more examples in the examples directory.
hadoop map-reduceTimothy's primary goal is to make Hadoop's Yellow Elephant rich and famous. Status and counters for the job can be updated using the this.updateStatus and this.updateCounter functions.
hadoop map-reduceAIStore (AIS for short) is a built from scratch storage solution for AI applications. At its core, it's open-source object storage with extensions tailored for AI and, specifically, for petascale deep learning. As a storage system, AIS is a distributed object store with a RESTful S3-like API, and the gamut of capabilities that one would normally expect from an object store: eventual consistency, flat namespace, versioning, and all the usual read/write and control primitives to read and write objects and create, destroy, list, and configure buckets that contain those objects.
high-performance erasure-coding high-availability object-storage map-reduce scale-outThis is an implementation of the appengine datastore mapper functionality of the appengine map-reduce framework developed using in Go. It is alpha status while I develop the API and should not be used in production yet but it's been successfully used to export datastore entities to JSON for import into BigQuery, streaming inserts directly into BigQuery and for schema migrations and lightweight aggregation reporting.
datastore-mapper datastore-entities cloud-storage shards bigquery datastore appengine map-reduceDistributed reduce on top of hypercore. We believe ops doesn't need to be complicated. If hypercore is distributed streams, hyperreduce is a distributed reducer for streams. We needed this to to turn our feed of server errors into a single meaningful value.
hyper reducer map-reduce hypercore dat
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.