StratoSphere - Cloud Computing Framework for Big Data Analytics

  •        821

The Stratosphere System is an open-source cluster/cloud computing framework for Big Data analytics. It comprises of An extensible higher level language (Meteor) to quickly compose queries for common and recurring use cases, A parallel programming model (PACT, an extension of MapReduce) to run user-defined operations, An efficient massively parallel runtime (Nephele) for fault tolerant execution of acyclic data flows.

Stra­tosphere (named after the layer of the atmosphere above the clouds) explores the power of massively parallel computing for complex information manage­ment applications.

https://www.stratosphere.eu/
https://github.com/stratosphere-eu/stratosphere

Tags
Implementation
License
Platform

   




Related Projects

DataflowJavaSDK - Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines


Google Cloud Dataflow SDK for Java is a distribution of Apache Beam designed to simplify usage of Apache Beam on Google Cloud Dataflow service. This artifact includes the parent POM for other Dataflow SDK artifacts.

OpenNebula - Data Center Management Solution


OpenNebula provides solution for building and managing virtualized enterprise data centers and cloud infrastructures to enable on-premise IaaS clouds. OpenNebula interoperability makes cloud an evolution by leveraging existing IT assets, protecting your investments, and avoiding vendor lock-in. penNebula was designed to address the requirements of business use cases from leading companies and across multiple industries, such as Hosting, Telecom, eGovernment, Utility Computing and lot more.

congress - Congress


Congress is an open policy framework for the cloud. With Congress, a cloud operator can declare, monitor, enforce, and audit "policy" in a heterogeneous cloud environment. Congress gets inputs from a cloud's various cloud services; for example in OpenStack, Congress fetches information about VMs from Nova, and network state from Neutron, etc. Congress then feeds input data from those services into its policy engine where Congress verifies that the cloud's actual state abides by the cloud operator's policies. Congress is designed to work with any policy and any cloud service.Congress's job is to help people manage that plethora of state across all cloud services with a succinct policy language.

DataflowSDK-examples - Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines


Google Cloud Dataflow provides a simple, powerful programming model for building both batch and streaming parallel data processing pipelines. This repository hosts example pipelines that use the Cloud Dataflow SDK and demonstrate the basic functionality of Google Cloud Dataflow.A good starting point for new users is our set of word count (java, python) examples, which compute word frequencies. This series of four successively more detailed pipelines is described in detail in the accompanying walkthrough.

solr-scale-tk - Fabric-based framework for deploying and managing SolrCloud clusters in the cloud.


Setup========Make sure you're running Python 2.7 and have installed Fabric and boto dependencies. On the Mac, you can do:```sudo easy_install fabricsudo easy_install boto```For more information about fabric, see: http://docs.fabfile.org/en/1.8/Clone the pysolr project from github and set it up as well:```git clone https://github.com/toastdriven/pysolr.gitcd pysolrsudo python setup.py install```Note, you do not need to know any Python in order to use this framework.Local Setup========The framewor



Apache Beam - Unified model for defining both batch and streaming data-parallel processing pipelines


Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

BC-BSP - Big Cloud Bulk Synchronous Parallel


Big Cloud Bulk Synchronous Parallel

OpenStack - Software for building Private and Public Clouds


OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface.

dasein-cloud-opsource


Dasein Cloud implementation for the OpSource Cloud (now Dimension Data). For more information on Dasein Cloud, see the Dasein Cloud home page at https://github.com/greese/dasein-cloud.

openQRM - Datacenter Management and Cloud Computing Platform


openQRM a web-based open source datacenter management and cloud computing platform that integrates flexibly with your existing datacenter components. openQRM supports the all major virtualization technologies. It automates provisioning, virtualization, storage and configuration management, and it takes care of high-availability. A self-service cloud portal with integrated billing system enables end-users to request new managed servers and application stacks on-demand.

OpenMobster - Open Source Mobile Cloud Platform


OpenMobster, is an open source Enterprise Backend for Mobile Apps. It provides a bi-directional data synchronization service for mobile apps to synchronize their locally stored database with Enterprise services in the Cloud such as server apps, CRM, ERP, etc. It supports a platform-agnostic Cloud-initiated Push Notification System. It has framework for creating end-to-end Location Aware Apps.

DataflowPythonSDK - Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines


Google Cloud Dataflow SDK for Python is based on Apache Beam and targeted for executing Python pipelines on Google Cloud Dataflow.Google Cloud Dataflow for Python is now Apache Beam Python SDK and the code development moved to the Apache Beam repo.

healthnmon


Healthnmon aims to deliver 'Cloud Resource Monitor', an extensible service to OpenStack Cloud Operating system by providing monitoring service for Cloud Resources and Infrastructure with a pluggable framework for 'Inventory Management', 'Alerts and notifications' and 'Utilization Data.

Dasein Cloud - Open Source Cloud Abstraction Library


Dasein Cloud (pronounced "da z-eye-n") is a Java-based cloud abstraction layer that enables programmers to build applications under a "write-once, run against any cloud" philosophy. It provides an abstract model under which most IaaS and some PaaS services can be modeled. Developers write applications to the Dasein Cloud model and it is then translated into the underlying cloud provider model.

datalab - Interactive tools and developer experiences for Big Data on Google Cloud Platform.


Google Cloud DataLab provides a productive, interactive, and integrated tool to explore, visualize, analyze and transform data, bringing together the power of Python, SQL, and the Google Cloud Platform with services such as BigQuery and Storage.

Sensorbee - Lightweight stream processing engine for IoT


Sensorbee is designed for low-latency processing of streaming data at the edge of the network. IoT devices frequently generate large volumes of unstructured streaming data, such as video and audio streams. Even if the data streams are structured, they may be meaningless if their temporal characteristics are not considered. Cloud-based services are generally not good at processing these kinds of data. Preprocessing data streams before they are sent to the cloud makes large scale data processing in the cloud more efficient and reduces the usage of network bandwidth.

FOSS-Cloud - Virtualization- and Cloud-Services


The FOSS-Cloud is a Software, which enables you, to build your own Private or your Public-Cloud. The FOSS-Cloud environment (software and hardware) is an integrated and redundant server infrastructure to provide cloud-Services, Windows or Linux based SaaS, Terminal Server, Virtual Desktop Infrastructure (VDI) or virtual server-environmens. It makes virtual machines available, which can be accessed from internally as well as from the Internet.

CloudStack - Infrastructure-as-a-Service (IaaS) software platform


CloudStack CE is an open source Infrastructure-as-a-Service (IaaS) software platform, which enables users to build, manage and deploy compute cloud environments.

Cloud Business Services


ISV Application Accelerator for business management of Cloud Applications build on Windows Azure or any hosting platform. Cloud Business Services is a strong framework for integrating business management (billing, meetering, monitoring and subscription management).

Sia - Your decentralized private cloud


Sia is a new decentralized cloud storage platform that radically alters the landscape of cloud storage. By leveraging smart contracts, client-side encryption, and sophisticated redundancy (via Reed-Solomon codes), Sia allows users to safely store their data with hosts that they do not know or trust. The result is a cloud storage marketplace where hosts compete to offer the best service at the lowest price. And since there is no barrier to entry for hosts, anyone with spare storage capacity can join the network and start making money.