StratoSphere - Cloud Computing Framework for Big Data Analytics

  •        0

The Stratosphere System is an open-source cluster/cloud computing framework for Big Data analytics. It comprises of An extensible higher level language (Meteor) to quickly compose queries for common and recurring use cases, A parallel programming model (PACT, an extension of MapReduce) to run user-defined operations, An efficient massively parallel runtime (Nephele) for fault tolerant execution of acyclic data flows.

Stra­tosphere (named after the layer of the atmosphere above the clouds) explores the power of massively parallel computing for complex information manage­ment applications.

https://www.stratosphere.eu/
https://github.com/stratosphere-eu/stratosphere

Tags
Implementation
License
Platform

   




Related Projects

OpenNebula - Data Center Management Solution


OpenNebula provides solution for building and managing virtualized enterprise data centers and cloud infrastructures to enable on-premise IaaS clouds. OpenNebula interoperability makes cloud an evolution by leveraging existing IT assets, protecting your investments, and avoiding vendor lock-in. penNebula was designed to address the requirements of business use cases from leading companies and across multiple industries, such as Hosting, Telecom, eGovernment, Utility Computing and lot more.

congress - Congress


Congress is an open policy framework for the cloud. With Congress, a cloud operator can declare, monitor, enforce, and audit "policy" in a heterogeneous cloud environment. Congress gets inputs from a cloud's various cloud services; for example in OpenStack, Congress fetches information about VMs from Nova, and network state from Neutron, etc. Congress then feeds input data from those services into its policy engine where Congress verifies that the cloud's actual state abides by the cloud operator's policies. Congress is designed to work with any policy and any cloud service.Congress's job is to help people manage that plethora of state across all cloud services with a succinct policy language.

solr-scale-tk - Fabric-based framework for deploying and managing SolrCloud clusters in the cloud.


Setup========Make sure you're running Python 2.7 and have installed Fabric and boto dependencies. On the Mac, you can do:```sudo easy_install fabricsudo easy_install boto```For more information about fabric, see: http://docs.fabfile.org/en/1.8/Clone the pysolr project from github and set it up as well:```git clone https://github.com/toastdriven/pysolr.gitcd pysolrsudo python setup.py install```Note, you do not need to know any Python in order to use this framework.Local Setup========The framewor

Apache Beam - Unified model for defining both batch and streaming data-parallel processing pipelines


Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

BC-BSP - Big Cloud Bulk Synchronous Parallel


Big Cloud Bulk Synchronous Parallel

OpenStack - Software for building Private and Public Clouds


OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface.

dasein-cloud-opsource


Dasein Cloud implementation for the OpSource Cloud (now Dimension Data). For more information on Dasein Cloud, see the Dasein Cloud home page at https://github.com/greese/dasein-cloud.

openQRM - Datacenter Management and Cloud Computing Platform


openQRM a web-based open source datacenter management and cloud computing platform that integrates flexibly with your existing datacenter components. openQRM supports the all major virtualization technologies. It automates provisioning, virtualization, storage and configuration management, and it takes care of high-availability. A self-service cloud portal with integrated billing system enables end-users to request new managed servers and application stacks on-demand.

OpenMobster - Open Source Mobile Cloud Platform


OpenMobster, is an open source Enterprise Backend for Mobile Apps. It provides a bi-directional data synchronization service for mobile apps to synchronize their locally stored database with Enterprise services in the Cloud such as server apps, CRM, ERP, etc. It supports a platform-agnostic Cloud-initiated Push Notification System. It has framework for creating end-to-end Location Aware Apps.

Dasein Cloud - Open Source Cloud Abstraction Library


Dasein Cloud (pronounced "da z-eye-n") is a Java-based cloud abstraction layer that enables programmers to build applications under a "write-once, run against any cloud" philosophy. It provides an abstract model under which most IaaS and some PaaS services can be modeled. Developers write applications to the Dasein Cloud model and it is then translated into the underlying cloud provider model.

healthnmon


Healthnmon aims to deliver 'Cloud Resource Monitor', an extensible service to OpenStack Cloud Operating system by providing monitoring service for Cloud Resources and Infrastructure with a pluggable framework for 'Inventory Management', 'Alerts and notifications' and 'Utilization Data.

datalab - Interactive tools and developer experiences for Big Data on Google Cloud Platform.


Google Cloud DataLab provides a productive, interactive, and integrated tool to explore, visualize, analyze and transform data, bringing together the power of Python, SQL, and the Google Cloud Platform with services such as BigQuery and Storage.

FOSS-Cloud - Virtualization- and Cloud-Services


The FOSS-Cloud is a Software, which enables you, to build your own Private or your Public-Cloud. The FOSS-Cloud environment (software and hardware) is an integrated and redundant server infrastructure to provide cloud-Services, Windows or Linux based SaaS, Terminal Server, Virtual Desktop Infrastructure (VDI) or virtual server-environmens. It makes virtual machines available, which can be accessed from internally as well as from the Internet.

CloudStack - Infrastructure-as-a-Service (IaaS) software platform


CloudStack CE is an open source Infrastructure-as-a-Service (IaaS) software platform, which enables users to build, manage and deploy compute cloud environments.

Cloud Business Services


ISV Application Accelerator for business management of Cloud Applications build on Windows Azure or any hosting platform. Cloud Business Services is a strong framework for integrating business management (billing, meetering, monitoring and subscription management).

Sia - Your decentralized private cloud


Sia is a new decentralized cloud storage platform that radically alters the landscape of cloud storage. By leveraging smart contracts, client-side encryption, and sophisticated redundancy (via Reed-Solomon codes), Sia allows users to safely store their data with hosts that they do not know or trust. The result is a cloud storage marketplace where hosts compete to offer the best service at the lowest price. And since there is no barrier to entry for hosts, anyone with spare storage capacity can join the network and start making money.

Stratos - Highly-extensible PaaS Framework


Apache Stratos is a highly-extensible PaaS framework that helps to run Apache Tomcat, PHP, and MySQL applications, and can be extended to support many more environments on all major cloud infrastructures. For developers, Stratos provides a cloud-based environment for developing, testing, and running scalable applications. IT providers benefit from high utilization rates, automated resource management, and platform-wide insights, including monitoring and billing.

turbinia


Turbinia is an open-source framework for deploying, managing, and running forensic workloads on cloud platforms. It is intended to automate running of common forensic processing tools (i.e. Plaso, TSK, strings, etc) to help with processing evidence in the Cloud, scaling the processing of large amounts of evidence, and decreasing response time by parallelizing processing where possible.Turbinia is composed of different components for the client, server and the workers. These components can be run on local physical machines or in the Cloud. The Turbinia client makes requests to process evidence to the Turbinia server. The Turbinia server creates logical jobs from these incoming user requests, which creates and schedules forensic processing tasks to be run by the workers. The evidence to be processed will be split up by the jobs when possible, and many tasks can be created in order to process the evidence in parallel. One or more workers run continuously to process tasks from the server. Any new evidence created or discovered by the tasks will be fed back into Turbinia for further processing.

dasein-cloud-mock


Mock Dasein Cloud implementation to enable applications using Dasein Cloud to execute unit tests without the need for a live cloud. For more information on Dasein Cloud, see the Dasein Cloud home page at https://github.com/greese/dasein-cloud.

Synnefo - Open source Cloud Software, Used to create massively scalable IaaS clouds


Synnefo is a complete open source cloud stack written in Python that provides Compute, Network, Image, Volume and Storage services, similar to the ones offered by AWS. Synnefo manages multiple Ganeti clusters at the backend for handling low-level VM operations and uses Archipelago to unify cloud storage. To boost 3rd-party compatibility, Synnefo exposes the OpenStack APIs to users.