s3committer - Hadoop output committers for S3

  •        73

This project has Hadoop OutputCommitter implementations for S3. Callers should use S3DirectoryOutputCommitter for single-directory outputs, or S3PartitionedOutputCommitter for partitioned data.

https://github.com/rdblue/s3committer

Tags
Implementation
License
Platform

   




Related Projects

s3-lambda - Lambda functions over S3 objects with concurrency control (each, map, reduce, filter)

  •    Javascript

s3-lambda enables you to run lambda functions over a context of S3 objects. It has a stateless architecture with concurrency control, allowing you to process a large number of files very quickly. This is useful for quickly prototyping complex data jobs without an infrastructure like Hadoop or Spark. At Littlstar, we use s3-lambda for all sorts of data pipelining and analytics.

genie - Distributed Big Data Orchestration Service

  •    Java

Genie is a federated job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.See the official website to find documentation about Genie and specific documentation for various releases.

netflix-1080p - Chrome extension to play Netflix in 1080p and 5.1

  •    Javascript

What it is doing is testing your User-agent for the "CrOS" string anywhere in it. If the search returns true, it appends the 1080p profile to the profile playback array (what this line a && this.oo.push(x.V.TH); is doing). If it returns false, it does nothing. The playback profile array is set up like so: this.oo = [x.V.vA, x.V.wA];, x.V.vA is the SD profile and x.V.wA is the 720p profile. After reading this you think the easy solution would be to just change the User-agent to make it contain the string "CrOS" right? Not that simple. ChromeOS apparently has a different DRM implementation than chrome, even though both use Widevine. I could never get it to work when I tried, Netflix always threw license errors. The next easiest thing to do is just delete the conditional to append 1080p and just make the 1080p profile apart of the regular profiles (this.oo = [x.V.vA, x.V.wA]; -> this.oo = [x.V.vA, x.V.wA, x.V.TH];). This works perfectly, but only for the majority of Netflix content. A few videos, like Disney movies, have manifests completely restricted to Edge to the point where you can't obtain them without an Edge ESN.

oneline

  •    Java

Oneline provides simple, high performance platform for EC2,JDBC,Hadoop,S3,Solr,Flex, XSLT, J2EE,Windows Mobile SMS, Blog, Yahoo EMail, Google EMail, Others) and many more platform components.


Ambari - Monitor Hadoop Cluster

  •    Java

The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. The set of Hadoop components that are currently supported by Ambari includes HDFS, MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop.

Apache Trafodion - Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.

  •    C++

Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.

Grauenwolf's .NET Wrapper for the Netflix API

  •    

This is a .NET Wrapper for the Netflix API. Currently it supports low level requests including OAuth signing. A high level object model is planned.

recipes-rss - RSS Reader Recipes that uses several of the Netflix OSS components

  •    Java

RSS is a Netflix Recipes application demonstrating how all of the following Netflix Open Source components can be tied together.Shared classes between edge and middletier.

rend - A memcached proxy that manages data chunking and L1 / L2 caches

  •    Go

Rend is currently in production at Netflix and serving live member traffic.Caching is used several ways at Netflix. Some people use it as a true working set cache, while others use it as the only storage mechanism for their service. Others use it as a session cache. This means that some services can continue as usual with some data loss, while others will permanently lose data and start to serve fallbacks. Rend is built to complement EVCache, which is the main caching solution in use at Netflix.

Netflix-Prize - The code I used to get in the top #150 in the Netflix Prize

  •    C

I'm not aware of folks having published their code for the Netflix Prize. Here's mine. Under the team name "Hi!", I competed alone in college. I did it mostly for fun, and to learn modern machine learning techniques. It was an incredibly valuable, but strenuous, time. Well worth it on all fronts, though. I peaked out at #45 or so, and then dropped out to work on my senior thesis, and came in #145 or so. What I learned in the process was that smarter wasn't always better -- make an algorithm, and then scale it up, and then make a dozen tweaks to it, and then average all of the results together. That's how you climbed the leaderboard. As for the technical nitty-gritty, everything that's speed sensitive is written in Cython, which was the best balance of speed and convenience in 2009. If I were to do it al again, I would use (Numba)[http://github.com/numba/numba].

netflix-categories - Netflix links to all hidden categories

  •    

Netflix links to all hidden categories

react-native-netflix - React Native App from my video Course on Youtube

  •    Javascript

React Native App with the same style as Netflix for iOS. I released a serie of videos on Youtube with a walkthrough explaining every part of this application. A few components must be installed with react-native link check out the following list.

spring-hadoop - Spring for Apache Hadoop is a framework for application developers to take advantage of the features of both Hadoop and Spring

  •    Java

The Spring for Apache Hadoop project provides extensions to Spring, Spring Batch, and Spring Integration to build manageable and robust pipeline solutions around Hadoop.Spring for Apache Hadoop extends Spring Batch by providing support for reading from and writing to HDFS, running various types of Hadoop jobs (Java MapReduce, Streaming, Hive, Spark, Pig) and using HBase. An important goal is to provide excellent support for non-Java based developers to be productive using Spring Hadoop and not have to write any Java code to use the core feature set.

gis-tools-for-hadoop - The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data

  •    

The GIS Tools for Hadoop are a collection of GIS tools that leverage the Spatial Framework for Hadoop for spatial analysis of big data. The tools make use of the Geoprocessing Tools for Hadoop toolbox, to provide access to the Hadoop system from the ArcGIS Geoprocessing environment. Start out by navigating to samples and following the instructions provided with each sample.There are also tutorials for using the GP tools and aggregation methods.

parkour - Hadoop MapReduce in idiomatic Clojure.

  •    Clojure

Hadoop MapReduce in idiomatic Clojure. Parkour takes your Clojure code’s functional gymnastics and sends it free-running across the urban environment of your Hadoop cluster. Parkour is a Clojure library for writing distributed programs in the MapReduce pattern which run on the Hadoop MapReduce platform. Parkour does its best to avoid being yet another “framework” – if you know Hadoop, and you know Clojure, then you’re most of the way to knowing Parkour. By combining functional programming, direct access to Hadoop features, and interactive iteration on live data, Parkour supports rapid development of highly efficient Hadoop MapReduce applications.

hadoop-docker - Hadoop docker image

  •    Shell

A few weeks ago we released an Apache Hadoop 2.3 Docker image - this quickly become the most popular Hadoop image in the Docker registry. Following the success of our previous Hadoop Docker images, the feedback and feature requests we received, we aligned with the Hadoop release cycle, so we have released an Apache Hadoop 2.7.1 Docker image - same as the previous version, it's available as a trusted and automated build on the official Docker registry.

mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services

  •    Python

mrjob is a Python 2.7/3.3+ package that helps you write and run Hadoop Streaming jobs. It fully supports Amazon's Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. mrjob has basic support for Google Cloud Dataproc (Dataproc) which allows you to buy time on a Hadoop cluster on a minute-by-minute basis. It also works with your own Hadoop cluster.

spring-hadoop-samples - Spring Hadoop Samples

  •    Java

This repository contains several sample applications that show how you can use Spring for Apache Hadoop.Hadoop has a poor out of the box programming model. Writing applications for Hadoop generally turn into a collection of scripts calling Hadoop command line applications. Spring for Apache Hadoop provides a consistent programming model and declarative configuration model for developing Hadoop applications.

S3-Uploads - The WordPress Plugin to Store Uploads on Amazon S3

  •    PHP

S3 is a WordPress plugin to store uploads on S3. S3-Uploads aims to be a lightweight "drop-in" for storing uploads on Amazon S3 instead of the local filesystem. It's focused on providing a highly robust S3 interface with no "bells and whistles", WP-Admin UI or much otherwise. It comes with some helpful WP-CLI commands for generating IAM users, listing files on S3 and Migrating your existing library to S3.