Displaying 1 to 6 from 6 results

pig-redis - Redis bulk-loader for Apache Pig

  •    Java

A UDF StoreFunc for Apache Pig designed to bulk-load data into Redis. Inspired by wonderdog, the Infochimps bulk-loader for elasticsearch. Dependencies are automatically retrieved using Ivy.

splittablegzip - Splittable Gzip codec for Hadoop

  •    Java

In many Hadoop production environments you get gzipped files as the raw input. Usually these are Apache HTTPD logfiles. When putting these gzipped files into Hadoop you are stuck with exactly 1 map task per input file. In many scenarios this is fine. However when doing a lot of work in this very first map task it may very well be advantageous to dividing the work over multiple tasks, even if there is a penalty for this scaling out.

devops-python-tools - DevOps CLI Tools for Hadoop, Spark, HBase, Log Anonymizer, Ambari Blueprints, AWS CloudFormation, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Elasticsearch, Solr, Travis CI, Pig, IPython - Python / Jython Tools

  •    Python

A few of the Big Data, NoSQL & Linux tools I've written over the years. All programs have --help to list the available options. For many more tools see the DevOps Perl Tools and Advanced Nagios Plugins Collection repos which contains many Hadoop, NoSQL, Web and infrastructure tools and Nagios plugins.