Displaying 1 to 14 from 14 results

logstash - Logstash - transport and process your logs, events, or other data

  •    Ruby

Logstash is part of the Elastic Stack along with Beats, Elasticsearch and Kibana. Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash." (Ours is Elasticsearch, naturally.). Logstash has over 200 plugins, and you can write your own very easily as well.The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.

noflo - Flow-based programming for JavaScript

  •    CoffeeScript

In computer science, flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented. This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

ETL Framework

  •    

This ETL Framework supports dynamic configurations and centralized logging for SSIS solutions in support of minimizing ETL TCO. It consists of an ETL Framework database, SSRS reports and template SSIS packages.

Bender - Serverless ETL Framework

  •    Java

This project provides an extendable Java framework for creating serverless ETL functions on AWS Lambda. Bender handles the complex plumbing and provides the interfaces necessary to build modules for all aspects of the ETL process.




eol-globi-data - Global Biotic Interactions provides access to existing species interaction datasets

  •    Java

Please see http://globalbioticinteractions.org or http://github.com/jhpoelen/eol-globi-data/wiki for more information . Unless otherwise noted, source code is released under GLPv3 and data is available under Creative Commons Attribution 4.0 International License. We are trying to do the best we can to ensure that the references to the original data sources are preserved and attributed. If you feel that there's better ways to do this, please let us know.

csvplus - csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins

  •    Go

Package csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream processing operations, indices and joins. The library is primarily designed for ETL-like processes. It is mostly useful in places where the more advanced searching/joining capabilities of a fully-featured SQL database are not required, but the same time the data transformations needed still include SQL-like operations.

waterdrop - An easy-to-use, scalable, bigdata processing tool

  •    Scala

An easy-to-use, scalable, bigdata processing tool


Transformalize - Configurable Extract, Transform, and Load

  •    CSharp

Transformalize automates moving data into data warehouses, search engines, and other value-adding systems. This section introduces <connections/>, <entities/>, and the tfl.exe command line interface.

metorikku - A simplified, lightweight ELT Framework based on Apache Spark

  •    Scala

Metorikku is a library that simplifies writing and executing ETLs on top of Apache Spark. A user needs to write a simple YAML configuration file that includes SQL queries and run Metorikku on a spark cluster. The platform also includes a way to write tests for metrics using MetorikkuTester. To run Metorikku you must first define 2 files.

Hydrograph - A visual ETL development and debugging tool for big data

  •    Java

Hydrograph is a powerful ETL tool that allows developers to create complex graphs using a simple drag-and-drop interface. Users build ETL graphs by using the Hydrograph UI to link together input, transformation, and output components. Users can customize a variety of pre-built components or contribute back to Hydrograph by developing additional inputs, outputs, and transformations. To execute ETL jobs Hydrograph leverages Apache Spark as the backend engine. This allows Hydrograph to handle a variety of workload sizes and provides a flexible deployment model. We welcome your interest in Capital One's Open Source Projects (the "Project"). Any Contributor to the project must accept and sign a CLA indicating agreement to the license terms. Except for the license granted in this CLA to Capital One and to recipients of software distributed by Capital One, you reserve all right, title, and interest in and to your contributions; this CLA does not impact your rights to use your own contributions for any other purpose.

dig-etl-engine - Download DIG to run on your laptop or server.

  •    

myDIG is a tool to build pipelines that crawl the web, extract information, build a knowledge graph (KG) from the extractions and provide an easy to user interface to query the KG. The project web page is DIG. You can install myDIG in a laptop or server and use it to build a domain specific search application for any corpus of web pages, CSV, JSON and a variety of other files.

ChoETL - ETL Framework for

  •    CSharp

Simple, intutive Extract, transform and load (ETL) library for .NET. Extremely fast, flexible, and easy to use. Cinchoo ETL is a code-based ETL framework for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.

hotsub - [Project renamed from awsub!] Command line tool to run a batch jobs with ETL framework on AWS or other cloud computing resources

  •    Go

Check releases here https://github.com/otiai10/awsub/releases and choose the binary for your OS. and you can see what you need.