CloverETL - Rapid Data Integration

  •        810

Java based data integration framework can be used to transform/map/manipulate data in various formats (CSV,FIXLEN,XML,XBASE,COBOL,LOTUS, etc.); can be used standalone or embedded(as a library). Connects to RDBMS/JMS/SOAP/LDAP/S3/HTTP/FTP/ZIP/TAR.



Related Projects

Apache Beam - Unified model for defining both batch and streaming data-parallel processing pipelines

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

gramene-etl - Tools for extraction, transform, and load (ETL) of Gramene data

Tools for extraction, transform, and load (ETL) of Gramene data

sparrow-etl - A lightweight java etl for Data Extraction/Tranformation/Loading

A lightweight java etl for Data Extraction/Tranformation/Loading

ruby-data-fu - Ideas for a Ruby Data-Fu presentation, cmdnline data processing, ETL, etc.

Ideas for a Ruby Data-Fu presentation, cmdnline data processing, ETL, etc.

Apache Storm - Distributed and fault-tolerant realtime computation

Storm is a distributed real time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real time processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

SiteWhere - The Open Platform for Internet of Things (IoT)

SiteWhere is an open source platform for capturing, storing, integrating, and analyzing data from IoT devices. SiteWhere is a multi-tenant, application enablement platform for the Internet of Things (IoT) providing device management, complex event processing (CEP) and integration through a modern, scalable architecture. SiteWhere provides REST APIs for all system functionality.

SSISTProbe - Data Integration & Extraction Testing

In today's corporate world, data is spread across multiple data sources (ex: DB2, Oracle, Sybase etc) and business users wish to generate types of business reports from a single source, no matter how the data is distributed. This scenario calls for data integration from multip...

logstash - Logstash - transport and process your logs, events, or other data

Logstash is part of the Elastic Stack along with Beats, Elasticsearch and Kibana. Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash." (Ours is Elasticsearch, naturally.). Logstash has over 200 plugins, and you can write your own very easily as well.The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.

RapidMiner -- Data Mining, ETL, OLAP, BI

No 1 in Business Analytics: Data Mining, Predictive Analytics, ETL, Reporting, Dashboards in One Tool. 1000+ methods: data mining, business intelligence, ETL, data mining, data analysis + Weka + R, forecasting, visualization, business intelligence

mwsoft64 - Wilson lab software for data extraction and processing

Wilson lab software for data extraction and processing

Apache Tajo - A big data warehouse system on Hadoop

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

Palo ETL Server

Palo ETL Server is a Java based Tool for Extraction, Transformation and Loading of mass data into the Palo OLAP Server. Palo ETL Server is one part of the Palo Suite.

heroku-buildpack-elt - Heroku Buildpack for ELT jobs

Most data warehousing is done using an ETL pattern. ETL stands for Extract, Transform, and Load and usually refers to extracting data, modifying it "in flight" - so either on a file system or in memory, and loading that data into a different data store. This pattern is mostly useful for converting incompatible data formats, such as CSV and XML into something more structured such as SQL.

data-connectors-api-examples - A set of code snippets for calling the Data Connections API

* This API uses a WSSE authentication header on every call, each example class has a method called getWSSEHeader for this purpose * The code examples can not be run without Partner API credentials (a Username and Secret). These must be obtained through an Adobe Partner Integration Manager after appropriate agreements are in place.* Each example passes JSON encoded data as a String and received JSON encoded data as a String * parsing the JSON data is left as an exercise for the developer so you

Gate - General Architecture for Text Engineering

GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.

restful-etl - A promising asynchronous, non-blocking data integration library

A promising asynchronous, non-blocking data integration library


Brute force your OpenERP data integration with OOOR inside the Kettle ETL (aka Pentaho Data Integration - PDI)

Conjecture - Scalable Machine Learning in Scalding

Conjecture is a framework for building machine learning models in Hadoop using the Scalding DSL. The goal of this project is to enable the development of statistical models as viable components in a wide range of product settings. Applications include classification and categorization, recommender systems, ranking, filtering, and regression (predicting real-valued numbers). Conjecture has been designed with a primary emphasis on flexibility and can handle a wide variety of inputs. Integration with Hadoop and scalding enable seamless handling of extremely large data volumes, and integration with established ETL processes. Predicted labels can either be consumed directly by the web stack using the dataset loader, or models can be deployed and consumed by live web code. Currently, binary classification (assigning one of two possible labels to input data points) is the most mature component of the Conjecture package.There are a few stages involved in training a machine learning model using Conjecture.

Open-XML-SDK - Open XML SDK by Microsoft Open Technologies, Inc.

The Open XML SDK provides open-source libraries for working with Open XML Documents (DOCX, XLSX, and PPTX). It supports scenarios such as: - High-performance generation of word-processing documents, spreadsheets, and presentations - Document modification, such as removing tracked revisions or removing unacceptable content from documents - Data and content querying and extraction, such as transformation from DOCX to HTML, or extraction of data from spreadsheets