Displaying 1 to 20 from 105 results

CloverETL - Rapid Data Integration

  •    Java

Java based data integration framework can be used to transform/map/manipulate data in various formats (CSV,FIXLEN,XML,XBASE,COBOL,LOTUS, etc.); can be used standalone or embedded(as a library). Connects to RDBMS/JMS/SOAP/LDAP/S3/HTTP/FTP/ZIP/TAR.

BIRT

  •    Java

BIRT is an Eclipse-based open source reporting system for web applications, especially those based on Java and J2EE. BIRT has two main components: a report designer based on Eclipse, and a runtime component that you can add to your app server. BIRT also offers a charting engine that lets you add charts to your own application.

kiba - Data processing & ETL framework for Ruby

  •    Ruby

If you need help, please ask your question with tag kiba-etl on StackOverflow so that other can benefit from your contribution! I monitor this specific tag and will reply to you. Writing reliable, concise, well-tested & maintainable data-processing code is tricky.

data-integration - A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

  •    Python

Data integration pipelines as code: pipelines, tasks and commands are created using declarative Python code. PostgreSQL as a data processing engine.




DataSphereStudio - DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling

  •    Java

DataSphere Studio (DSS for short) is WeDataSphere, a big data platform of WeBank, a self-developed one-stop data application development management portal. Based on Linkis computation middleware, DSS can easily integrate upper-level data application systems, making data application development simple and easy to use.

SpagoBI - Business Intelligence Suite

  •    Java

SpagoBI is the only entirely open source Business Intelligence suite. It covers all the analytical areas of Business Intelligence projects, with innovative themes and engines. SpagoBI offers a wide range of entirely open source analytical tools like Reporting, OLAP, Chart, Data mining, Real-time monitoring console, ETL.

Pentaho

  •    Java

Pentaho is the open source business intelligence leader. Thousands of organizations globally depend on Pentaho to make faster and better business decisions that positively impact their bottom lines. Download the Pentaho BI Suite today if you want to speed your BI development, deploy on-premise or in the cloud or cut BI licensing costs by up to 90%.

Transporter - Sync data between persistence engines, like ETL only not stodgy

  •    Go

Compose Transporter helps with database transformations from one store to another. It can also sync from one to another or several stores.Transporter allows the user to configure a number of data adaptors as sources or sinks. These can be databases, files or other resources. Data is read from the sources, converted into a message format, and then send down to the sink where the message is converted into a writable format for its destination. The user can also create data transformations in JavaScript which can sit between the source and sink and manipulate or filter the message flow.


Kylin - Extreme OLAP Engine for Big Data

  •    Java

Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc. It is designed to reduce query latency on Hadoop for 10+ billions of rows of data. It offers ANSI SQL on Hadoop and supports most ANSI SQL query functions.

Apache Tajo - A big data warehouse system on Hadoop

  •    Java

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

Scriptella - ETL (Extract-Transform-Load) and Script Execution Tool

  •    

Scriptella is an ETL (Extract-Transform-Load) and script execution tool. Its primary focus is simplicity. It doesn't require the user to learn another complex XML-based language to use it, but allows the use of SQL or another scripting language suitable for the data source to perform required transformations.

LSC engine - LDAP Synchronization Connector

  •    Java

Ldap Synchronization Connector reads from any data source including databases, LDAP directories or files and transforms and compares this data to an LDAP directory. These connectors can then be used to continuously synchronize a data source to a directory, for a one shot import or just to compare differences by outputting CSV or LDIF format reports.

MentDB - Mentalese Database Engine

  •    C

The platform provides tools for AI, SOA, ETL, ESB, database, web application, data quality, predictive analytics, chatbot ..., in a revolutionary data language (MQL). The server is based on a new generation of AI algorithm, and on an innovative SOA layer to reach the WWD.

Apache Hudi - Streaming Data Lake Platform

  •    Java

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). As an organization, Hudi can help you build an efficient data lake, solving some of the most complex, low-level storage management problems, while putting data into hands of your data analysts, engineers and scientists much quicker.

omniparser - omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc

  •    Go

Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON. In the example folders above you will find pairs of input files and their schema files. Then in the .snapshots sub directory, you'll find their corresponding output files.

koop - :crystal_ball: Transform, query, and download geospatial data on the web.

  •    Shell

Koop is a highly-extensible Javascript toolkit for connecting incompatible spatial APIs. Out of the box it exposes a Node.js server that can translate GeoJSON into the Geoservices specification supported by the ArcGIS family of products. Koop can be extended to translate data from any source to any API specification. Don't let API incompatiblity get in your way, start using one of Koop's data providers or write your own. Visit the demo at http://koop.dc.esri.com.

WeDataSphere - WeDataSphere is a financial level one-stop open-source suitcase for big data platforms

  •    

DataSphere Studio, Linkis, Scriptis, Qualitis, Schedulis, Exchangis. DataSphere Studio is positioned as a data application development portal, and the closed loop covers the entire process of data application development. With a unified UI, the workflow-like graphical drag-and-drop development experience meets the entire lifecycle of data application development from data import, desensitization cleaning, data analysis, data mining, quality inspection, visualization, scheduling to data output applications, etc.

Zingg - Scalable fuzzy matching for data mastering, deduplication and entity resolution

  •    Java

Zingg is a scalable fuzzy matching for data mastering, deduplication and entity resolution. Real world data contains multiple records belonging to the same customer. These records can be in single or multiple systems and they have variations across fields which makes it hard to combine them together, especially with growing data volumes. Zingg integrates different records of an entity like customer, patient, supplier, product etc in same or disparate data sources.

SQL Parallel Boost

  •    

Compared to the single-thread approach of SQL Server itself, SQL Parallel Boost facilitates the parallel execution of any data modification operations (UPDATE, INSERT, DELETE) - making best use of all available CPU resources. This results in performance gains of up to factor...

SvcPerf - E2E ETW trace analysis tool

  •    

End-to-End ETW trace viewer for manifest based traces.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.