Displaying 1 to 19 from 19 results

CloverETL - Rapid Data Integration


Java based data integration framework can be used to transform/map/manipulate data in various formats (CSV,FIXLEN,XML,XBASE,COBOL,LOTUS, etc.); can be used standalone or embedded(as a library). Connects to RDBMS/JMS/SOAP/LDAP/S3/HTTP/FTP/ZIP/TAR.

BIRT


BIRT is an Eclipse-based open source reporting system for web applications, especially those based on Java and J2EE. BIRT has two main components: a report designer based on Eclipse, and a runtime component that you can add to your app server. BIRT also offers a charting engine that lets you add charts to your own application.

Apache Tajo - A big data warehouse system on Hadoop


Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

Kylin - Extreme OLAP Engine for Big Data


Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc. It is designed to reduce query latency on Hadoop for 10+ billions of rows of data. It offers ANSI SQL on Hadoop and supports most ANSI SQL query functions.




SpagoBI - Business Intelligence Suite


SpagoBI is the only entirely open source Business Intelligence suite. It covers all the analytical areas of Business Intelligence projects, with innovative themes and engines. SpagoBI offers a wide range of entirely open source analytical tools like Reporting, OLAP, Chart, Data mining, Real-time monitoring console, ETL.

Pentaho


Pentaho is the open source business intelligence leader. Thousands of organizations globally depend on Pentaho to make faster and better business decisions that positively impact their bottom lines. Download the Pentaho BI Suite today if you want to speed your BI development, deploy on-premise or in the cloud or cut BI licensing costs by up to 90%.

SQL Parallel Boost


Compared to the single-thread approach of SQL Server itself, SQL Parallel Boost facilitates the parallel execution of any data modification operations (UPDATE, INSERT, DELETE) - making best use of all available CPU resources. This results in performance gains of up to factor...

SvcPerf - E2E ETW trace analysis tool


End-to-End ETW trace viewer for manifest based traces.



SSIS Dimension Merge SCD Component


A custom Data Flow component for SQL Server Integration Services (SSIS) that replaces the standard SCD Wizard with a superior experience, from the configuration UI to runtime performance. Performs 100x faster than the standard component, and edits are non-destructive.

GDAL SSIS


GDAL SSIS is a collection of geospatial components for SQL Server Integration Services (SSIS) that leverages GDAL to support a large number of GIS data formats.

Static Analyzer For Integration Services Packages


The parser check Best practice guidelines in Business Intelligence projects. Packages can be studied which are created with Visual Studio 2010.

XPerfUI


GUI wrapper for the XPerf performance analysis command-line tool.

Microsoft SQL Server Metadata-Driven ETL Management Studio (MDDE)


Originally an internal MSIT solution that has been released as an open source project, the Microsoft SQL Server Metadata-Driven ETL Management Studio (a.k.a. MDDE) provides a tool for rapidly generating SQL Server Integration Services (SSIS) packages from a shared central metadat

FluentETL - Data automation made easy for coders


Automate data transfers with a few lines of code. Replace SSIS with concise yet powerful .NET code. Much simpler and easier to learn than Rhino ETL, yet flexible enough to use just about any data source.

python_mozetl - ETL jobs for Firefox Telemetry


This repository is a collection of ETL jobs for Firefox Telemetry.Jobs committed to python_mozet can be scheduled via airflow or ATMO. We provide a testing suite and code review, which makes your job more maintainable. Centralizing our jobs in one repository allows for code reuse and easier collaboration.

ETW2JSON - Tool and library to convert ETW logs to JSON files


ETW2JSON is a tool that converts ETW Log Files (.ETL) to JSON using the Newtonsoft.Json library. It can be used as a stand-alone command line tool that will take as input locations of ETL files and an output path, or it can take your implementation of Newtonsoft.Json's JsonWriter class.Converting ETW Log Files (.ETL) to JSON makes accessible to you a plethora of data that was previously restricted to expert ETW tools or libraries. The goal of this tool is to make ETW data more accessible to a larger developer and operations audience by converting to a human-readable format that is ubiquitous.

camus-compressor - Camus Compressor merges files created by Camus and saves them in a compressed format


Camus Compressor merges files created by Camus and saves them in a compressed format.Camus is massively used at Allegro for dumping more than 200 Kafka topics onto HDFS. The script runs every 15 minutes and creates one file per Kafka partition which results in about 76800 small files per day. Most of the files do not exceed Hadoop block size. This is a clear Hadoop antipattern which leads to performance issues, for example extensive number of mappers in SQL queries’ executions.

storagetapper - StorageTapper is a scalable realtime MySQL change data streaming and transformation service


StorageTapper is a scalable realtime MySQL change data streaming and transformation service.Service reads data from MySQL, transforms it into an Avro schema serialized format, and publishes these events to Kafka. Consumers can then consume these events directly from Kafka.