Displaying 1 to 19 from 19 results

incubator-doris - Paloļ¼Œan MPP data warehouse

  •    C++

Palo is an MPP-based interactive SQL data warehousing for reporting and analysis. Palo mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo not only provides batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Palo. In Baidu, the largest Chinese search engine, we run a two-tiered data warehousing system for data processing, reporting and analysis. Similar to lambda architecture, the whole data warehouse comprises data processing and data serving. Data processing does the heavy lifting of big data: cleaning data, merging and transforming it, analyzing it and preparing it for use by end user queries; data serving is designed to serve queries against that data for different use cases. Currently data processing includes batch data processing and stream data processing technology, like Hadoop, Spark and Storm; Palo is a SQL data warehouse for serving online and interactive data reporting and analysis querying.

Apache Tajo - A big data warehouse system on Hadoop

  •    Java

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

Cascalog - Data processing on Hadoop

  •    Clojure

Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools.

AsterixDB - Big Data Management System (BDMS)

  •    Java

AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. It is a highly scalable data management system that can store, index, and manage semi-structured data, but it also supports a full-power query language with the expressiveness of SQL (and more).

VXQuery - Query XML Data

  •    Java

Apache VXQuer will be a standards compliant XML Query processor implemented in Java. The focus is on the evaluation of queries on large amounts of XML data. Specifically the goal is to evaluate queries on large collections of relatively small XML documents. To achieve this queries will be evaluated on a cluster of shared nothing machines.

SQL Parallel Boost


Compared to the single-thread approach of SQL Server itself, SQL Parallel Boost facilitates the parallel execution of any data modification operations (UPDATE, INSERT, DELETE) - making best use of all available CPU resources. This results in performance gains of up to factor...

MDX Parser,Builder,DOM and OLAP visual controls with Writeback for Silverlight

  •    CSharp

It is component library for OLAP, .NET & Silverlight (C#). * MDX DOM, Parser, Generator, Query Designer * Description of supported MDX Syntax * Dynamic Pivot Grid - Pivot Table with Writeback * OLAP metadata choice controls See also: http://code.google.com/p/ranet-uilibrary-olap/

IIS Log reader


IIS Log reader library is reading the IIS log data into intuitive domain model. Useful in ETL/SSIS applications. Intuitive, simple API.



A SQL Server 2005 Dynamic Management View Performance Data Warehouse

Learning Management Infrastructure (LMI)


The Learning Management Infrastructure is a set of open source tools that enables school districts to better deal with data storage, data integration, reporting, managing student achievement, communicating with stakeholders, and coordinating curriculum.

xmla4js - Javascript interface for XML for Analysis

  •    Javascript

Xmla4js is a standalone javascript library that provides basic XML for Analysis (XML/A) capabilities, allowing javascript developers to access data and metadata from OLAP provides for use in rich (web) applications. XML/A is an industry standard protocol to communicate with OLAP servers over HTTP. It defines a SOAP webservice that allows clients to obtain metadata and to execute MDX (multi-dimensional expressions) queries. XML is used as the data exchange format.

transmart-core - Core components and documentation of the tranSMART platform

  •    Groovy

This is the repository containing the core components and documentation of the tranSMART platform, an open source data sharing and analytics platform for translational biomedical research. tranSMART is maintained by the tranSMART Foundation. Official releases can be found on the tranSMART Foundation website, and the tranSMART Foundation's development repositories can be found at https://github.com/transmart/. All the instructions on how to install, build and run a private instance of tranSMART, get set up for developing or upgrade to the latest version of tranSMART from an older version are available in the documentation. For details on contributing code changes via pull requests, see the Contributing document.

ixmp - The ix modeling platform for integrated and cross-cutting scenario analysis

  •    Python

The ix modeling platform (ixmp) is a data warehouse for high-powered scenario analysis, with interfaces to Python and R for efficient scientific workflows and effective data pre- and post-processing, and a structured database backend for version-controlled data management. This repository contains the core and application programming interfaces (API) for the ix modeling platform (ixmp), as well as a number of tutorials and examples for a generic model instance based on Dantzig's transport problem.

OLAP-cube - is an hypercube of data

  •    Javascript

An OLAP cube is a multidimensional array of data you can explore and analyze. Here you will find an engine that could feed a graphic viewer. Attribute structure holds necessary information to clone a table excluding its data.

Transformalize - Configurable Extract, Transform, and Load

  •    CSharp

Transformalize automates moving data into data warehouses, search engines, and other value-adding systems. This section introduces <connections/>, <entities/>, and the tfl.exe command line interface.

domainmod - DomainMOD is an open source application written in PHP & MySQL used to manage your domains and other internet assets in a central location

  •    PHP

DomainMOD is an open source application written in PHP & MySQL used to manage your domains and other internet assets in a central location. DomainMOD also includes a Data Warehouse framework that allows you to import your web server data so that you can view, export, and report on your live data. Currently the Data Warehouse only supports web servers running WHM/cPanel. Not sure if DomainMOD is what you're looking for? Don't want to take the time to install it only to find out that it's not? We hate when that happens ourselves, which is why we've setup a live demo so you don't waste your time.


  •    CSharp

AppDynamics provides a rich source of information about your monitored applications, including the performance of individual business activities, dependency flow between application components, and details on every business transaction in an instrumented environment. AppDynamics APM provides a rich toolkit for turning the vast corpus of data captured by AppDynamics into valuable insights.

bigquery-utils - Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery

  •    TSQL

BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in. This repository provides useful utilities to assist you in migration and usage of BigQuery. All UDFs within this repository will be automatically created under the bqutil project under publicly shared datasets. Queries can then reference the shared UDFs via bqutil.<dataset>.<function>().