VXQuery - Query XML Data

  •        176

Apache VXQuer will be a standards compliant XML Query processor implemented in Java. The focus is on the evaluation of queries on large amounts of XML data. Specifically the goal is to evaluate queries on large collections of relatively small XML documents. To achieve this queries will be evaluated on a cluster of shared nothing machines. However there are no XQuery processors available today that are capable of processing these datasets in parallel and making the contained information accessible.

http://vxquery.apache.org
https://github.com/apache/vxquery

Tags
Implementation
License
Platform

   




Related Projects

Bagri - XML/Document DB on top of distributed cache

  •    Java

Bagri is a Document Database built on top of distributed cache solution like Hazelcast or Coherence. The system allows to process semi-structured schema-less documents and perform distributed queries on them in real-time. It scales horizontally very well with use of data sharding, when all documents are distributed evenly between distributed cache partitions.

Jailer - Database Subsetting and Browsing Tool

  •    Java

Database Subsetting and Browsing Tool. Exports consistent, referentially intact row-sets from relational databases (JDBC). Removes data w/o violating integrity. Generates DbUnit datasets, hierarchically structured XML and topologically sorted SQL-DML.

Apache Torque - Torque is an object-relational mapper for Java.

  •    Java

Torque is an object-relational mapper for Java.

Apache Tajo - A big data warehouse system on Hadoop

  •    Java

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

Apache Xerces for Perl XML Parser - Perl API to the Apache Xerces XML parser.

  •    Perl

Perl API to the Apache Xerces XML parser.


Xml Data Mapper - a simple xml based ORM, database to object, POCO / DTO mapper

  •    ASPNET

The Cached Xml Data Mapper is a a simple xml based ORM. It converts DataReader, DataTable to custom DTOs / POCOs / objects and vice versa. Unlike ORMs which are difficult to choose from, complex to understand and that leaves a heavy memory footprint, this one is simple and...

ActiveRecord - ActiveRecord pattern for .NET

  •    CSharp

The Castle ActiveRecord project is an implementation of the ActiveRecord pattern for .NET. The ActiveRecord pattern consists on instance properties representing a record in the database, instance methods acting on that specific record and static methods acting on all records. Castle ActiveRecord is built on top of NHibernate, but its attribute-based mapping free the developer of writing XML for database-to-object mapping, which is needed when using NHibernate directly.

gpdb - Greenplum Database

  •    C

The Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes. The Greenplum project is released under the Apache 2 license. We want to thank all our current community contributors and are really interested in all new potential contributions. For the Greenplum Database community no contribution is too small, we encourage all types of contributions.

Generic Data Storage Component for .Net

  •    

This component allows you to save data objects by its XML-Serialization easily in generic data storages (e.g. XML-File, SQL Server database ...)

incubator-doris - Paloļ¼Œan MPP data warehouse

  •    C++

Palo is an MPP-based interactive SQL data warehousing for reporting and analysis. Palo mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo not only provides batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Palo. In Baidu, the largest Chinese search engine, we run a two-tiered data warehousing system for data processing, reporting and analysis. Similar to lambda architecture, the whole data warehouse comprises data processing and data serving. Data processing does the heavy lifting of big data: cleaning data, merging and transforming it, analyzing it and preparing it for use by end user queries; data serving is designed to serve queries against that data for different use cases. Currently data processing includes batch data processing and stream data processing technology, like Hadoop, Spark and Storm; Palo is a SQL data warehouse for serving online and interactive data reporting and analysis querying.

Fluent NHibernate - Fluent mapping for model

  •    CSharp

Fluent, XML-less, compile safe, automated, convention-based mappings for NHibernate. Fluent NHibernate offers an alternative to NHibernate's standard XML mapping files. Rather than writing XML documents, you write mappings in strongly typed C# code. This allows for easy refactoring, improved readability and more concise code.

Web-Karma - Information Integration Tool

  •    Java

The Karma tutorial at https://github.com/szeke/karma-tcdl-tutorial, also check out our DIG web site, where we use Karma extensively to process > 90M web pages. Karma is an information integration tool that enables users to quickly and easily integrate data from a variety of data sources including databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs. Users integrate information by modeling it according to an ontology of their choice using a graphical user interface that automates much of the process. Karma learns to recognize the mapping of data to ontology classes and then uses the ontology to propose a model that ties together these classes. Users then interact with the system to adjust the automatically generated model. During this process, users can transform the data as needed to normalize data expressed in different formats and to restructure it. Once the model is complete, users can published the integrated data as RDF or store it in a database.

ServingXML

  •    Java

ServingXML is an open source, Apache 2.0 licensed, framework for flat/XML data transformations. It defines an extensible markup vocabulary for expressing flat-XML, XML-flat, flat-flat, and XML-XML processing in pipelines.

Flat Database

  •    

Simple project making it really quick and easy, to implement a simple flat-file database for an application.

Transaction Processing over XML (TPoX)

  •    Java

TPoX is an XML database benchmark based on a financial application scenario. It is used to evaluate the performance of XML database systems, focusing on XQuery, SQL/XML, XML storage, XML indexing, XML Schema support, XML updates, and other aspects.

database to xml

  •    

db2xml is a java2-swing application that permit to import from any type of database (local and online) one or all tables contains, to XML data file in local (1 XML file per tables). All you need, is a native driver (ODBC driver already included in JRE)

Apache Axiom - XML object model supporting deferred parsing

  •    Java

Apache Axiom is an XML object model supporting deferred parsing.

Xml to database converter

  •    

XML2DB is a simple tool which could used to convert XML Schema to RDBMS table structure and import corresponding xml file into database

gawk libraries for XML, PostgreSQL,...

  •    Awk

Dynamically loaded extension libraries for GNU AWK

Lutece - Portal Management in Java

  •    Java

Lutece is a portal engine which allows you to easily create your websites or intranets based upon HTML, XML content. Lutece provides a user friendly interface for portal management and therefore no specific technical skills are required. The data presented in each page of the portal coexists inside blocks called portlets. The separation of content and view is possible by opting for XML as the exchange format. The XML files consist of structured data, extracted from the database.