TableQA - AI Tool for querying natural language on tabular data.

  •        111

AI Tool for querying natural language on tabular data.Built using QA models from transformers.


  • Supports detection from multiple csvs (csvs can also be read from Amazon s3)
  • Supports FuzzyString implementation. i.e, incomplete column values in query can be automatically detected and filled in the query.
  • Supports Databases - SQLite, Postgresql, MySQL, Amazon RDS (Postgresql, MySQL).
  • Open-Domain, No training required.
  • Add manual schema for customized experience
  • Auto-generate schemas in case schema not provided
  • Data visualisations.



Related Projects

decaNLP - The Natural Language Decathlon: A Multitask Challenge for NLP

  •    Python

The Natural Language Decathlon is a multitask challenge that spans ten tasks: question answering (SQuAD), machine translation (IWSLT), summarization (CNN/DM), natural language inference (MNLI), sentiment analysis (SST), semantic role labeling(QA‑SRL), zero-shot relation extraction (QA‑ZRE), goal-oriented dialogue (WOZ, semantic parsing (WikiSQL), and commonsense reasoning (MWSC). Each task is cast as question answering, which makes it possible to use our new Multitask Question Answering Network (MQAN). This model jointly learns all tasks in decaNLP without any task-specific modules or parameters in the multitask setting. For a more thorough introduction to decaNLP and the tasks, see the main website, our blog post, or the paper. While the research direction associated with this repository focused on multitask learning, the framework itself is designed in a way that should make single-task training, transfer learning, and zero-shot evaluation simple. Similarly, the paper focused on multitask learning as a form of question answering, but this framework can be easily adapted for different approached to single-task or multitask learning.

OmniSciDB - Open Source Analytical Database & SQL Engine

  •    C++

OmniSciDB is the foundation of the OmniSci platform. OmniSciDB is SQL-based, relational, columnar and specifically developed to harness the massive parallelism of modern CPU and GPU hardware. OmniSciDB can query up to billions of rows in milliseconds, and is capable of unprecedented ingestion speeds, making it the ideal SQL engine for the era of big, high-velocity data.

question_generation - Neural question generation using transformers

  •    Jupyter

Question generation is the task of automatically generating questions from a text paragraph. The most straight-forward way for this is answer aware question generation. In answer aware question generation the model is presented with the answer and the passage and asked to generate a question for that answer by considering the passage context. While there are many papers available for QG task, it's still not as mainstream as QA. One of the reasons is most of the earlier papers use complicated models/processing pipelines and have no pre-trained models available. Few recent papers, specifically UniLM and ProphetNet have SOTA pre-trained weights availble for QG but the usage seems quite complicated. This project is aimed as an open source study on question generation with pre-trained transformers (specifically seq-2-seq models) using straight-forward end-to-end methods without much complicated pipelines. The goal is to provide simplified data processing and training scripts and easy to use pipelines for inference.

Haystack - Build a natural language interface for your data

  •    Python

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want to perform Question Answering or semantic document search, you can use the State-of-the-Art NLP models in Haystack to provide unique search experiences and allow your users to query in natural language. Haystack is built in a modular fashion so that you can combine the best technology from other open-source projects like Huggingface's Transformers, Elasticsearch, or Milvus.

Apache Hive - The Apache Hive (TM) data warehouse software facilitates querying and managing large d

  •    Java

The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage.

Ephyra - Question Answering System

  •    Java

Ephyra is a modular and extensible framework for open domain question answering (QA). The system retrieves accurate answers to natural language questions from the Web and other sources. The goal is to give researchers the opportunity to develop new QA techniques without worrying about the end-to-end system.

tapas - End-to-end neural table-text understanding models.

  •    Python

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training. The easiest way to try out TAPAS with free GPU/TPU is in our Colab, which shows how to do predictions on SQA.

spago - Self-contained Machine Learning and Natural Language Processing library in Go

  •    Go

A Machine Learning library written in pure Go designed to support relevant neural architectures in Natural Language Processing. spaGO is self-contained, in that it uses its own lightweight computational graph framework for both training and inference, easy to understand from start to finish.

Sequence-Semantic-Embedding - Tools and recipes to train deep learning models and build services for NLP tasks such as text classification, semantic search ranking and recall fetching, cross-lingual information retrieval, and question answering etc

  •    Python

SSE(Sequence Semantic Embedding) is an encoder framework toolkit for natural language processing related tasks. It's implemented in TensorFlow by leveraging TF's convenient deep learning blocks like DNN/CNN/LSTM etc. Depending on each specific task, similar semantic meanings can have different definitions. For example, in the category classification task, similar semantic meanings means that for each correct pair of (listing-title, category), the SSE of listing-title is close to the SSE of corresponding category. While in the information retrieval task, similar semantic meaning means for each relevant pair of (query, document), the SSE of query is close to the SSE of relevant document. While in the question answering task, the SSE of question is close to the SSE of correct answers.


  •    PHP

phpMyAdmin is a free software tool written in PHP intended to handle the administration of MySQL over the World Wide Web. phpMyAdmin supports a wide range of operations with MySQL. The most frequently used operations are supported by the user interface (managing databases, tables, fields, relations, indexes, users, permissions, etc), while you still have the ability to directly execute any SQL statement.

transformers - 🤗Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX

  •    Python

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone. 🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

Sequel Pro - MySQL Database management for Mac

  •    PHP

Sequel Pro is a fast, easy-to-use Mac database management application for working with MySQL databases. It gives you direct access to your MySQL databases on local and remote server. It supports Full table management including indexes and MySQL Views.

Coolstorage - ORM library for .NET

  •    CSharp

The main strength of Vici CoolStorage is the ease of use. Most ORM tools still require a lot of unneeded code to accomplish basic data persistence tasks, but Vici CoolStorage is designed to relieve the programmer from these tedious and error-prone tasks, making it very intuitive to use.

Apache ShardingSphere - Distributed Database Ecosphere

  •    Java

Apache ShardingSphere is an open-source ecosystem consisted of a set of distributed database solutions, including 3 independent products, JDBC, Proxy & Sidecar (Planning). They all provide functions of data scale-out, distributed transaction and distributed governance, applicable in a variety of situations such as Java isomorphism, heterogeneous language and cloud-native.

csv-schema - Analyzes a CSV file and generates database table schema, all within the browser

  •    Javascript

This application parses CSV files (including huge ones) within the browser. It analyzes each field to suggest the best database field type, max length, and whether or not there are any null values. From there, you can rename fields, ignore them, override field types/lengths, etc. and generate database table creation sql for MySQL, MariaDB, Postres, Oracle, or SQLite3. This application uses PapaParse for CSV parsing and Knex.js for SQL query building.

NHibernate - object-relational mapper for .NET

  •    CSharp

NHibernate is a mature, open source object-relational mapper for the .NET framework. NHibernate is a port of Hibernate Core for Java to the .NET Framework. It handles persisting plain .NET objects to and from an underlying relational database.

join-monster - A GraphQL to SQL query execution layer for query planning and batch data fetching.

  •    Javascript

Efficient query planning and data fetching for SQL. Use JOINs and/or batched requests to retrieve all your data. It takes a GraphQL query and your schema and automatically generates the SQL. Send that SQL to your database and get back all the data needed to resolve with only one or a few round-trips to the database. ...and get back correctly hydrated data.

CockroachDB - Cloud-native SQL database.

  •    Go

CockroachDB is a cloud-native SQL database for building global, scalable cloud services that survive disasters.CockroachDB is a distributed SQL database built on a transactional and strongly-consistent key-value store. It scales horizontally; survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; supports strongly-consistent ACID transactions; and provides a familiar SQL API for structuring, manipulating, and querying data.

ClickHouse - Column-oriented database management system that allows generating analytical data reports in real time

  •    C++

ClickHouse is a fast open-source OLAP database management system. It is column-oriented and allows to generate analytical reports using SQL queries in real-time. Large queries are parallelized using multiple cores, taking all the necessary resources available on the current server. It supports distributed processing, data can reside on different shards. Each shard can be a group of replicas used for fault tolerance. All shards are used to run a query in parallel, transparently for the user.

spaCy - 💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython

  •    Python

spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license. 💫 Version 2.0 out now! Check out the new features here.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.