ISO-3166-Countries-with-Regional-Codes - ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets

  •    Ruby

These lists are the result of merging data from two sources, the Wikipedia ISO 3166-1 article for alpha and numeric country codes, and the UN Statistics site for countries' regional, and sub-regional codes. In addition to countries, it includes dependent territories. The International Organization for Standardization (ISO) site provides partial data (capitalised and sometimes stripped of non-latin ornamentation), but sells the complete data set as a Microsoft Access 2003 database. Other sites give you the numeric and character codes, but there appeared to be no sites that included the associated UN-maintained regional codes in their data sets. I scraped data from the above two websites that is all publicly available already to produce some ready-to-use complete data sets that will hopefully save someone some time who had similar needs.

awesome-json-datasets - A curated list of awesome JSON datasets that don't require authentication.

  •    Javascript

A curated list of awesome JSON datasets that don't require authentication. Pro Tip: Check out Blockchain Data API for more options.

browser-compat-data - This repository contains compatibility data for Web technologies as displayed on MDN

  •    Javascript

This repository contains compatibility data for Web technologies. Browser compatibility data describes which platforms (where "platforms" are usually, but not always, web browsers) support particular Web APIs. This data can be used in documentation, to build compatibility tables listing browser support for APIs. For example: Browser support for WebExtension APIs.

quickdraw-dataset - Documentation on how to access and use the Quick, Draw! Dataset.


The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game Quick, Draw!. The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located. You can browse the recognized drawings on quickdraw.withgoogle.com/data. We're sharing them here for developers, researchers, and artists to explore, study, and learn from. If you create something with this dataset, please let us know by e-mail or at A.I. Experiments.


  •    Python

This is the official documentation for Quandl's Python Package. The package can be used to interact with the latest version of the Quandl RESTful API. This package is compatible with python v2.7.x and v3.x+. quandl.ApiConfig.api_version is optional however it is strongly recommended to avoid issues with rate-limiting. For premium databases, datasets and datatables quandl.ApiConfig.api_key will need to be set to identify you to our API. Please see API Documentation for more detail.

fashion-mnist - A MNIST-like fashion product database. Benchmark :point_right:

  •    Python

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

faker - Faker is a Python package that generates fake data for you.

  •    Python

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you. Faker is heavily inspired by PHP Faker, Perl Faker, and by Ruby Faker.

dataset - JavaScript library that makes managing the data behind client-side visualisations easy

  •    Javascript

Dataset is a JavaScript library that makes managing the data behind client-side visualisations easy, including realtime data. It takes care of the loading, parsing, sorting, filtering and querying of datasets as well as the creation of derivative datasets. The following builds do not have any of the dependencies built in. It is your own responsibility to include them as appropriate script elements in your page.

Mobius - C# and F# language binding and extensions to Apache Spark

  •    CSharp

Mobius provides C# language binding to Apache Spark enabling the implementation of Spark driver program and data processing operations in the languages supported in the .NET framework like C# or F#.For more code samples, refer to Mobius\examples directory or Mobius\csharp\Samples directory.

pandas-datareader - Extract data from a wide range of Internet sources into a pandas DataFrame.

  •    HTML

Up to date remote data access for pandas, works for multiple versions of pandas. As of v0.6.0 Yahoo!, Google Options, Google Quotes and EDGAR have been immediately deprecated due to large changes in their API and no stable replacement.

keen-js - Keen.io JavaScript SDKs

  •    Javascript

If you haven’t done so already, login to Keen to create a project. The Project ID and API Keys are available on the Access page of the Project Console. You will need these for the next steps. What is an event? An event is a record of something important happening in the life of your app or service: like a click, a purchase, or a device activation.

caffenet-benchmark - Evaluation of the CNN design choices performance on ImageNet-2012.

  •    Jupyter

Welcome to evaluation of CNN design choises performance on ImageNet-2012. Here you can find prototxt's of tested nets and full train logs. **upd2.: Some of the pretrained models are in Releases section. They are licensed for unrestricted use.

fma - FMA: A Dataset For Music Analysis

  •    Jupyter

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2. The dataset is a dump of the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads. Below the abstract from the paper.

PyTorch-NLP - Supporting Rapid Prototyping with a Toolkit (incl. Datasets and Neural Network Layers)

  •    Python

PyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research. Join our community, add datasets and neural network layers! Chat with us on Gitter and join the Google Group, we're eager to collaborate with you.

Xml To Csv Conversion Tool


This project contains an API that you can use to convert data stored in XML to comma seperated values (csv). There is also a Windows Form client application included. It is programmed in C#4.0.


  •    DotNet

DataFromFile is a small class (written in C#) which makes it easy to deal with data files such as Excel (xls or xlsx) or CSV.