Displaying 1 to 20 from 33 results

awesome-json-datasets - A curated list of awesome JSON datasets that don't require authentication.

  •    Javascript

A curated list of awesome JSON datasets that don't require authentication. Pro Tip: Check out Blockchain Data API for more options.

pix2code - pix2code: Generating Code from a Graphical User Interface Screenshot

  •    Python

Transforming a graphical user interface screenshot created by a designer into computer code is a typical task conducted by a developer in order to build customized software, websites, and mobile applications. In this paper, we show that deep learning methods can be leveraged to train a model end-to-end to automatically generate code from a single input image with over 77% of accuracy for three different platforms (i.e. iOS, Android and web-based technologies). The following software is shared for educational purposes only. The author and its affiliated institution are not responsible in any manner whatsoever for any damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of the use or inability to use this software.

PyDataset - Instant access to many datasets in Python.

  •    Python

Provides instant access to many datasets right from Python (in pandas DataFrame structure). That's it. See more examples.

awesome-public-datasets - A topic-centric list of high-quality open datasets in public domains


NOTICE: This repo is automatically generated by apd-core. Please DO NOT modify this file directly. We have provided a new way to contribute to Awesome Public Datasets. The original PR entrance directly on repo is closed forever. This list of a topic-centric public data sources in high quality. They are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in sindresorhus's awesome list.

DataFrames.jl - Library for working with tabular data in Julia

  •    Julia

Tools for working with tabular data in Julia. Maintenance: DataFrames is maintained collectively by the JuliaData collaborators. Responsiveness to pull requests and issues can vary, depending on the availability of key collaborators.

OpenML - Open Machine Learning

  •    CSS

We are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.

rdataretriever - R interface to the Data Retriever

  •    R

R interface to the Data Retriever.The Data Retriever automates the tasks of finding, downloading, and cleaning up publicly available data, and then stores them in a local database or csv files. This lets data analysts spend less time cleaning up and managing data, and more time analyzing it.

datasette - An instant JSON API for your SQLite databases

  •    Python

Datasette provides an instant, read-only JSON API for any SQLite database. It also provides tools for packaging the database up as a Docker container and deploying that container to hosting providers such as Zeit Now.Datasette requires Python 3.5 or higher.

retriever - Quickly download, clean up, and install ecological datasets into a database management system

  •    Python

Finding data is one thing. Getting it ready for analysis is another. Acquiring, cleaning, standardizing and importing publicly available data is time consuming because many datasets lack machine readable metadata and do not conform to established data structures and formats. The Data Retriever automates the first steps in the data analysis pipeline by downloading, cleaning, and standardizing datasets, and importing them into relational databases, flat files, or programming languages. The automation of this process reduces the time for a user to get most large datasets up and running by hours, and in some cases days. Precompiled binary installers are also available for Windows, OS X, and Ubuntu/Debian on the releases page. These do not require a Python installation. Download the installer for your operating system and follow the instructions at on the download page.

node-quandl - A nodejs module for interacting with the Quandl API.

  •    Javascript

###Description A nodejs module for interacting with the Quandl API. ###Configuration Simply require the quandl module, instantiate a new Quandl object, configure it if necessary, and start making calls. The auth token and api version are configurable.

MNIST-Sequence - A tool to generate image dataset for sequences of handwritten digits using MNIST database

  •    Python

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. The goal of this project is to use the above database of handwritten digit images to generate images representing sequences of handwritten digits. The project also provides a utility to generate and save a set of training/test image dataset of MNIST sequences with labels.

datasets - source{d} datasets ("big code") for source code analysis and machine learning on source code

  •    Jupyter

source{d} datasets for source code analysis and machine learning on source code. This repository contains all the needed tools and scripts to reproduce the datasets.

RData.jl - Read R data files from Julia

  •    Julia

Read R data files (.rda, .RData) and optionally convert the contents into Julia equivalents. Can read any R data archive, although not all R types could be converted into Julia.

Datasets - Machine learning datasets used in tutorials on MachineLearningMastery.com


This repository contains a copy of machine learning datasets used in tutorials on MachineLearningMastery.com. This repository was created to ensure that the datasets used in tutorials remain available and are not dependent upon unreliable third parties.

ck-env - Customizable cross-platform package and environment manager with automatic detection, installation and coexistence of multiple versions of software including libraries, compilers, tools, data sets across diverse Linux, Windows, MacOS and Android-based hardware to support collaborative and reproducible CK research workflows:

  •    Python

This is a Collective Knowledge repository providing functionality for portable, customizable, eproducible and automated experimental workflows. It lets users automatically detect multiple versions of different software (compilers, libraries, tools, models, data sets) using CK software detection plugins or install missing packages in a unified way across diverse hardware with Linux, Windows, MacOS and Android operating systems. It also allows users to collect information about a given platform in a unified way.

big-data-exploration - [Archive] Intern project - Big Data Exploration using MongoDB

  •    Javascript

This project seeks to discover, investigate, and solve big data-set questions while utilizing MongoDB for storage and computations. This summer internship project also shows how to answer questions concerning big datasets stored in MongoDB using MongoDB's frameworks and connector. Both the MongoDB native aggregation framework and hadoop were utilized to explore the data.

openml-r - R package to interface with OpenML

  •    Jupyter

OpenML.org is an online machine learning platform where researchers can access open data, download and upload data sets, share their machine learning tasks and experiments and organize them online to work and collaborate with other researchers. The R interface allows to query for data sets with specific properties, and allows the downloading and uploading of data sets, tasks, flows and runs. To cite the OpenML R package in publications, please use our paper entitled OpenML: An R Package to Connect to the Machine Learning Platform OpenML [bibtex].