Displaying 1 to 20 from 60 results

awesome-json-datasets - A curated list of awesome JSON datasets that don't require authentication.

  •    Javascript

A curated list of awesome JSON datasets that don't require authentication. Pro Tip: Check out Blockchain Data API for more options.

pix2code - pix2code: Generating Code from a Graphical User Interface Screenshot

  •    Python

Transforming a graphical user interface screenshot created by a designer into computer code is a typical task conducted by a developer in order to build customized software, websites, and mobile applications. In this paper, we show that deep learning methods can be leveraged to train a model end-to-end to automatically generate code from a single input image with over 77% of accuracy for three different platforms (i.e. iOS, Android and web-based technologies). The following software is shared for educational purposes only. The author and its affiliated institution are not responsible in any manner whatsoever for any damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of the use or inability to use this software.

PyDataset - Instant access to many datasets in Python.

  •    Python

Provides instant access to many datasets right from Python (in pandas DataFrame structure). That's it. See more examples.

awesome-public-datasets - A topic-centric list of high-quality open datasets in public domains


NOTICE: This repo is automatically generated by apd-core. Please DO NOT modify this file directly. We have provided a new way to contribute to Awesome Public Datasets. The original PR entrance directly on repo is closed forever. This list of a topic-centric public data sources in high quality. They are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in sindresorhus's awesome list.

Hub - Fastest dataset optimization and management for machine and deep learning

  •    Python

Note: the translations of this document may not be up-to-date. For the latest version, please check the README in English. Software 2.0 needs Data 2.0, and Hub delivers it. Most of the time Data Scientists/ML researchers work on data management and preprocessing instead of training models. With Hub, we are fixing this. We store your (even petabyte-scale) datasets as single numpy-like array on the cloud, so you can seamlessly access and work with it from any machine. Hub makes any data type (images, text files, audio, or video) stored in cloud usable as fast as if it were stored on premise. With same dataset view, your team can always be in sync.

datasets - 🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

  •    Python

🤗Datasets also provides access to +15 evaluation metrics and is designed to let the community easily add and share new datasets and evaluation metrics. 🤗Datasets originated from a fork of the awesome TensorFlow Datasets and the HuggingFace team want to deeply thank the TensorFlow Datasets team for building this amazing library. More details on the differences between 🤗Datasets and tfds can be found in the section Main differences between 🤗Datasets and tfds.

datasets - TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

  •    Python

TensorFlow Datasets provides many public datasets as tf.data.Datasets. To install and use TFDS, we strongly encourage to start with our getting started guide. Try it interactively in a Colab notebook.

doccano - Open source annotation tool for machine learning practitioners.

  •    Python

doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequence to sequence tasks. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours. You can try the annotation demo.

CodeSearchNet - Datasets, tools, and benchmarks for representation learning of code.

  •    Jupyter

We would like to thank all participants for their submissions and we hope that this challenge provided insights to practitioners and researchers about the challenges in semantic code search and motivated new research. We would like to encourage everyone to continue using the dataset and the human evaluations, which we now provide publicly. Please, see below for details, specifically the Evaluation section. No new submissions to the challenge will be accepted.

DataFrames.jl - Library for working with tabular data in Julia

  •    Julia

Tools for working with tabular data in Julia. Maintenance: DataFrames is maintained collectively by the JuliaData collaborators. Responsiveness to pull requests and issues can vary, depending on the availability of key collaborators.

geobr - Easy access to official spatial data sets of Brazil in R and Python

  •    R

geobr is a computational package to download official spatial data sets of Brazil. The package includes a wide range of geospatial data in geopackage format (like shapefiles but better), available at various geographic scales and for various years with harmonized attributes, projection and topology (see detailed list of available data sets below). The package is currently available in R and Python.

projects - 🪐 End-to-end NLP workflows from prototype to production

  •    Python

spaCy projects let you manage and share end-to-end spaCy workflows for different use cases and domains, and orchestrate training, packaging and serving your custom pipelines. You can start off by cloning a pre-defined project template, adjust it to fit your needs, load in your data, train a pipeline, export it as a Python package, upload your outputs to a remote storage and share your results with your team. ⚠️ spaCy project templates require spaCy v3. You can install it from pip with pip install spacy or conda with conda install spacy -c conda-forge. Make sure to use a fresh virtual environment.

PaperRobot - Code for PaperRobot: Incremental Draft Generation of Scientific Ideas

  •    Python

You can click the following links for detailed installation instructions. PubMed Paper Reading Dataset This dataset gathers 14,857 entities, 133 relations, and entities corresponding tokenized text from PubMed. It contains 875,698 training pairs, 109,462 development pairs, and 109,462 test pairs.

OpenML - Open Machine Learning

  •    CSS

We are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.

rdataretriever - R interface to the Data Retriever

  •    R

R interface to the Data Retriever.The Data Retriever automates the tasks of finding, downloading, and cleaning up publicly available data, and then stores them in a local database or csv files. This lets data analysts spend less time cleaning up and managing data, and more time analyzing it.

datasette - An instant JSON API for your SQLite databases

  •    Python

Datasette provides an instant, read-only JSON API for any SQLite database. It also provides tools for packaging the database up as a Docker container and deploying that container to hosting providers such as Zeit Now.Datasette requires Python 3.5 or higher.

retriever - Quickly download, clean up, and install ecological datasets into a database management system

  •    Python

Finding data is one thing. Getting it ready for analysis is another. Acquiring, cleaning, standardizing and importing publicly available data is time consuming because many datasets lack machine readable metadata and do not conform to established data structures and formats. The Data Retriever automates the first steps in the data analysis pipeline by downloading, cleaning, and standardizing datasets, and importing them into relational databases, flat files, or programming languages. The automation of this process reduces the time for a user to get most large datasets up and running by hours, and in some cases days. Precompiled binary installers are also available for Windows, OS X, and Ubuntu/Debian on the releases page. These do not require a Python installation. Download the installer for your operating system and follow the instructions at on the download page.

node-quandl - A nodejs module for interacting with the Quandl API.

  •    Javascript

###Description A nodejs module for interacting with the Quandl API. ###Configuration Simply require the quandl module, instantiate a new Quandl object, configure it if necessary, and start making calls. The auth token and api version are configurable.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.