Displaying 1 to 20 from 226 results

ISO-3166-Countries-with-Regional-Codes - ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets

  •    Ruby

These lists are the result of merging data from two sources, the Wikipedia ISO 3166-1 article for alpha and numeric country codes, and the UN Statistics site for countries' regional, and sub-regional codes. In addition to countries, it includes dependent territories. The International Organization for Standardization (ISO) site provides partial data (capitalised and sometimes stripped of non-latin ornamentation), but sells the complete data set as a Microsoft Access 2003 database. Other sites give you the numeric and character codes, but there appeared to be no sites that included the associated UN-maintained regional codes in their data sets. I scraped data from the above two websites that is all publicly available already to produce some ready-to-use complete data sets that will hopefully save someone some time who had similar needs.

awesome-json-datasets - A curated list of awesome JSON datasets that don't require authentication.

  •    Javascript

A curated list of awesome JSON datasets that don't require authentication. Pro Tip: Check out Blockchain Data API for more options.

browser-compat-data - This repository contains compatibility data for Web technologies as displayed on MDN

  •    Javascript

This repository contains compatibility data for Web technologies. Browser compatibility data describes which platforms (where "platforms" are usually, but not always, web browsers) support particular Web APIs. This data can be used in documentation, to build compatibility tables listing browser support for APIs. For example: Browser support for WebExtension APIs.

dataset - JavaScript library that makes managing the data behind client-side visualisations easy

  •    Javascript

Dataset is a JavaScript library that makes managing the data behind client-side visualisations easy, including realtime data. It takes care of the loading, parsing, sorting, filtering and querying of datasets as well as the creation of derivative datasets. The following builds do not have any of the dependencies built in. It is your own responsibility to include them as appropriate script elements in your page.




quickdraw-dataset - Documentation on how to access and use the Quick, Draw! Dataset.

  •    

The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game Quick, Draw!. The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located. You can browse the recognized drawings on quickdraw.withgoogle.com/data. We're sharing them here for developers, researchers, and artists to explore, study, and learn from. If you create something with this dataset, please let us know by e-mail or at A.I. Experiments.

quandl-python

  •    Python

This is the official documentation for Quandl's Python Package. The package can be used to interact with the latest version of the Quandl RESTful API. This package is compatible with python v2.7.x and v3.x+. quandl.ApiConfig.api_version is optional however it is strongly recommended to avoid issues with rate-limiting. For premium databases, datasets and datatables quandl.ApiConfig.api_key will need to be set to identify you to our API. Please see API Documentation for more detail.

fashion-mnist - A MNIST-like fashion product database. Benchmark :point_right:

  •    Python

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.


faker - Faker is a Python package that generates fake data for you.

  •    Python

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you. Faker is heavily inspired by PHP Faker, Perl Faker, and by Ruby Faker.

raccoon_dataset - The dataset is used to train my own raccoon detector and I blogged about it on Medium

  •    Jupyter

This is a dataset that I collected to train my own Raccoon detector with TensorFlow's Object Detection API. Images are from Google and Pixabay. In total, there are 200 images (160 are used for training and 40 for validation). See LICENSE for details. Copyright (c) 2017 Dat Tran.

Objectron - Objectron is a dataset of short, object-centric video clips

  •    Jupyter

Objectron is a dataset of short object centric video clips with pose annotations. The Objectron dataset is a collection of short, object-centric video clips, which are accompanied by AR session metadata that includes camera poses, sparse point-clouds and characterization of the planar surfaces in the surrounding environment. In each video, the camera moves around the object, capturing it from different angles. The data also contain manually annotated 3D bounding boxes for each object, which describe the object’s position, orientation, and dimensions. The dataset consists of 15K annotated video clips supplemented with over 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes. In addition, to ensure geo-diversity, our dataset is collected from 10 countries across five continents. Along with the dataset, we are also sharing a 3D object detection solution for four categories of objects — shoes, chairs, mugs, and cameras. These models are trained using this dataset, and are released in MediaPipe, Google's open source framework for cross-platform customizable ML solutions for live and streaming media.

waymo-open-dataset - Waymo Open Dataset

  •    C++

The Waymo Open Dataset was first launched in August 2019 with a perception dataset comprising high resolution sensor data and labels for 1,950 segments. We have released the Waymo Open Dataset publicly to aid the research community in making advancements in machine perception and autonomous driving technology. We expanded the Waymo Open Dataset to also include a motion dataset comprising object trajectories and corresponding 3D maps for over 100,000 segments. We have updated this repository to add support for this new dataset. Please refer to the Quick Start.

datasets - TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

  •    Python

TensorFlow Datasets provides many public datasets as tf.data.Datasets. To install and use TFDS, we strongly encourage to start with our getting started guide. Try it interactively in a Colab notebook.

doccano - Open source annotation tool for machine learning practitioners.

  •    Python

doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequence to sequence tasks. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours. You can try the annotation demo.

covid-chestxray-dataset - We are building an open database of COVID-19 cases with chest X-ray or CT images

  •    Jupyter

Project Summary: To build a public open dataset of chest X-ray and CT images of patients which are positive or suspected of COVID-19 or other viral and bacterial pneumonias (MERS, SARS, and ARDS.). Data will be collected from public sources as well as through indirect collection from hospitals and physicians. All images and data will be released publicly in this GitHub repo. Lung Bounding Boxes and Chest X-ray Segmentation (license: CC BY 4.0) contributed by General Blockchain, Inc.

COVID-19 - Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE

  •    

This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL). DATA SOURCES: This list includes a complete list of all sources ever used in the data set, since January 21, 2010. Some sources listed here (e.g. ECDC, US CDC, BNO News) are not currently relied upon as a source of data.

Mobius - C# and F# language binding and extensions to Apache Spark

  •    CSharp

Mobius provides C# language binding to Apache Spark enabling the implementation of Spark driver program and data processing operations in the languages supported in the .NET framework like C# or F#.For more code samples, refer to Mobius\examples directory or Mobius\csharp\Samples directory.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.