Displaying 1 to 20 from 27 results

fma - FMA: A Dataset For Music Analysis

  •    Jupyter

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2. The dataset is a dump of the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads. Below the abstract from the paper.

ITK - Insight Segmentation and Registration Toolkit -- Mirror

  •    C++

The National Library of Medicine Insight Segmentation and Registration Toolkit (ITK), or Insight Toolkit, is an open-source, cross-platform C++ toolkit for segmentation and registration. Segmentation is the process of identifying and classifying data found in a digitally sampled representation. Typically the sampled representation is an image acquired from such medical instrumentation as CT or MRI scanners. Registration is the task of aligning or developing correspondences between data. For example, in the medical environment, a CT scan may be aligned with a MRI scan in order to combine the information contained in both. The toolkit may be built from source using CMake.

OpenML - Open Machine Learning

  •    CSS

We are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.

food-inspections-evaluation - This repository contains the code to generate predictions of critical violations at food establishments in Chicago

  •    HTML

This is our model for predicting which food establishments are at most risk for the types of violations most likely to spread food-borne illness. Chicago Department of Public Health staff use these predictions to prioritize inspections. During a two month pilot period, we found that that using these predictions meant that inspectors found critical violations much faster. You can help improve the health of our city by improving this model. This repository contains a training and test set, along with the data used in the current model.

repairnator - Software development bot that automatically repairs programs and build failures on Travis Continuous Integration

  •    Java

Repairnator is a software development bot that automatically repairs build failures on continuous integration. It monitors failing Travis CI builds in Java projects, tries to locally reproduce the failing builds and finally attempts to repair them with the state-of-the-art of automated program repair research. Repairnator is a community effort, driven by Martin Monperrus at KTH Royal Institute of Technology. See the usage section of our documentation.

itk-js - Run C++ spatial analysis code in Node.js or a Web Browser

  •    C++

itk.js combines Emscripten and ITK to enable high-performance spatial analysis in a JavaScript runtime environment. The project provides tools to a) build C/C++ code to JavaScript (asm.js) and WebAssembly, b) bridge local filesystems, native JavaScript data structures, and traditional file formats, c) transfer data efficiently in and out of the Emscripten runtime, and d) asynchronously execute processing pipelines in a background thread. itk.js can be used to execute ITK, VTK or arbitrary C++ codes in the browser or on a workstation / server with Node.js.

itk-jupyter-widgets - Interactive Jupyter widgets to visualize images in 2D and 3D

  •    Python

Interactive Jupyter widgets to visualize images in 2D and 3D. These widgets are designed to support image analysis with the Insight Toolkit (ITK), but they also work with other spatial analysis tools in the scientific Python ecosystem.

events - Materials related to events I might attend, and to talks I am giving


Many conferences and other events require submissions of proposals, but very few of these are being made public in a timely fashion or at all. Plus, many of these submission systems have technical issues or do not provide submitters with a copy of their submission. I have thus started this repo to keep track of my submissions to such closed systems, and I will also use it for hosting the materials for some invited talks. Many more of my talks are available elsewhere, as detailed here. Some of them have been recorded.

open-science-prize - Auxiliary infrastructure for the Open Science Prize


The winner of phase 2 has been announced: http://nextstrain.org/ . Feedback on all aspects of the prize (organization, scope, rules, timing, prize numbers and amounts etc.) is still welcome. The 6 winning teams of Phase I of the Prize have now been announced, and they will now compete for the single prize in phase II. Feedback on the process so far is being collected here.

tomviz - Cross platform, open source application for the processing, visualization, and analysis of 3D tomography data

  •    C++

The Tomviz project is developing a cross platform, open source application for the processing, visualization, and analysis of 3D tomographic data. It features a complete pipeline capable processing data from alignment, reconstruction, and segmentation through to displaying, visualizing, and interacting with 3D reconstructions of tomographic data. Many of the data operators are available as editable Python scripts that can be modified in the interface to experiment with different techniques. The pipeline can be saved to disk, and a number of common file formats are supported for importing and exporting data. The Tomviz project was founded by Marcus D. Hanwell and Utkarsh Ayachit at Kitware, David A. Muller (Cornell University), and Robert Hovden (University of Michigan), funded by DOE Office of Science contract DE-SC0011385.

open-computational-neuroscience-resources - A publicly-editable collection of open computational neuroscience resources


Computational neuroscience means one of two things: 1. analysis of neuroscientific data, whether it be fMRI imaging data, electrode recordings from a mouse running in a maze, statistical modeling of that data, or something else, and 2. simulation of neural systems, including modeling many compartments of a single neuron, or large networks of model neurons with simple individual behavior. These endeavors require expensive data from wet-lab experiments, but much of the work can be accomplished using everyday, consumer-grade laptop and desktop computers! Indeed, the biggest barrier to entry is not hardware, data, or expense, but rather time and passion to learn the tools needed for such computational science. Coupled with the great tools coming out of the modern Data Science movement, open data, open simulation models, and open analysis and simulation tools for computational neuroscience make it easier than ever to learn or even contribute to the study of the brain! The resources below should be more than enough to provide anyone with the means to begin learning or working in computational neuroscience, at no cost other than time and a modern computer. Note: This is intended as a list of resources to help with neuroscientific pursuits, as opposed to artificial intelligence pursuits. More broadly, I've made a similar repo-list of general open science resources here.

open-science-resources - A publicly-editable collection of open science resources, including tools, datasets, meta-resources, etc


Scientific data and tools should, as much as possible, be free as in beer and free as in freedom. The vast majority of science today is paid for by taxpayer-funded grants; at the same time, the incredible successes of science are strong evidence for the benefit of collaboration in knowledgable pursuits. Within the scientific academy, sharing of expertise, data, tools, etc. is prolific, but only recently with the rise of the Open Access movement has this sharing come to embrace the public. Even though most research data is never shared, both the public and even scientists in their own fields are often unaware of just much data, tools, and other resources are made freely available for analysis! This list is a small attempt at bringing light to data repositories and computational science tools that are often siloed according to each scientific discipline, in the hopes of spurring along both public and professional contributions to science. These categories are very non-exclusive, as many resources could fit multiple categories. If you're interested in computational neuroscience specifically, I've made a similar repo-list of open computational neuroscience resources here.

clowder - A data management system that allows users to share, annotate, organize and analyze large collections of datasets

  •    Javascript

A customizable and scalable data management system you can install in the cloud or on your own hardware. More information is available at https://clowder.ncsa.illinois.edu/. For a full list of metadata extractors you can deploy to your instance, please take a look at the NCSA repositories or the Brown Dog wiki. If you have extractors available somewhere else, please get in touch with the team so we can add them these lists.

neuroglia - a Python machine learning library for neurophysiology data

  •    Python

Neuroglia is a Python machine learning library for neurophysiology data. It features scikit-learn compatible transformers for extracting features from extracellular electrophysiology & optical physiology data for machine learning pipelines. We are planning on occasional updating this tool with no fixed schedule. Community involvement is encouraged through both issues and pull requests.

prodigenr - Project directory generator R package

  •    R

This R package is part of a series of (planned) packages that are aimed at creating a toolkit for doing reproducible and open science. Many researchers (especially in biomedicine, medicine, or health, which is my area of research) have little to no knowledge on what open science is or what reproducibility is, let alone how to do it. My goal is create an (opinionated) toolkit to automate and simplify the process of doing open and reproducible science. This specific package is a project directory generator (prodigenr). It will create a standardized project folder structure with the necessary template files for managing and analyzing data and for creating common scientific output (posters, slides, abstracts, manuscripts). Because of the standardized structure and because of the focus on a "one project, one scientific output", this allows the final code and documents to be fairly modular, self-contained, easy to share and make public... and be as reproducible as possible. This folder structure also makes use of the existing and established applications and workflows (RStudio, devtools, and usethis). This package aims to make it easier to adhere to open scientific practices by following a standard, consistent, and established folder and file structure for data analysis projects.

openml-r - R package to interface with OpenML

  •    Jupyter

OpenML.org is an online machine learning platform where researchers can access open data, download and upload data sets, share their machine learning tasks and experiments and organize them online to work and collaborate with other researchers. The R interface allows to query for data sets with specific properties, and allows the downloading and uploading of data sets, tasks, flows and runs. To cite the OpenML R package in publications, please use our paper entitled OpenML: An R Package to Connect to the Machine Learning Platform OpenML [bibtex].

meta-review - Manuscript describing open collaborative writing with Manubot

  •    Jupyter

This manuscript presents the benefits of writing collaborative reviews in the open and the Manubot System for automating large portions of the build process. The results are derived from the authors' experience with the collaborative Deep Review. Feedback and minor contributions (e.g. typo corrections) are welcome. Major contributions are not being solicited at this time. To see what's incoming, check the open pull requests.