Displaying 1 to 3 from 3 results

shields - Concise, consistent, and legible badges in SVG and raster format

  •    Javascript

This is home to Shields.io, a service for concise, consistent, and legible badges in SVG and raster format, which can easily be included in GitHub readmes or any other web page. The service supports dozens of continuous integration services, package registries, distributions, app stores, social networks, code coverage services, and code analysis services. Every month it serves over 470 million images. Browse a complete list of badges.

crossref - Download metadata for all DOIs using the Crossref API

  •    Jupyter

This repository downloads Crossref metadata using the Crossref API. The items retrieved are stored in MongoDB to preserve their raw structure. This design allows for flexible downstream analyses. MongoDB is run via Docker. It's available on the host machine at http://localhost:27017/.

data-versioning - Collecting thoughts about data versioning


Version control is a huge part of reproducible research and open source software development. Versioning provides a complete history of some digital object (e.g., a software program, a research project, etc.) and, importantly, allows one to trace what changes have made to that object, when those changes were made, and (with the appropriate metadata) why those changes were made. This document holds some of my current thinking about version control for data. Especially in the social sciences, researchers depend on large, public datasets (e.g., Polity, Quality of Government, Correlates of War, ANES, ESS, etc.) as source material for quantitative research. These datasets typically evolve (new data is added over time, corrections are made to data values, etc.) and new releases are periodically made public. Sometimes these data are complex collaborative efforts (see, for example, Quality of Government) and others are public releases of single-institution data collection efforts (e.g., ANES). While collaborative datasets create a more obvious use case for version control, single-institution datasets might also be improved by version control. This is particularly important because old releases of these vital datasets are often not archived (e.g., ANES) meaning that it is essentially impossible to recover a prior version of a given ANES dataset after a new release has occurred. This post is meant to steer thinking about how to manage the creation, curation, revision, and dissemination of these kinds of datasets. While the ideas here might also apply to how one thinks about managing their own data, they probably apply more at the stage of data creation than at later data use after a dataset is essentially complete or frozen.