Displaying 1 to 20 from 28 results

roadoi - Use oaDOI.org with R

  •    R

roadoi interacts with the oaDOI API, a simple interface which links DOIs and open access versions of scholarly works. oaDOI powers unpaywall.Use the oadoi_fetch() function in this package to get open access status information and full-text links from oaDOI.

esbulk - elasticsearch bulk indexing for newline delimited JSON.

  •    Go

Please note that, in such a case, some documents are indexed and some are not. Your index will be in an inconsistent state, since there is no transactional bracket around the indexing process. However, using defaults (parallism: number of cores) on a single node setup will just work. For larger clusters, increase the number of workers until you see full CPU utilization. After that, more workers won't buy any more speed.

estab - Export elasticsearch as TSV or line delimited JSON.

  •    Go

to retrieve large numbers of documents from Elasticsearch efficiently ... Or if your system speaks dpkg or rpm, there is a release.

metha - Command line OAI harvester and client with built-in cache.

  •    Go

The metha command line tools can gather information on OAI-PMH endpoints and to harvest data incrementally. All downloaded files are written to a directory below a base directory. The base directory is ~/.metha by default and can be adjusted with the METHA_DIR environment variable.




ntto - Small n-triples to line delimited JSON converter and prefix cutter.

  •    Go

RPM and DEB packages can be found under releases. should work as well.

siskin - Tasks around metadata.

  •    Python

Various tasks for heterogeneous metadata handling for Project finc. Based on luigi from Spotify. Currently, Python 2 and 3 support is on the way and the installation might be a bit flaky.

solrbulk - SOLR bulk indexing utility for the command line.

  •    Go

solrbulk expects as input a file with line-delimited JSON. Each line represents a single document. solrbulk takes care of reformatting the documents into the bulk JSON format, that SOLR understands. solrbulk will send documents in batches and in parallel. The number of documents per batch can be set via -size, the number of workers with -w.

span - Span formats.

  •    Go

The span tools convert to and from an intermediate schema and support license tagging and quality assurance. or via deb or rpm packages.


siegfried - signature-based file format identification

  •    Go

By default, siegfried uses the latest PRONOM signatures without buffer limits (i.e. it may do full file scans). To use MIME-info or LOC signatures, or to add buffer limits or other customisations, use the roy tool to build your own signature file. See the CHANGELOG for the full history.

sfm-ui - The [new] Social Feed Manager user interface application.

  •    Python

Social Feed Manager (SFM) harvests social media data from multiple platforms' public APIs to help archivists, librarians, and researchers to build social media collections. More information about the project itself. This is a re-architected version of an earlier Social Feed Manager which had been in use at GWU Libraries since 2012.

diy-headline-roulette - Make your own simple game using the Trove API

  •    Javascript

Headline Roulette is a simple game that challenges you to guess the publication date of a newspaper article chosen at random from Trove. Here you can build your own customised version of Headline Roulette with nothing more than a GitHub account.

diy-trove-exhibition

  •    CSS

Before you get started you'll need to have some Trove lists ready to provide the content for your exhibition. Trove lists are just collections of items found on Trove -- they're easy to create, manage, and edit. You could create an exhibition with just one list, but it's probably better to divide your content between a number of lists. Each list is displayed as a 'topic' in the exhibition.

trovebuildabot - Build your own Trove collection Twitter bot.

  •    Python

If you're an organisation that contributes collection records to Trove you can use this code to build your own Twitter bot that responds to user queries and tweets random collection items. To set up your bot you need to create a Twitter account and then generate the necessary authentication keys for both Twitter and Trove.

Catmandu - Catmandu - a data processing toolkit developed by Ghent University Library

  •    Perl

and export records into formats such as JSON, YAML, CSV, XLS, RDF and many more. In the example above, we renamed all the 'title' fields in the dataset into the 'my_title' field.

Catmandu-MARC - Catmandu modules for working with MARC data

  •    Perl

With Catmandu, LibreCat tools abstract digital library and research services as data warehouse processes. As stores we reuse MongoDB or ElasticSearch providing us with developer friendly APIs. Catmandu works with international library standards such as MARC, MODS and Dublin Core, protocols such as OAI-PMH, SRU and open repositories such as DSpace and Fedora. And, of course, we speak the evolving Semantic Web. Follow us on http://librecat.org and read an introduction into Catmandu data processing at https://github.com/LibreCat/Catmandu/wiki.

LibreCat - A publication management system

  •    Perl

The development started in 2013 in Bielefeld and was made available on GitHub from the start. Since 2015 the code is in production at Bielefeld. In 2016 Ghent University started using the cataloging backend in production.

librisxl - Libris XL

  •    Groovy

The applications above depend on the Whelk Core repository. Core metadata to be loaded is managed in the definitions repository.

edtf-humanize - This gem adds a humanize method to EDTF dates.

  •    Ruby

The EDTF-humanize gem adds a humanize method to EDTF::Decade, EDTF::Century, EDTF::Interval, EDTF::Set, EDTF::Season, EDTF::Unknown, and Date (ISO 8601 compliant) objects. It uses the edtf-ruby gem for parsing EDTF date strings into Date and EDTF objects. See the edtf-ruby project's documentation for details about supported EDTF string formats and other implementation details.

metadata-qa-marc - Metadata assessment for MARC records

  •    Java

A metadata quality assurance framework. It checks some metrics of metadata records, such as completeness, uniqueness, problem catalog.

iromlab - Loader software for automated imaging of optical media with Nimbie disc robot

  •    HTML

Iromlab (Image and Rip Optical Media Like A Boss ) provides a simple and straightforward way to save the content of offline optical media from the KB collection. Internally it wraps around a number of widely-used software tools. Iromlab automatically detects if a carrier contains data, audio, or both. The content of data sessions is extracted and saved as an ISO image using IsoBuster. For audio sessions all tracks are ripped to WAV or FLAC format with dBpoweramp. Iromlab supports 'enhanced' audio CDs that contain both audio and data sessions. ('Mixed mode' discs, which contain 1 session that holds both audio and data tracks, are currently not supported). ISO images are verified using Isolyzer. Audio files are checked for errors using shntool (WAV format) or flac (FLAC format).