Montreal-Forced-Aligner - Command line utility for forced alignment using Kaldi

  •        93

The Montreal Forced Aligner is a command line utility for performing forced alignment of speech datasets using Kaldi ( Please see the documentation for installation and usage.



Related Projects

aeneas - aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

  •    Python

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.

vhost - virtual domain hosting

  •    Javascript

Create a new middleware function to hand off request to handle when the incoming host for the request matches hostname. The function is called as handle(req, res, next), like a standard middleware. hostname can be a string or a RegExp object. When hostname is a string it can contain * to match 1 or more characters in that section of the hostname. When hostname is a RegExp, it will be forced to case-insensitive (since hostnames are) and will be forced to match based on the start and end of the hostname.

bwa - Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

  •    C

Note: minimap2 has replaced BWA-MEM for PacBio and Nanopore read alignment. It retains all major BWA-MEM features, but is ~50 times as fast, more versatile, more accurate and produces better base-level alignment. BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to a few megabases. BWA-MEM and BWA-SW share similar features such as the support of long reads and chimeric alignment, but BWA-MEM, which is the latest, is generally recommended as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.

haven - Haven is for people who need a way to protect their personal spaces and possessions without compromising their own privacy, through an Android app and on-device sensors

  •    Java

Haven is for people who need a way to protect their personal spaces and possessions without compromising their own privacy. It is an Android application that leverages on-device sensors to provide monitoring and protection of physical spaces. Haven turns any Android phone into a motion, sound, vibration and light detector, watching for unexpected guests and unwanted intruders. We designed Haven for investigative journalists, human rights defenders, and people at risk of forced disappearance to create a new kind of herd immunity. By combining the array of sensors found in any smartphone, with the world's most secure communications technologies, like Signal and Tor, Haven prevents the worst kind of people from silencing citizens without getting caught in the act. View our full Haven App Overview presentation for more about the origins and goals of the project.

Dictionary Maker

  •    Java

DictionaryMaker is a graphical tool for creating electronic pronunciation dictionaries (for natural languages). The system allows a user to develop a pronunciation dictionary without requiring expert linguistic knowledge or programming expertise.

Tag Aligner

  •    C++

Parallel text aligner dessigned to generate transation memories (TMX files) from two files tagged with any kind of XML-based tags. The application uses the tag structure and the text blok length to perform the alignment.

g2p-seq2seq - G2P with Tensorflow

  •    Python

The tool does Grapheme-to-Phoneme (G2P) conversion using transformer model from tensor2tensor toolkit [1]. A lot of approaches in sequence modeling and transduction problems use recurrent neural networks. But, transformer model architecture eschews recurrence and instead relies entirely on an attention mechanism to draw global dependencies between input and output [2]. This implementation is based on python TensorFlow, which allows an efficient training on both CPU and GPU.

codetriage - Discover the best way to get started contributing to Open Source projects

  •    Ruby

When patients come into the emergency room, they don't see a doctor immediately, they go to a triage nurse. The nurse knows enough about medical problems to properly assign that person to the doctor that can help them the quickest. Since the doctors are the most limited resource, triage nurses help to assign them as effectively as possible. Triage in open source means looking at open issues and adding useful information for maintainers. While you might not maintain a repository, you can help those who do by diagnosing issues, reviewing pull requests. Triage is an important part of open source. It can be difficult to keep up with bugs and assess the validity of contributions. Code introduced to fix one problem can easily generate more problems than it solves, so it's important for maintainers to look closely at bug reports and code contributions. Unfortunately as the size of a project grows, the demands placed on the maintainers grow. This means they are forced to choose between spending enormous amounts of time reviewing each GitHub issue, only skimming over issues, or worse, ignoring issues.

Connection Manager

  •    Python

Do you have to connect to a lot of systems, using ssh, telnet, vnc and/or rdesktop? The Connection Manager lets you configure all of these connections, and access them by name, so you aren't forced to remember how to connect to each system.

Tails - Live Operating System supports Privacy and Anonymity

  •    C

Tails is a live operating system, that you can start on almost any computer from a DVD, USB stick, or SD card. It aims at preserving your privacy and anonymity, and helps you to use the Internet anonymously and circumvent censorship. All connections to the Internet are forced to go through the Tor network. It leaves no trace on the computer you are using unless you ask it explicitly. It uses cryptographic tools to encrypt your files, emails and instant messaging.

InfusionSoftDotNet Library


This library provides a dll to ease the pain for .Net developers to access the InfusionSoft API. No longer will you need to "roll your own" code to access the API nor are you forced into use PHP (the library provided by InfusionSoft).

SimplePHPEasyPlus - A simple, pragmatic numeric operation api written in PHP. It does addition.

  •    PHP

In early stages of Internet, developers were forced to work with poor, dry, imperative, horrific languages. Everything had to be done through austere functions and operators. There was no objects. No interfaces. No dependency injection. For example, to make something as simple as an addition, our dads had to write: 1+1. Yeah, really.

trail-map - Trails to help designers and developers learn various topics.

  • has forced us to formalize our answers to such questions. This repository contains trails to help designers and developers learn.

Task Coach

  •    Objective-C

Free flexible open source todo manager featuring hierarchical tasks

VSCodeNotebook - 📝 Use VS Code as a reliable note-taking/journal application

  •    Python

VSCode Notebook is an attempt to use VSCode as a complete note taking application. This is a VSCode port of the popular SublimeNotebook project. Because of these reasons, I had to lose my notes a number of times and was forced to start from scratch. This was frustrating, and finally, I decided to do something about it.

commit - :chart_with_upwards_trend: Level up your dev skills every day.


Please note, this repo is now retired. Due to the size of the git tree becoming unmanagable, GitHub asked us to discontinue use of this repo as they were forced to perform manual maintainence and it was disrupting the overall GitHub service. This repo previously logged the progress of the Enki community’s personal learning habit. Tracking progress with a commit for each workout. Commit to a daily habit and learn something new each day.

Tox - The future of online communications.

  •    C

With the rise of government surveillance programs, Tox, a FOSS initiative, aims to be an easy to use, all-in-one communication platform that ensures full privacy and secure message delivery. Tox must use UDP simply because hole punching with TCP is not as reliable. However, Tox does use TCP relays as a fallback if it encounters a firewall that prevents UDP hole punching.

Champollion Tool Kit


Built around LDC's champollion sentence aligner kernel, Champollion Tool Kit (CTK) aims to providing ready-to-use parallel text sentence alignment tools for as many language pairs as possible.

Custom SharePoint List Item Attachments versions


Recently, I am working on a custom requirement to have maintaining own file versions for SPListItem Attachments with one of my engagements. This forced me to have this code published for community to share IP.

jsgamebench - Exercise web browsers under game-like conditions

  •    Javascript

This is an archived project and is not currently being developed by Facebook. Please do not file issues or pull-requests against this repo. If you wish to continue to develop this code yourself, we recommend you fork it. For each render path, JSGameBench draws as many moving, animating sprites as possible at 30fps against a background with both axis-aligned and rotated sprites. We try both because significant performance differences between the two indicate flaws or oversights in current rendering techniques. More importantly, while animation can be used instead of sprite rotations, it is often an unacceptable trade off that game developers should not be forced to make.