sonus - :speech_balloon: /so.nus/ STT (speech to text) for Node with offline hotword detection

  •        161

Sonus lets you quickly and easily add a VUI (Voice User Interface) to any hardware or software project. Just like Alexa, Google Now, and Siri, Sonus is always listening offline for a customizable hotword. Once that hotword is detected your speech is streamed to the cloud recognition service of your choice - then you get the results. Generally, running npm install should suffice. This module however, requires you to install SoX.

https://github.com/evancohen/sonus#readme

Dependencies:

@google-cloud/speech : ^0.10.3
node-record-lpcm16 : ^0.3.0
snowboy : ^1.2.0

Tags
Implementation
License
Platform

   




Related Projects

Porcupine - On-device wake word detection engine powered by deep learning.

  •    C

Try out Porcupine using its interactive web demo. You need a working microphone. Try out Porcupine by downloading it's Android demo application. The demo application allows you to test Porcupine on a variety of wake words in any environment.

voice-elements - :speaker: Web Component wrapper to the Web Speech API, that allows you to do voice recognition and speech synthesis using Polymer

  •    HTML

Web Component wrapper to the Web Speech API, that allows you to do voice recognition (speech to text) and speech synthesis (text to speech) using Polymer. Or download as ZIP.

SpeechKITT - 🗣 A flexible GUI for Speech Recognition

  •    Javascript

Speech KITT makes it easy to add a GUI to sites using Speech Recognition. Whether you are using annyang, a different library or webkitSpeechRecognition directly, KITT will take care of the GUI. Speech KITT provides a graphical interface for the user to start or stop Speech Recognition and see its current status. It can also help guide the user on how to interact with your site using their voice, providing instructions and sample commands. It can even be used to carry a natural conversation with the user, asking questions the user can answer with his voice, and then asking follow up questions.

annyang - :speech_balloon: Speech recognition for your site

  •    Javascript

A tiny javascript SpeechRecognition library that lets your users control your site with voice commands. annyang has no dependencies, weighs just 2 KB, and is free to use and modify under the MIT license.


Jovo Framework - Build cross-platform voice applications for Amazon Alexa and Google Home

  •    Javascript

Jovo is the first open source framework that lets you build voice apps for both Amazon Alexa and Google Assistant with only one code base. Besides cross-platform development, Jovo also offers a variety of integrations and easy prototyping capabilities.

Voice Conference Manager

  •    Java

Voice Conference Manager uses VoiceXML and CCXML to control speech recognition, text to speech, and voice biometrics for a telephone conference service. Say the names or numbers of people and VCM places them into the call. Can be hosted on public servers

juliusjs - A speech recognition library for the web

  •    Javascript

Try the live demo. JuliusJS is an opinionated port of Julius to JavaScript. It actively listens to the user to transcribe what they are saying through a callback.

Stephanie - Open-source platform built specifically for voice-controlled applications as well as to automate daily tasks imitating much of an virtual assistant's work

  •    Python

Stephanie is an open-source platform built specifically for voice-controlled application as well as to automate daily tasks imitating much of an virtual assistant's work. Use your voice to ask for information, update social networks, get weather updates, live football scores, movies information restaurant suggestions, writing a note, or even chit-chatting for fun, and many more.

autosub - Command-line utility for auto-generating subtitles for any video file

  •    Python

Autosub is a utility for automatic speech recognition and subtitle generation. It takes a video or an audio file as input, performs voice activity detection to find speech regions, makes parallel requests to Google Web Speech API to generate transcriptions for those regions, (optionally) translates them to a different language, and finally saves the resulting subtitles to disk. It supports a variety of input and output languages (to see which, run the utility with the argument --list-languages) and can currently produce subtitles in either the SRT format or simple JSON.

Google Speech Recognition Example

  •    

Google Speech Recognition contains a working example of application that uses google speech recognition API. App contains all necessary dlls to record, decode and send your voice request to google service and recieve a text representation of what you've said. It's developed i...

tensorflow-speech-recognition - 🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

  •    Python

Speech recognition using google's tensorflow deep learning framework, sequence-to-sequence neural networks. Replaces caffe-speech-recognition, see there for some background.

Simon - Speech Recognition and Dictation System

  •    C++

Simon is an open source speech recognition program that can replace your mouse and keyboard. The system is designed to be as flexible as possible and will work with any language or dialect. It is a real dictation system.

Mycroft - an Artificial intelligence for everyone

  •    Python

Mycroft is an Artificial intelligence for everyone. It uses open software to process natural language, determine your intent and take action. It can integrate a host of professional functions – Control scenes to conserve power, grant office access with your voice. It can control all of your media and devices with the sound of your voice. Adjust your thermostat, turn on your lights, water your lawn, play your favorite movie and lot more.

Deeplearning4J - Neural Net Platform in Java and Scala

  •    Java

Deeplearning4J is an open source, distributed neural net library written in Java and Scala. It integrates with Hadoop and Spark and runs on several backends that enable use of CPUs and GPUs. It provides versatile n-dimensional array class for Java and Scala.

stt-benchmark - speech to text benchmark framework

  •    Python

This is a minimalist and extensible framework for benchmarking different speech-to-text engines. It has been developed and tested on Ubuntu 18.04 with Python3.6. This framework has been developed by Picovoice as part of the project Cheetah. Cheetah is Picovoice's speech-to-text engine specifically designed for IoT applications. Deep learning has been the main driver in recent improvements in speech recognition. But due to stringent compute/storage limitations of IoT platforms it is most beneficial to the cloud-based engines. Picovoice's proprietary deep learning technology enables transferring these improvements to IoT platforms with much lower CPU/memory footprint. The goal is to be able to run Cheetah on any platform with a C Compiler and a few MB of memory.

VocalKit - Objective-C shim layer for Speech Recognition

  •    C

VocalKit is a wrapper for available open source Speech related packages. It's goal is to ease the development of voice recognition solutions for the iPhone by providing a nice, simple Objective-C API. Currently VocalKit is in an Alpha version and just wraps Pocket Sphinx. When enabled, it will post notifications for the recognized speech. It does not currently configure Pocket Sphinx programmatically, it just configures ps from a file. It also does not trigger speech processing when the user stops speaking. Anyone interested in helping is greatly appreciated.

Festvox - Builds New Synthetic Voices

  •    C++

The Festvox project aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice. Festvox is the base for most of the Speech Synthesis libraries.

Concrete Voice - Complete Text to Speech System

  •    

Concrete Voice is a text-to-speech solution using Microsoft text-to-speech technologies. I started this project because I could not find a quality text-to-speech program to use. The commercial products are embarrassing to think they would ask money for something I would not ev...

Julius - Large Vocabulary CSR Engine

  •    C

"Julius" is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. Based on word N-gram and context-dependent HMM, it can perform almost real-time decoding on most current PCs in 60k word dictation task.