The primary purpose of this benchmark framework is to provide a scientific comparison between different wake-word detection engines in terms of accuracy and runtime metrics. Currently, the framework is configured for Alexa as the test wake-word. But it can be configured for any other wake-words as described here. Common Voice is used as background dataset, i.e., dataset without utterances of the wake-word. It can be downloaded from here. Only recordings with at least two up-votes and no down-votes are used (this reduces the size of the dataset to ~125 hours).