Displaying 1 to 6 from 6 results

Language Detection - Language Detection Library in Java

  •    Java

This is a language detection library implemented in plain Java. It detects language of a text using naive Bayesian filter. It is 99% over precision for 53 languages.




tensorflow_fasttext - Simple embedding based text classifier inspired by fastText, implemented in tensorflow

  •    Python

This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of fastText. Classification is done by embedding each word, taking the mean embedding over the full text and classifying that using a linear classifier. The embedding is trained with the classifier. You can also specify to use 2+ character ngrams. These ngrams get hashed then embedded in a similar manner to the orginal words. Note, ngrams make training much slower but only make marginal improvements in performance, at least in English.

witch-language - Easy language identification of 380 languages

  •    Python

Massively multilingual, easy language identification. It currently works on 380 languages. You can add the --help command-line flag to see more options. Using Python3 gives much better performance than Python2 for this task.