This is a language detection library implemented in plain Java. It detects language of a text using naive Bayesian filter. It is 99% over precision for 53 languages.
language-identification language-detection natural-language-processing nlpjchardet is a java port of the source from mozilla's automatic charset detection algorithm.
language-identification language-detection text-catagorization internationalization charset charset-detectionTextCat written in Perl helps to identify 69 natural langauge.
language-identification language-detection text-catagorizationjuniversalchardet is a Java port of "universalchardet", that is the encoding detector library of Mozilla.
charset charset-detection encoding language-identification language-detectionThis project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of fastText. Classification is done by embedding each word, taking the mean embedding over the full text and classifying that using a linear classifier. The embedding is trained with the classifier. You can also specify to use 2+ character ngrams. These ngrams get hashed then embedded in a similar manner to the orginal words. Note, ngrams make training much slower but only make marginal improvements in performance, at least in English.
fasttext tensorflow language-identification text-classifierMassively multilingual, easy language identification. It currently works on 380 languages. You can add the --help command-line flag to see more options. Using Python3 gives much better performance than Python2 for this task.
language-identification bad-pun
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.