CLD - Language Detector library ported from Chrome browser

  •        0

This is a straight port from the CLD (Compact Language Detector) library embedded in Google's Chromium browser. The library detects the language from provided UTF8 text (plain text or HTML). It is implemented in C++, with very basic Python bindings.



comments powered by Disqus

Related Projects

S-Space - A scalable software library for semantic spaces

The S-Space Package is a collection of algorithms for building Semantic Spaces as well as a highly-scalable library for designing new distributional semantics algorithms. Distributional algorithms process text corpora and represent the semantic for words as high dimensional feature vectors.

Semantic Vectors - Creating and Searching Semantic Vector using Lucene

The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis. Other methods supported by the package include Latent Semantic Analysis (LSA) and Reflective Random Indexing. Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. This library is used in semantic analysis and text mining.

OpenCCG: The OpenNLP CCG Library

OpenCCG, the OpenNLP CCG Library, is a collection of natural language processing components and tools which provide support for parsing and realization with Combinatory Categorial Grammar (CCG).

Modular Audio Recognition Framework

MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.

OpenCog - Framework to build Artificial Intelligence Programs

The OpenCog Framework is a platform to build and share artificial intelligence programs. It includes components for procedural and declarative knowledge representation (AtomSpace), task scheduling (CogServer), AI algorithm containers (MindAgents), connectors to instant messaging and virtual world systems, and other components. MindAgents and other add-ons explore a wide variety of AI techniques including evolutionary program learning (MOSES), natural language processing, and others.


ImageMagick is a software suite to create, edit, and compose bitmap images. It can read, convert and write images in a variety of formats (over 100) including DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, and TIFF. Use ImageMagick to translate, flip, mirror, rotate, scale, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.


GraphicsMagick is the swiss army knife of image processing. It provides a robust and efficient collection of tools and libraries which support reading, writing, and manipulating an image in over 88 major formats including important formats like DPX, GIF, JPEG, JPEG-2000, PNG, PDF, PNM, and TIFF.


Valgrind is an award-winning instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. You can also use Valgrind to build new tools.


OpenNLP provides the organizational structure for coordinating several different projects which approach some aspect of Natural Language Processing. OpenNLP also defines a set of Java interfaces and implements some basic infrastructure for NLP compon

Helsinki Finite-State Technology

The Helsinki Finite-State Transducer toolkit is intended for processing natural language morphologies. The toolkit is demonstrated by wide-coverage implementations of a number of languages of varying morphological complexity.