Displaying 1 to 10 from 10 results

Semantic Vectors - Creating and Searching Semantic Vector using Lucene


The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis. Other methods supported by the package include Latent Semantic Analysis (LSA) and Reflective Random Indexing. Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. This library is used in semantic analysis and text mining.

S-Space - A scalable software library for semantic spaces


The S-Space Package is a collection of algorithms for building Semantic Spaces as well as a highly-scalable library for designing new distributional semantics algorithms. Distributional algorithms process text corpora and represent the semantic for words as high dimensional feature vectors.

Language Detection - Language Detection Library in Java


This is a language detection library implemented in plain Java. It detects language of a text using naive Bayesian filter. It is 99% over precision for 53 languages.

CLD - Language Detector library ported from Chrome browser


This is a straight port from the CLD (Compact Language Detector) library embedded in Google's Chromium browser. The library detects the language from provided UTF8 text (plain text or HTML). It is implemented in C++, with very basic Python bindings.

Jobads-opensire - Meaning-aware text mining on job ads (open source snippets from the sire-project.e


This project will hold the open source parts of the SIRE project (namely a 'word space' based dictionnary of skills and occupations and the tools to consult it). Planned release : mid 2011 It is a part of the PhD thesis I'm currently writing in computational linguistics ("the semantics of job ads" - Paris University 10) For more information, you're free to contact me through http://www.purl.org/net/romainloth

Clip-lm - Decision tree-based syntactic language model


This is a decision tree-based language model that can (optionally) utilize syntactic tags (e.g., part-of-speech tags) as well as other information, such as morphological feature, prosody, etc.

Book-reader - Automated Book Reader for the Visually challenged


A project that will open the world of all printed books to the visually challenged. Our bold promise: By 15th August 2010, a readily downloadable book reader for Tamil and Kannada languages will be made available, which will have an accuracy rate of correct pronunciation of at least 95% of the words. (Pre-requisite hardware would be desktop, scanner & speaker).

Python-depparse - Dependency parsers written in Python


This project aims at providing readable, usable dependency parsers for natural language text (or any other domain where suitable labeled training data are available). The project includes implementations of a maximum spanning tree (MST) parser and a stack-based, shift-reduce parser. Python's multiprocessing module is used to provide data parallelism on multicore machines, so Python 2.6+ is required. Other than that, the code is self-contained ; that is, no special machine learning or parsing lib

Oa4j - Java Client for OpenAmplify


OverviewOA4J is a java client for Version 2.1 of the OpenAmplify web service. It requires java 1.6+. It needs an API key to be used, this is free and takes a couple of minutes to obtain. InstallationAdd oa4j-x.x.x.jar to your classpath, Java 1.6+ is required. Usageimport static java.lang.System.out;import java.net.URL;import com.linguamathematica.oa4j.Analysis;import com.linguamathematica.oa4j.AnalysisService;import com.linguamathematica.oa4j.DefaultAnalysisService;public class Test{\tpublic sta

Tikka-postagger - Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Lo


An unsupervised Bayesian postagger. See http://aclweb.org/anthology-new/D/D10/D10-1020.pdf for details.