S-Space - A scalable software library for semantic spaces
The S-Space Package is a collection of algorithms for building Semantic Spaces as well as a highly-scalable library for designing new distributional semantics algorithms. Distributional algorithms process text corpora and represent the semantic for words as high dimensional feature vectors.
http://code.google.com/p/airhead-research/
comments powered by Disqus
Related Products
Semantic Vectors - Creating and Searching Semantic Vector using Lucene
The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis. Other methods supported by the package include Latent Semantic Analysis (LSA) and Reflective Random Indexing. Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. This library is used in semantic analysis and text mining.
OpenCog - Framework to build Artificial Intelligence Programs
The OpenCog Framework is a platform to build and share artificial intelligence programs. It includes components for procedural and declarative knowledge representation (AtomSpace), task scheduling (CogServer), AI algorithm containers (MindAgents), connectors to instant messaging and virtual world systems, and other components. MindAgents and other add-ons explore a wide variety of AI techniques including evolutionary program learning (MOSES), natural language processing, and others.
CLD - Language Detector library ported from Chrome browser
This is a straight port from the CLD (Compact Language Detector) library embedded in Google's Chromium browser. The library detects the language from provided UTF8 text (plain text or HTML). It is implemented in C++, with very basic Python bindings.
Language Detection - Language Detection Library in Java
This is a language detection library implemented in plain Java. It detects language of a text using naive Bayesian filter. It is 99% over precision for 53 languages.
OpenPipe - Document Pipeline
OpenPipe is an open source scalable platform for manipulating a stream of documents. A pipeline is an ordered set of steps / operations performed on a document to convert from its raw form to something ready to be put into the index.
The operations performed on documents include language detection, field manipulation, POS tagging, entity extraction or submitting the document to a search engine.
Gate - General Architecture for Text Engineering
GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.
Carrot2 - Search Results Clustering Engine
Carrot2 is an Open Source Search Results Clustering Engine. It could cluster the search results from various sources and generates small collection of documents. Carrot2 offers ready-to-use components for fetching search results from various sources including YahooAPI, GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, Google Desktop and more.
Constellio - Enterprise Search engine
Constellio Open Source Enterprise Search is based on Apache Solr and using Google Search Appliances connectors architecture, it allows, with a single click, to find all relevant content in your organization (Web, email, ECM, CRM etc.).
Solr
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
ElasticSearch
ElasticSearch is an Open Source (Apache 2 license), distributed, RESTful Search Engine built for the cloud.