•        0

This project is used to segment text into tokens according its context and semantic. the segment use front-maximum matching and CRF algorithms to split text.




comments powered by Disqus

Related Projects

S-Space - A scalable software library for semantic spaces

The S-Space Package is a collection of algorithms for building Semantic Spaces as well as a highly-scalable library for designing new distributional semantics algorithms. Distributional algorithms process text corpora and represent the semantic for words as high dimensional feature vectors.

Semantic Vectors - Creating and Searching Semantic Vector using Lucene

The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis. Other methods supported by the package include Latent Semantic Analysis (LSA) and Reflective Random Indexing. Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. This library is used in semantic analysis and text mining.

OpenCCG: The OpenNLP CCG Library

OpenCCG, the OpenNLP CCG Library, is a collection of natural language processing components and tools which provide support for parsing and realization with Combinatory Categorial Grammar (CCG).

X Simple Input Method

use xsim u can input chinese with pinyin, wubi and any input method added by you in X window

Modular Audio Recognition Framework

MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.

Wikipedia Miner Toolkit

The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.


MeCab is a fast and customizable Japanese morphological analyzer. MeCab is designed for generic purpose and applied to variety of NLP tasks, such as Kana-Kanji conversion. MeCab provides parameter estimation functionalities based on CRFs and HMM


Amine is a Multi-Layer Java Open Source Platform dedicated to the development of various kinds of Intelligent Systems (Knowledge-Based, Ontology-Based, Conceptual Graph Based, NLP, etc.) and Intelligent Agents. See: //amine-platform.sourceforge.net/

Sudokuki - essential sudoku game

Sudokuki is a free graphical SUDOKU game: Sudokuki solves even the most difficult sudoku grids for you - Generate a sudoku - Play sudoku - Print a sudoku... Available in 15 languages. Just download and play! You can also play with arabic or chinese numbers. Sudokuki is Free Software developed in Java. Have fun!

WenQuanYi (Spring of Letters)

This project aims to develop the most complete, standard compliant, high-quality Chinese (and CJKV) fonts and resources, including bitmap and outline fonts of various styles. We also develop web-based tools to facilitate online font-dev collaborations.