chinese-seg - A Chinese Text Segmentation module with some build-in plugins.

  •        17

This module is inspired by node-segment, but re-write from scratch. Code is still under heavy development. DO NOT USE IT NOW.

https://github.com/jacobbubu/chinese-seg

Dependencies:

errno : ~0.1.0
hanzenkaku : ~1.0.1
xtend : ~2.1.2
deep-equal : ~0.2.1

Tags
Implementation
License
Platform

   




Related Projects

Chinese-Word-Vectors - 100+ Chinese Word Vectors 上百种预训练中文词向量

  •    Python

This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse), context features (word, ngram, character, and more), and corpora. One can easily obtain pre-trained vectors with different properties and use them for downstream tasks. Moreover, we provide a Chinese analogical reasoning dataset CA8 and an evaluation toolkit for users to evaluate the quality of their word vectors.


jieba - 结巴中文分词

  •    Python

"Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module.

gse - Go efficient text segmentation; support english, chinese, japanese and other. Go 语言高性能分词

  •    Go

Go efficient text segmentation; support english, chinese, japanese and other. Dictionary with double array trie (Double-Array Trie) to achieve, Sender algorithm is the shortest path based on word frequency plus dynamic programming.

SymSpell - SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm

  •    CSharp

The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent. Lookup provides a very fast spelling correction of single words.

Awesome-Chinese-NLP - A curated list of resources for Chinese NLP 中文自然语言处理相关资料

  •    

BaiduLac by 百度 Baidu's open-source lexical analysis tool for Chinese, including word segmentation, part-of-speech tagging & named entity recognition.

rmmseg-cpp - an re-implementation of rmmseg (Chinese word segmentation library for Ruby) in C++

  •    Ruby

an re-implementation of rmmseg (Chinese word segmentation library for Ruby) in C++

node-segment - 基于Node.js的中文分词模块

  •    Javascript

Chinese word segmentation 中文分词模块

OpenCC - A project for conversion between Traditional and Simplified Chinese

  •    C++

Open Chinese Convert (OpenCC, 開放中文轉換) is an opensource project for conversion between Traditional Chinese and Simplified Chinese, supporting character-level conversion, phrase-level conversion, variant conversion and regional idioms among Mainland China, Taiwan and Hong kong.

Simplified Chinese Mozilla and Firefox

  •    

This project is to build Chinese Simplified Mozilla and Firefox for Chinese users. And also include the Chinese Simplified version add-ons of Mozilla and Firefox.

Traditional Chinese to Simplified Chinese converter

  •    

A python script to convert traditional Chinese text to simplified Chinese. A character relation table is included.





We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.