ansj_seg - ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典

  •        99

best java chinese word seg !





Related Projects

ansj_seg - ansj??.ict???java??.?????????????ict. ????,????,????,???????

  •    Java

ansj??.ict???java??.?????????????ict. ????,????,????,???????

Awesome-Chinese-NLP - A curated list of resources for Chinese NLP 中文自然语言处理相关资料


BaiduLac by 百度 Baidu's open-source lexical analysis tool for Chinese, including word segmentation, part-of-speech tagging & named entity recognition.

deepnlp - Deep Learning NLP Pipeline implemented on Tensorflow

  •    Python

Deep Learning NLP Pipeline implemented on Tensorflow. Following the 'simplicity' rule, this project aims to use the deep learning library of Tensorflow to implement new NLP pipeline. You can extend the project to train models with your own corpus/languages. Pretrained models of Chinese corpus are distributed. Free RESTful NLP API are also provided. Visit for details. 下载预训练模型 If you install deepnlp via pip, the pre-trained models are not distributed due to size restriction. You can download full models for 'Segment', 'POS' en and zh, 'NER' zh, zh_entertainment, zh_o2o, 'Textsum' by calling the download function.


  •    C++

Davepy is a Chinese Pinyin Input Method Editor (IME), which supports smoothly converting from Chinese Pinyin to Chinese Hanzi. In which some statistic language modeling approaches are introduced and some NLP technique will be added into it continually.

jieba - 结巴中文分词

  •    Python

"Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module.

cnn-text-classification-tf-chinese - CNN for Chinese Text Classification in Tensorflow

  •    Python

This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post. It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.

gse - Go efficient text segmentation; support english, chinese, japanese and other. Go 语言高性能分词

  •    Go

Go efficient text segmentation; support english, chinese, japanese and other. Dictionary with double array trie (Double-Array Trie) to achieve, Sender algorithm is the shortest path based on word frequency plus dynamic programming.

nlp_fundamentals - 📘 Contains a series of hands-on notebooks for learning the fundamentals of NLP

  •    Jupyter

Natural language processing (NLP) has made substantial advances in the past few years due to the success of modern techniques that are based on deep learning. With the rise of the popularity of NLP and the availability of different forms of large-scale data, it is now even more imperative to understand the inner workings of NLP techniques and concepts, from first principles, as they find their way into real-world usage and applications that affect society at large. Building intuitions and having a solid grasp of concepts are both important for coming up with innovative techniques, improving research, and building safe, human-centered AI and NLP technologies. We introduce a new series called Fundamentals of NLP where we aim to teach about important NLP techniques and concepts starting from the first principles. We will introduce the theoretical aspect and motivation of each concept covered throughout the series. Then we will obtain hands-on experience by using bootstrap methods, industry-standard tools, and other open-source libraries to implement the different techniques. Along the way, we will also cover best practices, share important references, point out common mistakes to avoid when training and building NLP models, and discuss what lies ahead.


  •    Python

SegyMAT is a set of Matlab/Octave m-files to read and write SEG Y data following SEG Y Revision 0 and 1



This project is used to segment text into tokens according its context and semantic. the segment use front-maximum matching and CRF algorithms to split text.

duckling - Probabilistic parser

  •    Clojure

Duckling is a Clojure library that parses text into structured data: “the first Tuesday of October” => {:value "2014-10-07T00:00:00.000-07:00" :grain :day}<div class="doc-website-link"> <p>You can try it out at <a href=""></a></p></div>See our [blog post announcement]( for more context.Duckling is shipped with modules that parse temporal expressions i

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.