Displaying 1 to 12 from 12 results

Chinese-Word-Vectors - 100+ Chinese Word Vectors 上百种预训练中文词向量

  •    Python

This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse), context features (word, ngram, character, and more), and corpora. One can easily obtain pre-trained vectors with different properties and use them for downstream tasks. Moreover, we provide a Chinese analogical reasoning dataset CA8 and an evaluation toolkit for users to evaluate the quality of their word vectors.

Jiagu - Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

  •    Python

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类




SymSpell - 1 million times faster through Symmetric Delete spelling correction algorithm

  •    CSharp

Spelling correction & Fuzzy search: 1 million times faster through Symmetric Delete spelling correction algorithm The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.


sub-character-cws - Sub-Character Representation Learning

  •    Python

Codes and corpora for paper "Dual Long Short-Term Memory Networks for Sub-Character Representation Learning" (accepted at ITNG 2018). We proposed to learn character and sub-character level representations jointly for capturing deeper level of semantic meanings. When applied to Chinese Word Segmentation as a case example, our solution achieved state-of-the-art results on both Simplified and Traditional Chinese, without extra Traditional to Simplified Chinese conversion.

friso - High performance chinese tokenizer with both GBK and UTF-8 charset support developed by ANSI C

  •    C

Friso 是使用 c 语言开发的一款开源的高性能中文分词器,使用流行的mmseg算法实现。完全基于模块化设计和实现,可以很方便的植入其他程序中, 例如:MySQL,PHP,源码无需修改就能在各种平台下编译使用,加载完 20 万的词条,内存占用稳定为 14.5M.

PKUSeg-python - python版本:高准确度中文分词工具,简单易用,跟现有开源工具相比大幅提高了分词的准确率。

  •    Python

python版本:高准确度中文分词工具,简单易用,跟现有开源工具相比大幅提高了分词的准确率。






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.