gonlpir - Golang wapper for NLPIR/ICTCLAS2015.

  •        12

Golang wapper for NLPIR/ICTCLAS2015.




Related Projects

SymSpell - SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm

  •    CSharp

The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent. Lookup provides a very fast spelling correction of single words.

Chinese-Word-Vectors - 100+ Chinese Word Vectors 上百种预训练中文词向量

  •    Python

This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse), context features (word, ngram, character, and more), and corpora. One can easily obtain pre-trained vectors with different properties and use them for downstream tasks. Moreover, we provide a Chinese analogical reasoning dataset CA8 and an evaluation toolkit for users to evaluate the quality of their word vectors.

rmmseg-cpp - an re-implementation of rmmseg (Chinese word segmentation library for Ruby) in C++

  •    Ruby

an re-implementation of rmmseg (Chinese word segmentation library for Ruby) in C++

Thai word segmentation utility

  •    Ruby

Word segmentation utility for Thai language written in C

JVnSegmenter: Vietnamese Word Segmenter

  •    Java

JVnSegmenter is a Java-based and open-source Vietnamese word segmentation tool. The segmentation model was trained on about 8,000 sentences using Conditional Random Fields (FlexCRFs). This tool would be useful for Vietnamese NLP community.

NCRFpp - NCRF++, an Open-source Neural Sequence Labeling Toolkit

  •    Python

Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation. State-of-the-art sequence labeling models mostly utilize the CRF structure with input word features. LSTM (or bidirectional LSTM) is a popular deep learning based feature extractor in sequence labeling task. And CNN can also be used due to faster computation. Besides, features within word are also useful to represent word, which can be captured by character LSTM or character CNN structure or human-defined neural features. NCRF++ is a PyTorch based framework with flexiable choices of input features and output structures. The design of neural sequence labeling models with NCRF++ is fully configurable through a configuration file, which does not require any code work. NCRF++ is a neural version of CRF++, which is a famous statistical CRF framework.

wordvectors - Pre-trained word vectors of 30+ languages

  •    Python

This project has two purposes. First of all, I'd like to share some of my experience in nlp tasks such as segmentation or word vectors. The other, which is more important, is that probably some people are searching for pre-trained word vector models for non-English languages. Alas! English has gained much more attention than any other languages has done. Check this to see how easily you can get a variety of pre-trained English word vectors without efforts. I think it's time to turn our eyes to a multi language version of this. Nearing the end of the work, I happened to know that there is already a similar job named polyglot. I strongly encourage you to check this great project. How embarrassing! Nevertheless, I decided to open this project. You will know that my job has its own flavor, after all.

jieba - 结巴中文分词

  •    Python

"Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module.

gse - Go efficient text segmentation; support english, chinese, japanese and other. Go 语言高性能分词

  •    Go

Go efficient text segmentation; support english, chinese, japanese and other. Dictionary with double array trie (Double-Array Trie) to achieve, Sender algorithm is the shortest path based on word frequency plus dynamic programming.

sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.

  •    C++

SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements sub-word units (also known as wordpieces [Wu et al.] [Schuster et al.] and byte-pair-encoding (BPE) [Sennrich et al.]) with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.This is not an official Google product.

LightNet - LightNet: Light-weight Networks for Semantic Image Segmentation (Cityscapes and Mapillary Vistas Dataset)

  •    Python

This repository contains the code (in PyTorch) for: "LightNet: Light-weight Networks for Semantic Image Segmentation " (underway) by Huijun Liu @ TU Braunschweig. Semantic Segmentation is a significant part of the modern autonomous driving system, as exact understanding the surrounding scene is very important for the navigation and driving decision of the self-driving car. Nowadays, deep fully convolutional networks (FCNs) have a very significant effect on semantic segmentation, but most of the relevant researchs have focused on improving segmentation accuracy rather than model computation efficiency. However, the autonomous driving system is often based on embedded devices, where computing and storage resources are relatively limited. In this paper we describe several light-weight networks based on MobileNetV2, ShuffleNet and Mixed-scale DenseNet for semantic image segmentation task, Additionally, we introduce GAN for data augmentation[17] (pix2pixHD) concurrent Spatial-Channel Sequeeze & Excitation (SCSE) and Receptive Field Block (RFB) to the proposed network. We measure our performance on Cityscapes pixel-level segmentation, and achieve up to 70.72% class mIoU and 88.27% cat. mIoU. We evaluate the trade-offs between mIoU, and number of operations measured by multiply-add (MAdd), as well as the number of parameters.

robot-surgery-segmentation - Wining solution and its improvement for MICCAI 2017 Robotic Instrument Segmentation Sub-Challenge

  •    Jupyter

Here we present our wining solution and its improvement for MICCAI 2017 Robotic Instrument Segmentation Sub-Challenge. In this work, we describe our winning solution for MICCAI 2017 Endoscopic Vision Sub-Challenge: Robotic Instrument Segmentation and demonstrate further improvement over that result. Our approach is originally based on U-Net network architecture that we improved using state-of-the-art semantic segmentation neural networks known as LinkNet and TernausNet. Our results shows superior performance for a binary as well as for multi-class robotic instrument segmentation. We believe that our methods can lay a good foundation for the tracking and pose estimation in the vicinity of surgical scenes.

semantic-segmentation-pytorch - Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

  •    Python

This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset. This module differs from the built-in PyTorch BatchNorm as the mean and standard-deviation are reduced across all devices during training. The importance of synchronized batch normalization in object detection has been recently proved with a an extensive analysis in the paper MegDet: A Large Mini-Batch Object Detector, and we empirically find that it is also important for segmentation.

Arab Techies

  •    Javascript

A collection of open source libraries and tools that provide solutions for common problems in processing Arabic text, especially in web applications. text normalization, phrase segmentation, text indexing, stop word lists, common spelling mistakes.

Awesome-Chinese-NLP - A curated list of resources for Chinese NLP 中文自然语言处理相关资料


BaiduLac by 百度 Baidu's open-source lexical analysis tool for Chinese, including word segmentation, part-of-speech tagging & named entity recognition.