h2h_converter - Convert Sino-Korean words written in Hangul to Chinese characters, which is called hanja in Korean, using neural networks

  •        11

Around 2/3 of Korean words are Sino-Korean. For that reason, although the official script of the Korean language is Hangul, Chinese characters are still widely used. Converting Chinese characters (Hanja in Korean) to Hangul is trivial because most Hanjas have a single equivalent of Hangul. However, the reverse is not. There has been a project, UTagger, for Hangul-to-Hanja conversion. I use neural networks to tackle the task. The KRV Bible is in the public domain. I have refined it to our purpose. Each line is separated by a tab. Sino Korean words in the first sentence is written in Hanja in the second sentence (See below). Check data/bible_ko.tsv.

https://github.com/Kyubyong/h2h_converter

Tags
Implementation
License
Platform

   




Related Projects

???????? Internet Online Go ?? ?? ?? ?? ?? IGS android weiqi baduk ??

  •    

???????? Go (Japanese:??), known in Chinese as weiqi (simplified Chinese: ??; traditional Chinese: ??; pinyin: wéiqí; Wade-Giles: wei ch'i) and in Korean as baduk (hangul: ??), is an ancient board game for two players that is noted for being rich in strategy despite its simpl...

Traditional Chinese to Simplified Chinese converter

  •    

A python script to convert traditional Chinese text to simplified Chinese. A character relation table is included.

Pinyin4j.Net

  •    

Pinyin4j is a Java library supporting convertion between Chinese characters and most popular Pinyin systems. Moreover, the output format of pinyin could be customized. Pinyin4j.Net using C # language development.

toMOTko

  •    C++

toMOTko is a small flashcard application for learning foreign language vocabulary. It's specifically designed for people learning one or more foreign languages. Its unicode support is convenient for japanese, korean or chinese characters, etc. It's available for multiple platforms: Windows, MacOS, Linux, Zaurus, and some Nokia phones. The interface is available in French, English, Spanish, and partially in German and Japanese.


Rasa_NLU_Chi - Turn Chinese natural language into structured data 中文自然语言理解

  •    Python

For training, please build the MITIE Wordrep Tool. Note that Chinese corpus should be tokenized first before feeding into the tool for training. Close-domain corpus that best matches user case works best. A trained model from Chinese Wikipedia Dump and Baidu Baike can be downloaded from 中文Blog.

Hong Zi

  •    

No more poor man's chinese TeX! Now a full fledged Chinese Metafont shall be available... with your help. Hong Zi is (OK, OK, shall be) a set of Metafont programs and TeX scripts which creates chinese characters so as you may use them in your documents.

sunpinyin - A statistical language model based Chinese input method

  •    C++

SunPinyin is an SLM (Statistical Language Model) based input method engine. To model the Chinese language, it uses a backoff bigram and trigram language model. Currently, SunPinyin 2.0 is available on IBus, SCIM, and as a standalone XIM Server.

Econ NetVert

  •    CSharp

Econ NetVert is a free .NET sourcecode language converter to convert between C# and Visual Basic .NET. Conversion of simple statements, codefiles, ASP.NET files or Visual Studio projects is supported. .NET 1.1 and .NET 2.0 Language Features support. Econ NetVert contains a GUI...

friso

  •    

Friso is a Chinese tokenizer developed in C. It uses the popular mmseg algorithm to tokenize the Chinese characters.

pangu.js - 為什麼你們就是不能加個空格呢?

  •    Javascript

Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols).

HanYu Dictionary

  •    Java

HanYu Dictionary is a small, simple and easy to use 2 ways Chinese-English dictionary. It also supports pinyin and tradition Chinese characters search. It uses CEDict as a dictionary definition.

Dragon Character Training

  •    Java

Dragon Character Training is a PalmOS program using stroke recognition to help you learn to read and write Chinese characters. It is good for learning Mandarin vocabulary and characters.

MyChineseFlashCards

  •    Java

MyChineseFlashCards goal is to learn, hear and access the 1000 most used chinese characters. The user can navigate through the characters (flash cards) in different orders and choose between three learning strategies( learning, recognization ,writing).

sdlpal - SDL-based reimplementation of the classic Chinese-language RPG "Xiān jiàn Qí Xiá Zhuàn" (also known as PAL)

  •    Objective-C

SDLPAL is an SDL-based open-source cross-platform reimplementation of the classic Chinese RPG game Xiān jiàn Qí Xiá Zhuàn (Chinese: 仙剑奇侠传/仙劍奇俠傳) (also known as Chinese Paladin or Legend of Sword and Fairy, or PAL for short). SDLPAL is originally created by Wei Mingzhi from 2009. Now it is owned by the SDLPAL development team. Please see AUTHORS for full author list.

ODF-UOF Converter

  •    Java

ODF-UOF Converter provides a way to convert the docs(text/spreadsheet/presentation) between Open Document Format for Office Application(ODF) and Chinese office file format based on XML(UOF).

Dictionary Lookup Tool

  •    CSharp

Dictionary tool to assist Chinese and Japanese language learners when viewing web pages or other text documents. Automatically looks up and translates text on the clipboard. Requires .NET Framework and language support for Chinese/Japanese