h2h_converter - Convert Sino-Korean words written in Hangul to Chinese characters, which is called hanja in Korean, using neural networks

  •        427

Around 2/3 of Korean words are Sino-Korean. For that reason, although the official script of the Korean language is Hangul, Chinese characters are still widely used. Converting Chinese characters (Hanja in Korean) to Hangul is trivial because most Hanjas have a single equivalent of Hangul. However, the reverse is not. There has been a project, UTagger, for Hangul-to-Hanja conversion. I use neural networks to tackle the task. The KRV Bible is in the public domain. I have refined it to our purpose. Each line is separated by a tab. Sino Korean words in the first sentence is written in Hanja in the second sentence (See below). Check data/bible_ko.tsv.

https://github.com/Kyubyong/h2h_converter

Tags
Implementation
License
Platform

   




Related Projects

???????? Internet Online Go ?? ?? ?? ?? ?? IGS android weiqi baduk ??

  •    

???????? Go (Japanese:??), known in Chinese as weiqi (simplified Chinese: ??; traditional Chinese: ??; pinyin: wéiqí; Wade-Giles: wei ch'i) and in Korean as baduk (hangul: ??), is an ancient board game for two players that is noted for being rich in strategy despite its simpl...

Traditional Chinese to Simplified Chinese converter

  •    

A python script to convert traditional Chinese text to simplified Chinese. A character relation table is included.

Pinyin4j.Net

  •    

Pinyin4j is a Java library supporting convertion between Chinese characters and most popular Pinyin systems. Moreover, the output format of pinyin could be customized. Pinyin4j.Net using C # language development.

toMOTko

  •    C++

toMOTko is a small flashcard application for learning foreign language vocabulary. It's specifically designed for people learning one or more foreign languages. Its unicode support is convenient for japanese, korean or chinese characters, etc. It's available for multiple platforms: Windows, MacOS, Linux, Zaurus, and some Nokia phones. The interface is available in French, English, Spanish, and partially in German and Japanese.


Rasa_NLU_Chi - Turn Chinese natural language into structured data 中文自然语言理解

  •    Python

For training, please build the MITIE Wordrep Tool. Note that Chinese corpus should be tokenized first before feeding into the tool for training. Close-domain corpus that best matches user case works best. A trained model from Chinese Wikipedia Dump and Baidu Baike can be downloaded from 中文Blog.

Hong Zi

  •    

No more poor man's chinese TeX! Now a full fledged Chinese Metafont shall be available... with your help. Hong Zi is (OK, OK, shall be) a set of Metafont programs and TeX scripts which creates chinese characters so as you may use them in your documents.

khaiii - Kakao Hangul Analyzer III

  •    Python

khaiii는 "Kakao Hangul Analyzer III"의 첫 글자들만 모아 만든 이름으로 카카오에서 개발한 세 번째 형태소분석기입니다. 두 번째 버전의 형태소분석기 이름인 dha2 (Daumkakao Hangul Analyzer 2)를 계승한 이름이기도 합니다. 기존 버전이 사전과 규칙에 기반해 분석을 하는 데 반해 khaiii는 데이터(혹은 기계학습) 기반의 알고리즘을 이용하여 분석을 합니다. 학습에 사용한 코퍼스는 국립국어원에서 배포한 21세기 세종계획 최종 성과물을 저희 카카오에서 오류를 수정하고 내용을 일부 추가하기도 한 것입니다.

sunpinyin - A statistical language model based Chinese input method

  •    C++

SunPinyin is an SLM (Statistical Language Model) based input method engine. To model the Chinese language, it uses a backoff bigram and trigram language model. Currently, SunPinyin 2.0 is available on IBus, SCIM, and as a standalone XIM Server.

Econ NetVert

  •    CSharp

Econ NetVert is a free .NET sourcecode language converter to convert between C# and Visual Basic .NET. Conversion of simple statements, codefiles, ASP.NET files or Visual Studio projects is supported. .NET 1.1 and .NET 2.0 Language Features support. Econ NetVert contains a GUI...

friso

  •    

Friso is a Chinese tokenizer developed in C. It uses the popular mmseg algorithm to tokenize the Chinese characters.

pangu.js - 為什麼你們就是不能加個空格呢?

  •    Javascript

Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols).

HanYu Dictionary

  •    Java

HanYu Dictionary is a small, simple and easy to use 2 ways Chinese-English dictionary. It also supports pinyin and tradition Chinese characters search. It uses CEDict as a dictionary definition.

Dragon Character Training

  •    Java

Dragon Character Training is a PalmOS program using stroke recognition to help you learn to read and write Chinese characters. It is good for learning Mandarin vocabulary and characters.

MyChineseFlashCards

  •    Java

MyChineseFlashCards goal is to learn, hear and access the 1000 most used chinese characters. The user can navigate through the characters (flash cards) in different orders and choose between three learning strategies( learning, recognization ,writing).

sdlpal - SDL-based reimplementation of the classic Chinese-language RPG "Xiān jiàn Qí Xiá Zhuàn" (also known as PAL)

  •    Objective-C

SDLPAL is an SDL-based open-source cross-platform reimplementation of the classic Chinese RPG game Xiān jiàn Qí Xiá Zhuàn (Chinese: 仙剑奇侠传/仙劍奇俠傳) (also known as Chinese Paladin or Legend of Sword and Fairy, or PAL for short). SDLPAL is originally created by Wei Mingzhi from 2009. Now it is owned by the SDLPAL development team. Please see AUTHORS for full author list.

ODF-UOF Converter

  •    Java

ODF-UOF Converter provides a way to convert the docs(text/spreadsheet/presentation) between Open Document Format for Office Application(ODF) and Chinese office file format based on XML(UOF).






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.