CJK Decomposition Data

  •        117

The CJK Decomposition Data File is a graphical analysis of the approx 75,000 Chinese/Japanese characters in Unicode.

http://cjkdecomp.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

ordb-unihan - An ORM for the published Unihan database


An ORM for the published Unihan database

Unihan-Normalize - Process normalize, and store the Unihan database


Process normalize, and store the Unihan database

Unicode-Unihan - Release history of Unicode-Unihan


Release history of Unicode-Unihan

budou - Budou is an automatic organizer tool for beautiful line breaking in CJK (Chinese, Japanese, and Korean)


English uses spacing and hyphenation as cues to allow for beautiful and legible line breaks. Certain CJK languages have none of these, and are notoriously more difficult. Breaks occur randomly, usually in the middle of a word. This is a long standing issue in typography on web, and results in degradation of readability.Budou automatically translates CJK sentences into organized HTML code with lexical chunks wrapped in non-breaking markup so as to semantically control line breaks. Budou uses Google Cloud Natural Language API (NL API) to analyze the input sentence, and it concatenates proper words in order to produce meaningful chunks utilizing part-of-speech (pos) tagging and syntactic information. Processed chunks are wrapped with SPAN tag, so semantic units will no longer be split at the end of a line by specifying their display property as inline-block in CSS.

unihan_extractor - Extract selected character data from unihan characters


Extract selected character data from unihan characters



unihan_utils - Unicode Unihan Database Utility


Unicode Unihan Database Utility

wcwidth-cjk - Run command with CJK-friendly wcwidth(3) to fix ambiguous width chars


Run command with CJK-friendly wcwidth(3) to fix ambiguous width chars

Lingua-CJK-Tokenizer - Release history of Lingua-CJK-Tokenizer


Release history of Lingua-CJK-Tokenizer

Encode-Detect-CJK - Release history of Encode-Detect-CJK


Release history of Encode-Detect-CJK

RST-Tables-CJK - Allows to create and edit restructuredText tables easily (CJK)


Allows to create and edit restructuredText tables easily (CJK)

IME-Test - Windows IME sample code for CJK text input.


Windows IME sample code for CJK text input.

lucene-lastuni - Lucene CJK analyzer that tokenize last character as uni-gram.


Lucene CJK analyzer that tokenize last character as uni-gram.

php-mpdf - A PHP class to generate PDF files from HTML with Unicode/UTF-8 and CJK support


A PHP class to generate PDF files from HTML with Unicode/UTF-8 and CJK support

fonts-TTF-hanazono - Free TrueType fonts for CJK ideograms


Free TrueType fonts for CJK ideograms

hwr - Handwriting recognition tools for CJK Languages


Handwriting recognition tools for CJK Languages

Codecs33 - CJK library files missing in the embedded Python of Sublime Text 3


CJK library files missing in the embedded Python of Sublime Text 3

Codecs26 - CJK library files missing in the embedded Python of Sublime Text 2


CJK library files missing in the embedded Python of Sublime Text 2

PagesFontSetter - Setting the font for CJK or Latin text separately.


Setting the font for CJK or Latin text separately.