hangulize2 - A Reboot of Hangulize

  •        20

외국어의 한글 표기 체계가 제대로 서려면 일반인이 외국어를 한글로 표기하고 싶을 때 바로바로 쉽게 용례를 찾을 수 있어야 한다. 정기적으로 회의를 열어 용례를 정하는 것으로는 한계가 있다. 외래어 표기 심의 방식이 자동화되어 한글로 표기하고 싶은 외국어를 입력하자마자 한글 표기가 나와야 한다. 이미 용례가 정해진 것은 그것을 따르고 용례에 없는 것이라도 각 언어의 표기 규칙에 따라 권장 표기를 표시해야 한다. 프로그래머들과 언어학자들이 손잡고 연구한다면 이게 공상으로만 그치지 않을 것이다. Hangulize 2는 외래어를 한글로 변환해주는 도구입니다.




Related Projects

lingo - Linguistics module for Node - inflection, transformation, i18n and more

  •    Javascript

Lingo is a linguistics module, currently providing inflection and some string transformations. Eventually I would like to extend its capabilities and add additional languages.Can be viewed here.

linguistics - A generic, language-neutral framework for extending Ruby objects with linguistic methods

  •    Ruby

Linguistics is a framework for building linguistic utilities for Ruby objects in any language. It includes a generic language-independant front end, a module for mapping language codes into language names, and a module which contains various English-language utilities. The Linguistics module comes with a language-independant mechanism for extending core Ruby classes with linguistic methods.

Korean Table

  •    CSharp

Korean Table is a memory trainer using ancient Korean method based on showing the colored stones on the colored table for a moment and remembering its position and color.

twitter-korean-text - Korean tokenizer

  •    Scala

Scala library to process Korean text

Tools for Field Linguistics

  •    Java

This site is devoted to the collaborative creation of tools, protocols and procedures for field linguistics and language analysis. We are especially interested in tools for annotating or manipulating text, audio and video-based language archives.

A set of linguistics tools

  •    Shell

Linguistico is a linguistics tools project based on Italian language. Tools are: dictionaries, thesaurus, words definitions, scripts, programs, ... For: OpenOffice.org LibreOffice ThunderBird Mozilla FireFox - MySpell MyThes Aspell HunSpell

pynlpl - PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing

  •    Python

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation). The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

Parsimmon - Parsimmon is a wee linguistics toolkit for iOS written in Swift.

  •    Swift

Parsimmon is a wee linguistics toolkit for iOS written in Swift. We currently support Swift 2.0. If you are looking for Objective-C, please use version 0.3.4 or earlier.

nlp-with-ruby - Practical Natural Language Processing done in Ruby.

  •    Ruby

This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.

OLS Transcription Project


The primary goal of the OLS Transcription Project (olstrans) is to provide high-quality, technically-accurate transcripts of the audio recordings provided by the OLS staff.

pangu.js - 為什麼你們就是不能加個空格呢?

  •    Javascript

Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols).


  •    Java

The term quot;question-answer paradigmquot; (QAP) is used in psycho-linguistics to refer to a familiar mode of human discourse. QAParadigm facilitates a common QAP experiment that involves a visual display, then a verbal question about the display.

Field Linguistics Tools


This is a project to convert linguistic field data into other usable formats.


  •    Java

Emdros is a corpus query system for storage and retrieval of linguistic analyses of text. It is especially applicable in corpus linguistics dealing with syntax, morphology, phonology, and/or discourse. It is also a generally useful text database engine.

toki pona - corpus linguistic tools and various experiments


Various mini-linguistics tools targeting toki pona, a very small fake language. Of potential interest to hobby linguists and conlang enthusiasts.


  •    Javascript

Linguistics module for Node - inflection, transformation, i18n and more


  •    C++

Seman is a set of linguistic tools to analyze Russian or German texts, it contains lexicons and grammars. The project is interesting as a base line for many research projects in computer linguistics area.

KH Coder

  •    Perl

KH Coder is a free software for quantitative content analysis or text data mining. It is also utilized for computational linguistics. You can analyze Japanese, English, French, German, Italian, Portuguese and Spanish text with KH Coder. KH Coder provides various kinds of search and statistical analysis functions using back-end tools such as Stanford POS Tagger, Snowball stemmer, MySQL and R.

ONZE Miner

  •    Java

NB: ONZE Miner has been renamed LaBB-CAT, and active support has been moved to another sourceforge project: http://labbcat.sourceforge.net ONZE Miner was a browser-based linguistics research tool that stores audio recordings and regular-expression searchable text transcripts of interviews. The search results, entire transcripts, and media, can be viewed or exported in a variety of format