homoglyphs - Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group

  •        31

Homoglyphs -- python library for getting homoglyphs and converting to ASCII. Categories -- (aliases from ISO 15924).

https://github.com/orsinium/homoglyphs

Tags
Implementation
License
Platform

   




Related Projects

mimic - [ab]using Unicode to create tragedy

  •    Python

There are many more characters in the Unicode character set that look, to some extent or another, like others – homoglyphs. Mimic substitutes common ASCII characters for obscure homoglyphs.

UTF8-CPP - UTF-8 with C++ in a Portable Way

  •    C++

UTF8-CPP is a small generic library to handle UTF-8 encoded Unicode strings.

utf8

  •    Javascript

utf8.js is a well-tested UTF-8 encoder/decoder written in JavaScript. Unlike many other JavaScript solutions, it is designed to be a proper UTF-8 encoder/decoder: it can encode/decode any scalar Unicode code point values, as per the Encoding Standard. Here’s an online demo.A string representing the semantic version number.

Awesome-Unicode - :joy: :ok_hand: A curated list of delightful Unicode tidbits, packages and resources

  •    Javascript

A curated list of delightful Unicode tidbits, packages and resources.Please read the contribution guidelines before contributing. Key Unicode terminology is defined in the glossary.


TCPDF - PHP class for generating PDF

  •    PHP

TCPDF is a PHP class for generating PDF documents without requiring external extensions. TCPDF Supports UTF-8, Unicode, RTL languages, XHTML, Javascript, digital signatures, barcodes and much more.

jsesc - Given some data, jsesc returns the shortest possible stringified & ASCII-safe representation of that data

  •    Javascript

For any input, jsesc generates the shortest possible valid printable-ASCII-only output. Here’s an online demo.jsesc’s output can be used instead of JSON.stringify’s to avoid mojibake and other encoding issues, or even to avoid errors when passing JSON-formatted data (which may contain U+2028 LINE SEPARATOR, U+2029 PARAGRAPH SEPARATOR, or lone surrogates) to a JavaScript parser or an UTF-8 encoder.

utf8proc - a clean C library for processing UTF-8 Unicode data

  •    C

utf8proc is a small, clean C library that provides Unicode normalization, case-folding, and other operations for data in the UTF-8 encoding. It was initially developed by Jan Behrens and the rest of the Public Software Group, who deserve nearly all of the credit for this package. With the blessing of the Public Software Group, the Julia developers have taken over development of utf8proc, since the original developers have moved to other projects. The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included LICENSE.md file for more detailed information.

utf8proc - a clean C library for processing UTF-8 Unicode data

  •    C

utf8proc is a small, clean C library that provides Unicode normalization, case-folding, and other operations for data in the UTF-8 encoding. It was initially developed by Jan Behrens and the rest of the Public Software Group, who deserve nearly all of the credit for this package. With the blessing of the Public Software Group, the Julia developers have taken over development of utf8proc, since the original developers have moved to other projects. The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included LICENSE.md file for more detailed information.

art - 🎨 ASCII Art Library For Python

  •    Python

ASCII art is also known as "computer text art". It involves the smart placement of typed special characters or letters to make a visual shape that is spread over multiple lines of text. set_default function is added in Version 2.2 in order to change default values.

python-unicodecsv - Python2's stdlib csv module is nice, but it doesn't support unicode

  •    Python

The unicodecsv is a drop-in replacement for Python 2.7's csv module which supports unicode strings without a hassle. Supported versions are python 2.6, 2.7, 3.3, 3.4, 3.5, and pypy 2.4.0. Python 2's csv module doesn't easily deal with unicode strings, leading to the dreaded "'ascii' codec can't encode characters in position ..." exception.

camomile

  •    

Camomile is a Unicode library for ocaml. Camomile provides Unicode character type, UTF-8, UTF-16, UTF-32 strings, conversion to/from about 200 encodings, collation and locale-sensitive case mappings, and more.

branchless-utf8 - Branchless UTF-8 decoder

  •    C

Branchless UTF-8 decoder

langid.py - Stand-alone language identification system

  •    Python

langid.py is a standalone Language Identification (LangID) tool. All that is required to run langid.py is >= Python 2.7 and numpy. The main script langid/langid.py is cross-compatible with both Python2 and Python3, but the accompanying training tools are still Python2-only.

zotero-better-bibtex - Make Zotero effective for us LaTeX holdouts

  •    TypeScript

This extension aims to make Zotero (and soon Juris-M) effective for us text-based authoring holdouts; currently, that translates to the LaTeX/Markdown crowd. To get started, read the Installation instructions. At its core, it behaves like any Zotero import/export module; anywhere you can export or import bibliography items in Zotero, you'll find Better Bib(La)TeX listed as one of the choices. If nothing else, you could keep your existing workflow as-is, and just enjoy the improved LaTeX ↔ unicode translation on import and export and more accurate field mapping. Zotero does all its work in UTF-8 Unicode, which is absolutely the right thing to do. Unfortunately, for those shackled to BibTeX and who cannot (yet) move to BibLaTeX, unicode is a major PITA. Also, Zotero supports some simple HTML markup in your references that Bib(La)TeX won't understand.

Image to Text Art (HTML Art, Unicode Art, Ascii Art)

  •    

Image to Text Art is a class library, WinForms project & example Asp.Net site that turns images supported by the bitmap class into HTML art, Unicode art & ASCII art.

WARTS

  •    Java

WARTS is a pure Java database utility that can perform character-encoding aware data synchronization. It was developed to correctly transfer non-ascii characters in an Oracle database that used ascii encoding to a UTF-8 Oracle instance.

goquery - A little like that j-thing, only in Go.

  •    Go

goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/html package and the CSS Selector library cascadia. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), detach()) have been left off.Also, because the net/html parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML. See the wiki for various options to do this.

JiLetters - kids can learn the alphabet

  •    Python

JILetters is a small python based application to help young children to learn letters of the alphabet (western), building on the application lletters. When a letter of the alphabet is pressed, the child is presented with a graphic depicting that lett

python3_with_pleasure - A short guide on features of Python 3

  •    

Python became a mainstream language for machine learning and other scientific fields that heavily operate with data; it boasts various deep learning frameworks and well-established set of tools for data processing and visualization. However, Python ecosystem co-exists in Python 2 and Python 3, and Python 2 is still used among data scientists. By the end of 2019 the scientific stack will stop supporting Python2. As for numpy, after 2018 any new feature releases will only support Python3.