juniversalchardet - Java port of universalchardet, that is the encoding detector library of Mozilla.

  •        26

juniversalchardet is a Java port of "universalchardet", that is the encoding detector library of Mozilla.

https://github.com/albfernandez/juniversalchardet
https://code.google.com/archive/p/juniversalchardet/

Tags
Implementation
License
Platform

   




Related Projects

Charset detector

  •    Delphi

Library for automatic charset detection of a given text or file. Input buffer will be analysed to guess used encoding. The result (charset name or code page id) can be used as control parameter for charset conversation. Make your programs Unicode aware!

UIMA - Unstructured information management architecture

  •    Java

UIMA analyzes large volumes of unstructured information in order to discover knowledge that is relevant to an end user. It is a framework with different set of components. The components include Language Identification, Language specific segmentation, Sentence boundary detection, Entity detection (person/place names) etc. The framework manages these components and the data flows between them.

Java port of Mozilla charset detector

  •    Java

Java port of Mozillaamp;#39;s automatic charset detection algorithm. See... lt;a href=quot;http://www.mozilla.org/projects/intl/chardet.htmlquot;gt; http://www.mozilla.org/projects/intl/chardet.html lt;/agt;for the details of the orginal code and Author.

franc - Natural language detection

  •    Javascript

Detect the language of text.† - Based on the UDHR, the most translated document in the world.


modernish - cross-platform POSIX shell feature detection and language extension library

  •    Shell

modernish is an ambitious, as-yet experimental, cross-platform POSIX shell feature detection and language extension library. It aims to extend the shell language with extensive feature testing and language enhancements, using the power of aliases and functions to extend the shell language using the shell language itself. The name is a pun on Modernizr, the JavaScript feature testing library, -sh, the common suffix for UNIX shell names, and -ish, still not quite a modern programming language but perhaps a little closer. jQuery is another source of general inspiration; like it, modernish adds a considerable feature set by using the power of the language it's implemented in to extend/transcend that same language.

Java UTF-7 Charset support

  •    Java

Charset implementation adding encoding and decoding support for UTF-7 (as in RFC 2152, in two variants) and modified UTF-7 (RFC 3501) to Java. The two variants of UTF-7 supported differ in the encoding chosen for Set O (optional direct characters).

cz2cz tools

  •    C

cz2cz is software for converting text files between various encoding charsets (ISO-8859-2, Win-1250, UTF-8, ...). Main feature is autodetection of charset used in text file. Only in czech language (and useful for cz user only).

Charset Guessing Library

  •    C

A C/C++ library to guess the encoding and charset of a string

Highlight.js - Javascript Syntax Highlighter

  •    Javascript

Highlight.js is a syntax highlighter written in JavaScript. It works in the browser as well as on the server. It works with pretty much any markup, doesn’t depend on any framework and has automatic language detection. It supports 176 languages and 79 styles, automatic language detection, multi-language code highlighting and lot more.

SeetaFaceEngine

  •    C++

SeetaFace Engine is an open source C++ face recognition engine, which can run on CPU with no third-party dependence. It contains three key parts, i.e., SeetaFace Detection, SeetaFace Alignment and SeetaFace Identification, which are necessary and sufficient for building a real-world face recognition applicaiton system. SeetaFace Detection implements a funnel-structured (FuSt) cascade schema for real-time multi-view face detection, which achieves a good trade-off between detection accuracy and speed. State of the art accuracy can be achieved on the public dataset FDDB in high speed. See SeetaFace Detection for more details.

File Encoding Checker

  •    

File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files. The tool can display the encoding for all selected files, or only the files that do not have the encodings you specify. File Encoding Checker requires .NET 2 or above to run.

Suricata IDS - Network threat detection engine

  •    C

The Suricata engine is capable of real time intrusion detection (IDS), inline intrusion prevention (IPS), network security monitoring (NSM) and offline pcap processing. Suricata inspects the network traffic using a powerful and extensive rules and signature language, and has powerful Lua scripting support for detection of complex threats.

whatlanguage - A language detection library for Ruby that uses bloom filters for speed.

  •    Ruby

Text language detection. Quick, fast, memory efficient, and all in pure Ruby. Uses Bloom filters for aforementioned speed and memory benefits. It works well on texts of over 10 words in length (e.g. blog posts or comments) and very poorly on short or Twitter-esque text, so be aware. Works with Dutch, English, Farsi, French, German, Italian, Pinyin, Swedish, Portuguese, Russian, Arabic, Finnish, Greek, Hebrew, Hungarian, Korean, Norwegian, Polish and Spanish out of the box.

Snort

  •    C

Snort is a libpcap-based sniffer/logger which can be used as a network intrusion detection and prevention system. It uses a rule-based detection language as well as various other detection mechanisms and is highly extensible.

CyberChef - The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis

  •    Javascript

CyberChef is a simple, intuitive web app for carrying out all manner of "cyber" operations within a web browser. These operations include simple encoding like XOR or Base64, more complex encryption like AES, DES and Blowfish, creating binary and hexdumps, compression and decompression of data, calculating hashes and checksums, IPv6 and X.509 parsing, changing character encodings, and much more. The tool is designed to enable both technical and non-technical analysts to manipulate data in complex ways without having to deal with complex tools or algorithms. It was conceived, designed, built and incrementally improved by an analyst in their 10% innovation time over several years.

Filesystem Charset Converter

  •    C

Filesystem Charset Convertor (fcc) Converts the file and directory names from one charset to another.

python-magic - A python wrapper for libmagic

  •    Python

python-magic is a python interface to the libmagic file type identification library. libmagic identifies file types by checking their headers according to a predefined list of file types. This functionality is exposed to the command line by the Unix command file. There is also a Magic class that provides more direct control, including overriding the magic database file and turning on character encoding detection. This is not recommended for general use. In particular, it's not safe for sharing across multiple threads and will fail throw if this is attempted.