HebMorph - Hebrew text analyzer for searchengine

  •        0

An open-source effort for making Hebrew properly searchable by various IR software libraries, while maintaining decent recall, precision and relevancy in retrievals. Includes Hebrew Analyzer for Lucene, and already produces results for Hebrew texts which are much better than the default Lucene implementation.




comments powered by Disqus

Related Projects

Lusql - Plugable pipelined threaded extract transform load (ETL), default from JDBC to Lucene

LuSql is a simple but powerful tool for building Lucene indexes from relational databases. It is a command-line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC-accessible SQL database. It allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode it uses threading to take advantage of multiple core


Hspell-gui is a graphical front end to hspell , a Hebrew spell checker and linguistic analyzer, using the gnome-2.0 graphics library.

Lucene-ext-ko - Korean support in Lucene

There is many Korean lexical analyzer, but it's very expensive and it's difficult for use. If we are able to know and use Lucene library, we want to analyze any korean text in free.

Lucene-skos - A SKOS analyzer module for Apache Lucene and Solr

A SKOS analyzer module for Apache Lucene and SolrWhat is SKOS?The Simple Knowledge Organization Systems (SKOS) is a model for expressing the basis structure and content controlled vocabularies (classification schemes, thesauri, taxonomies, etc.). As an application of the Resource Description Framework (RDF), SKOS allows these vocabularies to be published as dereferencable resources on the Web, which makes them easily retrievable in re-usable in applications. SKOS plays a major role in the ongoin

Luke - Luke - Lucene Index Toolbox

Lucene is an Open Source, mature and high-performance Java search engine. It is highly flexible, and scalable from hundreds to millions of documents. Luke is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display and modify their content in several ways: browse by document number, or by term view documents / copy to clipboard retrieve a ranked list of most frequent terms execute a search, and browse the results analyze search results sel

Hebstem - A Hebrew stemmer for search engines (Lucene, Xapian, etc.) based on libhspell

BackgroundHebrew morphology is complicated, and does not lend itself to simple prefix/suffix stemming, as done in most search engines (e.g. Lucene, Xapian) via e.g. the Snowball algorithm collection. libhspell is a free (GPL) library providing lexicon-based morphologically-correct Hebrew stemming. It supports the ISO-8859-8 encoding only, and the official Hebrew undiacriticized (niqqud-less) spelling of Hebrew words. The ProjectThe hebstem project aims to create the glue necessary to use libhspe

Russianmorphology - Russian morphology for lucene

Russian and English morphology for java and lucene 3.0 framework based on open source dictionary from site �ОТ. It use dictionary base morphology with some heuristics for unknown words. It support homonym for example for Russian word "вина" it gives two variants "вино" and "вина". How to useFirst download morph-1.0.jar and add it to your class path. When download Russian or English package. If you use maven you can add dependency <!-- For Russian morphology --> <dependency> <group

Solr-query-expander - A Query Expander for Semantic Expansion between English and Chinese based on S

A Query Expander for Semantic Expansion between English and Chinese based on SKOSSQEThe Solr Query Expander(SQE) is a Solr plug based on the lucene skos – in that, based on English and Chinese SKOS vocabulary loaded at design-time, expands user queries to other languages and/or related terms. IKAnalyzerstable version of IKAnalyzer,IKAnalyzer3.2.5, is an Chinese Analyzer, but now it can't analyze and segment mixed English and Chinese in phases well during expanding query. The Solr Query Expande

Hickwall-analyzer - Chinese Segmentation Base On Apache Lucene Analyzer

The rapid expansion of internet information has made means of effective access to information become necessary, and the search engine is the one of them. Segmentation system, which is a very important part of the search engine, is primarily responsible for dividing the text into words, building index on the handled information and saving to the database. It is due to the index established on the text after segmentation, when users is using such as Baidu (www.baidu.com), Google (www.google.cn) an

Arabicanalyzer - ArabicAnalyzer for Lucene

A library for adding support for Arabic language in Lucene.