ahocorapy - Pure python Aho-Corasick library.

  •        228

ahocorapy is a pure python implementation of the Aho-Corasick Algorithm. Given a list of keywords one can check if at least one of the keywords exist in a given text in linear time. We started working on this in the beginning of 2016. Our requirements included unicode support combined with python2.7. That was impossible with C-extension based libraries (like pyahocorasick). Pure python libraries were very slow or unusable due to memory explosion. Since then another pure python library was released py-aho-corasick. The repository also contains some discussion about different implementations. There is also acora, but it includes the note ('current construction algorithm is not suitable for really large sets of keywords') which really was the case the last time I tested, because RAM ran out quickly.

https://github.com/abusix/ahocorapy

Tags
Implementation
License
Platform

   




Related Projects

aho-corasick - Java implementation of the Aho-Corasick algorithm for efficient string matching

  •    Java

Java library for efficient string matching against a large set of keywords

ahocorasick - A Golang implementation of the Aho-Corasick string matching algorithm

  •    Go

A Golang implementation of the Aho-Corasick string matching algorithm

AHO Corasick .net

  •    

Aho corasick search algorithm implementation using .net C#, with path compression.

ac - Aho-Corasick Automaton with Double Array Trie (Multi-pattern substitute in go)

  •    Go

Aho-Corasick Automaton with Double Array Trie (Multi-pattern substitute in go)


Tandem Repeat Occurrence Locator

  •    C++

The Tandem Repeat Occurrence Locator -- TROLL -- is a light weight SSR finder based on a slight modification of the Aho-Corasick algorithm.

rake-nltk - Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

  •    Python

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text. If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.

SwiftGraph - A Graph Data Structure in Pure Swift

  •    Swift

SwiftGraph is a pure Swift (no Cocoa) implementation of a graph data structure, appropriate for use on all platforms Swift supports (iOS, macOS, Linux, etc.). It includes support for weighted, unweighted, directed, and undirected graphs. It uses generics to abstract away both the type of the vertices, and the type of the weights. It includes copious in-source documentation, unit tests, as well as search functions for doing things like breadth-first search, depth-first search, and Dijkstra's algorithm. Further, it includes utility functions for topological sort, Jarnik's algorithm to find a minimum-spanning tree, detecting a DAG (directed-acyclic-graph), and enumerating all cycles.

rdflib - RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information

  •    Python

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information as graphs. The current version of RDFLib is 4.2.2, see the CHANGELOG.md file for what's new.

VivaGraphJS - Graph drawing library for JavaScript

  •    Javascript

VivaGraphJS is designed to be extensible and to support different rendering engines and layout algorithms. Underlying algorithms have been broken out into ngraph. The larger family of modules can be found by querying npm for "ngraph".

Python Prompt Toolkit - Library for building powerful interactive command lines in Python

  •    Python

prompt_toolkit is a library for building powerful interactive command lines and terminal applications in Python. ptpython is an interactive Python Shell, build on top of prompt_toolkit. prompt_toolkit could be a replacement for GNU readline, but it can be much more than that.

RAKE - A python implementation of the Rapid Automatic Keyword Extraction

  •    Python

A Python implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons. The source code is released under the MIT License.

rivalcfg - CLI tool and Python library to configure SteelSeries gaming mice

  •    Python

Rivalcfg is a Python library and a CLI utility program that allows you to configure SteelSeries gaming mice on Linux and Windows (probably works on BSD and Mac OS too, but not tested). I first created this program to configure my Rival 100 and the original Rival mice, then I added support for other Rival devices thanks to contributors. Today this project aims to support any SteelSeries gaming mice (Rival, Sensei,...).

jGABL

  •    Java

jGABL (the Java Graph Algorithm Base Library) is a java library for the implementation of graph algorithms. It covers a hierarchy of graph concepts, various graph implementations, and algorithm animation.

caniusepython3 - Can I Use Python 3?

  •    Python

You can read the documentation on how to use caniusepython3 from its PyPI page. A web interface is also available. As long as a trove classifier for some version of Python 3 is specified then the project is considered to support Python 3 (project owners: it is preferred you at least specify Programming Language :: Python :: 3 as that is how you end up listed on the Python 3 Packages list on PyPI; you can represent Python 2 support with Programming Language :: Python). Note that Python 3.0 through 3.3 have reached their End Of Life.

hashring - Consistent hashing "hashring" implementation in golang (using the same algorithm as libketama)

  •    Go

Implements consistent hashing that can be used when the number of server nodes can increase or decrease (like in memcached). The hashing ring is built using the same algorithm as libketama. This is a port of Python hash_ring library https://pypi.python.org/pypi/hash_ring/ in Go with the extra methods to add and remove nodes.

Tantivy - Full-text search engine library inspired by Lucene and written in Rust

  •    Rust

Tantivy is a full text search engine library written in rust. It is closer to Lucene than to Elastic Search and Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.

localshop - local pypi server (custom packages and auto-mirroring of pypi)

  •    Python

A PyPI server which automatically proxies and mirrors PyPI packages based upon packages requested. It has support for multiple indexes and team based access and also supports the uploading of local (private) packages. If you want more flexibility you can load your custom settings file by mounting a docker volume and creating a localshop.conf.py. This file will be loaded by localshop at the end of the settings file.

facebook-sdk - Python SDK for Facebook's Graph API

  •    Python

This client library is designed to support the Facebook Graph API and the official Facebook JavaScript SDK, which is the canonical way to implement Facebook authentication. You can read more about the Graph API by accessing its official documentation. This library uses the Apache License, version 2.0. Please see the library's individual files for more information.

pugixml - Light-weight, simple and fast XML parser for C++ with XPath support

  •    C++

pugixml is a C++ XML processing library, which consists of a DOM-like interface with rich traversal/modification capabilities, an extremely fast XML parser which constructs the DOM tree from an XML file/buffer, and an XPath 1.0 implementation for complex data-driven tree queries. Full Unicode support is also available, with Unicode interface variants and conversions between different Unicode encodings (which happen automatically during parsing/saving). pugixml is used by a lot of projects, both open-source and proprietary, for performance and easy-to-use interface.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.