ahocorapy - Pure python Aho-Corasick library.

ahocorapy is a pure python implementation of the Aho-Corasick Algorithm. Given a list of keywords one can check if at least one of the keywords exist in a given text in linear time. We started working on this in the beginning of 2016. Our requirements included unicode support combined with python2.7. That was impossible with C-extension based libraries (like pyahocorasick). Pure python libraries were very slow or unusable due to memory explosion. Since then another pure python library was released py-aho-corasick. The repository also contains some discussion about different implementations. There is also acora, but it includes the note ('current construction algorithm is not suitable for really large sets of keywords') which really was the case the last time I tested, because RAM ran out quickly.




