- 0

SeerSuite is an application toolkit for digital libraries and search engines; i.e., CiteSeerX.

http://citeseerx.sourceforge.netTags | |

Implementation | Java |

License | Apache |

Platform |

SummaryWhalebot is open-source web crawler. It is intended to be simple, fast and memory efficient. It was created as a targeted spider, but you may use it as common. Current release 0.02 Current state. Bold - done, normal - TODO If something broken or you have an idea, please visit http://groups.google.com/group/whalebot UsagesIt was used for collecting papers on target thematic from http://citeseerx.ist.psu.edu for my master degree work Candidates for logo were collected using whalebot Eating

cpp crawler fetcher spiderThis implementation allows the detection, in quasi-linear time, of all the maximal repeats in one, or more, strings. Let S a string of lenght ''n'' over a finite alphabet Î£. Si refer to the i-th character of S. Si..j refers to a substring of S starting at the position i and ending at position j. Each position 0 â‰¤ i < n represents a unique suffix Si..n-1 of S. LCi refers to the Left Context of a suffix i. We note Â£ the special left context LC0. We note with the special symbol $ the character

maximal repeat repeats stringpy-rstr-max : detection of all maximal repeats in strings, a python implementationWhat does it look for? Usage Bench py-rstr-max please Have fun with py-rstr-max py-rstr-max formally See also What does it look for?This implementation allows the detection, in linear time, of all the maximal repeats in one, or more, strings. The complete extraction is done in quasi-linear time |n + z| where z is the number of maximal repeats in S. This implementation uses the computation of suffix array in linear

maximum motif non-gapped repeats stringAn implementation of the DC3 algorithm for Suffix Array Construction, including computation of longest common prefixes. Based on: "Simple Linear Work Suffix Array Construction" (2003) by Juha Karkkainen, Peter Sanders, and Stefan Burkhardt http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.137.7871

algorithm dc3 lcs strings suffixarray suffixtreeAboutmadIS is an extensible relational database system built on top of the SQLite database with extensions implemented in Python (via APSW SQLite wrapper). In usage, madIS, feels like Hive (Hadoop's SQL with User Defined Functions language), without the overhead but also without the distributed processing capabilities. Nevertheless madIS can easily handle tens of millions of rows on a single desktop/laptop computer. madIS' main goal is to promote the handling of data related tasks within an exte

analysis data database etl frp processing relational relational_programming udf workflowsA Bazaar-Style Open Source Project: We are all familiar with the basic concept of a web-based CMS. Implementations in various forms and permutations in Java Websphere, Zend PHP, Joomla, Drupal, Python Zope and Python Plone already exist and are very well known. So, why are we reinventing the wheel? In this open source project, we have implemented a powerful "Diff":http://en.wikipedia.org/wiki/Diff algorithm in a Django-based CMS, described by Eugene Myers in one of his papers http://citeseerx.is

cms diff django myersalgorithm revisioncontrolIn mathematics, and computational geometry, a Delaunay triangulation for a set P of points in the plane is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P). Delaunay triangulations maximize the minimum angle of all the angles of the triangles in the triangulation; they tend to avoid skinny triangles. The triangulation was invented by Boris Delaunay in 1934. source: http://en.wikipedia.org/wiki/Delaunay_triangulation This is a parallel implementati

delaunay openmp triangulationThis Matlab toolbox converts JSON formats into (json2mat) and from Matlab structures. The name, JSON4MAT, is a wink to an old, outdated but similalry minded XML toolbox XML4MAT (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.9787) from the days when XML was emerging as the interoperable format of choice and Matlab did have a native XML parser. JSON4MAT was originally developped to support another Matlab toolbox, COUCH4MAT. Web directoryhttp://json4mat.googlecode.com/hg/ Manual / tuto

A C++ implementation of an LRU cache that has a maximum size (but no expiration time). The cache is implemented as a single hash table and provides fast, constant-time insertion, retrieval and query operations. Usage#include <iostream>#include "lru.hpp"using namespace std;int compute_value(int key){ return 100 + key;}int main(){ // Create a cache that can hold 10 (int, int) pairs at most typedef plb::LRUCacheH4<int, int> lru_cache; lru_cache cache(10); // Insert a (key, value) pair into the cach

cache cplusplus hashtable LRUtool to parse citeseerx metadata dump, download papers and save to db