An open source .NET web crawler written in C# using SQL 2005/2008. Arachnode.net is a complete and comprehensive .NET web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages. Its features include
Tags | crawler webcrawler searchengine search-engine full-text-search |
Implementation | CSharp |
License | GPLv2 |
Platform | Windows |
Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.
crawler webcrawler searchengine search-engine full-text-search spiderNorconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable.
crawler webcrawler spider full-text-search searchengine search-engineNutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.
crawler webcrawler searchengine search-engine full-text-searchASPseek is an Internet search engine software developed by SWsoft.ASPseek consists of an indexing robot, a search daemon, and a CGI search frontend. It can index as many as a few million URLs and search for words and phrases, use wildcards, and do a Boolean search. Search results can be limited to time period given, site or Web space (set of sites) and sorted by relevance (PageRank is used) or date.
crawler webcrawler searchengine search-engine full-text-search spiderHeritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix is designed to respect the robots.txt exclusion directives and META robots tags, and collect material at a measured, adaptive pace unlikely to disrupt normal website activity.
crawler webcrawler searchengine search-engine full-text-searchmnoGoSearch for UNIX consists of a command line indexer and a search program which can be run under Apache Web Server, or any other HTTP server supporting CGI interface. mnoGoSearch for Unix is distributed in sources and can be compiled with a number of databases, depending on user's choice. It is known to work on a wide variety of the modern Unix operating systems including Linux, FreeBSD, Mac OSX, Solaris and others.
crawler webcrawler searchengine search-engine full-text-searchGrub Next Generation is distributed web crawling system (clients/servers) which helps to build and maintain index of the Web. It is client-server architecture where client crawls the web and updates the server. The peer-to-peer grubclient software crawls during computer idle time.
crawler webcrawler searchengine search-engine full-text-search spiderElasticlunr.js is a lightweight full-text search engine in Javascript for browser search and offline search. Elasticlunr.js is developed based on Lunr.js, but more flexible than lunr.js. Elasticlunr.js provides Query-Time boosting and field search. Elasticlunr.js is a bit like Solr, but much smaller and not as bright, but also provide flexible configuration and query-time boosting.
searchengine search-engine offline-search solr javascript-searchengineManticore Search is an open source high performance full-text search oriented engine. It is a fork of Sphinx Search. Manticore Search is written in C++. It means speed and low resource consumption, it means you don’t have to worry about a garbage collector that suddenly makes a trouble.
searchengine search-engine full-text-search geo-searchApache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
search-engine searchengine full-text-searchSolr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
search-engine searchengine full-text-search facet distributed analyticsThe open source project strus provides a collection of C++ (C++98) libraries and command line tools for building a full-text search engine. The strus search engine can be build using any key value store database that provides an upper bound seek function for the stored key/value pairs. Currently there exists an implementation based on the LevelDB library.
searchengine search-engine full-text-search libraryMeiliSearch is a powerful, fast, open-source, easy to use, and deploy search engine. The search and indexation are fully customizable and handles features like typo-tolerance, filters, and synonyms. It features include Search as-you-type experience, Full text search, Supports Synonym, Highly customizable, RESTfull API and lot more.
search-engine typo-tolerance instant synonyms highlighting searchengine full-text-searchSphinix is free open-source SQL full-text search engine. How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.
searchengine search-engine full-text-search standaloneMG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. MG4J is a highly customisable, high-performance, full-fledged search engine providing state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms. The main points of MG4J are Powerful indexing, Multi-index interval semantics, Virtual fields, Clustering and lot more.
searchengine search-engine full-text-search lucene-alternative searchFull text, faceted, dependency free search engine in javascript. Created to perform fast search on small json dataset (up to 1000 elements).
searchengine search-engine searchThe Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri search engine, Lemur Toolbar, and ClueWeb09 dataset.
searchengine search-engine full-text-search lucene-alternative search information-retrievalLux is an open source XML search engine using Lucene /Solr and Saxon XQuery/XSLT processor. Lux provides XML-aware indexing, an XQuery 1.0 optimizer that rewrites queries to use the indexes, and a function library for interacting with Lucene via XQuery. These capabilities are tightly integrated with Solr, and leverage its application framework in order to deliver a REST service, application server, and supporting tools.
search-engine searchengine xml-searchengine xml-search xml-databaseXapian is an Open Source Search Engine Library. It is written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby. Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.
searchengine search-engine full-text-search lucene-alternativeJumper 2.0 is a collaborative community search platform that revolutionizes search by crowdsourcing knowledge management powered by a shared bookmarking engine. It is easily and quickly deployed into a community of practice that benefits users with complex and specialized search requirements. Jumper delivers universal search of any databases, flat files, fileshares, content systems, web pages, blogs and wikis, even people - through one simple search box.
searchengine search-engine full-text-search collaborative community bookmarking-engine
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.