algolia-webcrawler - Simple node worker that crawls sitemaps in order to keep an algolia index up-to-date

  •        48

Simple node worker that crawls sitemaps in order to keep an Algolia index up-to-date. It uses simple CSS selectors in order to find the actual text content to index.

https://github.com/DeuxHuitHuit/algolia-webcrawler

Dependencies:

algoliasearch : ^3.26.0
cheerio : ^1.0.0-rc.2
html-entities : ^1.2.1
lodash : ^4.17.5
md5 : ^2.2.1
optimist : ^0.6.1
trim : 0.0.1
update-notifier : ^2.4.0

Tags
Implementation
License
Platform

   




Related Projects

algoliasearch-client-php - Algolia Search API Client for PHP

  •    PHP

Algolia Search is a hosted full-text, numerical, and faceted search engine capable of delivering realtime results from the first keystroke. The Algolia Search API Client for PHP lets you easily use the Algolia Search REST API from your PHP code.

Open Search Server

  •    C++

Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.

Arachnode.net

  •    CSharp

An open source .NET web crawler written in C# using SQL 2005/2008. Arachnode.net is a complete and comprehensive .NET web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages.

algoliasearch-client-javascript - 🔎 Algolia Search API Client for JavaScript platforms

  •    Javascript

Algolia Search is a hosted full-text, numerical, and faceted search engine capable of delivering realtime results from the first keystroke. The Algolia Search API Client for JavaScript lets you easily use the Algolia Search REST API from your JavaScript code. The JavaScript client works both on the frontend (browsers) or on the backend (Node.js) with the same API.

github-awesome-autocomplete - :octocat: Add instant search capabilities to GitHub's search bar

  •    Javascript

By working every day on building the best search engine, we've become obsessed with our own search experience on the websites and mobile applications we use. GitHub is quite big for us, we use their search bar every day but it was not optimal for our needs: so we just re-built Github's search the way we thought it should be and we now share it with the community via this Chrome, Firefox and Safari extensions. Algolia provides a developer-friendly SaaS API for database search. It enables any website or mobile application to easily provide its end-users with an instant and relevant search. With Algolia's unique find as you type experience, users can find what they're looking for in just a few keystrokes. Feel free to give Algolia a try with our 14-days FREE trial at Algolia.


ASPseek

  •    C++

ASPseek is an Internet search engine software developed by SWsoft.ASPseek consists of an indexing robot, a search daemon, and a CGI search frontend. It can index as many as a few million URLs and search for words and phrases, use wildcards, and do a Boolean search. Search results can be limited to time period given, site or Web space (set of sites) and sorted by relevance (PageRank is used) or date.

react-instantsearch - ⚡ Lightning-fast search for React and React Native apps, by Algolia.

  •    Javascript

React InstantSearch is a library for building blazing fast search-as-you-type search UIs with Algolia. React InstantSearch is available on the npm registry. It relies on algoliasearch to communicate with Algolia APIs.

algoliasearch-wordpress - :pill: Algolia Search plugin for WordPress is a drop in replacement for WordPress search

  •    Javascript

Search by Algolia is the smartest way to improve search on your site. Autocomplete is included, along with full control over look, feel and relevance. The plugin provides relevant search results in milliseconds, ensuring that your users can find your best posts at the speed of thought. It also comes with native typo-tolerance and is language-agnostic, so that every WordPress user, no matter where they are, can benefit from it.

Norconex HTTP Collector - Enterprise Web Crawler

  •    Java

Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable.

Nutch - Highly extensible, highly scalable Web crawler

  •    Java

Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

Heritrix

  •    Java

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix is designed to respect the robots.txt exclusion directives and META robots tags, and collect material at a measured, adaptive pace unlikely to disrupt normal website activity.

instantsearch.js - :zap: Lightning-fast search for your apps, by Algolia

  •    Javascript

InstantSearch.js is a library for building blazing fast search-as-you-type search UIs with Algolia. To learn more about the library, follow the getting started guide or check how to add it to your own project.

vue-instantsearch - 👀 Algolia components for building search UIs with Vue.js

  •    Javascript

InstantSearch projects: Vue InstantSearch | InstantSearch.js | React InstantSearch | Angular InstantSearch | InstantSearch Android | InstantSearch iOS. Built by Algolia.

mnoGoSearch

  •    C

mnoGoSearch for UNIX consists of a command line indexer and a search program which can be run under Apache Web Server, or any other HTTP server supporting CGI interface. mnoGoSearch for Unix is distributed in sources and can be compiled with a number of databases, depending on user's choice. It is known to work on a wide variety of the modern Unix operating systems including Linux, FreeBSD, Mac OSX, Solaris and others.

Grub

  •    CSharp

Grub Next Generation is distributed web crawling system (clients/servers) which helps to build and maintain index of the Web. It is client-server architecture where client crawls the web and updates the server. The peer-to-peer grubclient software crawls during computer idle time.

autocomplete.js - :crystal_ball: Fast and full-featured autocomplete library

  •    Javascript

This JavaScript library adds a fast and fully-featured auto-completion menu to your search box displaying results "as you type". It can easily be combined with Algolia's realtime search engine. The library is available as a jQuery plugin, an Angular.js directive or a standalone library. The autocomplete.js library must be included after jQuery, Zepto or Angular.js (with jQuery).

Deathwatch WebCrawler

  •    VB

Deathwatch WebCrawler Personal search engine that runs on any Windows machine with the .NET Framework installed.

jekyll-theme-basically-basic - Your new Jekyll default theme

  •    CSS

If you're running Jekyll v3.5+ and self-hosting you can quickly install the theme as a Ruby gem. If you're hosting with GitHub Pages you can install as a remote theme or directly copy all of the theme files (see structure below) into your project. GitHub Pages has added full support for any GitHub-hosted theme.