Displaying 1 to 5 from 5 results

algolia-webcrawler - Simple node worker that crawls sitemaps in order to keep an algolia index up-to-date

  •    Javascript

Simple node worker that crawls sitemaps in order to keep an Algolia index up-to-date. It uses simple CSS selectors in order to find the actual text content to index.

redditLikedSavedImageDownloader - Download all of your reddit Liked/Upvoted and Saved images to disk for hoarding!

  •    Python

This repository includes a simple web server interface. Unlike the main script, the server is supported in Python 3 only. To use it, install tornado via pip3 install tornado then run python3 LikedSavedDownloaderServer.py. The interface can be seen by visiting http://localhost:8888 in any web browser.

pilgrim - Bookmarklet and manual webcrawler to aid in web research

  •    Javascript

Pilgrim is a prototype tool for assisting in web-based research. This project was initiated with generous support from the Knight Foundation Prototype Fund.




krawler - A web crawling framework written in Kotlin

  •    Kotlin

Krawler is a web crawling framework written in Kotlin. It is heavily inspired by crawler4j by Yasser Ganjisaffar. The project is still very new, and those looking for a mature, well tested crawler framework should likely still use crawler4j. For those who can tolerate a bit of turbulence, Krawler should serve as a replacement for crawler4j with minimal modifications to existing applications. Using the Krawler framework is fairly simple. Minimally, there are two methods that must be overridden in order to use the framework. Overriding the shouldVisit method dictates what should be visited by the crawler, and the visit method dictates what happens once the page is visited. Overriding these two methods is sufficient for creating your own crawler, however there are additional methods that can be overridden to privde more robust behavior.