Displaying 1 to 9 from 9 results

Grub

  •    CSharp

Grub Next Generation is distributed web crawling system (clients/servers) which helps to build and maintain index of the Web. It is client-server architecture where client crawls the web and updates the server. The peer-to-peer grubclient software crawls during computer idle time.

Arachnode.net

  •    CSharp

An open source .NET web crawler written in C# using SQL 2005/2008. Arachnode.net is a complete and comprehensive .NET web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages.

FileMasta - Search servers for video, music, books, software, games, subtitles and much more

  •    CSharp

FileMasta is a search engine allowing you to find a file among millions of files located on FTP-servers. The search engine database contains the regularly updated information on the contents of thousands FTP-servers worldwide. We don't search the contents of the files. We host no content, we provide only access to already available files in the same way Google and other search engines do.




MOSS 2007 - C# Protocol Handler

  •    CSharp

Sample code with supporting documentation that enables the creation of a Microsoft Office SharePoint Server 2007 Protocol Handler entirely in .NET Managed (C#) code.

WebCrawler - Just a simple web crawler which return crawled links as IObservable using reactive extension and async await

  •    CSharp

Just a simple web crawler which return crawled links as IObservable using reactive extension and async await.

AzureSearchCrawler - A simple web crawler, using Abot, that indexes page contents into Azure Search.

  •    CSharp

Azure Search is a cloud search service for web and mobile app development. This project helps you get content from a website into an Azure Search index. It uses Abot to crawl websites. For each page it extracts the content in a customizable way and indexes it into Azure Search. This project is intended as a demo or a starting point for a real crawler. At a minimum, you'll want to replace the console messages with proper logging, and customize the text extraction to improve results for your use case.

Mechanize

  •    CSharp

Stateful programmatic web browsing, based on Python-Mechanize, which is based on Andy Lester’s Perl module WWW::Mechanize.


RuiJi.Net - crawler framework, distributed crawler extractor

  •    CSharp

This project exists thanks to all the people who contribute. RuiJi.Net is a distributed crawl framework written in netcore.