Arachnode.net

An open source .NET web crawler written in C# using SQL 2005/2008. Arachnode.net is a complete and comprehensive .NET web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages. Its features include

  • .NET architecture
  • Configurable Rules and Actions
  • Lucene.NET Integration
  • SQL Server 2008 and full-text indexing
  • .DOC/.PDF/.PPT/.XLS Indexing
  • HTML to XML and XHTML
  • Multi-threading and Throttling
  • Respectful Crawling
  • Analysis Services
  • SQL Server 2008 and SSIS
  • EXIF data extraction



http://arachnode.net/

Bookmark and Share          6814



comments powered by Disqus


Related Products

Nutch

Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

Read more

Lucene

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Read more

IndexTank - Search Engine powers Reddit

IndexTank search engine powers search in Reddit, Social bookmarking site. IndexTank is acquired by LinkedIn and released the project as open source. It includes features like Variables boosts, Facets, Faceted search, Snippeting, Custom scoring functions, Suggest, and Autocomplete.

Read more

Solr

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

Read more

Sphinix

Sphinix is free open-source SQL full-text search engine. How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.

Read more

Katta - Lucene and more in the cloud.

Katta is a scalable, failure tolerant, distributed, data storage for real time access. Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.

Read more

Open Search Server

Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.

Read more

Grub

Grub Next Generation is distributed web crawling system (clients/servers) which helps to build and maintain index of the Web. It is client-server architecture where client crawls the web and updates the server. The peer-to-peer grubclient software crawls during computer idle time.

Read more

compass

Compass is a real time searchengine. It is built on top of lucene. It is transactional, distributed, supports Spring MVC, integrates with Hibernate.

Read more

Gigablast - Web and Enterprise search engine in C++

Gigablast is one of the remaining four search engines in the United States that maintains its own searchable index of over a billion pages. It is scalable to thousands of servers. Has scaled to over 12 billion web pages on over 200 servers. It supports Distributed web crawler, Document conversion, Automated data corruption detection and repair, Can cluster results from same site, Synonym search, Spell checker and lot more.

Read more

Related Tags
Browse projects by tags.

We have collection of more than 1 Million open source products ranging from Enterprise product to small libraries in all platforms. We aggregate information from all open source repositories. Search and find the best for your needs.