Displaying 1 to 6 from 6 results

Gigablast - Web and Enterprise search engine in C++

  •    C++

Gigablast is one of the remaining four search engines in the United States that maintains its own searchable index of over a billion pages. It is scalable to thousands of servers. Has scaled to over 12 billion web pages on over 200 servers. It supports Distributed web crawler, Document conversion, Automated data corruption detection and repair, Can cluster results from same site, Synonym search, Spell checker and lot more.

spoon - A package for building specific Proxy Pool for different Sites.

  •    Python

Spoon is a library for building Distributed Proxy Pool for each different sites as you assign. Only running on python 3. Simply run: pip install spoonproxy or clone the repo and set it into your PYTHONPATH.




NScrapy - NScrapy is a

  •    CSharp

Below is a sample of NScrapy, the sample will visit Liepin, which is a Recruit web site Based on the seed URL defined in the [URL] attribute, NScrapy will visit each Postion information in detail page(the ParseItem method) , and visit the next page automatically(the VisitPage method). It is not necessary for the Spider writer to know how the Spiders distributed in different machine/process communicate with each other, and how the Downloader process get the urt that need to download, just tell NScrapy the seed URL, inhirt Spider.Spdier class and write some call back, NScrapy will take the rest of the work NScrapy support different kind of extension, including add your own DownloaderMiddleware, config HTTP header, user agent pool.