Displaying 1 to 20 from 34 results

Scrapy - Web crawling & scraping framework for Python

  •    Python

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Photon - Incredibly fast crawler designed for recon.

  •    Python

The extracted information is saved in an organized manner or can be exported as json. Control timeout, delay, add seeds, exclude URLs matching a regex pattern and other cool stuff. The extensive range of options provided by Photon lets you crawl the web exactly the way you want.




gain - Web crawling framework based on asyncio.

  •    Python

Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp. You can add proxy setting to spider as above.

toapi - Every web site provides APIs.

  •    Python

Toapi give you the ability to make every web site provides APIs. Version v2.0.0, Completely rewrote.

ruia - Async Python 3.6+ web scraping micro-framework based on asyncio.

  •    Python

Ruia is an async web scraping micro-framework, written with asyncio and aiohttp, aims to make crawling url as convenient as possible.


BlackWidow - A Python based web application scanner to gather OSINT and fuzz for OWASP vulnerabilities on a target website

  •    Python

BlackWidow is a python based web application spider to gather subdomains, URL's, dynamic parameters, email addresses and phone numbers from a target website. This project also includes Inject-X fuzzer to scan dynamic URL's for common OWASP vulnerabilities. This software is released under the GNU General Public License v3.0. See LICENSE.md for details.

tumblr_spider - 汤不热 python 多线程爬虫

  •    Python

汤不热 python 多线程爬虫

Squzer - Distributed Web Crawler

  •    Python

Squzer is the Declum's open-source, extensible, scale, multithreaded and quality web crawler project entirely written in the Python language.

grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

  •    Python

grab-site is an easy preconfigured web crawler designed for backing up websites. Give grab-site a URL and it will recursively crawl the site and write WARC files. Internally, grab-site uses wpull for crawling. a dashboard with all of your crawls, showing which URLs are being grabbed, how many URLs are left in the queue, and more.

portSpider - 🕷 A lightning fast multithreaded network scanner framework with modules.

  •    Python

I'm not responsible for anything you do with this program, so please only use it for good and educational purposes. Copyright (c) 2017 by David Schütz. Some rights reserved.

spider.py - [Reference Only] An asynchronous, multiprocessed, python based spider framework.

  •    Python

An asynchronous, multiprocessed, python spider framework. The spider is seperated into two parts, the actuall engine and the extractors. The engine submits the requests, and handles all of the processes and connections. The extractors are functions that are registered to be called after a page has been loaded and parsed.

freshonions-torscraper - Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi

  •    Python

This is a copy of the source for the http://zlal32teyptf4tvi.onion hidden service, which implements a tor hidden service crawler / spider and web site. This software is made available under the GNU Affero GPL 3 License.. What this means is that is you deploy this software as part of networked software that is available to the public, you must make the source code available (and any modifications).

hacker-news-digest - :newspaper: A responsive interface of Hacker News with summaries and illustrations

  •    Python

This service extracts summaries and illustrations from hacker news articles for people who want to get the most out of hacker news while cutting down the time spent on deciding which one to read and which to skip.

Miyo - 💖 Miyo is the back end of ouenu of a star fans app

  •    Python

Miyo is the back end of ouenu of a star fans app, which was a startup project.

dht-spider - 一个简单的基于DHT协议的BT磁力链接爬虫

  •    Python

一个简单的基于DHT协议的BT磁力链接爬虫

jd_product_spider - 京东商品爬虫服务

  •    Python

缺少一个图,后期补上... 京东所有的品类都是三层.





We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.