Photon - Incredibly fast crawler designed for recon.

  •        9

The extracted information is saved in an organized manner or can be exported as json. Control timeout, delay, add seeds, exclude URLs matching a regex pattern and other cool stuff. The extensive range of options provided by Photon lets you crawl the web exactly the way you want.

https://github.com/s0md3v/Photon

Tags
Implementation
License
Platform

   




Related Projects

OSINT-Framework - OSINT Framework

  •    Javascript

OSINT framework focused on gathering information from free tools or resources. The intention is to help people find free OSINT resources. Some of the sites included might require registration or offer more data for $$$, but you should be able to get at least a portion of the available information for no cost. Feedback or new tool suggestions are extremely welcome! Please feel free to submit a pull request or open an issue on github or reach out on Twitter.

xray - XRay is a tool for recon, mapping and OSINT gathering from public networks.

  •    Go

XRay is a tool for network OSINT gathering, its goal is to make some of the initial tasks of information gathering and network mapping automatic. The shodan.io API key parameter ( -shodan-key KEY ) is optional, however if not specified, no service fingerprinting will be performed and a lot less information will be shown (basically it just gonna be DNS subdomain enumeration).

Raccoon - A high performance offensive security tool for reconnaissance and vulnerability scanning

  •    Python

Raccoon is a tool made for reconnaissance and information gathering with an emphasis on simplicity. It will do everything from fetching DNS records, retrieving WHOIS information, obtaining TLS data, detecting WAF presence and up to threaded dir busting and subdomain enumeration. Every scan outputs to a corresponding file. As most of Raccoon's scans are independent and do not rely on each other's results, it utilizes Python's asyncio to run most scans asynchronously.

BlackWidow - A Python based web application scanner to gather OSINT and fuzz for OWASP vulnerabilities on a target website

  •    Python

BlackWidow is a python based web application spider to gather subdomains, URL's, dynamic parameters, email addresses and phone numbers from a target website. This project also includes Inject-X fuzzer to scan dynamic URL's for common OWASP vulnerabilities. This software is released under the GNU General Public License v3.0. See LICENSE.md for details.

RED_HAWK - All in one tool for Information Gathering, Vulnerability Scanning and Crawling

  •    PHP

RED HAWK's CMS Detector currently is able to detect the following CMSs (Content Management Systems) in case the website is using some other CMS, Detector will return could not detect. Want to contribute to RED HAWK or point out something wrong? Just create a new issue here: https://github.com/Tuhinshubhra/RED_HAWK/issues/new I'd love to hear from you.


Norconex HTTP Collector - A Web Crawler in Java

  •    Java

Norconex HTTP Collector is a web spider, or crawler that aims to make Enterprise Search integrators and developers's life easier. It is Portable, Extensible, reusable, Robots.txt support, Obtain and manipulate document metadata, Resumable upon failure and lot more.

node-crawler - Web Crawler/Spider for NodeJS + server-side jQuery ;-)

  •    Javascript

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

Norconex HTTP Collector - Enterprise Web Crawler

  •    Java

Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable.

go_spider - [爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework

  •    Go

A crawler of vertical communities achieved by GOLANG. Latest stable Release: Version 1.2 (Sep 23, 2014).

colly - Elegant Scraper and Crawler Framework for Golang

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

awesome-osint - :scream: A curated list of amazingly awesome OSINT

  •    

Please read CONTRIBUTING if you wish to add tools or resources. This list was taken directly from i-inteligence's OSINT Tools and Resources Handbook. I-intelligence is dedicated to helping you improve your ability to collect, analyze, manage, share and communicate information, whether in support of government policy or in pursuit of competitive advantage.

crawler - A high performance web crawler in Elixir.

  •    Elixir

A high performance web crawler in Elixir, with worker pooling and rate limiting via OPQ. Below is a very high level architecture diagram demonstrating how Crawler works.

creepy - A geolocation OSINT tool

  •    Python

Geolocation OSINT tool. Creepy is a geolocation OSINT tool. Gathers geolocation related information from online sources, and allows for presentation on map, search filtering based on exact location and/or date, export in csv format or kml for further analysis in Google Maps.

dcrawl - Simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names

  •    Go

dcrawl is a simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names. dcrawl takes one site URL as input and detects all <a href=...> links in the site's body. Each found link is put into the queue. Successively, each queued link is crawled in the same way, branching out to more URLs found in links on each site's body.

tinfoleak - The most complete open-source tool for Twitter intelligence analysis

  •    Python

tinfoleak is an open-source tool within the OSINT (Open Source Intelligence) and SOCMINT (Social Media Intelligence) disciplines, that automates the extraction of information on Twitter and facilitates subsequent analysis for the generation of intelligence. Taking a user identifier, geographic coordinates or keywords, tinfoleak analyzes the Twitter timeline to extract great volumes of data and show useful and structured information to the intelligence analyst. tinfoleak is included in several Linux Distros: Kali, CAINE, BlackArch and Buscador. It is currently the most comprehensive open-source tool for intelligence analysis on Twitter.

Crawler-Detect - 🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

  •    PHP

CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent and http_from header. Currently able to detect 1,000's of bots/spiders/crawlers. Run composer require jaybizzle/crawler-detect 1.* or add "jaybizzle/crawler-detect" :"1.*" to your composer.json.

Monkey-Spider

  •    Python

The Monkey-Spider is a crawler based low-interaction Honeyclient Project. It is not only restricted to this use but it is developed as such. The Monkey-Spider crawles Web sites to expose their threats to Web clients.

dhtspider - Bittorrent dht network spider

  •    Javascript

Bittorrent dht network infohash spider, for engiy.com[a bittorrent resource search engine]