Crawler-Detect - ๐Ÿ•ท CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

  •        75

CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent and http_from header. Currently able to detect 1,000's of bots/spiders/crawlers. Run composer require jaybizzle/crawler-detect 1.* or add "jaybizzle/crawler-detect" :"1.*" to your composer.json.

http://crawlerdetect.io
https://github.com/JayBizzle/Crawler-Detect

Tags
Implementation
License
Platform

   




Related Projects

agent - ๐Ÿ‘ฎ A PHP desktop/mobile user agent parser with support for Laravel, based on Mobiledetect

  •    PHP

A PHP desktop/mobile user agent parser with support for Laravel, based on Mobile Detect with desktop support and additional functionality. Check for a certain property in the user agent.

Detect.js - :mag: Library to detect browser, os and device based on the UserAgent String

  •    Javascript

โ—๏ธ NOTE: THIS PLUGIN IS NO LONGER MAINTAINED. If you encounter a bug then you're probably on your own. Try Bowser as an alternative. Note: Detect.js is a JavaScript library to detect platforms, versions, manufacturers and types based on the navigator.userAgent string. This code is based on, and modified from, the original work of Tobie Langel's UA-Parser: https://github.com/tobie/ua-parser. UA-Parser is subsequently a port of BrowserScope's user agent string parser.

Norconex HTTP Collector - A Web Crawler in Java

  •    Java

Norconex HTTP Collector is a web spider, or crawler that aims to make Enterprise Search integrators and developers's life easier. It is Portable, Extensible, reusable, Robots.txt support, Obtain and manipulate document metadata, Resumable upon failure and lot more.

node-crawler - Web Crawler/Spider for NodeJS + server-side jQuery ;-)

  •    Javascript

Web Crawler/Spider for NodeJS + server-side jQuery ;-)


Norconex HTTP Collector - Enterprise Web Crawler

  •    Java

Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable.

go_spider - [็ˆฌ่™ซๆก†ๆžถ (golang)] An awesome Go concurrent Crawler(spider) framework

  •    Go

A crawler of vertical communities achieved by GOLANG. Latest stable Release: Version 1.2 (Sep 23, 2014).

colly - Elegant Scraper and Crawler Framework for Golang

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

crawler - A high performance web crawler in Elixir.

  •    Elixir

A high performance web crawler in Elixir, with worker pooling and rate limiting via OPQ. Below is a very high level architecture diagram demonstrating how Crawler works.

RED_HAWK - All in one tool for Information Gathering, Vulnerability Scanning and Crawling

  •    PHP

RED HAWK's CMS Detector currently is able to detect the following CMSs (Content Management Systems) in case the website is using some other CMS, Detector will return could not detect. Want to contribute to RED HAWK or point out something wrong? Just create a new issue here: https://github.com/Tuhinshubhra/RED_HAWK/issues/new I'd love to hear from you.

device_detector - DeviceDetector is a precise and fast user agent parser and device detector written in Ruby

  •    Ruby

DeviceDetector is a precise and fast user agent parser and device detector written in Ruby, backed by the largest and most up-to-date user agent database. DeviceDetector will parse any user agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model. DeviceDetector detects thousands of user agent strings, even from rare and obscure browsers and devices.

Squzer - Distributed Web Crawler

  •    Python

Squzer is the Declum's open-source, extensible, scale, multithreaded and quality web crawler project entirely written in the Python language.

grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

  •    Python

grab-site is an easy preconfigured web crawler designed for backing up websites. Give grab-site a URL and it will recursively crawl the site and write WARC files. Internally, grab-site uses wpull for crawling. a dashboard with all of your crawls, showing which URLs are being grabbed, how many URLs are left in the queue, and more.

Mobile-Detect

  •    PHP

Mobile_Detect is a lightweight PHP class for detecting mobile devices (including tablets). It uses the User-Agent string combined with specific HTTP headers to detect the mobile environment.

ATSCAN - Advanced Search & Mass Exploit Scanner- ูุงุญุต ู…ุชู‚ุฏู… ู„ุจุญุซ ูˆ ุงุณุชุบู„ุงู„ ุงู„ุซุบุฑุงุช ุจุงู„ุฌู…ู„ุฉ

  •    Perl

โ— Search engine Google / Bing / Ask / Yandex / Sogou โ— Mass Dork Search โ— Multiple instant scans. โ— Mass Exploitation โ— Use proxy. โ— Random user agent. โ— Random engine. โ— Extern commands execution. โ— XSS / SQLI / LFI / AFD scanner. โ— Filter wordpress and Joomla sites. โ— Find Admin page. โ— Decode / Encode Base64 / MD5 โ— Ports scan. โ— Collect IPs โ— Collect E-mails. โ— Auto detect errors. โ— Auto detect Cms. โ— Post data. โ— Auto sequence repeater. โ— Validation. โ— Post and Get method โ— Interactive and Normal interface. โ— And more...

device-detector - The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc

  •    PHP

The Universal Device Detection library that parses User Agents and detects devices (desktop, tablet, mobile, tv, cars, console, etc.), clients (browsers, feed readers, media players, PIMs, ...), operating systems, brands and models. Instead of using the full power of DeviceDetector it might in some cases be better to use only specific parsers. If you aim to check if a given useragent is a bot and don't require any of the other information, you can directly use the bot parser.

Smart TCL

  •    Perl

A complete TCL script for eggdrop bots 1.6.x. Features:op,botnet,bnc,mass-commands,user-protector,anti-spam,anti-clone, repeat-kicker,extra-bitch,onjoin,topic-locker,limit,split-detect,split-protect, upgrade,shell-commands,auto-add(bots),remote-sends,.

browscap - :page_with_curl: The main project repository

  •    PHP

This tool is used to build and maintain browscap files. There is actually only one cli command available.

NWebCrawler

  •    

This is a web crawler program written in C#.