Displaying 1 to 20 from 36 results

scrapy-redis - Redis-based components for Scrapy.

  •    Python

Redis-based components for Scrapy. You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.

feeds - DIY Atom feeds in times of social media and paywalls

  •    Python

Once upon a time every website offered an RSS feed to keep readers updated about new articles/blog posts via the users' feed readers. These times are long gone. The once iconic orange RSS icon has been replaced by "social share" buttons. Feeds aims to bring back the good old reading times. It creates Atom feeds for websites that don't offer them (anymore). It allows you to read new articles of your favorite websites in your feed reader (e.g. Tiny Tiny RSS) even if this is not officially supported by the website.

buscaimoveis-scraper - Projeto que coleta anúncios de imóveis a venda em grandes plataformas como OLX, Zap Imóveis, etc

  •    Python

Projeto voltado para raspagem de anúncios de imóveis a venda nas plataformas conhecidas como por exemplo OLX e ZAP Imóveis. OBS.: Por enquanto é raspado vendas de imóveis no Distrito Federal somente, mas em breve estará flexível para outros estados.

books_crawler - A Scrapy crawler for http://books.toscrape.com

  •    Python

A Scrapy crawler for http://books.toscrape.com

experiments - Some research experiments

  •    Jupyter

Some research experiments I have done during the years. Most of the notes can be found on City of Wings.

scrapy-bench - A CLI for benchmarking Scrapy.

  •    Python

A command-line interface for benchmarking Scrapy, that reflects real-world usage. Firstly, download the static snapshot of the website Books to Scrape. That can be done by using wget.

caoliuscrapy - 一个抓取草榴技术讨论帖子内容并在本地展示的小爬虫

  •    Python


scrapy-bhinneka-crawler - Scraping bhinneka.com, just for fun

  •    Python

This Crawler is for crawl an online shop Bhinneka. It will save the item name, link, categories and price in MySQL. At 19 January 2013, this script give me 14567 Items.

scrapy-blog-crawler - Crawl a blog url, and find all url from it, then save to mysql.

  •    Python

This Crawler is for crawl some Blog URL like that, and return some url that found on that url, then insert it into mysql.

Inventus - Inventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers

  •    Python

Inventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers. It's a Scrapy spider, meaning it's easily modified and extendable to your needs. Inventus requires Scrapy to be installed before it can be run. Firstly, clone the repo and enter it.

RARBG-scraper - With Selenium headless browsing and CAPTCHA solving

  •    Python

Scraping RARBG for torrents using Scrapy. Including headless browsing with Selenium and CAPTCHA solving with pytesseract and Pillow.