Huginn is a system for building agents that perform automated tasks for you online. They can read the web, watch for events, and take actions on your behalf. Huginn's Agents create and consume events, propagating them along a directed graph. Think of it as a hackable version of IFTTT or Zapier on your own server. You always know who has your data. You do. Join us in our Gitter room to discuss the project.
automation notifications scraper webscraping feedgenerator rss agent monitoring feed twitter-streaming huginn twitterThis project is made for automatic web scraping to make scraping easy. It gets a url or the html content of a web page and a list of sample data which we want to scrape from that page. This data can be text, url or any html tag value of that page. It learns the scraping rules and returns the similar elements. Then you can use this learned object with new urls to get similar content or the exact same element of those new pages. It's compatible with python 3.
crawler machine-learning scraper automation ai scraping artificial-intelligence web-scraping scrape webscraping webautomationsoup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.
webscraper webscraping beautifulsoup scraping web-scraping crawler web-crawlerDevelopment is supported on Linux and Mac OS X. Just follow the instructions on the Docker site.
civictech docker webscrapingInspired by Hartley Brody, this cheat sheet is about web scraping using rvest,httr and Rselenium. It covers many topics in this blog. While Hartley uses python's requests and beautifulsoup libraries, this cheat sheet covers the usage of httr and rvest. While rvest is good enough for many scraping tasks, httr is required for more advanced techniques. Usage of Rselenium(web driver) is also covered.
r web-scraping cheatsheet webscraping rvest scrape-websites httr rseleniumwebchem is a R package to retrieve chemical information from the web. This package interacts with a suite of web APIs to retrieve chemical information.The functions in the package that hit a specific API have a prefix and suffix separated by an underscore (prefix_suffix()) They follow the format of source_functionality, e.g.cs_compinfo uses ChemSpider to retrieve compound informations.
rstats ropensci chemical-information webscraping cas-number chemspider identifierOpen Source web scraping API. Falkor turns web pages into queryable JSON
webscraping webscrapperWhen scraping websites one must always observe the terms of service of that website. The spider that I provide in this repo is for educational purposes only.
pyparsing webscraping data-wrangling dsl parsingwebhog is a package that stores and downloads a given URL (including js, css, and images) for offline use and uploads it to a given AWS-S3 account (more persistance options to come). ##Usage Make a POST request to http://localhost:3000/scrape with a header set to value X-API-KEY: SCRAPEAPI. Pass in a JSON value of the URL you'd like to fetch: { "url": "http://facebook.com"} (as an example). You'll notice an Ent dir: /blah/blah/blah printed to the console - your assets are saved there. To test, open the given index.html file.
webscrapingProvides functions to download and parse ‘robots.txt’ files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, …) are allowed to access specific resources on a domain.
robotstxt crawler webscraping spider scraper r rstats r-packageIn the era of Big Data, the web is an endless source of information. For this reason, there are plenty of good tools/frameworks to perform scraping of web pages. So, I guess, in an ideal world there should be no need of a new web scraping framework. Nevertheless, there are always subtle differences between theory and practice. The case of web scraping made no exceptions.
scraping webscraping selenium chrome-headlessdecryptr is an R package to break captchas. It is also an extensible tool built in a way that enables anyone to contribute with their own captcha-breaking code. Simple, right? The decrypt() funcion is this package's workhorse: it is able to take a captcha (either the path to a captcha file or a captcha object read with read_captcha()) and break it with a model (either the name of a known model, the path to a model file or a model object created with train_model()).
captcha tidyverse r rstats webscrapingPacPaw is pawn package manager for SAMP wrriten in python and is still under developement.It mainly relies on webscraping with BeautifulSoup.In addition to it it also helps scripters for gathering snippets based on pawn and function references documented for SA-MP.
sa-mp samp webscraping pawn beautifulsoup package-manager command-line(Note) Please note that this was written around January 2015. These scripts rely heavily on webscraping to access the required data. Websites regularly change their layouts and locations. Because of this, the webscraping may fail, which causes no updated predictions to be made. I will eventually get around this fall to making sure this runs. I've decided to rework my NBA Prediction code from R to Python. Mostly to see if I could do it, and also to see if I could speed it up a bit. I'll update here with current speed/accuracy results as the 2014-15 season plays out. The structure and format is pretty much the same, with the exception that it's cleaner code. I still need to comment it a bit more, but it's Git ready for now.
nba-prediction webscrapingA telegram chat bot for : Getting lyrics, Getting nearby restaurants and their menu and random quotes.
telegram-bot zomato-api chatbot webscrapingRcrawler is an R package for web crawling websites and extracting structured data which can be used for a wide range of useful applications, like web mining, text mining, web content mining, and web structure mining. So what is the difference between Rcrawler and rvest : rvest extracts data from one specific page by navigating through selectors. However, Rcrawler automatically traverses and parse all web pages of a website, and extract all data you need from them at once with a single command. For example collect all published posts on a blog, or extract all products on a shopping website, or gathering comments, reviews for your opinion mining studies. More than that, Rcrawler can help you studies web site structure by building a network representation of a website internal and external hyperlinks (nodes & edges). Help us improve Rcrawler by asking questions, revealing issues, suggesting new features. If you have a blog write about it, or just share it with your collegues.
r rpackage crawler scraper webcrawler webscraping webscraper webscrapping crawlersR has a number of great packages for interacting with web data but it still lags behind Python in large part because of the power and ease of use of the Requests module. This package aims to port those powers to R, I like to think of this package as the Bo Jackson of web interaction tools.
r requests http webscrapingGet Instagram media (photos and videos), stories, story highlights, postlives (live stream that shared to stories after end), following and followers in Go. The following three values are must to access the Instagram API.
instagram webscraping web-scraping downloaderIs a tool (Hosted version / Demo: feedbridge.notmyhostna.me) to provide RSS feeds for sites that don't have one, or only offer a feed of headlines. For each site—or kind of site—you want to generate a feed for you'll have to implement a plugin with a custom scraping strategy. Feedbridge doesn't persist old items so if it's not on the site you are scraping any more it won't be in the feed. Pretty similar to how most feeds these days work that only have the latest items in there. It publishes Atom, RSS 2.0, and JSON Feed Version 1 conform feeds. There are a bunch of web apps doing something similar, some of them you can even drag and drop selectors to create a feed. That didn't work well for the site I was trying it for so I decided to built this. (Also it was fun doing so).
rss-generator jsonfeed-generator atom-feed webscraping scraping rssanirip is a Crunchyroll episode/subtitle ripper written in Go. It performs all actions associated with downloading video segments, subtitle files, and metadata and muxes them together appropriately. 1) Install ffmpeg if it doesn't already exist on your system. We will using this tool primarily for dumping episode content and editing video metadata.
crunchyroll ffmpeg cli video webscraping matroska anime anime-downloader
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.