Devbridge Styleguide helps you create, share, and automate a living visual style library of your brand. Share your digital brand standards, improve team collaboration, and implement an independent easily-extendable modular structure. Note, do not download files directly from git repository, unless you know what you are doing.
styleguide webstyleguide gulp sass snippet snippets colors typography scrape crawl generate styleguide-generator frontend front-end automated automatization toolAccessibility is not only a concern for disabled people. Bots can be considered as such, like DuckDuckGo, Google or Bing. By respecting these standards, you're likely to have a better ranking. Also it helps to clean your code. Accessibility issues are often left unaddressed for budget reasons. In fact most of the cost is spent looking for errors on your website. The A11y Machine greatly help with this task, you can thus focus on fixing your code and reap the benefits. If you would like to validate your pages against the HTML5 recommendation, then you need to install Java.
test accessibility wcag crawlHuntsman takes one or more 'seed' urls with the spider.queue.add() method. Once the process is kicked off with spider.start(), it will take care of extracting links from the page and following only the pages we want.
spider crawler crawl huntsman robot ayncIn my case, the speed of spider is about 1500k documents per day, and the maximize crawling speed is 1.2k /minute, avg 1k /minute, the memory cost are about 200 MB on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.
read scrape grab article spider crawl readable readabilitygrab-site is an easy preconfigured web crawler designed for backing up websites. Give grab-site a URL and it will recursively crawl the site and write WARC files. Internally, grab-site uses wpull for crawling. a dashboard with all of your crawls, showing which URLs are being grabbed, how many URLs are left in the queue, and more.
archiving crawl spider crawler warcCrawl GitHub issues to build a dependency graph.Where keys indicate issues in the graph, and each maps to a list of its dependencies.
crawl github dependency graph dependencies tree issue issuesLike wget -r <url> but specifically designed to support "pretty" URLs. With wget, a URL pointing to /foo would result in /foo.html, but this means the URL has now changed.With snap, it will create the directory /foo and save the file to /foo/index.html so that the URL /foo still works.
spider crawl mirrorA 2nd generation spider to crawl any article site, automatic reading title and content.In my case, the speed of spider is about 700 thousands documents per day, 22 million per month, and the maximize crawling speed is 450 per minute, avg 80 per minute, the memory cost are about 200 megabytes on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.
crawl crawling spider spidering readability scraperobots.txt parser coffeescript
robots.txt parser web crawl seoA node.js module to crawl product IDs from Amazon. Amazon Product Advertising API can access pages which are between 1 to 10 only. Using this module you can retrieve pages between 1 to 400 like an Amazon product list page.
amazon product crawlA node.js module to crawl product reviews from Amazon. Amazon Product Advertising API provides almost attributes about a product. But review data cannot be gathered by API. Use this module if you want to get product reviews.
amazon product review crawlLightweight library for crawlers in Go. HTML parsing and extracting is done thanks to goquery.
crawl crawlerNOTE: This project is no longer being maintained by me. If you are interested in taking over maintenance of this project, let me know. Crawl, as it's name implies, will crawl around a website, discovering all of the links and their relationships starting from a base URL. The output of crawl is a JSON object representing a sitemap of every resource within a site, including each links outbound references and any inbound refferers.
crawl differencer diff web websiteThis package is a slightly overengineered Diffbot API wrapper. It uses PSR-7 and PHP-HTTP friendly client implementations to make API calls. To learn more about Diffbot see here and their homepage. Right now it only supports Analyze, Product, Image, Discussion, Crawl, Search, and Article APIs, but can also accommodate Custom APIs. Video and Bulk API support coming soon. Full documentation available here.
diffbot crawling crawl scrape scraping scraper scraped-data machine-learning nlp ai artificial-intelligence bot
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.