robotparser-scala implements a parser for the robots.txt file format in Scala. And then, you have RobotsTxt instance. By default, character encoding is UTF-8.
https://github.com/bizreach/robotparser-scalaTags | crawler parser sitemap |
Implementation | Scala |
License | Apache |
Platform | OS-Independent |
The Crawler is a microservices which can be deployed i.e. using Docker. When the Crawler Component is started, it searches for a MCP and connect to it. By default the local host is searched for a MCP but you can configure one yourself. Every loader and parser microservice must read this crawl profile information. Because that information is required many times, we omit a request into the cawler index by adding the crawler profile into each contract of a crawl job in the crawler_pending and loader_pending queue.
This library is now community-maintained. If you are interested in helping please contact @gourlaysama or mention it on Gitter. As of Scala 2.11, this library is a separate jar that can be omitted from Scala projects that do not use Parser Combinators.
parser-combinators parsingFor the 90's people, i'm keeping this repository as 5.2 compatible. If you need PSR-0 and Composer compatible version, here is a fork that maintained by Evert Pot. Include Sitemap.php file to your PHP document and call Sitemap class with your base domain.
generating-sitemaps google-sitemap sitemap-php sitemap sitemap-filesMVC Sitemap makes it a snap for your ASP.NET MVC based web site to expose a sitemap xml file to search engine crawlers. Simply place a [Sitemap] attribute on all Actions you want crawled and create an action for the sitemap - it's that easy.
sitemapsitemap.js is a high-level sitemap-generating framework that makes creating sitemap XML files easy.Description specifications. Required fields are thumbnail_loc, title, and description.
sitemap sitemap-xml nodejs sitemap-generator sitemap.xmlSitemap and sitemap index builder. After that, make sure your application autoloads Composer classes by including vendor/autoload.php.
sitemapparboiled2 is a Scala 2.11+ library enabling lightweight and easy-to-use, yet powerful, fast and elegant parsing of arbitrary input text. It implements a macro-based parser generator for Parsing Expression Grammars (PEGs), which runs at compile time and translates a grammar rule definition (written in an internal Scala DSL) into corresponding JVM bytecode.PEGs are an alternative to Context-Free Grammars (CFGs) for formally specifying syntax, they make a good replacement for regular expressions and have some advantages over the "traditional" way of building parsers via CFGs (like not needing a separate lexer/scanner phase).
Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.
scraper framework crawler scraping crawling spider parserAn automatic ASP.NET sitemap generator, including dynamic url's, completely configurable. JUST SET AND FORGET!
sitemap sitemap-generatorSiteMap Editor for Microsoft Dynamics CRM 2011 helps developer and customizers to configure the Site Map in a graphical way. You'll no longer have to create solution, add component, export, update Xml and reimport the solution to update the SiteMap.
crm crm-2011 sitemap xrmAn ASP.NET MVC breadcrumbs & SiteMapPath with controller, action and routeValues implementation
action-sitemap breadcrumb-mvc controller-sitemap mvc-sitemap route-breadcrumb route-with-sitemap[Crawler for Golang] Pholcus is a distributed, high concurrency and powerful web crawler software.
crawler spider multi-interface distributed-crawler high-concurrency-crawler fastest-crawler cross-platform-crawler web-crawlerArgonaut is a JSON library for Scala, providing a rich library for parsing, printing and manipulation as well as convenient codecs for translation to and from scala data types. Argonaut is licenced under BSD3 (see LICENCE). See more at http://argonaut.io.
Sitemaps.NET is a website plugin that automatically generates an XML sitemap of your content. Sitemaps.NET reuses ASP.NET's sitemap functionality and automatically mirrors changes in your site to search engines. Features include: - Quickly generate XML sitemaps for search eng...
sitemapNeed the easiest way to add a sitemap to your DotNetNuke module? After a very simple install process, drop this on a page and voila - you have a sitemap. Confi
WatchersNET.SiteMap - A Modern SiteMap / TreeView Module and Skin Object for DotNetNuke®
dotnetnuke dotnetnuke-extension dotnetnuke-module dotnetnuke-skins sitemap sitemap-generatorSimple nested UL/LI emitting composite web control which you can bind to a SiteMap provider. I have provided some basic CSS and jQuery scripts to style it into a tree view. Code has been derived from this sample http://bryantlikes.com/archive/2006/02/17/4839.aspx
html sitemap💡 If you are using a Jekyll version less than 3.5.0, use the gems key instead of plugins. Because the sitemap is added to site.pages, you may have to modify any templates that iterate through all pages (for example, to build a menu of all of the site's content).
sitemap jekyll-pluginThis package can generate a sitemap without you having to add urls to it manually. This works by crawling your entire site. The generator has the ability to execute JavaScript on each page so links injected into the dom by JavaScript will be crawled as well.
laravel sitemap google seo xmlScrooge is a thrift code generator written in Scala, which currently generates code for Scala, Java, Cocoa, Android and Lua.It's meant to be a replacement for the apache thrift code generator, and generates conforming, compatible binary codecs by building on top of libthrift. It integrates with the finagle project, exporting stats and finagle APIs, and makes it easy to build high throughput, low latency, robust thrift servers and clients.
finagle thrift code-generation android cocoa
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.