html-metadata - MetaData html scraper and parser for Node.js (supports Promises and callback style)

  •        8

The aim of this library is to be a comprehensive source for extracting all html embedded metadata. Currently it supports Schema.org microdata using a third party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags).You can also pass an options object as the first argument containing extra parameters. Some websites require the user-agent or cookies to be set in order to get the response.

https://github.com/wikimedia/html-metadata

Dependencies:

bluebird : 3.5.1
cheerio : 1.0.0-rc.2
microdata-node : 0.2.1
preq : 0.5.4

Tags
Implementation
License
Platform

   




Related Projects

extruct - Extract embedded metadata from HTML markup

  •    Python

extruct is a library for extracting embedded metadata from HTML markup. It also has a built-in HTTP server to test its output as JSON.

jekyll-seo-tag - A Jekyll plugin to add metadata tags for search engines and social networks to better index and display your site's content

  •    Ruby

A Jekyll plugin to add metadata tags for search engines and social networks to better index and display your site's content. While you could theoretically add the necessary metadata tags yourself, Jekyll SEO Tag provides a battle-tested template of crowdsourced best-practices.

schema-generator - PHP Model Scaffolding from Schema.org vocabulary

  •    PHP

Schema.org is a vocabulary representing common data structures and their relations. Schema.org can be exposed as JSON-LD, microdata and RDFa. Extracting semantic data exposed in the Schema.org vocabulary is supported by a growing number of companies including Google (Search, Gmail), Yahoo!, Bing and Yandex. Data models provided by Schema.org are popular and have proved efficient. They cover a broad spectrum of topics including creative work, e-commerce, event, medicine, social networking, people, postal address, organization, place or review. Schema.org has its roots in a ton of preexisting well designed vocabularies and is successfully used by more and more websites and applications.

Standard Content Archive Management

  •    Java

SCAM is a development environment for building metadata stores for RDF and the Semantic Web. SCAM is built upon international technology standards and metadata standards. Such as RDF, Dublin Core, IEEE/LOM and IMS.

Rework.SocialMetadata

  •    

Provides Orchard CMS UI exposing Social Metadata (e.g. Facebook Open Graph, Twitter Cards, etc.). Default (via tokens) to content type or override on item.


OpenGraph-Net - .Net Open Graph Parser written in CShap

  •    CSharp

A simple .net assembly to use to parse Open Graph information from either a URL or an HTML snippet. You can read more about the Open Graph protocol.

core - The server component of API Platform: hypermedia and GraphQL APIs in minutes

  •    PHP

API Platform Core is an easy to use and powerful system to create hypermedia-driven REST APIs. It is a component of the API Platform framework and it can be integrated with the Symfony framework using the bundle distributed with the library. It natively supports popular open formats including JSON for Linked Data (JSON-LD), Hydra Core Vocabulary, Swagger (OpenAPI), HAL and HTTP Problem.

WGFA - Web Gateway for Fact Assessment

  •    Java

WGFA is a web application to create and manage W3C-OWL based ontologies, index websites, extract XML-RDF or Dublin-Core metadata and provide search and query operations on the websites based on the created semantic webs.

jsonld.js - A JSON-LD Processor and API implementation in JavaScript

  •    Javascript

This library is an implementation of the JSON-LD specification in JavaScript. JSON, as specified in RFC7159, is a simple language for representing objects on the Web. Linked Data is a way of describing content across different documents or Web sites. Web resources are described using IRIs, and typically are dereferencable entities that may be used to find more information, creating a "Web of Knowledge". JSON-LD is intended to be a simple publishing method for expressing not only Linked Data in JSON, but for adding semantics to existing JSON.

api-platform - REST and GraphQL framework to build modern API-driven projects (server-side and client-side)

  •    Javascript

The official project documentation is available on the API Platform website. API Platform embraces open web standards (Swagger, JSON-LD, GraphQL, Hydra, HAL, JWT, OAuth, HTTP...) and the Linked Data movement. Your API will automatically expose structured data in Schema.org/JSON-LD. It means that your API Platform application is usable out of the box with technologies of the semantic web.

Prism - Prism is a framework for building loosely coupled, maintainable, and testable XAML applications in WPF, Windows 10 UWP, and Xamarin Forms

  •    CSharp

Prism is a framework for building loosely coupled, maintainable, and testable XAML applications in WPF, Windows 10 UWP, and Xamarin Forms. Separate releases are available for each platform and those will be developed on independent timelines. Prism provides an implementation of a collection of design patterns that are helpful in writing well-structured and maintainable XAML applications, including MVVM, dependency injection, commands, EventAggregator, and others. Prism's core functionality is a shared code base in a Portable Class Library targeting these platforms. Those things that need to be platform specific are implemented in the respective libraries for the target platform. Prism also provides great integration of these patterns with the target platform. For example, Prism for UWP and Xamarin Forms allows you to use an abstraction for navigation that is unit testable, but that layers on top of the platform concepts and APIs for navigation so that you can fully leverage what the platform itself has to offer, but done in the MVVM way.Prism 6 is a fully open source version of the Prism guidance originally produced by Microsoft patterns & practices. The core team members were all part of the P&P team that developed Prism 1 through 5, and the effort has now been turned over to the open source community to keep it alive and thriving to support the .NET community. There are thousands of companies who have adopted previous versions of Prism for WPF, Silverlight, and Windows Runtime, and we hope they will continue to move along with us as we continue to evolve and enhance the framework to keep pace with current platform capabilities and requirements.

LegoWeb: Open source Web CMS base on ASP.NET Webparts + MARCXML metadata

  •    

LegoWeb is an open source web content management solution developed base on combination of ASP.NET 2.0 Webparts and MARCXML Metadata. It is very simple and very flexible.

Apache Beehive - Simple object model on J2EE and Struts

  •    Java

Beehive makes J2EE programming easier by building a simple object model on J2EE and Struts

STINGER - In-memory graph store and dynamic graph analysis platform

  •    C

STINGER is a package designed to support streaming graph analytics by using in-memory parallel computation to accelerate the computation. STINGER is composed of the core data structure and the STINGER server, algorithms, and an RPC server that can be used to run queries and serve visualizations.

Potnia

  •    PHP

Potnia is a subject gateway software, developed for scientific directories, including journals, papers, bibliographic databases, research webs and so on. Database structure is compliant to Dublin Core Metadata Set

I-Man

  •    

I-Man will be an XML metadata \quot;Information Repository\quot;. Based on RDF built in Java. Provides GUI\'s, schemas and other tools. Compliant with UK e-GIF, e-GMS and Dublin Core.

Metaphile

  •    Java

Metaphile is a Java library for reading image metadata. It supports JFIF, JFXX, IPTC IIM (V3 and V4), EXIF (2.1 and 2.2) and XMP (Dublin Core, Photoshop, Iptc4XMPCore, Rights Management)

tumblr-boilerplate - :zap: A true bare bones Tumblr theme for a quick jump-start

  •    HTML

A fully functional bare-bones Tumblr theme that works out of the box. Style it to your needs. The goal of the project was to remove uncessary code easing the development process. Tumblr will auto-inject code (such as Open Graph Protocol, Twitter Cards & javascript) into the final result for your page. This is out of the theme developers' control. Running it through a HTML Validator or Page Speed may spit out warnings & errors.

Embed - Get info from any web service or page

  •    PHP

PHP library to get information from any web page (using oembed, opengraph, twitter-cards, scrapping the html, etc). It's compatible with any web service (youtube, vimeo, flickr, instagram, etc) and has adapters to some sites like (archive.org, github, facebook, etc). This package is installable and autoloadable via Composer as embed/embed.