extruct - Extract embedded metadata from HTML markup

  •        4

extruct is a library for extracting embedded metadata from HTML markup. It also has a built-in HTTP server to test its output as JSON.

https://github.com/scrapinghub/extruct

Tags
Implementation
License
Platform

   




Related Projects

schema-generator - PHP Model Scaffolding from Schema.org vocabulary

  •    PHP

Schema.org is a vocabulary representing common data structures and their relations. Schema.org can be exposed as JSON-LD, microdata and RDFa. Extracting semantic data exposed in the Schema.org vocabulary is supported by a growing number of companies including Google (Search, Gmail), Yahoo!, Bing and Yandex. Data models provided by Schema.org are popular and have proved efficient. They cover a broad spectrum of topics including creative work, e-commerce, event, medicine, social networking, people, postal address, organization, place or review. Schema.org has its roots in a ton of preexisting well designed vocabularies and is successfully used by more and more websites and applications.

jsonld.js - A JSON-LD Processor and API implementation in JavaScript

  •    Javascript

This library is an implementation of the JSON-LD specification in JavaScript. JSON, as specified in RFC7159, is a simple language for representing objects on the Web. Linked Data is a way of describing content across different documents or Web sites. Web resources are described using IRIs, and typically are dereferencable entities that may be used to find more information, creating a "Web of Knowledge". JSON-LD is intended to be a simple publishing method for expressing not only Linked Data in JSON, but for adding semantics to existing JSON.

api-platform - REST and GraphQL framework to build modern API-driven projects (server-side and client-side)

  •    Javascript

The official project documentation is available on the API Platform website. API Platform embraces open web standards (Swagger, JSON-LD, GraphQL, Hydra, HAL, JWT, OAuth, HTTP...) and the Linked Data movement. Your API will automatically expose structured data in Schema.org/JSON-LD. It means that your API Platform application is usable out of the box with technologies of the semantic web.

dokieli - :bulb: dokieli is a clientside editor for decentralised article publishing, annotations and social interactions

  •    Javascript

dokieli is a decentralised article authoring, annotation, and social notification tool which works from Web browsers. It is built with the following principles in mind: freedom of expression, decentralisation, interoperability. See the growing list of examples in the wild. Add the URLs of your articles or interactions to the list.

rdflib - RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information

  •    Python

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information as graphs. The current version of RDFLib is 4.2.2, see the CHANGELOG.md file for what's new.


Web-Karma - Information Integration Tool

  •    Java

The Karma tutorial at https://github.com/szeke/karma-tcdl-tutorial, also check out our DIG web site, where we use Karma extensively to process > 90M web pages. Karma is an information integration tool that enables users to quickly and easily integrate data from a variety of data sources including databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs. Users integrate information by modeling it according to an ontology of their choice using a graphical user interface that automates much of the process. Karma learns to recognize the mapping of data to ontology classes and then uses the ontology to propose a model that ties together these classes. Users then interact with the system to adjust the automatically generated model. During this process, users can transform the data as needed to normalize data expressed in different formats and to restructure it. Once the model is complete, users can published the integrated data as RDF or store it in a database.

create - Midgard Create, a generic web editing interface for any CMS

  •    Javascript

Create, from the Midgard Project, is a comprehensive web editing interface for Content Management Systems. It is designed to provide a modern, fully browser-based HTML5 environment for managing content. Create can be adapted to work on almost any content management backend.Create.js is built on top of VIE, the semantic interaction library powered by Backbone.js. The widgets in Create.js itself are done with the jQuery UI tools.

rdflib.js - Linked Data API for JavaScript

  •    Javascript

Javascript RDF library for browsers and Node.js. See Tutorial for using rdflib.js for more information.

dbscript

  •    PHP

dbscript is a new PHP framework for composing distributed collaborative Semantic Web applications with Microformats, OpenID and REST Web services and Qooxdoo Ajax Toolkit. install --gt; wget dbscript.net/latest.zip, unzip latest.zip, vi db/config.yml

core - The server component of API Platform: hypermedia and GraphQL APIs in minutes

  •    PHP

API Platform Core is an easy to use and powerful system to create hypermedia-driven REST APIs. It is a component of the API Platform framework and it can be integrated with the Symfony framework using the bundle distributed with the library. It natively supports popular open formats including JSON for Linked Data (JSON-LD), Hydra Core Vocabulary, Swagger (OpenAPI), HAL and HTTP Problem.

html5.vim - HTML5 omnicomplete and syntax

  •    Vim

HTML5 + inline SVG omnicomplete function, indent and syntax for Vim. Based on the default htmlcomplete.vim. This plugin contributes to vim-polyglot language pack.

jsonld.js - A JSON-LD Processor and API implementation in JavaScript

  •    Javascript

A JSON-LD Processor and API implementation in JavaScript

Embed - Get info from any web service or page

  •    PHP

PHP library to get information from any web page (using oembed, opengraph, twitter-cards, scrapping the html, etc). It's compatible with any web service (youtube, vimeo, flickr, instagram, etc) and has adapters to some sites like (archive.org, github, facebook, etc). This package is installable and autoloadable via Composer as embed/embed.

Microformat.net, a .net based flexible Microformat Parser

  •    CSharp

The Microformat.net project is a framework to help developers take advantage of the information that is stored as microformats on web pages and XML feeds in their .Net based applications. The framework is completly flexible. Each microformat that you want to parse on a web...

Oomph - A Microformats Toolkit

  •    

Oomph makes Microformats more accessible for developers, designers and users. Oomph is an amalgamation of applications: an Internet Explorer Add-in built in C++ that finds Microformats on a page; a cross-browser HTML overlay built using JQuery that aggregates Microformats; a ...

VIE - Semantic Interaction Framework for JavaScript

  •    Javascript

VIE is a utility library for implementing decoupled Content Management systems. VIE is developed as part of the EU-funded IKS project.VIE development is now targeting a 2.0 release. Read this blog post to find out the changes from VIE 1.0. There is also a good introductory post on VIE on the IKS blog.

SemanticEngine.NET

  •    

A library that enable any ASP.NET website to produce and consume semantic markup such as FOAF, APML, SIOC, XFN and microformats.

Microdata Management Toolkit

  •    Java

The Microdata Management Toolkit is a collection of tools for documenting, disseminating and preserving survey and census microdata The project is sponsored by the International Household Survey Network with financial support from the World Bank.

RDFa Developer

  •    Javascript

Firefox add-on that displays and checks RDFa. RDFa Developer project has been moved to BitBucket https://bitbucket.org/fundacionctic/rdfadev