meta-extractor - Super simple and fast html page meta data extractor with low memory footprint

  •        1

Super simple and fast meta data extractor with low memory footprint. If no callback provided returns a Promise.

https://github.com/velocityzen/meta-extractor#readme

Dependencies:

file-type : ^7.5.0
got : ^8.0.3
htmlparser2 : ^3.9.0

Tags
Implementation
License
Platform

   




Related Projects

django-meta - Pluggable app to allow Django developers to quickly add meta tags and OpenGraph, Twitter, and Google Plus properties to their HTML responses

  •    Python

This pluggable app allows Django developers to quickly add meta tags and OpenGraph, Twitter, and Google Plus properties to their HTML responses. django-meta is now maintained by Nephila on github. Old bitbucket repository won't be updated anymore.

python-goose - Html Content / Article Extractor, web scrapping lib in Python

  •    HTML

Goose was originally an article extractor written in Java that has most recently (Aug2011) been converted to a scala project. This is a complete rewrite in Python. The aim of the software is to take any news article or article-type web page and not only extract what is the main body of the article but also all meta data and most probable image candidate.

gofeed - Parse RSS and Atom feeds in Go

  •    Go

The gofeed library is a robust feed parser that supports parsing both RSS and Atom feeds. The universal gofeed.Parser will parse and convert all feed types into a hybrid gofeed.Feed model. You also have the option of parsing them into their respective atom.Feed and rss.Feed models using the feed specific atom.Parser or rss.Parser.It also provides support for parsing several popular predefined extension modules, including Dublin Core and Apple’s iTunes, as well as arbitrary extensions. See the Extensions section for more details.

Data Extractor

  •    C++

The software tool, Data-Extractor, will be able to extract the content and meta-data of files and present the extracted information in a consolidated report.


FeedParser - Parse RSS and Atom feeds in Python

  •    Python

Universal Feed Parser is a Python module for parsing syndicated feeds. It can handle RSS 0.90, Netscape RSS 0.91, Userland RSS 0.91, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, Atom 1.0, and CDF feeds. It also parses several popular extension modules, including Dublin Core and Apple's iTunes extensions.

feedhq - FeedHQ is a web-based feed reader

  •    Python

Then deploy the Django app using the recipe that fits your installation. More documentation on the Django deployment guide. The WSGI application is located at feedhq.wsgi.application.

htmlparser2 - forgiving html and xml parser

  •    Javascript

A forgiving HTML/XML/RSS parser. The parser can handle streams and provides a callback interface. A live demo of htmlparser2 is available here.

ROME

  •    Java

ROME is an set of Java tools for parsing, generating and publishing RSS and Atom feeds. The core ROME library depends only on the JDOM XML parser and supports parsing, generating and converting all of the popular RSS and Atom formats including RSS 0.90, RSS 0.91 Netscape, RSS 0.91 Userland, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, and Atom 1.0. You can parse to an RSS object model, an Atom object model or an abstract SyndFeed model that can model either family of formats.

CDex - A Free Digital Audio CD Extractor

  •    Javascript

CDex is a free Audio CD ripper, extractor and converter for various formats and encoders, e.g. MP3, AAC, WMA, FLAC, OGG, WAV, MP2, Musepack, Ape, VQF and many others. It features advanced jitter correction, ID3v2+v1 tagging, audio normalization, transcoding of compressed audio files, analog input recording and also meta tagging with CDDB and Musicbrainz. CDex has been translated to various languages.

.net HTTP Data Extractor

  •    LINQ

The nWeb Data Extractor Library provides support for extracting data from the http response html, it allows user to convert http response HTML to XML, then allows user to extract desired data form the generated xml file.

go-pkg-rss - This package reads RSS and Atom feeds and provides a caching mechanism that adheres to the feed specs

  •    Go

This package allows us to fetch Rss and Atom feeds from the internet. They are parsed into an object tree which is a hybrid of both the RSS and Atom standards.The package allows us to maintain cache timeout management. This prevents us from querying the servers for feed updates too often and risk ip bans. Apart from setting a cache timeout manually, the package also optionally adheres to the TTL, SkipDays and SkipHours values specified in the feeds themselves.

meta-tags - Search Engine Optimization (SEO) for Ruby on Rails applications.

  •    Ruby

Search Engine Optimization (SEO) plugin for Ruby on Rails applications. MetaTags master branch fully supports Ruby on Rails 4.2+, and is tested against all major Rails releases up to 5.1.

MagpieRSS - XML-based RSS parser in PHP

  •    PHP

MagpieRSS is compatible with RSS 0.9 through RSS 1.0. Also parses RSS 1.0's modules, RSS 2.0, and Atom. (with a few exceptions)

feedparser - A Cocoa RSS/Atom parser for Mac OS X and the iPhone

  •    Objective-C

FeedParser is an NSXMLParser-based RSS/Atom feed parser for Cocoa. It is intended to parse well-formed RSS and Atom feeds on both the desktop and the iPhone. The simplest way to use FeedParser is to simply add the FeedParser directory to your project. FeedParser also includes a static library target if you prefer to include it that way.

feedparser - A Cocoa RSS/Atom parser for Mac OS X and the iPhone

  •    Objective-C

FeedParser is an NSXMLParser-based RSS/Atom feed parser for Cocoa. It is intended to parse well-formed RSS and Atom feeds on both the desktop and the iPhone. The simplest way to use FeedParser is to simply add the FeedParser directory to your project. FeedParser also includes a static library target if you prefer to include it that way.

jFeed - jQuery RSS/ATOM feed parser plugin

  •    Perl

jQuery RSS/ATOM feed parser plugin

node-feedparser - Robust RSS, Atom, and RDF feed parsing in Node.js

  •    Javascript

Feedparser is for parsing RSS, Atom, and RDF feeds in node.js. This example is just to briefly demonstrate basic concepts.

RSS EXTRACTOR

  •    Java

RSS EXTRACTOR is a java library for generating RSS newsfeeds considering the RSS web feeds from multiple websites. It extracts the best of newsfeed entries and a produces a RSS file which is a fusion of newsfeed entries from several websites.

RSParser - Parser for RSS, Atom, JSON Feed, RSS-inJSON, OPML, and HTML.

  •    HTML

This framework was developed for NetNewsWire and is made available here for developers who just need the parsing code. It has no depencies that aren’t provided by the system. It also includes Objective-C wrappers for libXML2’s XML SAX and HTML SAX parsers. You can write your own parsers on top of these.