newspaper - 💡 News, full-text, and article metadata extraction in Python 3. Advanced docs:

  •        202

Newspaper has seamless language extraction and detection. If no language is specified, Newspaper will attempt to auto detect a language. Check out The Documentation for full and detailed guides using newspaper.

https://goo.gl/VX41yK
https://github.com/codelucas/newspaper

Tags
Implementation
License
Platform

   




Related Projects

colly - Elegant Scraper and Crawler Framework for Golang

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

Spycyroll - Python News Aggregator

  •    Python

A News Aggregator - not a news reader - to collect news from subscribed RSS channels.

news-android - 📱:newspaper: Android client for the Nextcloud news/feed reader app

  •    Java

The Nextcloud News Reader Android App is under AGPLv3 License terms.You can join the translation team and improve one of over 100 languages (it's the android-news app).

RSSOwl

  •    Java

Applications that collect data from RSS-compliant sites are called RSS readers or "aggregators." RSSOwl is such an application. RSSOwl lets you gather, organize, update, and store information from any compliant source in a convenient, easy to use interface, save selected information in various formats for offline viewing and sharing, and much more. It's easy to configure and the best of all: It's platform-independent.


hacker-news-undocumented - Some of the hidden norms about Hacker News not otherwise covered in the Guidelines and the FAQ

  •    

Hacker News, a simple link aggregator owned and operated by Silicon Valley startup incubator Y Combinator, has had many positive effects on SV startups and engineers as a whole. On Hacker News, users receive Karma whenever another user upvotes a submission or comment they made, which incentives positive contributions to the community. However, in maintaining its simplicity, many new features and behaviors added over the years on Hacker News are not fully documented other than the occasional comments from staff. This list details some of the hidden norms about Hacker News not otherwise covered in the Guidelines and the FAQ, along with a few bonus features outside of typical HN usage. If there is anything missing/incorrect from this list, feel free to file a GitHub issue/PR.

dotProject news aggregator

  •    PHP

A rss based news aggregator module for dotProject (http://dotproject.net). It enables you to rapidly view project relevant news from several sources.

MyNewsGroups :)

  •    Javascript

MyNewsGroups :) is a Web based USENET news crawler, news reader and news poster. With the use of a DB backend, the crawler fetch the newsgroups messages ONCE only. Web based environment, SPAM Filters, Search Engine, Subscriptions and much more.

news

  •    

The AMH News Syndicate Newspaper Online Version 1.0 ----------------------------------- General Information Description News,Information,Music,Movies,Fun and Games Website amhnews.weebly.com Blog amhnewssyndicate.blogspot.com amhnews.wordpress.com amhnews.livejourna...

colly - Fast and Elegant Scraping Framework for Gophers

  •    Go

Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

headless-chrome-crawler - Distributed crawler powered by Headless Chrome

  •    Javascript

Crawlers based on simple requests to HTML files are generally fast. However, it sometimes ends up capturing empty bodies, especially when the websites are built on such modern frontend frameworks as AngularJS, React and Vue.js. Note: headless-chrome-crawler contains Puppeteer. During installation, it automatically downloads a recent version of Chromium. To skip the download, see Environment variables.

lylina rss/atom aggregator

  •    PHP

lylina is a river-of-news style web-based RSS/Atom feed aggregator loosely based on it's predicessor, lilina. It features a dynamic multi-user system and a scalable database backend, providing the best way to get all your news in one place.

hacker-menu - Hacker News Delivered to Desktop :dancers:

  •    Javascript

Hacker Menu stays on your menu bar and delivers the top news stories from Y Combinator news aggregator, built with love by @jingweno & @lokywin. It's powered by Electron and Node.js.Website: https://hackermenu.io.

RSS Owl | RSS / RDF / Atom Feed Reader

  •    Java

RSS Owl is a powerful application to organize, search and read feeds.

HackerNewsAPI - :newspaper: Unofficial Python API for Hacker News

  •    Python

Unofficial Python API for Hacker News. NOTE: Do not make a lot of requests in a short period of time. HN has it's own throttling system.

FreshRSS - A free, self-hostable aggregator…

  •    PHP

FreshRSS is a self-hosted RSS feed aggregator such as Leed or Kriss Feed. It is at the same time lightweight, easy to work with, powerful and customizable.

Liferea

  •    C

Liferea (Linux Feed Reader) is a GTK desktop news aggregator for online news feeds, weblogs and podcasts. The project focus is on simplicity and easy installation.

Storm Crawler - Web crawler SDK based on Apache Storm

  •    Java

StormCrawler is an open source collection of resources for building low-latency, scalable web crawlers on Apache Storm. StormCrawler is a library and collection of resources that developers can leverage to build their own crawlers. The good news is that doing so can be pretty straightforward. Often, all you'll have to do will be to declare StormCrawler as a Maven dependency, write your own Topology class (tip : you can extend ConfigurableTopology), reuse the components provided by the project and maybe write a couple of custom ones for your own secret sauce.

newsdiffs - Automatic scraper that tracks changes in news articles over time.

  •    Python

A website and framework that tracks changes in online news articles over time. This is free software under the MIT/Expat license; see LICENSE. The project's source code lives at http://github.com/ecprice/newsdiffs .