Silverlight HtmlExtractor

  •        0

Extractor html text from webpage

http://htmlextractor.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

Htmlxtractor - 基于统计的网页去噪


基于linkNum/textNum比例的网页去噪 1. 介ç»�:1) xpspider对于一个é�žhub页,除了正文之外,在周边通常存在一些链接或者广告等"噪声"ä¿¡æ�¯ã€‚通过编写正则表达å¼�å�¯ä»¥æ–¹ä¾¿ä¸”准确地对正文进行抽å�–(例如工具:xpspider),但是需è¦�具备正则表达å¼�知识。 2) 本工具采用了一ç§�基于统计学的新方法:> 首先对获å�–çš„HTML代ç �创建DOMæ ‘ï¼› > 然å�Žæ·±åº¦ä¼˜å…ˆé��历DOM树,对æ¯�个结点统计其包å�«çš„链接数ç

Html-extractor - 正文��


项目介ç»�程åº�用于分æž�å’Œæ��å�–出网页的正文部份,以便入库和索引,ä¸�需è¦�针对网页结构定制出相应的模æ�¿ã€‚æ��高使用效率。 项目背景原本å�ªæ˜¯å› ä¸ºéœ€è¦�写一个简å�•çš„HTML解æž�器(其实php simple html dom以ç»�很强大了),ä¸�过在我这里需è¦�çš„å�ªæ˜¯ä¸€ä¸ªå¾ˆç®€å�•çš„分æž�,所以没有使用它。自己简å�•çš„实现了一个。在å�Žæ�¥æœ‰å¾ˆå¤šæœ‹å�‹åœ¨é—®å…³äºŽç½‘页正文æ��å�–的东西,网上原ç�†ç¼–的一大堆,ä¸

Javascripthtmlextractor - a html extractor in javascript


A html extractor in javascript. usage: ---- jhe_im(extract_conditions...) return inner html match the extract conditions. jhe_om(extract_conditions...) return outter html match the extract conditions. jhe_ma(extract_conditions..., attributeName) return the attribute value in the special tag that match the extract conditions. jhe_mt(extract_conditions...) return the text in the special tag that match the extract conditions. about the extract_conditions, extract conditions are uncertain length arg

HtmlExtractor


From html generate code useful in your SharpKit C# -> Javascript project to find and use css selectors.

html-extractor


Extract meta-data from a html string. It extracts the body, title, meta-tags and first headlines to a object to push them to a search indexer like elastic-search