Displaying 1 to 3 from 3 results

Data Extracting SDK


Data Extracting SDK can help you to extract information from the web resources in a simple way.

node-readability - Scrape/Crawl article from any site automatically

  •    Javascript

In my case, the speed of spider is about 1500k documents per day, and the maximize crawling speed is 1.2k /minute, avg 1k /minute, the memory cost are about 200 MB on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.

req-fast - Fastest way to fetch the web content(HTML stream) from server, supports:redirects, auto decode(e

  •    Javascript

When options is instance of String, it means the URL of server that to be requested.uri || url Url to which the request is sent.