+
Skip to content

time2die/tScaper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

tScaper

console web scraper

Accepts as command line parameters:

  • (must) web resources URL or path to plain text file containing a list of URLs
  • (must) data command(s)
  • (must) word (or list of words with “,” delimiter)
  • (nice to have) output verbosity flag, if on then the output should contains information about time spend on data scraping and data processing (-v)

Supports the following data processing commands:

  • (must) count number of provided word(s) occurrence on webpage(s). (-w)
  • (must) count number of characters of each web page (-c)
  • (nice to have) extract sentences’ which contain given words (-e)

Data processing results should be printed to output for each web resources separately and for all resources as total.

Command line parameters example for Java implementation:

java –jar scraper.jar http://www.cnn.com Greece,default –v –w –c –e

About

console web scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载