Using Python as output, I require the following:-
Spider
Spider http/https URL and take a text file with a ‘project name’ , ‘base URL’ along with a ‘depth’ parameter as input passed on the command line.
The spider should find all site pages and follow links to ‘n’ depth. It should output a text file containing full url for each page found on separate lines.
The format should be comma delimited text file. The naming should be unique (e.g projectname_’urllist’_datetime)
Line number, ‘project name’, datetime, “url”
Phase 1 – Sprint 2
Cookie Extract
The second process is to extract the cookie information for each url in the result textfile from the Spider process. It should be command line called passing in the textfile.
The output should be a comma delimited text file:-
Line Number, datetime, “url”, json(“cookie name=”, “cookie type=”, “cookie attributes key=value pairs”).
(see [login to view URL] )
The format should be comma delimited text file. The naming should be unique (e.g projectname_’cookielist’_datetime)