Build a php based crawler

Σε Εξέλιξη Αναρτήθηκε Jan 17, 2016 Πληρώθηκε κατά την παράδοση
Σε Εξέλιξη Πληρώθηκε κατά την παράδοση

We need to crawl in total 10 different sites

Project plan // readme

PHP based server-side application that..

Crawls through specified web content (multiple \'data URLs\', not just a single one.) (Data URL = data source web page)

Data URLs can be looped (for loop for query string), extracted from an array, or crawled from a web page with certain rules to find the right URLs.

Extracts specified data fields from the \'data URLs\', and forms a JSON object of them.

Stores this JSON object into MongoDB using MongoDB PHP Driver.

When executed, is able to send the crawled data from MongoDB in JSON to a remote server via HTTP POST request.

Example of a page to be crawled (1):

[url removed, login to view]

Data URLs are found from this page, before the \'h3.job-header\' element. -> Example Data URL example: [url removed, login to view] Data to crawl from a data URL example: [url removed, login to view]

Example of a page to be crawled (2):

[url removed, login to view]

Data URLs are found from this page as a first link element inside of \'tr.job-item\' table row. -> Example Data URL example: [url removed, login to view] Data to crawl from a data URL example: 1. [url removed, login to view] 2. [url removed, login to view] (Note that some of the data pieces will have to extracted from a text part, so certain amount of regular expressions // other data sorting methods are required here.)

The application should be structured in a way, where core crawling & data processing models are separate from the data source specific crawler parts.

By \'data source specific\', I mean that we\'ll want a separate files for each web resource to be crawled, for example \'Cvmarket\' for [url removed, login to view] specific rules / guides, and \'CVCrawler\' for [url removed, login to view] specific rules. These should extend the core crawling & data modelling parts of the application.

Example:

-> [url removed, login to view] executes multiple individual .php files, which each crawl a single data source.

-> It simultaneously spits progress output (+ possible errors) in console + log file about the process. (example below)

### Storing the results to database.. ###

### Data stored successfully! ###

Example application structure from our other crawler project: [url removed, login to view]

(Here\'s zipped repo to examine the prevous project\'s code for guidance: [url removed, login to view])

1. [url removed, login to view] <- cURL implement functions

2. [url removed, login to view] <- Bundless individual crawler files into one executable, and stores the results into MongoDB

3. [url removed, login to view], [url removed, login to view], [url removed, login to view] + the rest <- individual crawler files for 1 file per 1 data source.

The MongoDB should look like this, when the [url removed, login to view] has been executed a couple times: [url removed, login to view]

Σχεδιασμός Γραφικών HTML PHP Σχεδιασμός Ιστοσελίδας

Ταυτότητα Εργασίας: #9357555

Σχετικά με την εργασία

6 προτάσεις Απομακρυσμένη εργασία Ενεργό Jan 17, 2016

6 freelancers κάνουν προσφορές κατά μέσο όρο $546 για αυτή τη δουλειά

AwaisChaudhry

A proposal has not yet been provided

$309 USD σε 10 μέρες
(32 Αξιολογήσεις)
6.1
imran009cse

Good Day, Thank you so much for giving me an opportunity to apply this wonderful job opportunity. I read your whole job requirements very carefully and understand what you actually wanted for your project. In your Περισσότερα

$526 USD σε 10 μέρες
(77 Αξιολογήσεις)
5.7
umairmalik10

Hi, I've overviewed the details of the project provided by you and I am please to declare myself the right candidate for this project as I've completed many successful projects of such kind (urls will be provided on re Περισσότερα

$666 USD σε 10 μέρες
(36 Αξιολογήσεις)
4.9
origamisolution

Thank you for reviewing my qualifications. I am a high-level programmer and who prefers to work alone. I do not outsource any projects in order to maintain quality control. I provide free project support from beginning Περισσότερα

$722 USD σε 10 μέρες
(14 Αξιολογήσεις)
4.8
saraca29n0v

A proposal has not yet been provided

$500 USD σε 18 μέρες
(14 Αξιολογήσεις)
4.9
ss48nancy

A proposal has not yet been provided

$555 USD σε 10 μέρες
(0 Αξιολογήσεις)
0.0