Extract 50,912 rows from website table (table displays 25 rows per page, split into 2,037 pages)

  • Κατάσταση: Closed
  • Βραβείο: $50
  • Ληφθείσες Συμμετοχές: 8
  • Νικητής: Krish1019

Σύνοψη Διαγωνισμού

Winner will be the first person to upload a screenshot showing the bottom row of the extracted data, along with the relevant columns of data, (example image attached).

Have made this a contest rather than a project because someone with the technical knowledge may be able to extract the following data very quickly if they have a system that can run through 2,037 pages very quickly. If this contest is unsuccessful, I will make it a project, where someone may have to do the extraction manually.

I have found a website table has 50,912 results, but the results are split into about 2,037 pages because there is a limit of 25 being shown per page.

The starting page is here - https://tinyurl.com/yy7vksyh

The ending page is here - https://tinyurl.com/y3ugp23q

Navigating between pages simply changes offset= in the url.

You can either manually or programatically go through each page and extract the content from the table.

The 7 columns along with their 50,912 rows/results of data needs to be displayed within an excel file.

The column headers are as follows - CUS, CN code, CAS RN, EC number, UN number, Nomen, Name.

Simply put, I want the whole table displayed within one table/sheet, for use in excel.

I can do it myself, but would have to manually copy and paste each pages data.
I am creating this project because someone may have expertise in a simple data crawler/miner/extraction software that can run through the 2,037 pages very quickly.

If there are no entrants within the 3 days, I will instead make this a project for someone. But I'm hoping someone can compile the data very quickly within this contest.

Along with description & url links here, I have also attached some Images which show the website, what to query & what is shown. I have highlighted the table which displays the column names followed by the results.

Be careful of the CAS number data...they can automatically convert to dates, they are the ones which look like 1115-69-1 , 5417-52-7.

Thanks!

Προτεινόμενες Δεξιότητες

Σχόλια Εργοδότη

“Did exactly what was required without me needing to explain for a second time. A technical wizard in regards to data extraction, will hire again in the future for more extraction tasks. Thanks.”

Εικόνα προφίλ efficienttrade, New Zealand.

Κορυφαίες καταχωρήσεις από αυτόν τον διαγωνισμό

Προβολή Περισσότερων Συμμετοχών

Δημόσιος Πίνακας Διευκρινίσεων

  • efficienttrade
    Κάτοχος Διαγωνισμού
    • Πριν από 4 χρόνια

    Thanks, Krishna was able to extract all the data quickly using python. I appreciate the people who were manually trying to extract everything, but python script will always win in a task like this, so it would be a good skill to learn.

    • Πριν από 4 χρόνια
  • rkdesi
    rkdesi
    • Πριν από 4 χρόνια

    working..., hope you will wait till last time

    • Πριν από 4 χρόνια
  • Krish1019
    Krish1019
    • Πριν από 4 χρόνια

    Extracted all 50912 rows as mentioned.

    • Πριν από 4 χρόνια
  • Krish1019
    Krish1019
    • Πριν από 4 χρόνια

    Please check entry #7

    • Πριν από 4 χρόνια
  • efficienttrade
    Κάτοχος Διαγωνισμού
    • Πριν από 4 χρόνια

    #6 currently winning at approx 10,000 rows extracted so far, #5 coming in second with approx 7,000 rows extracted as of last screenshot. Race is on for first to extract all rows and display a screenshot. No bluffing. Thanks.

    • Πριν από 4 χρόνια
  • MdTanzil
    MdTanzil
    • Πριν από 4 χρόνια

    Please check entry #6

    • Πριν από 4 χρόνια
  • WahajRocky
    WahajRocky
    • Πριν από 4 χρόνια

    Trust on me i will do your work sir

    • Πριν από 4 χρόνια
    1. WahajRocky
      WahajRocky
      • Πριν από 4 χρόνια

      please give me this opportunity please .

      • Πριν από 4 χρόνια
    2. efficienttrade
      Κάτοχος Διαγωνισμού
      • Πριν από 4 χρόνια

      Hey there, you still may have time, people have extracted between 7,000 - 30,000. Not sure how much longer before someone will gather all rows.

      • Πριν από 4 χρόνια
  • rkdesi
    rkdesi
    • Πριν από 4 χρόνια

    Working....

    • Πριν από 4 χρόνια
    1. efficienttrade
      Κάτοχος Διαγωνισμού
      • Πριν από 4 χρόνια

      You are winning so far. #4 said he needs to go do some errands, so you might beat him if you continue. I'm also not sure if he actually has extracted the 30,000 because the screenshot looks a bit strange?

      • Πριν από 4 χρόνια
  • efficienttrade
    Κάτοχος Διαγωνισμού
    • Πριν από 4 χρόνια

    #5 is winning so far. Quality data extract with screenshot showing where he is up to, excellent. Look forward to awarding once rest is done. No one appears to be automating the process so far, so manual entry may win!

    • Πριν από 4 χρόνια
  • rkdesi
    rkdesi
    • Πριν από 4 χρόνια

    Please check #5

    • Πριν από 4 χρόνια

Προβολή περισσότερων σχολίων

Πώς να ξεκινήσετε με τους διαγωνισμούς

  • Δημοσιεύστε τον διαγωνισμό σας

    Αναρτήστε τον διαγωνισμό σας Γρήγορα και εύκολα

  • Λάβε ένα σωρό συμμετοχές

    Λάβετε Πολλές Συμμετοχές Από όλο τον κόσμο

  • Βραβεύστε την καλύτερη καταχώρηση

    Βραβεύστε την καλύτερη καταχώρηση Κατεβάστε τα αρχεία - Εύκολα!

Αναρτήστε ένα Διαγωνισμό Τώρα ή Ελάτε Μαζί Μας Σήμερα!