Find Jobs
Hire Freelancers

Apache Bean proof of concept Java and Phyton Integrations

$10-30 USD

Κλειστή
Αναρτήθηκε περίπου 5 χρόνια πριν

$10-30 USD

Πληρωμή κατά την παράδοση
Overview Google BigQuery as enterprise-wide analytics platform and currently has a number of projects for creating data pipelines which get data from various sources, perform necessary transformations and output data to BigQuery. The chosen technology for these data pipelines is Google DataFlow which has been open-sourced by Google and re-branded as Apache Beam. There are two Google Apache Beam SDKs: Java SDK and Python SDK. We expect you to be able to work with both. Test The test is a typical de-normalization task that is performed frequently when loading data to BigQuery. The test itself doesn’t require interaction with BigQuery, as we find that final output of transformed data to BigQuery is the easy part. The transformations in Google DataFlow are more complex and this is what we would like you to do. You’ll be given 3 files in gzip-archived JSON format that we receive from Spotify API: streams, tracks and users. Your job is to develop two pipelines in Google DataFlow (one in Java and one in Python) to denormalize these three files into one flat output JSON file. Source Files Schema of each source file is provided below. For simplicity, consider all the fields of type “string”. [login to view URL] One record example: {"message":"APIStreamData","version":"2","user_id":"9302111e473e4b76da615a47a133097a","track_i d":"7718fc9605964e8698bc108bf9008864","length":"201","cached":"","source":"others_playlist","sour ce_uri":"spotify:user:12150809594:playlist:60XVOmMeSorE8ZgLVpqDTN","device_type":"desktop","os" :"Windows","stream_country":"CO","timestamp":"2016-10-19 19:00:00 UTC","report_date":"2016-10- 19 00:00:00 UTC"} [login to view URL] One record example: {"message":"APITrackData","version":"2","track_id":"7ce9b6b3b51e46b09f5698b53ab37ea9","uri":"spo tify:track:3NHR4rVd31G2wZ2m6JNM3D","isrc":"SEUM70500101","album_code":"00602537010196","al bum_artist":"Abba","track_name":"Should I Laugh Or Cry - Intro Version","album_name":"The Visitors","track_artists":"ABBA"} [login to view URL] One record example: {"message":"APIUserData","version":"2","user_id":"47ca39a988516984267a922d586fcccf","zip_code":" 1","region":"","country":"UY","gender":"female","birth_year":"1996","referral":"","partner":"","product ":"","type":"ad","access":"free"} Output File [login to view URL] One record example: {"user_id":"aba7aebaa4972d3139a268444f1ca009","cached":"","timestamp":"2016-10-19 16:45:00 UTC","source_uri":"","track_id":"c7ff4b8cc72345bbaeec1ff6eb9bfee9","source":"other","length":"215", "version":"2","device_type":"tablet”,"message":"APIStreamData","os":"iOS","stream_country":"DO","re port_date":"2016-10-19 00:00:00 UTC","isrc":"USUL10400965","album_code":"00602498827840","product":"","country":"DO","region":" ","zip_code":"12","access":"free","gender":"male","partner":"","referral":"","type":"ad","birth_year":"1 986"} Instructions Preparation Create account on Google Cloud Platform. To get one, go to [login to view URL] and login with your Gmail email account. If you don’t have Gmail email account, register for one. You’ll be given a trial period and $300 in free spending. Create a new bucket on Google Cloud Storage and upload provided source files into it. Create a new bucket on Google Cloud Storage for staging files. You will need it to run DataFlow jobs Create two solutions one for Java SDK and one for Python SDK. It’s up to you what IDE and configuration to use. At UMG we are using Intellij IDEA + Gradle for Java SDK and Intellij PyCharm IDE for Python. Read files from Google Cloud Storage Write resulting denormalized file to Google Cloud Storage Approximate number of code lines for Java solution is 350 lines and for Python solution around 100 lines. There will be some routine tasks such as converting from/to JSON, dealing with key/value pairs for joining data, etc. We leave this to your expertise to figure out how to do it. Good luck!
Ταυτότητα εργασίας: 19353895

Σχετικά με την εργασία

6 προτάσεις
Απομακρυσμένη Εργασία
Ενεργός/ή 5 χρόνια πριν

Ψάχνεις τρόπο για να κερδίσεις μερικά χρήματα;

Πλεονεκτήματα πλειοδοσίας στο Freelancer

Καθόρισε τον προϋπολογισμό σου και το χρονοδιάγραμμα
Πληρώσου για τη δουλειά σου
Περίγραψε την πρόταση σου
Η εγγραφή και η πλειοδοσία σε εργασίες είναι δωρεάν
6 freelancers δίνουν μια μέση προσφορά $144 USD για αυτή τη δουλειά
Avatar Χρήστη
Hi, I have worked in Cloud Dataflow and Apache Beam using both Python and Java SDK for developing a data analytics platform. I used BigQuery, Bigtable & Cloud Spanner as the sinks. I'm bidding a "dummy" amount as I am not sure of the scope and purpose of the project. Let me know if you want to discuss further. Regards, Monir
$30 USD σε 10 ημέρες
5,0 (2 αξιολογήσεις)
2,7
2,7
Avatar Χρήστη
hi i haven't read. just a developer here. please raise price to min 100 for me to read ________________________________________________________________________________________________________________________
$111 USD σε 1 ημέρα
0,0 (0 αξιολογήσεις)
0,0
0,0

Σχετικά με τον πελάτη

Σημαία της COSTA RICA
Heredia, Costa Rica
0,0
0
Μέλος από Απρ 25, 2019

Επαλήθευση Πελάτη

Ευχαριστούμε! Σου έχουμε στείλει ένα email με ένα σύνδεσμο για να διεκδικήσεις τη δωρεάν πίστωση σου.
Κάτι πήγε στραβά κατά την προσπάθεια αποστολής του email σου. Παρακαλούμε δοκίμασε ξανά.
Εγγεγραμμένοι Χρήστες Συνολικές Αναρτημένες Δουλειές
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Φόρτωση προεπισκόπησης
Δόθηκε πρόσβαση για Geolocation.
Η σύνδεση σου έχει λήξει και τώρα έχεις αποσυνδεθεί. Παρακαλούμε συνδέσου ξανά.