Find Jobs
Hire Freelancers

Parallel Programming with MPI in C or C++

$30-250 USD

Σε Εξέλιξη
Αναρτήθηκε περίπου 11 χρόνια πριν

$30-250 USD

Πληρωμή κατά την παράδοση
In this project you are going to implement a parallel algorithm with C/C++ language usinMPI library. MapReduce Everyday we create tremendous amounts of data in our every activity. Tweeter adds 400million tweets to its database in every day. The sensors in the Large Hadron Collider at Cern records petabytes of data each year. We can find many more examples from astronomy, biology, internet activities, sensor networks, etc. This makes big-data processing, today's one of the biggest computer science and engineering problems. Distributed processing is the general approach for handling large volume data, but designing an efficient distributed system is a challenging task. There are some general distributed pro- gramming frameworks in order to simplify the implementation. One of them is the MapReduce model. There are many libraries for MapReduce, so that the programmer does not care about the distribution of the data, instead he or she supplies the necessary map and reduce functions. However, we are not going to use any library; instead, we borrow the idea from this model, and implement our solution using MPI library in C or C++. In this project, you are going to demonstrate a small distributed data processing solution using the MapReduce programming model. This model consists of map and reduce steps. In the map step, master node takes the input, divides it and distributes to worker nodes and each worker node works on its own data independently. In the reduce step, the master node collects the answers from the workers, and combines them to generate the final result. (This programming model can be implemented in a multi-level way, i.e a worker node can map its input to other idle workers and collect the results, but we are not going to implement this.) Problem Definition You are going to extract records and calculate statistics from a large gene expression database. The data set consists of 2467 genes. Each gene can belong to one of the following 6 classes: tricarboxylic acid cycle (TCA), respiration (Resp), cytoplasmic ribosomes (Ribo), proteasome (Proteas), histones (Hist) and helix-turn-helix proteins(HTH). There are also 79 expressions for each gene, corresponding to different measurements. The data is stored in a tab separated file where the first column is the unique identifier of the gene (ORF=open reading frame), the second column is the name, next 6 columns are the class labels and the remaining 79 columns are the measurements (Table 1). When your program starts, the master node should load the data, divide and distribute it among the worker processors. Then, the master node should wait for the user to input a query. 1 ORF NAME TCA Resp Ribo Proteas Hist HTH alpha 0 alpha 7 . . . YMR056C AAC1 TRAN... -1 -1 -1 -1 -1 -1 -0.18 -0.58 . . . YBR085W AAC3 TRAN... -1 -1 -1 -1 -1 -1 -0.01 -0.42 . . . YNL141W AAH1 PURI... -1 -1 -1 -1 -1 -1 0.46 -0.71 . . . ... ... ... ... ... ... ... ... ... ... . . . Table 1: Gene Expressions. If a gene is labeled with a class its corresponding value is 1, otherwise it is -1. There will be 2 types of queries as listed below. When your program answers a query, it should not terminate, instead wait for the next query. Your program should terminate when the user enters: quit 1. Finding a record The user may want to see the data about a single gene. For example, if the user wants to see the gene YMR056C, he or she will enter: gene YMR056C Your output should contain all information about the gene. The output format is as follows: YMR056C Name: AAC1 TRANSPORT MITOCHONDRIAL ADP/ATP TRANSLOCATOR TCA: -1 Resp: -1 Ribo: -1 Proteas: -1 Hist: -1 HTH: -1 alpha 0: -0.18 alpha 7: -0.58 ... ... 2. Calculating Statistics The user may wonder the mean and the standard deviation of the measurements of genes belonging to a specific class. For example, if the user wants to list the statistics for the TCA class, he or she enters: class TCA You have to output the mean and standard deviation of the 79 measurements of the
Ταυτότητα εργασίας: 4561145

Σχετικά με την εργασία

3 προτάσεις
Απομακρυσμένη Εργασία
Ενεργός/ή 11 χρόνια πριν

Ψάχνεις τρόπο για να κερδίσεις μερικά χρήματα;

Πλεονεκτήματα πλειοδοσίας στο Freelancer

Καθόρισε τον προϋπολογισμό σου και το χρονοδιάγραμμα
Πληρώσου για τη δουλειά σου
Περίγραψε την πρόταση σου
Η εγγραφή και η πλειοδοσία σε εργασίες είναι δωρεάν
Βραβεύτηκε στον/στην:
Avatar Χρήστη
I have extensive experience with MPI; I have done several projects running on our university's super cluster. Also see my PM for more info.
$150 USD σε 3 ημέρες
4,3 (2 αξιολογήσεις)
3,2
3,2
3 freelancers δίνουν μια μέση προσφορά $293 USD για αυτή τη δουλειά
Avatar Χρήστη
Hello, I will implement this program in C++ and MPI. Thanks, Paul
$500 USD σε 7 ημέρες
4,9 (48 αξιολογήσεις)
5,4
5,4
Avatar Χρήστη
Hi, expert in parallel programming here, I'll use C/C++/MPI for querying your gene database. Thank you, Danny
$250 USD σε 5 ημέρες
5,0 (11 αξιολογήσεις)
2,4
2,4
Avatar Χρήστη
Hi, please see my PM
$230 USD σε 10 ημέρες
5,0 (5 αξιολογήσεις)
2,2
2,2

Σχετικά με τον πελάτη

Σημαία της TURKEY
Istanbul, Turkey
5,0
1
Επαληθευμένη μέθοδος πληρωμής
Μέλος από Μαΐ 27, 2013

Επαλήθευση Πελάτη

Ευχαριστούμε! Σου έχουμε στείλει ένα email με ένα σύνδεσμο για να διεκδικήσεις τη δωρεάν πίστωση σου.
Κάτι πήγε στραβά κατά την προσπάθεια αποστολής του email σου. Παρακαλούμε δοκίμασε ξανά.
Εγγεγραμμένοι Χρήστες Συνολικές Αναρτημένες Δουλειές
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Φόρτωση προεπισκόπησης
Δόθηκε πρόσβαση για Geolocation.
Η σύνδεση σου έχει λήξει και τώρα έχεις αποσυνδεθεί. Παρακαλούμε συνδέσου ξανά.