Aaron Ardiri
[Valid RSS] RSS/XML feed
198 entries available (show all)

 

Internet of Things (IoT)
   

PLEASE TAKE A MOMENT TO FOLLOW MY NEW VENTURE:
 
RIoT Secure AB
 
ALL SECURITY RELATED TOPICS ON IoT wILL BE POSTED THERE


2016-11-30
>> ANATOMY OF A PARALLEL APPLICATION (MPI BASICS)

It is one thing to just run an application; but another to understand how it works internally.

In an earlier blog we demonstrated the execution of the Pokémon GO brute-force hash generator algorithm running as a parallel application on our Raspberry Pi cluster. While we were quick to jump into seeing the performance of our cluster; one thing we overlooked was explaining how parallel applications are written and walk through the code to understand how it works.

Up until a few days ago; the only remnants of parallel programming I had in my head was dating way back to my university years when we had one course on the subject and we had access to a 64-CPU 1Mhz computer offering vector processing (run the same command across multiple nodes). The first step was to study the tutorials over at mpitutorial.com to get acquainted with MPI.

Thankfully; I am a quick learner - here are my findings and how I designed the parallel application.

DECLARATIONS AND CONSTANTS
First of all; we need to define that we wish to use the MPI library - a simple #include does this.

    #include <mpi.h>
    
    #define NODE_MASTER  0
    #define MIN_NODES    2
    #define MAX_NODES    16
    
    #define TAG_SEED     0x01 // master to slave - are you alive?
    #define TAG_WORK     0x02 // master to slave - here is a job for you
    #define TAG_DIE      0x03 // master to slave - ok, you can shutdown 
    #define TAG_RESULT   0x04 // slave to master - here is the result

We also need to define a few constants to help with the flow of the application. MPI messaging allows for the tagging of messages, which are great for logic decisions; we have created a few in our case - the first to seed the slaves; the second to do some work and a third to tell the slaves they can shutdown. We also need another to send results from the slave back to the master.

MPI INITIALIZATION
MPI is more than just a library; there is a lot that happens behind the scenes in regards to inter process communication between different nodes (utilizing ssh) and the spawning of the various processes. The good news is that initialization is a one liner; we then need to identify our position in the MPI space and determine what type of node we need to act as.

    int  rank, size;
    char name[MPI_MAX_PROCESSOR_NAME];
    int  name_len;
    
    // initialize MPI
    MPI_Init(&argc, &argv);
    
    // get the environment information from MPI
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    // get the name of the worker
    MPI_Get_processor_name(name, &name_len);
    fprintf(stdout, ":: '%s' - process %d of %d\n", name, rank, size);
    
    // do we have enough processes: we need at least two nodes
    if ((size < MIN_NODES) || (size > MAX_NODES)) exit(0);
    

At this point, we have two variables rank and size. While it is common practice (as we will do) to make the master the node that has the rank of 0 - it doesn't have to be this way. You may wish to utilize a specific machine for the master and you could check the processor name to do this.

    switch (rank)
    {
      // MASTER
      case NODE_MASTER:
           // put the master logic here
           break;
    
      // SLAVE
      default:
           // put the slave logic here
           break;
    }

Based on the rank variable - we switch the application into acting as a master or slave. MPI deploys the same application across all the nodes where it spawns the process, so we need to manage this check ourselves and act appropriately, a simple switch or if-then-else statement will be sufficient to separate the logic between being a master and a slave.

LOGIC: MASTER NODE
The master node is the brains of the application as it is responsible for issuing jobs to all the slave nodes until a result it found. The master will issue three types of messages to the slaves - the first to see if the slave is alive; the second to issue a job to perform and the last to shutdown the slave.

    int node, job, result;
    
    // seed the slaves
    for (node=1; node<size; node++)
      MPI_Send(NULL, 0, MPI_INT, node, TAG_SEED, MPI_COMM_WORLD);
    
    // keep passing out tasks to slaves until a solution is found
    for (;;)
    {
      // let's receive the status of the slave computation
      MPI_Recv(&result, 1, MPI_INT,
               MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
      if (result == 1) break;
    
      // if we didn't get a result, issue a new tasks to the same slave
      node = status.MPI_SOURCE;
      MPI_Send(&job, 1, MPI_INT, node, TAG_WORK, MPI_COMM_WORLD);
    }
    
    // inform all slaves to shutdown
    for (node=1; node<size; node++)
      MPI_Send(NULL, 0, MPI_INT, node, TAG_DIE, MPI_COMM_WORLD);

Seeing the slaves isn't technically required; but it does make the flow of main processing loop simpler. The basic idea of the master is to wait until a response is received by a slave and then issue the slave that responded with a new task. This ensures maximum use of all the slaves, the last thing we want is slaves waiting for their turn - as I can try to explain below with two slaves:

    {node}   {execution time} 
    ----------------------------------------------------------
    slave 1: xxxxxxxxxx
    slave 2: xxxxxx
    slave 2:       xxxxxxxxxx
    slave 1:           xxxxxxxxxxx
    salve 2:                 xxxxxxxxxxxxxxxxxxx
    slave 1:                      xxxxxxxxxxx
    slave 1:                                 xxxxxxxxxxx
    salve 2:                                    xxxxxxxxxxxxxx

In the above example; we have two slave nodes executing in parallel. We issue a new task to the one that responds first, as not all tasks may take the same amount of time. If we were to assume linear assignment - we would have a number of dead spots, for example (same data):

    {node}   {execution time} 
    ----------------------------------------------------------
    slave 1: xxxxxxxxxx
    slave 2: xxxxxx....
    slave 1:           xxxxxxxxxxx
    slave 2:           xxxxxxxxx.
    slave 1:                      xxxxxxxxxxx.......
    salve 2:                      xxxxxxxxxxxxxxxxxxx
    slave 1:                                         xxxxxxxxx
    salve 2:                                         xxxxxxxxx

It should be clear here that points where a dot ('.') exist are times where the slave will sit idle; waiting for its turn to be issued a new command, it is great that we know the id of the node that responded so we can avoid this type of dead time in the parallel execution process.

LOGIC: SLAVE NODE
The slave node just needs to execute the job it has been asked to do do and send back a result.

    // the slaves run until told to shutdown
    for (;;)
    {
      MPI_Status status;
      int        job, result;
    
      // receive the work request from the master
      MPI_Recv(&job, 1, MPI_INT,
               NODE_MASTER, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
    
      // attempt to break the code
      switch (status.MPI_TAG)
      {
        case TAG_SEED:
             result = 0; // no result, let the master know we are here
             break;
    
        case TAG_WORK:
             result = /** do the job, based on 'job' params passed **/
             break;
    
        case TAG_DIE:
             goto SLAVE_DONE;
      }
    
      // send the result of the work request back to the master
      MPI_Send(&result, 1, MPI_INT, NODE_MASTER, TAG_RESULT, MPI_COMM_WORLD);
    }
    
    SLAVE_DONE:;

The core logic is to run in an infinite loop until a TAG_WORK or TAG_DIE message is received from the master. 99.9% of the time, the slave will be working way doing some form of complex processing based on the parameters that have been sent from the master. When it received the TAG_DIE message, it should break from the infinite loop and shutdown nicely.

MPI TERMINATION
The final step of the parallel application is to tell the MPI environment that it can shut itself down.

    // shutdown MPI
    MPI_Finalize();

In turn, this will clean up all the resources that were utilized to run the application in parallel across the various nodes that were defined. It will also bring down any slaves that may not have shutdown correctly and collate any error messages that may have incurred as a result of the shutdown.

Hopefully; with this knowledge available - you can go back to the previous blog entry and download the source code and get a better understanding of how the single process was parallelized using MPI, in the mpi-sources.zip I have included a number of versions, to do some time comparisons.

    >> max=50/min=12 split=4_4       >> max=50/min=12 split=5_3
    mpi_call: 30958                  mpi_call: 323521
    attempts: 1435441080             attempts: 1428129840
    real      2m24.406s              real      2m59.222s

If we run the numbers, we can approximate that the overhead of the MPI messaging framework compared to the time it takes to calculate a hash lets us know that we can do approximately 1200 hash calculations in the same time a single MPI request is made. It makes sense to identify this ratio to ensure you can maximize the benefit of making the application parallel in the first place.


 

advertisement (self plug):
need assistance in an IoT project? contact us for a free consultation.

 



µTLS - defining lightweight security for IoT (part 1)
 
Parallel applications on a Raspberry Pi cluster

DISCLAIMER:
All content provided on this blog is for informational purposes only.
All comments are generated by users and moderated for inappropriateness periodically.
The owner will not be liable for any losses, injuries, or damages from the display or use of this information.