Parallel and Distributed Computing
Does your application have a communication bottleneck? Why?
In this assignment you are to write a C or Fortran program that sends a token around the network for an arbitrary number of nodes:
1. Login to reinhardt via ssh
2. I recommend starting with your code from the lecture that illustrates the 2-node ring.
3. Write a makefile that will compile your code.
4. Write a PBS script to submit to the supercomputer.
5. Use MPI to send a token around the network for any number of nodes.
6. Compute the network bandwidth using two communication patterns: ring and ping-pong.
7. Verify that your code is working correctly.
8. Create a PDF file with a write-up about this assignment. Upload it to Brightspace.
Understanding how to send and receive data to/from other nodes is imperative in this class. While you may not always use Send/Recv pairs in all codes that we write, these MPI calls are the basis for the message-passing paradigm. In addition, you will be timing the execution, which is needed in future assignments to examine the performance of your parallel code.
Start by sending a single integer between two nodes, like we did in class, and confirm that your program is working correctly. For the ring, you do not need any loops, only an if/else style structure. Remember that each rank is running the exact same code at the exact same time; use the if/else to direct the ranks to run the appropriate code. The ping-pong style communication will require a loop, but only for rank 0. Now, try sending a larger structure, like an array, and time how long it takes to travel between the nodes. Take note of where you put your timers, as putting them in different places will give you different information. The size of the array should be large, ~1GB. Use this information to compute the bandwidth of the network (GB/s and/or Gb/s)
Describe the differences between the two algorithms, how they create their respective communication pattern, and confirm that both report the same network bandwidth (a small amount based on network traffic). You will often find that HPC applications that have a communication bottleneck (i.e. where the application performance suffers due to the speed of the network communication and amount of data needing to be transferred). Think about the actual calculations that you perform on each node (like incrementing a value in the array) and how this operation compares to the amount of communication thatoccurs. Does your application have a communication bottleneck? Why?
MPI uses objects called communicators and groups to defifine which collection of processes may communicate with each other. Most MPI routines require you to specify a communicator as an argument. We will cover communicators and groups in more detail later. For now, simply use MPI_COMM_WORLD whenever a communicator is required – it is the predefifined communicator that includes all of your MPI processes.