A performance prediction tool for irregular parallel programs

1.Introduction

During the last decade, performance prediction has been repeatedly quoted as a key factor to developing parallel systems [1]. Predicting the behaviour of a program performance as a function of the number of processors and of the problem size is essential to users:
Numerous prediction tools have been recently proposed in the literature[2,3]. Most of these works focus mainly on applications with regular and static data structures. In contrast, the present work focusses on irregular applications [4] and especially those exhibiting dynamic data structures and a data-dependent execution scheme. For this type of application, performances does not depend only on the number of processor and on the problem size. Other parameters have been taken into account, the value of which are data-dependent, and therefore unpredictable.

In this doctoral thesis, we present a performance prediction model for irregular (or regular) parallel applications on parallel machines and on nondedicated networks of workstations/PC. We are developing a performance prediction tool to show that our model is intended for automatic use.

2. Modeling irregular applications

In our approach, we began by assuming that the total execution time of an application is the sum of different times spent in computation, communication, synchronization, etc. In order to predict the behaviour of irregularly (or regularly) structured algorithms, it is necessary to find well defined templates or frameworks that can capture their behaviour. We have assumed that the algorithms to be modeled can be decomposed into a sequence of blocks, where each blocks consists of either local computation (performed in parallel by the processors), or of a global communication operation, or a synchronization barrier. There are two main categories of blocks: computation blocks and communication blocks (Table 1).

Computation blocks:
Type of block Symbol
Sequential block
Parallel block
Iterative block

Communication blocks:
Type of block Symbol
Point-to-point
Broadcast
Scatter
Gather
Barrier
Table 1: Model of blocks


For each of the proposed block types, a simple performance model is established as a function of basic parameters. Execution time of the whole program can then be modeled by the sum of the execution times for the consecutive blocks. In order to predict performance of irregular applications we add the possibility to model blocks where the length varies during the execution[4].

3. Results

In order to verify the utilisation of the model in practice we are carrying out prediction experiments on custom-made, irregular, parallel programs, then on real programs. Included among the programs used for these performance validation experiments are: The method developed in this thesis has allowed to model real irregular programs. The extremely satisfying results of the validation experiments confirm the validity of the method [6,7]. For example, with the application of wave diffusion in urban areas, we can automatically predict with very good precision how efficient it is to decompose the application in parallel (Figure 1).


Figure 1: Result of prediction and measurement with an application of wave diffusion in urban areas (Cray T3D).

3. IP3T, Irregular Parallel Performance Prediction Tool

We have invented the tool, planned and directed the efforts of 7 senior students to implement a prediction performance tool named IP3T (Irregular Parallel Performance Prediction Tool) useful on networks of nondedicated workstations/PC and on parallel machines [7]. IP3T can be also used for sequential machines like single processor PCs. The analysed application can be programmed in many languages: C, C++, Fortran and so on. The communication libraries can be: PVM, MPI and so on.

IP3T permits to demonstrate that my model is intended for automatic use. It work in 3 phases (Figure 2):

Figure 2: Phases of IP3T.

In the first phase, IP3T can automatically recognize the block structure of applications and show it in a window (figure 3). This window permits also directly to add or delete new blocks and/or change some parameters of blocks. After changing you can regenerate a new source code. Morever, in this window, the user can select a subpart of the program to be analyzed.


Figure 3: Window with block decomposition of the analysed application.

In order to predict performance of applications with many irregularities, the user of the tool can define some personal global block parameters and can interact on them for performance prediction through a dialog box generated dynamically. For example, we obtain the dialog box of the figure 4 with our defined parameters of the wave diffusion application.


Figure 4: Dialog box with user defined parameters with the wave diffusion application.

IP3T has been successfully tested on the real applications already mentioned in the results section.

3. Conclusions

In this Ph.D. project, we have done successfuly: Now, the the prediction tool can be integrated in a development environment or in a parallel operating system.


References:

Last modified 19 March 1999