Current Projects

The HPCL has many research areas but typically one student will be the primary researcher on any particular project. The following projects are currently active.

In Situ Performance Analysis and Visualization of HPC Applications

This research focuses on performance analysis and visualization of exascale applications. Often HPC applications generate so much performance and scientific data that monitoring, reducing, and conducting analysis on-the-fly is the only option. We are exploring configurable options for monitoring and supporting this work across different applications and hardware ecosystems.

Primary researcher: Dewi Yokelson

Parallel Dynamic Computation of Strongly-Connected Components

Strongly connected components (SCC) is an important property for understanding the structure of directed networks. Given many real world networks are large and dynamic (i.e., they change with time), it is often computationally more efficient to update those parts of the network that changed rather than recomputing a graph property over the entire network. In this project, we present the first parallel algorithm for updating SCC on dynamic networks. Our algorithm is a hybrid of shared and distributed memory approaches. While each local component is updated via a shared memory algorithm, the partial results are merged across processors using a distributed memory algorithm.

Primary researcher: Sudharshan Srinivasan

A Library for Sparse Data Computations in Haskell

Historically, scientific algorithms have been built on top of linear algebraic kernels, implemented in languages such as C or Fortran,to optimize their performance, at the cost of conciseness. It has been shown that through the use of various techniques such as fusion, concise yet optimal implementations of these algorithms infunctional languages, for example Haskell, are possible. However, these efforts in Haskell have been focused on getting dense data to run efficiently on modern multi-core systems. Consequently, even though extensive research has been done on sparse data representations, each optimized to work well with specific problems, there are a lack of tools to support a variety of sparse data representations. This work is a step in that direction. By leveraging the power of Haskell’s type system, we present the design of a library that takes advantage of fusion to achieve good performance without sacrificing the conciseness inherent in scientific computation kernels. We want such kernels to be generic in the choice of data representation thereby giving the user power to choose whatever representation works best for their specific use case. We explore the use case of such a library by providing concise implementations of kernels including waxpy, atax, gemv.

Primary researcher: Bosco Ndemeye

Understanding How Scientific Applications Use HPC Systems

This research focuses on performance analysis in High Performance Scientific computing. Specifically, the goal is to develop methods to enable developers of scientific software to understand how their applications use computer hardware and how to improve the performance of those applications.

Primary researcher: Brian Gravelle

Guiding Optimizations with Meliora: A Deep Walk down Memory Lane

Performance modeling of nontrivial computations typically requires significant expertise and human effort. Moreover, even when performed by experts, it is necessarily limited in scope, accuracy, or both. At the same time, performance models can be very useful for understanding the behavior of applications and hence can help guide design and optimization decisions. However, since models are not typically available, compilers or other automated code generation and optimization tools cannot use them to guide optimizations. Hence, the use and impact of models are currently limited to mainly manual analysis and optimization efforts. We believe that streamlining model generation and making it scalable (both in terms of human effort and code size) would enable dramatic improvements in compilation techniques. To that end, we are building a compiler infrastructure for performance model generation of arbitrary codes based on hybrid static/dynamic analysis of intermediate language representations. We demonstrate good accuracy in matching known codes and show how Meliora can be used to optimize new codes though reusing optimization knowledge, either manually or in conjunction with an autotuner. When autotuning, Meliora eliminates or dramatically reduces the empirical search space, while generally achieving competitive performance. arXiv

Primary researcher: Kewen Meng

Reproducibility of MPI Collective Operations

Many critical applications rely on HPC including drug design or physics simulations. In these cases, the cost of errors are high and reproducibility is necessary. This is at odds with the goal of exploiting performance of HPC codes through aggressive optimization.

One way this manifests itself is through collective operations such as MPI_Reduce. Floating point addition and multiplication are nonassociative. That is, (a + b) + c ≠ a + (b + c) for every a, b, and c. However, MPI Collective operations assume associativity for their operations. This can cause different results for the same input on different architectures. This work seeks to find bounds on the worst-case errors as well as analyze the effects of different reduction algorithms on error and reproducibility.

Primary researcher: Samuel Pollard