This dissertation work presents various approaches toward accelerating training of deep neural networks with the use of high-performance computing resources, while balancing learning and systems utilization objectives. Acceleration of machine learning is formulated as a multi-objective optimization problem that seeks to satisfy multiple objectives, based on its respective constraints. In machine learning, the objective is to strive for a model that has high accuracy, while eliminating false positives and generalizing beyond the training set.
Accelerator-based heterogeneous computing has become the de facto standard in contemporary high-performance machines, including upcoming exascale machines. These heterogeneous platforms have been instrumental to the development of computation-based science over the past several years. However, this specialization of hardware has also led to a specialization of associated heterogeneous programming models that are often intimidating to scientific programmers and that may not be portable or transferable between different platforms.
Performance models is of significant importance for both software and hardware development. They can be used to describe and predict the behavior of an application to provide software developers and researchers with insightful information about the execution status to help them identify the potential bottlenecks to further optimize the performance. Unfortunately, performance modeling of nontrivial computations typically requires significant expertise and human effort. Moreover, even when performed by experts, it is necessarily limited in scope, accuracy, or both.
This dissertation is about verifying the correctness of low-level computer programs. This is challenging because low-level programs by definition cannot use many useful abstractions of computer science. Features of high-level languages such as type systems or abstraction over binary representation of data provide rich information about the purpose of a computer program, which verification techniques or programmers can use as evidence of correctness.
Since near the very beginning of electronic computing, Monte Carlo particle transport has been a fundamental approach for solving computational physics problems. Due to the high computational demands and inherently parallel nature of these applications, Monte Carlo transport applications are often performed in the supercomputing environment. That said, supercomputers are changing, as parallelism has dramatically increased with each supercomputer node, including regular inclusion of many-core devices.
Semantic oppositeness is the natural counterpart of the rather more popular natural language processing concept, semantic similarity. Much like how semantic similarity is a measure of the degree to which two concepts are similar, semantic oppositeness yields the degree to which two concepts would oppose each other. This complementary nature has resulted in most applications and studies incorrectly assuming semantic oppositeness to be the inverse of semantic similarity.
The significant increase in the scale and complexity of networked systems, from online retail networks to computer networks, on one hand, and the progress in machine learning techniques that is supported by the rapid development of software and hardware components, on the other hand, creates a unique dynamic. There is a pull from the networked systems for automated and scalable methods to handle the challenges of management, scheduling, and monitoring of such complex systems while there is a push from the machine learning side to solve such problems.
Exploratory visualization and analysis of time-dependent vector fields or flow fields generated by large scientific simulations is increasingly challenging on modern supercomputers. Traditional time-dependent flow visualization is performed using an Eulerian representation of the vector field and requires both a high spatial and temporal resolution to be accurate.
We consider the problem of efficient particle advection in a distributed- memory parallel setting, focusing on four popular parallelization algorithms. The performance of each of these algorithms varies based on the desired workload. Our research focuses on two important questions: (1) which parallelization techniques perform best for a given workload?, and (2) what are the unsolved problems in parallel particle advection?
In situ visualization is increasingly necessary to address I/O limitations on supercomputers. However, in situ visualization can take on multiple forms. In this research we consider two popular forms: in-line and in-transit in situ. With the increasing heterogeneity of supercomputer design, efficient and cost effective use of resources is extremely difficult for in situ visualization routines. This is further compounded by in situ's multiple forms, and the unknown performance of various visualization algorithms performed in situ at large scale.