Trends in high-performance computing increasingly require visualization to be carried out using in situ processing. This processing most often occurs without a human in the loop, meaning that the in situ software must be able to carry out its tasks without human guidance. This dissertation explores this topic, focusing on automating camera placement for in situ visualization when there is no a priori knowledge of where to place the camera.
It is desirable for general productivity that high-performance computing applications be portable to new architectures, or can be optimized for new workflows and input types, without the need for costly code interventions or algorithmic re-writes. Parallel portability programming models provide the potential for high performance and productivity, however they come with a multitude of runtime parameters that can have significant impact on execution performance.
From the advent of the message-passing architecture in the early 1980's to the recent dominance of accelerator-based heterogeneous architectures, high performance computing (HPC) hardware has gone through a series of changes. At the same time, HPC runtime systems have also been adapted to harness this growth in computational capabilities.
Detecting and repairing software performance issues requires test cases that demonstrate those problems. The quality and availability of test cases play an instrumental role in applications performance testing. Worst-case complexity edge cases often escape developers' understanding as the size and complexity of the application grow. Research shows that feedback-directed search (mutational fuzzing) can effectively discover pathological inputs that expose performance issues, but blindly mutating byte strings slows search by producing mostly invalid inputs.
This dissertation work presents various approaches toward accelerating training of deep neural networks with the use of high-performance computing resources, while balancing learning and systems utilization objectives. Acceleration of machine learning is formulated as a multi-objective optimization problem that seeks to satisfy multiple objectives, based on its respective constraints. In machine learning, the objective is to strive for a model that has high accuracy, while eliminating false positives and generalizing beyond the training set.
Accelerator-based heterogeneous computing has become the de facto standard in contemporary high-performance machines, including upcoming exascale machines. These heterogeneous platforms have been instrumental to the development of computation-based science over the past several years. However, this specialization of hardware has also led to a specialization of associated heterogeneous programming models that are often intimidating to scientific programmers and that may not be portable or transferable between different platforms.
Performance models is of significant importance for both software and hardware development. They can be used to describe and predict the behavior of an application to provide software developers and researchers with insightful information about the execution status to help them identify the potential bottlenecks to further optimize the performance. Unfortunately, performance modeling of nontrivial computations typically requires significant expertise and human effort. Moreover, even when performed by experts, it is necessarily limited in scope, accuracy, or both.
This dissertation is about verifying the correctness of low-level computer programs. This is challenging because low-level programs by definition cannot use many useful abstractions of computer science. Features of high-level languages such as type systems or abstraction over binary representation of data provide rich information about the purpose of a computer program, which verification techniques or programmers can use as evidence of correctness.
Since near the very beginning of electronic computing, Monte Carlo particle transport has been a fundamental approach for solving computational physics problems. Due to the high computational demands and inherently parallel nature of these applications, Monte Carlo transport applications are often performed in the supercomputing environment. That said, supercomputers are changing, as parallelism has dramatically increased with each supercomputer node, including regular inclusion of many-core devices.
Semantic oppositeness is the natural counterpart of the rather more popular natural language processing concept, semantic similarity. Much like how semantic similarity is a measure of the degree to which two concepts are similar, semantic oppositeness yields the degree to which two concepts would oppose each other. This complementary nature has resulted in most applications and studies incorrectly assuming semantic oppositeness to be the inverse of semantic similarity.