Directed Research Project

Accelerating Particle Advection via Block Exterior Flow Maps and Lagrangian Interpolation

Flow visualization techniques involving extreme advection workloads are becoming increasingly popular. While these techniques often produce insightful images, the execution times to carry out the corresponding computations are lengthy. With this work, we introduce an alternative to traditional advection. Our approach centers around block exterior flow maps (BEFMs). BEFMs can be used to accelerate flow computations by reducing redundant calculations, at the cost of decreased accuracy.

A Bayesian semi-parametric approach to cluster heterogeneous time series

Majority of time series clustering research is focused on calculating similarity metrics between individual series, which in conjunction with traditional clustering algorithm partitions the data into similar groups (clusters). A major challenge lies in obtaining partitions when the number of clusters is not known in advance. Another challenge in such a clustering problem is to apply known hierarchies and heterogeneities in the data to refine clustering. 

SparkGalaxy: Workflow-based Big Data Processing

We introduce SparkGalaxy, a big data processing toolkit that is able to encode complex data science experiments as a set of high-level workflows. SparkGalaxy combines the Spark big data processing platform and the Galaxy workflow management system to offer a set of tools for graph processing and machine learning using a novel interaction model for creating and using complex workflows. SparkGalaxy contributes an easy-to-use interface and scalable algorithms for data science. We demonstrate SparkGalaxy use in large social network analysis and other case studies.

Scalable Ray-Casted Volume Rendering

Computational power has been increasing tremendously in recent years, resulting in an increase in data size and complexity. Volume rendering is an important method for visualizing such data, as it provides insight over the entire data set. However, traditional volume rendering techniques are not sufficient to handle such data because they are too large to fit in the memory of a single computer. Using a distributed system to visualize massive data improves the performance. That said, while parallelization has its benefits, it also creates challenges.

A Multi-Resolution Approach to Characterize the Connectivity Structure and Evolution of Large Graphs

Graphs are widely used to represent the structure of large networked systems such as Online Social Networks (OSN). These graphs have a large number of evolving nodes (i.e., users) and edges (i.e., relationships). It is important to have practical methods to capture and characterize the connectivity structure and evolution patterns of such networks to gain insights about the corresponding system. However, existing techniques for graph analysis either do not scale or only offer limited insight about graph structure without addressing its evolution.

Scalable Observation System for Scientific Workflows

Modern clusters for parallel computing are complex environments and the high-performance computing (HPC) applications that run on them do so often with little insight about their or the system's behavior. Sophisticated parallel measurement systems can capture performance and power data for characterization, analysis, and tuning purposes, but the infrastructure for observation of these systems is not intended for general use and typically does not allow online processing.

QoS aware Virtual Machine Consolidation in Cloud Datacenter

With rapid growth of cloud industry in recent years, energy consumption of warehouse-scale datacenter has become a major concern. Energy aware Virtual Machine consolidation has proven to be one of the most effective solutions for tackling this problem. Among the sub problems of VM consolidation, VM placement is the trickiest and can be treated as bin packing problem which is NP hard, hence, it is logical to apply heuristic approach. The main challenge of VM consolidation is to achieve a balance between energy consumption and quality of service(QoS).

IP Packet Traceback at Autonomous System Level

IP traceback system is used to determine the path taken by an IP packet from its source to its destination. This is not an easy task due to the fundamentally asymmetric nature of Internet routing, which means the forwarding path between a given pair of end-hosts is not guaranteed to be the same in both directions. Reliable IP traceback is especially important when used as part of a defense against modern distributed denial-of-service (DDoS) attacks. For many DDoS defense strategies, quickly finding the forwarding paths taken by the attack packets is a critical step for attack mitigation.

An In Situ Approach for Explorative Visualization using Temporal Intervals

We explore a technique for saving full spatio-temporal simulation data for visualization and analysis. While such data is typically prohibitively large to store, we consider an in situ reduction approach that takes advantage of temporal coherence to make storage sizes tractable in some cases. As I/O constraints continuously increase and hamper the ability of simulations to write full-resolution data to disk, our work presents an in situ data reduction technique with an accuracy guarantee.

Ontology-Based Information Extraction on PubMed abstracts using the OMIT ontology to discover inconsistencies

Scientific progress, at its core, is about constant change. Progress is brought about by introducing new findings where there were none, altering previously established knowledge, and in some cases, deconstructing prior knowledge to replace it with new findings. This process happens, generally, through scientific publications; in journals, at conferences, at workshops, in articles. Thus it is possible to analyze these documents and see how knowledge would change with time. This study accomplishes said task in the domain of MicroRNA which is a subdomain of Medical Science.


Subscribe to RSS - Directed Research Project