The success of Internet Data Science depends on the availability of high-quality labeled data (e.g., onset of a DDoS in NetFlow log). Equally critical is the ability to share the data with others, respecting the data owners' privacy concerns. Unfortunately, short of applying the data-to-code paradigm (i.e., actual sharing of data), researchers lack a systematic framework for working with or benefiting from data while being mindful of privacy concerns.
Directed Research Project
We present an high-order finite difference method for earthquake cycle simulations that represents the full dynamics across cycles and is provably stable. The method is developed for a two-dimensional anti-plane strain problem on a strike-slip fault in a complex domain. The fault is governed by a rate-state friction law and the volume is governed by the anisotropic wave equation.
Embeddings are generated across different Deep Learning models to represent objects or entire images. They are then used for tasks such as Object Retrieval by matching against a database of other object embeddings. This work focuses on creating LocEm, a single passthrough model that can generate embeddings for multiple objects in images but at the object’s location. We also repurpose the ImageNet video dataset that includes natural augmentation containing pose and action movement variation of objects in images to create a triplet generator.
Strongly connected components (SCC) are an essential property for understanding the structure of directed networks. Given that many real-world networks are significant, it is often computationally efficient to partition the network over many distributed systems and solve for SCC simultaneously over the partitioned network. In this paper, we present an algorithm for identifying SCC on distributed systems. Our algorithm comprises three steps. In the first step, we locally perform SCC over all partitions.
We present a new algorithm for computing tensor decomposition on streaming data that achieves up to 102× speedup over the state-of-the-art CP-stream algorithm through lower computational complexity and performance optimization. For each streaming time slice, our algorithm partitions the factor matrix rows into those with and without updates and keeps them in Gram matrix form to significantly reduce the required computation.
We present a numerical framework for modeling the temporal evolution of ground deformation caused by a subsurface, pressurized magma reservoir situated within a viscoelastic medium. The host rock surrounding an oblate, ellipsoidal magma reservoir behaves as a Maxwell material. Temporal evolution due to the viscous effects are encoded as source terms on the static equilibrium equations; the coupled system is solved via high-order FEM and explicit time-stepping. We derive numerically stable time steps and verify convergence at the theoretical rate.
We present our recent work on developing multilingual Natural Language Processing (NLP) systems for different upstream and downstream tasks in NLP.
Defending against attackers with unknown behavior is an important area of research in security games. A well-established approach is to utilize historical attack data to create a behavioral model of the attacker. However, this presents a vulnerability: a clever attacker may change its own behavior during learning, leading to an inaccurate model and ineffective defender strategies. In this paper, we investigate how a wary defender can defend against such deceptive attacker. We provide four main contributions.
As the field of Cyber-Physical Systems continues to advance, new and interesting changes regarding its capability, adaptability, scalability, and usability  have come about. The most notable change has been the aggressive expansion of the variety of entity types that can be deployed in these systems (i.e. the entity eco-system).
Parallelized particle advection algorithms are a key visualization tool for domain scientists. They are also very computationally expensive to run. Machine learning techniques have been widely used in regression settings to predict results based on a set of input features. Our work describes an approach for parallel particle advection optimization; an approach which uses a machine learning algorithm at its core. We specifically investigate how our approach operates when applied to a GPU-based parallel particle advection algorithm.