Directed Research Project

On Phylogenetic Uncertainty and Ancestral Sequence Reconstruction

Life has been evolving on Earth for over 3.8 billion years. However, given the extreme paucity of molecular fossils and the relative brevity of a human lifetime, the evolutionary history of genes can be difficult to study. Fortunately, the computational methods of ancestral sequence reconstruction (ASR) can be used to statistically infer the sequences of extinct genes. In this project, I investigate one aspect of ASR algorithms which may impact accuracy: phylogenetic uncertainty. Most ASR algorithms assume the phylogeny is known with certainty; in practice, this assumption is rarely valid.

Characterizing User Interactions in Flickr Social Networks

Online Social Networking (OSN) services have become increasingly popular over the Internet in recent years. OSNs enable each user to establish friendship with other users, interact with them, post content and view posted content by other users. The popularity of OSNs has motivated researchers to characterize various aspects of these systems. A majority of prior studies have focused on the empirical characterization of the friendship connectivity among users in different OSNs.

Unbiased Sampling over Online Social Networks

During recent years, Online Social Networks (OSNs) have evolved in many ways and attracted millions of users. The dramatic increase in the popularity of OSNs has encouraged network researchers to examine their connectivity structure. The majority of empirical studies for characterizing OSN connectivity graphs have analyzed snapshots of the system taken in different times. These snapshots are collected by measurements that crawl OSN connectivity graphs. However, OSN owners are often unwilling to expose their user information due to privacy concerns.

Supporting Automatic Performance Tuning

Parallel applications running on high-end computer systems manifest a complex combination of performance parameters, such as number of processors, work distributions, and computational efficiencies. Automatic performance tuning tools strive to find the parameters that yield the highest performance. Most of these tuning tools leverage an empirical performance evaluation of parallel systems and applications which can generate significant amounts of performance data and analysis results from multiple experiments.

An Improved A* Heuristic for Automated Flight Planning

Flight planning is essential for large aircraft. Generating optimum flight plans can be automated using A* search. A* depends on a well-behaving heuristic function. A successful heuristic function must be admissible, accurate, and fast. We present a new flight planning heuristic, STRIPS, that promises speed, accuracy, and admissibility. STRIPS achieves this by limiting geographically the amount of weather data considered for each heuristic estimate, calculating the estimate piecewise over a set of disjoint regions, and precomputing as much data off-line as possible.

Observing and Measuring Internet Quakes

Various traumatic events such as worms, cable-cuts, blackouts and natural disasters, have had a disruptive effect on the Internet. The traumatic events have caused changes to the control plane and overall structure of the Internet. We call the changes an "Internet Earthquake" since like an Earthquake, there is a clear impact and changes in the underlying structures. This work focuses on developing a systematic approach to quantifying the effect of various disruptive events. First, we observe the difference between normal data and disruptive data.

Personalized Requirements Elicitation Using a Domain Model

My interest is in applying a domain model to help elicit personal requirements for the problem of community travel for people with cognitive impairments. The domain model I take advantage of is the ACT model, which is embedded in the tool I design for defining required prompts for travel. I set up a study to look at the use of the domain model to help travel-planners generate personalized prompts for a traveler. My goal is to better understand the mechanisms of running a human-performance study, and to get a first look at how the domain model can be understood by travel-planners.

Using the Mean Shift Algorithm to Make Post Hoc Improvements to the Accuracy of Eye Tracking Data Based on Probable Fixation Locations

If they choose to look for it, eye tracking researchers will almost always see disparities between the participants' actual gaze locations and the locations recorded by the eye trackers. Sometimes these discrepancies are so great that they dramatically affect the validity of the theoretical and empirical claims made based on the eye tracking data. Much of the disparity is in fact a type of eye tracking error-systematic error-which tends to stay constant over time.

Refining NELL's knowledge base with Markov logic networks

Never Ending Language Learner (NELL) is an AI system that runs 24 hours per day, 7 days per week, forever, repeatly extracting knowledge from the web. It uses a bootstrap learning algorithm that works with small volume of labeled data, and large volume of unlabeled data. One of the biggest problems of NELL is that the accuracy of the knowledge it acquires drops down over time gradually. In this work, we propose a novel approach dedicated to solving the problem.

WalkAbout, a Random Walk Based Framework to Characterize Online Social Networks

The structure of networked systems such as online social networks (OSN) or power grids is often represented with a graph. Characterizing the connectivity features of such a graph reveals the structural properties of the corresponding system. In particular, identifying tightly connected regions (i.e. clusters) in a graph provides a valuable insight about its connectivity structure. However, existing cluster detection techniques often require the entire graph, only detect clusters with a relatively small diameter and do not gracefully scale to large graphs.


Subscribe to RSS - Directed Research Project