Directed Research Project

Automatic Grading for Summaries by Ontology-based Information Extraction

Automatic grading systems for summaries and essays have been studied for years. Most commercial and research implementations are based in statistical methods, which can provide high accuracy, but they cannot offer feedback. In the present work, we propose an automatic grading system by Ontology-based Information Extraction (OBIE). OBIE uses ontologies for formal domain representation, and it can provide rich feedback about concepts and relationships that are present or absent in the summaries.

Semantic Artifacts and Control

Most modern programming languages use many interesting, and sometimes complicated, computational effects. In order to answer many important questions about program behavior, we need a way to reason formally about program semantics. In this talk, we will discuss various forms of formal semantic artifacts, and several, increasingly expressive, control effects.

Convex Adversarial Collective Classification

Many real-world domains, such as web spam, auction fraud, and counter-terrorism, are both relational and adversarial. Existing work on adversarial machine learning assumes that the attributes of each instance can be manipulated independently. Collective classification violates this assumption, since object labels depend on the labels of related objects as well as their own attributes. In this paper, we present a novel method for robustly performing collective classification in the presence of a malicious adversary that can modify up to a fixed number of binary-valued attributes.

Improved Blind Seer System With Constant Communication Rounds

Private queries bring new challenges to database design.  The recent work of the Blind Seer system introduces a new method having an efficient sublinear search for an arbitrary boolean query.  It splits an index server from the (main) server so that the client can communicate with the index server to retrieve the encrypted records.  During the query, the server learns nothing about the query.

Learning Tractable Markov Networks Using Arithmetic Circuits

Markov networks are an effective way to represent complex probability distributions. However, learning their structure and parameters or using them to answer queries is typically intractable. One approach to make learning and inference tractable is to use approximations, such as pseudo-likelihood or approximate inference. An alternate approach is to use a restricted class of models where exact inference is always efficient. Previous work has explored low treewidth models, models with tree-structured features, and latent variable models. 

Mapping the PoP-Level Connectivity of Large Content Providers

Large content providers (CPs) are responsible for a large fraction of injected traffic to the Internet. They maintain multiple data centers and they connect to different ASes to relay their contents and service traffic to the rest of the Internet. In this paper we propose a novel methodology to measure and characterize large Internet Content Providers. Basically our contribution is two-fold:

String matching by Dynamic Programming-enhanced Suffix Trees

Aligning DNA reads to a reference sequence is an important step in modern sequence analysis. iMap is an alignment program designed by Dr. Conery at the University of Oregon. iMap utilizes a best-first search through a suffix tree representing the reference to find potential alignments. Inspired by information theory, iMap calculates the information content of these potential alignments and reports those that have the lowest content, i.e.

Improving Dynamic Invariant Saliency with Static Dataflow Analysis

Saliency of invariants reported by dynamic detection techniques tend to be poor. We present a prototype intra-procedural static analysis and invariant filtering system that improves reported invariant saliency by applying a data flow based admission criteria. While successful at reducing the number of nonsensical invariants reported, the current prototype is overly aggressive due to the limitations of intra-procedural data flow. Extension to inter-procedural analysis non-trivial and some of the challenges are discussed.

Auditory Display of Spatial Information: Analyzing Movement Strategies in Exploration of Thematic Maps

Auditory displays can make geospatial data and geographic information systems (GIS) accessible to people who are blind. In this presentation I will provide an overview of the minimal geographic information system (mGIS) that we have developed to display choropleth maps and describe an analysis of log files collected during behavioral testing with users who are blind. Through the analysis, I investigate how patterns of stylus movement can be characterized using a GOMS-style model, gesture identification, and a kernel density estimate (KDE).

Quantitative Association Mining From Bottom Up and Heuristic Search Perspectives

The traditional association mining focuses on discovering frequent patterns from the categorical data, such as the supermarket transaction data. The quantitative association mining (QAM) is a nature extension of the traditional association mining. It refers to the task of discovering association rules from quantitative data instead of from categorical data. The discrepancies between the two types of data lead to different analytical methods and mining algorithms.


Subscribe to RSS - Directed Research Project