Prof. Daniel Lowd received an award to research how text can be modified to trick automated classifiers for abuse detection, sentiment analysis, and other tasks. In the "Reverse Engineering Adversarially Created Text (REACT)" project, funded by the DARPA Artificial Intelligence Explorations (AIE) program, the goal is to "reverse engineer" modified text to determine which algorithm or technique was used to create it. "There's a growing number of algorithms out there for attacking classifier," Lowd says, "and if we can determine which algorithm is being used when, then we can learn something about the people or organizations that are launching these attacks."
The project is a collaboration with Prof. Sameer Singh at UC Irvine and will take place over 18 months, broken into two phases. The total budget is $765,000. The first phase ran from October 2020 through July 2021, and the second phase runs from August 2021 through March 2022. Several UO CIS PhD students, including Wencong You, Adam Noack, and Jonathan Brophy, having been working on the project since last year.
In the first phase, Lowd and his team used existing text attack algorithms to build up a large collection of examples. They then used techniques from machine learning (ML) and natural language processing (NLP) to build models that can tell different attacks apart. "Some of the signals are easy to distinguish," Lowd said. "For example, the VIPER attack replaces letters with other characters that are visually similar. If we see a piece of text with many unusual characters, it's more likely to be this attack than any other." Other attacks are harder to distinguish, but can be differentiated by the word replacements they choose and the grammar errors they make. In the second phase, Lowd's team will develop techniques for detecting and analyzing new attacks that are different from any seen before.
Figure: system description from grant proposal.