We will tell the story of how Moebius functions may be used to count by inclusion-exclusion topologically. In particular, we will discuss combinatorial-topological tools that have been remarkably effective at doing this, namely lexicographic shellability and more recently also discrete Morse theory. Some of the most compelling applications have been to theoretical computer science, a topic we will highlight along the way. We will not assume familiarity with topology or with the more algebraic and topological sides of discrete mathematics arising in this talk.
Federated learning (FL) is an emerging technique for model training from decentralized data. Compared to learning from data in a central storage, FL has benefits of privacy preservation and communication bandwidth reduction. A challenge in FL is that data and model characteristics can vary largely across different tasks, and an FL task with improper configuration could waste a lot of computation/communication resources and may cause the trained model to diverge from the optimal result.
Cybersecurity has been long thought of as a technical challenge for computer science and engineering departments to address. This talk will highlight cybersecurity as a "wicked problem" that needs a complex solution comprised of expertise in computer science, law, psychology, business, and other disciplines to effectively address this problem.
To make sense of network traffic telemetry (NetFlow, IPFIX, sFlow, VPC Flow Logs, eBPF, etc) in modern, orchestrated and diverse networks, it's necessary to enrich those telemetry streams with context. Kentik's ingest layer does this live with millions of flows/second, and started doing so with BGP routing, but also has other large-scale and generalized mechanisms for doing high-speed lookup and enrichment/decoration/coloring of incoming data, each of which can have tens of millions of dynamically changing association rules.
IP address blocklists are a useful source of information about repeat attackers. Such information can be used to prioritize which traffic to divert for deeper inspection (e.g., repeat offender traffic), or which traffic to serve first (e.g., traffic from sources that are not on the blocklist). But blocklists also suffer from overspecialization – each list is geared towards a specific purpose – and they may be inaccurate due to misclassification or stale information.
We outline the “Learning Everywhere" paradigm -- a powerful scientific methodology of coupling learning methods to traditional HPC simulations. We present several examples of “Learning Everywhere” applications, their scientific impact, and effective performance improvements over traditional HPC simulations. Such applications require a fundamental re-examination of scientific programming and systems software. This talk will highlight middleware advances to enable "Learning Everywhere" algorithms and methods as the "natural" extreme-scale programming paradigm.
Despite intensive research over 10 years, Android malware detection still faces substantial challenges including frequent changes in the Android framework, the existence of noisy labels in large-scale up-to-date datasets, and the continuous evolution of Android malware. The consequences of ignoring these challenges are multifold. One is the fast decline of malware detection accuracy over time due to the use of out-of-date malware detection feature sets, the ignorance of new APIs and changed APIs, and the failure of capturing emerging malware patterns.
While much attention has been recently given to the social, ethical and political implications of fairness in artificial intelligence methods, practices and technologies, not much has been said about the formal (conceptual, mathematical, algorithmic) definitions of fairness and their conceptual adequacy. It is not clear, for example, whether group parity/equity captures equality amongst individuals and/or which one is more desirable in a given algorithmic function.
In this talk, I introduce three recent and/or ongoing projects that are representative of the work we do in my lab (Learner Corpus Research and Applied Data Science Lab). The first project investigates the features of academic language using multidimensional analysis (Biber, 1988; 2004) and a wide array of linguistic features extracted using an NLP pipeline with a number of post-processing steps.
Our daily lives are becoming increasingly dependent on a variety of smart cyber-physical infrastructures, such as smart cities and buildings, smart energy grid, smart transportation, smart healthcare, etc. Alongside, smartphones and sensor-based IoTs are empowering humans with fine-grained information and opinion collection through crowdsensing about events of interest, resulting in actionable inferences and decisions. This synergy has led to the cyber-physical-social (CPS) convergence with human in the loop, the goal of which is to improve the “quality” of life.