The last half-decade ushered in a new era of vision research. Computer vision now works on real images, in natural environments, solving hard problems. But the technology is far from ubiquitous and many researchers are most concerned with getting the best performance on a handful of datasets. This hyper-focus on accuracy has largely turned vision into a numbers game and research tends toward complex, finely-tuned systems that are brittle and impractical in the real world.
I focus on aspects of research that are often neglected in vision: speed, scalability, usability. In this talk I will describe three research efforts towards making vision more usable in the real world. The YOLO object detection system runs an order of magnitude faster than comparable systems and powers research in robotics, autonomous driving, ecology, and myriad other fields. Binary XNOR nets approximate floating-point neural networks with bitwise operations and allow vision systems to run in real-time on cell phones and embedded devices. Finally, techniques for dataset combination and weakly supervised training make it easier to train vision systems that perform fine-grained detection across a large number of object classes.
Joseph Redmon is a Ph.D. student at the University of Washington advised by Prof. Ali Farhadi. His research spans a variety of topics in computer vision including image classification and tagging, object detection, vision for robotics, scalability, speed, and canine vision.