Título: Are Self-Driving Cars Reliable? Evaluation of Radiation-Induced Errors in Automative Applications
The new trend in the automotive market is self-driving system. To be implemented, a self-driving platform needs to be able to analyze a huge amount of images and signals in real time. Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), thanks to their low cost, increased energy efficiency, and flexible development platforms, are extremely attractive for the automotive market. The Tesla self-driving system, for instance, is powered by NVIDIA embedded GPUs. GPUs were originally designed for multimedia applications, for which reliability is not an issue. Their architecture is then optimized to increase performances, not reliability. In the talk we will discuss the reliability of GPUs and FPGAs, evaluating if they are compliant with the strict ISO 26262, which is the standard that define the reliability constraints for automotive applications. The talk will focus on the reliability of pedestrian-detection algorithm and convolution neutral networks (including YOLO and Faster RCNN) as implemented in FPGAS and GPUs. We will understand how to identify radiation-induced errors in GPUs and FPGAs, distinguishing between tolerable errors and critical errors.
After a brief description of radiation effects at physical level we will show the real impact of neutrons in GPUs and FPGAs by presenting accelerated neutron beam results that correspond to more than 150,000 years of natural exposure. Our data demonstrates that most of radiation-induced errors can be tolerated, even in safety-critical applications. We will show how to replicate the causes of critical errors through architectural-level and instruction-level fault-injection. By hardening only critical error sources, we will be able to increase the reliability of the application without unnecessary overhead.
Paolo Rech received his master and Ph.D. degrees from Padova University, Padova, Italy, in 2006 and 2009, respectively. He was a Post Doc at LIRMM, Montpellier, France from 2010 to 2012. He is currently an associate professor at the Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil. He is actively collaborating with the Los Alamos National Labs, NM, USA, the Jet Propulsion Lab., Pasadena, USA, NVIDIA, and AMD. His main research interests include the evaluation and mitigation of radiation-induced errors in modern computing systems for HPC and safety-critical applications.