Skip to content

RSS2: Workshop on Robustness and Safe Software 2.0

This workshop is in conjunction with ASPLOS 2022  

Monday, February 28, 2022 | Half Day

Please register for ASPLOS to get Zoom access to this event

Introduction

Welcome to the Workshop on Robustness and Safe Software 2.0 (RSS2).  

Unlike Software 1.0 (conventional programs) that is manually coded with hardened parameters and explicit logics, Software 2.0 programs, usually manifested as and enabled by Deep Neural Networks (DNN), have learnt parameters and implicit logics. Software 2.0 is found in a diverse set of applications in today’s society, ranging from autonomous machines, Augmented/Virtual Reality devices, to smart-city infrastructures.  

While the systems and architecture communities have focused, rightly so, on the efficiency of DNNs, Software 2.0 exposes a unique set of challenges for robustness, safety, and resiliency, which are major roadblocks before Software 2.0 becomes a pervasive computing paradigm. For instance, small perturbations to inputs could easily “fool” DNNs to produce incorrect results, giving rise to the so-called adversarial attacks. Similarly, while DNNs are generally resilient to hardware faults, few have studied the worst-case resiliency of DNNs to hardware faults, which usually dictated the safety of mission-critical systems.  

Improving the robustness, safety, and resiliency of Software 2.0 is necessarily a cross-layer task, just like how algorithms, programming language, architecture, and circuits communities came together in the Software 1.0 era. It is also critical to not hyper-optimize individual system components; rather, we must take a whole-of-system approach that understands the requirements and constraints of end-to-end systems, which are usually multi-chip and span both client, edge, and cloud.  

To that end, the workshop is meant to foster an interactive discussion about computer systems and architecture research’s role of robust, safe, and resilient Software 2.0. Ultimately, the workshop is meant to lead to new discussions and insights on algorithms, architectures, and circuit/device-level design as well as system-level integration and co-design.

Organizer

Agenda (All Times are CET)

Opening and Welcome : 2:00PM CET (8:00AM EST)

Session 1: Algorithms & Applications: What are the new challenges for robustness, safety and resiliency?

Time Speaker Title
2:00PM ~ 2:20PM CET (8:00AM ~ 8:20AM EST) Qi Zhu Know the Unknowns: Robust and Safe Machine Learning in Autonomous Systems
2:20PM ~ 2:40PM CET (8:20AM ~ 8:40AM EST) Yuan Tian CryptGPU: Fast Privacy-preserving Machine Learning on the GPU
2:40PM ~ 3:00PM CET (8:40AM ~ 9:00AM EST) Emre Neftci Meta-training Neuromorphic Hardware for Robust and Safe Learning at the Edge

Session 2: Architecture & Systems: How can we build resilient and safe systems and hardware?

Time Speaker Title
3:00PM ~ 3:20PM CET (9:00AM ~ 9:20AM EST) Deiming Chen AccGuard: Secure and Trusted Computation on Remote FPGA Accelerators
3:20PM ~ 3:40PM CET (9:20AM ~ 9:40AM EST) Onur Mutlu The Story of RowHammer
3:40PM ~ 4:00PM CET (9:40AM ~ 10:00AM EST) Vijaykrishnan Narayanan Distributed and Multi-Modal Information for Robustness

Session 3: Circuits & Devices: How can we harness emerging devices and their characteristics for extreme robustness?

Time Speaker Title
4:10PM ~ 4:30PM CET (10:10AM ~ 10:30AM EST) Gert Cauwenberghs Efficiency and Robustness in Large-Scale Neuromorphic Computing
4:30PM ~ 4:50PM CET (10:30AM ~ 10:50AM EST) Cecilia Metra Circuit Level Challenges and Solutions for Safe and Reliable Intelligent Systems
4:50PM ~ 5:10PM CET (10:50AM ~ 11:10AM EST) Yu Cao Reliable In-Memory Computing with Unreliable RRAM Devices

Session 4: Industry Perspectives: What are some of the urgent issues that the industry is facing and how could academia best help?

Time Speaker Title
5:10PM ~ 5:30PM CET (11:10AM ~ 11:30AM EST) Paul Whatmough Functional safety for ML systems
5:30PM ~ 5:50PM CET (11:30AM ~ 11:50AM EST) Vincent Lee Concretizing machine perception security and privacy challenges
5:50PM ~ 6:10PM CET (11:50AM ~ 12:10PM EST) Bo Yu Building Computing Systems for Autonomous Vehicles

Closing: 6:10PM ~ 6:20PM CET (12:10PM ~ 12:20PM EST)

Abstract

Know the Unknowns: Robust and Safe Machine Learning in Autonomous Systems

Abstract: Future autonomous systems will employ complex sensing, computation, and communication components for their perception, planning, control, and coordination, and could operate in highly dynamic and uncertain environment with safety and security assurance. To realize this vision, we have to better understand and address the challenges from the “unknowns” – the unexpected disturbances from component failures, environment interferences, and malicious attacks, the inherent uncertainties from system inputs and model inaccuracies, and the lack of analyzability of neural network-based machine learning techniques. In this talk, I will discuss these challenges and present an overview of our recent work in developing quantitative and formal methods for ensuring robust and safe application of neural networks in perception, decision making, and runtime adaptation.

CryptGPU: Fast Privacy-preserving Machine Learning on the GPU

Abstract: Scalability is one of the biggest challenges for privacy-preserving machine learning techniques. For example, it takes around one year to train Alexnet on Tiny Imagenet with state-of-art secure multi-party computation techniques. We introduce CryptGPU, a system for privacy-preserving machine learning that implements all operations on the GPU. Just as GPUs played a pivotal role in the success of modern deep learning, they are also essential for realizing scalable privacy-preserving deep learning. With CRYPTGPU, we support private inference and private training on convolutional neural networks with over 60 million parameters as well as handle large datasets like ImageNet. We get up to 37× improvement for private training. Our work not only showcases the viability of performing secure multiparty computation (MPC) entirely on the GPU to enable fast privacy-preserving machine learning, but also highlights the importance of designing new MPC primitives that can take full advantage of the GPU’s computing capabilities.

Meta-training Neuromorphic Hardware for Robust and Safe Learning at the Edge

Abstract: Adaptive life-long learning at the edge and during online task performance is an aspirational goal of AI research. Neuromorphic hardware implementing Spiking Neural Networks (SNNs) are particularly attractive in this regard, as their real-time, event-based, local computing paradigm makes them suitable for edge implementations and fast learning. However, the long and iterative learning that characterizes state-of-the-art SNN training is incompatible with the physical nature and real-time operation of neuromorphic hardware. Bi-level learning, such as meta-learning is increasingly used in deep learning to overcome these limitations. In this work, we demonstrate gradient-based meta-learning in SNNs. We show that meta-trained SNNs match or exceed the performance of conventional ANNs meta-trained with MAML on event-based meta-datasets. Furthermore, we highlight the specific advantages that accrue from meta-learning: algorithmic simplicity, reduced memory requirement and improved privacy.

AccGuard: Secure and Trusted Computation on Remote FPGA Accelerators

Abstract: Application-specific acceleration has prevailed in cloud computing and data centers. But the current infrastructure design provides little or no support for security in external accelerators. Existing trusted computing solutions such as Intel SGX or ARM TrustZone only target CPU-only environments, leaving external accelerators and peripheral devices unprotected. This work proposes AccGuard, a new scheme to extend trust computation for remote FPGA accelerators. AccGuard consists of a security manager (SM) with hardware root of trust and remote attestation through standard cryptographic primitives to form an enclave framework for FPGA accelerators. It minimizes the performance overhead (due to the security features) compared to a state-of-the-art CPU-based enclave framework, Intel SGX, while enjoying the benefit of improved performance through hardware acceleration.

The Story of RowHammer

Abstract: We will examine the RowHammer problem in DRAM, which is the first example of how a circuit-level failure mechanism in Dynamic Random Access Memory (DRAM) can cause a practical and widespread system security vulnerability. RowHammer is the phenomenon that repeatedly accessing a row in a modern DRAM chip predictably causes errors in physically-adjacent rows. It is caused by a hardware failure mechanism called read disturb errors, a manifestation of circuit-level cell-to-cell interference in a scaled memory technology. Building on our initial fundamental work that appeared at ISCA 2014, Google Project Zero demonstrated that this hardware phenomenon can be exploited by user-level programs to gain kernel privileges. Many other works demonstrated other attacks exploiting RowHammer, including remote takeover of a server vulnerable to RowHammer and takeover of a mobile device by a malicious user-level application.

Unfortunately, the RowHammer problem still plagues cutting-edge DRAM chips, DDR4 and beyond. Based on our recent characterization studies of more than 1500 DRAM chips from six technology generations that appeared at ISCA 2020 and MICRO 2021, we will show that RowHammer at the circuit level is getting much worse, newer DRAM chips are much more vulnerable to RowHammer than older ones, and existing mitigation techniques do not work well. We will also show that existing proprietary mitigation techniques employed in DDR4 DRAM chips, which are advertised to be Rowhammer-free, can be bypassed via many-sided hammering (also known as TRRespass) and even more sophisticated methods that generate many bitflips in real state-of-the-art DDR4 DRAM chips.

Throughout the talk, we will discuss various properties of the RowHammer problem, examine circuit/device scaling characteristics, and discuss solution directions. We may also discuss what other problems may be lurking in DRAM and other types of memory, e.g., NAND flash and Phase Change Memory, which can potentially threaten the foundations of reliable and secure systems, as the memory technologies scale to higher densities. We will conclude by describing and advocating a principled approach to memory reliability and security research that can enable us to better anticipate and prevent such vulnerabilities.

Distributed and Multi-Modal Information for Robustness

Abstract: This talk will indicate techniques that rely on orthogonal modalities of sensing and knowledge representation for extracting visual information. These distributed approaches also enhance system efficiencies making them suitable for resource constrained deployments.

Efficiency and Robustness in Large-Scale Neuromorphic Computing

Abstract: The mammalian brain offers an existence proof of remarkable general intelligence realized by hierarchical assemblies of massively parallel, yet imprecise and slow compute elements that operate near fundamental limits of noise and energy efficiency. Neuromorphic instantiations approaching such natural intelligence in custom silicon and reconfigurable hardware have evolved from highly specialized, task-specific compute-in-memory neural and synaptic crossbar array architectures that operate near the efficiency of synaptic transmission in the mammalian brain, to large tiles of neurosynaptic cores assembled into hierarchically interconnected networks for general-purpose learning and inference. By combining extreme efficiency of local interconnects (grey matter) with great flexibility and sparsity in global interconnects (white matter), these assemblies are capable of realizing a wide class of deeply layered and recurrent neural architectures with embedded local plasticity for on-line learning, at a fraction of the computational and energy cost of implementation on CPU and GPGPU platforms.

Reliable In-Memory Computing with Unreliable RRAM Devices

Abstract: With the ever-increasing demand of AI algorithms and high-definition sensors, contemporary microprocessor design is facing tremendous challenges in memory bandwidth (i.e., the von Neumann bottleneck), processing speed and power consumption. Leveraging the advances in device technology and design techniques, in-memory computing (IMC) embeds analog deep-learning operations in the memory array, achieving massively parallel computing with high storage density. On the other side, its performance is still limited by device non-idealities, circuit precision, on-chip interconnection, and algorithm properties.

In this talk, we will focus on robust RRAM-b ased IMC design. Based on statistical data from a fully integrated 65nm CMOS/RRAM test chip, we will illustrate the bottlenecks of current IMC system, including RRAM variations, the stability of machine learning models, and others. They interact with each other, limiting the inference accuracy on-a-chip. We will demonstrate two methods to recover the accuracy loss: training for model stability before mapping to the hardware, and a hybrid SRAM/RRAM architecture for post-mapping recovery. These methods are applied to various datasets as well as a 65nm SRAM/RRAM test chip, helping shed light on future IMC research focus.

Functional safety for ML systems

Abstract: The field of hardware functional safety (FuSA) is well established around a few major tenets, such as various forms of redundancy and error-correcting coding techniques. However, in stark contrast, the software and algorithm aspects of FuSA for emerging ML systems are not well established. This talk will briefly survey the current state of FuSA in ML systems and outline some apparent gaps between layers of the stack.

Concretizing machine perception security and privacy challenges

Abstract: AR is a confluence of many different fields that have come together to the right place at the right time. One piece of the larger technology focuses on machine perception and is responsible for answering questions such as "where am I?" and "what is around me?". One of the key issues with security and privacy for machine perception is that because AR is not something that exists yet, the challenges remain abstract. Furthermore, the research challenges often lie at the intersection of other fields like computer vision, systems architecture, devices, etc. In this presentation, I will talk about pre-empting security and privacy challenges by making challenges concrete to enable a path for research to solve them.

Building Computing Systems for Autonomous Vehicles

Abstract: In this talk, Dr. Yu will share his experiences on building the on-vehicle computing system for autonomous driving vehicles. An FPGA design of a localization algorithm will be introduced to demonstrate the energy efficiency and latency reduction of hardware accelerators. Finally, the simulation tool for autonomous driving development, which could effectively reduce development cost, will be introduced.