Search | arXiv e-print repository

arXiv:2507.02613 [pdf, ps, other]

MULTI-SCOUT: Multistatic Integrated Sensing and Communications in 5G and Beyond for Moving Target Detection, Positioning, and Tracking

Authors: Yalin E. Sagduyu, Kemal Davaslioglu, Tugba Erpek, Sastry Kompella, Gustave Anderson, Jonathan Ashdown

Abstract: This paper presents a complete signal-processing chain for multistatic integrated sensing and communications (ISAC) using 5G Positioning Reference Signal (PRS). We consider a distributed architecture in which one gNB transmits a periodic OFDM-PRS waveform while multiple spatially separated receivers exploit the same signal for target detection, parameter estimation and tracking. A coherent cross-a… ▽ More This paper presents a complete signal-processing chain for multistatic integrated sensing and communications (ISAC) using 5G Positioning Reference Signal (PRS). We consider a distributed architecture in which one gNB transmits a periodic OFDM-PRS waveform while multiple spatially separated receivers exploit the same signal for target detection, parameter estimation and tracking. A coherent cross-ambiguity function (CAF) is evaluated to form a range-Doppler map from which the bistatic delay and radial velocity are extracted for every target. For a single target, the resulting bistatic delays are fused through nonlinear least-squares trilateration, yielding a geometric position estimate, and a regularized linear inversion of the radial-speed equations yields a two-dimensional velocity vector, where speed and heading are obtained. The approach is applied to 2D and 3D settings, extended to account for time synchronization bias, and generalized to multiple targets by resolving target association. The sequence of position-velocity estimates is then fed to standard and extended Kalman filters to obtain smoothed tracks. Our results show high-fidelity moving-target detection, positioning, and tracking using 5G PRS signals for multistatic ISAC. △ Less

Submitted 3 July, 2025; originally announced July 2025.

arXiv:2411.10342 [pdf, other]

EHRs Data Harmonization Platform, an easy-to-use shiny app based on recodeflow for harmonizing and deriving clinical features

Authors: Arian Aminoleslami, Geoffrey M. Anderson, Davide Chicco

Abstract: Electronic health records (EHRs) contain important longitudinal information on individuals who have received medical care. Traditionally, EHRs have been used to support a wide range of administrative activities such as billing and clinical workflow, but, given the depth and breadth of clinical and demographic data they contain, they are increasingly being used to provide real-world data for resear… ▽ More Electronic health records (EHRs) contain important longitudinal information on individuals who have received medical care. Traditionally, EHRs have been used to support a wide range of administrative activities such as billing and clinical workflow, but, given the depth and breadth of clinical and demographic data they contain, they are increasingly being used to provide real-world data for research. Although EHR data have enormous research potential, the full realization of that potential requires a data management strategy that extracts from large EHR databases, that are collected from a range of care settings and time periods, well-documented research-relevant data that can be used by different researchers. Having a common well-documented data management strategy for EHR will support reproducible research and sharing documentation on research variables that are derived from EHR variables is important to open science. In this short paper, we describe the EHRs Data Harmonization Platform. The platform is based on an easy to use web app a publicly available at https://poxotn-arian-aminoleslami.shinyapps.io/Arian/ and as a standalone software package at https://github.com/ArianAminoleslami/EHRs-Data Harmonization-Platform, that is linked to an existing R library for data harmonization called recodeflow. The platform can be used to extract, document, and harmonize variables from EHR and it can also be used to document and share research variables that have been derived from those EHR data. △ Less

Submitted 15 November, 2024; originally announced November 2024.

Comments: 15 pages, 10 figures

arXiv:2407.21783 [pdf, other]

The Llama 3 Herd of Models

Authors: Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere , et al. (536 additional authors not shown)

Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development. △ Less

Submitted 23 November, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

arXiv:2405.16763 [pdf, other]

Transport of Algebraic Structure to Latent Embeddings

Authors: Samuel Pfrommer, Brendon G. Anderson, Somayeh Sojoudi

Abstract: Machine learning often aims to produce latent embeddings of inputs which lie in a larger, abstract mathematical space. For example, in the field of 3D modeling, subsets of Euclidean space can be embedded as vectors using implicit neural representations. Such subsets also have a natural algebraic structure including operations (e.g., union) and corresponding laws (e.g., associativity). How can we l… ▽ More Machine learning often aims to produce latent embeddings of inputs which lie in a larger, abstract mathematical space. For example, in the field of 3D modeling, subsets of Euclidean space can be embedded as vectors using implicit neural representations. Such subsets also have a natural algebraic structure including operations (e.g., union) and corresponding laws (e.g., associativity). How can we learn to "union" two sets using only their latent embeddings while respecting associativity? We propose a general procedure for parameterizing latent space operations that are provably consistent with the laws on the input space. This is achieved by learning a bijection from the latent space to a carefully designed mirrored algebra which is constructed on Euclidean space in accordance with desired laws. We evaluate these structural transport nets for a range of mirrored algebras against baselines that operate directly on the latent space. Our experiments provide strong evidence that respecting the underlying algebraic structure of the input space is key for learning accurate and self-consistent operations. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Proceedings of the 41st International Conference on Machine Learning (2024)

arXiv:2403.06348 [pdf, other]

Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation

Authors: Jan Laukemann, Ahmed E. Helal, S. Isaac Geronimo Anderson, Fabio Checconi, Yongseok Soh, Jesmin Jahan Tithi, Teresa Ranadive, Brian J Gravelle, Fabrizio Petrini, Jee Choi

Abstract: High-dimensional sparse data emerge in many critical application domains such as healthcare and cybersecurity. To extract meaningful insights from massive volumes of these multi-dimensional data, scientists employ unsupervised analysis tools based on tensor decomposition (TD) methods. However, real-world sparse tensors exhibit highly irregular shapes and data distributions, which pose significant… ▽ More High-dimensional sparse data emerge in many critical application domains such as healthcare and cybersecurity. To extract meaningful insights from massive volumes of these multi-dimensional data, scientists employ unsupervised analysis tools based on tensor decomposition (TD) methods. However, real-world sparse tensors exhibit highly irregular shapes and data distributions, which pose significant challenges for making efficient use of modern parallel processors. This study breaks the prevailing assumption that compressing sparse tensors into coarse-grained structures or along a particular dimension/mode is more efficient than keeping them in a fine-grained, mode-agnostic form. Our novel sparse tensor representation, Adaptive Linearized Tensor Order (ALTO), encodes tensors in a compact format that can be easily streamed from memory and is amenable to both caching and parallel execution. In contrast to existing compressed tensor formats, ALTO constructs one tensor copy that is agnostic to both the mode orientation and the irregular distribution of nonzero elements. To demonstrate the efficacy of ALTO, we propose a set of parallel TD algorithms that exploit the inherent data reuse of tensor computations to substantially reduce synchronization overhead, decrease memory footprint, and improve parallel performance. Additionally, we characterize the major execution bottlenecks of TD methods on the latest Intel Xeon Scalable processors and introduce dynamic adaptation heuristics to automatically select the best algorithm based on the sparse tensor characteristics. Across a diverse set of real-world data sets, ALTO outperforms the state-of-the-art approaches, achieving more than an order-of-magnitude speedup over the best mode-agnostic formats. Compared to the best mode-specific formats, ALTO achieves 5.1X geometric mean speedup at a fraction (25%) of their storage costs. △ Less

Submitted 14 March, 2025; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: Accepted to TPDS 2025

arXiv:2312.08286 [pdf, other]

Evolutionary Games on Infinite Strategy Sets: Convergence to Nash Equilibria via Dissipativity

Authors: Brendon G. Anderson, Jingqi Li, Somayeh Sojoudi, Murat Arcak

Abstract: We consider evolutionary dynamics for population games in which players have a continuum of strategies at their disposal. Models in this setting amount to infinite-dimensional differential equations evolving on the manifold of probability measures. We generalize dissipativity theory for evolutionary games from finite to infinite strategy sets that are compact metric spaces, and derive sufficient c… ▽ More We consider evolutionary dynamics for population games in which players have a continuum of strategies at their disposal. Models in this setting amount to infinite-dimensional differential equations evolving on the manifold of probability measures. We generalize dissipativity theory for evolutionary games from finite to infinite strategy sets that are compact metric spaces, and derive sufficient conditions for the stability of Nash equilibria under the infinite-dimensional dynamics. The resulting analysis is applicable to a broad class of evolutionary games, and is modular in the sense that the pertinent conditions on the dynamics and the game's payoff structure can be verified independently. By specializing our theory to the class of monotone games, we recover as special cases existing stability results for the Brown-von Neumann-Nash and impartial pairwise comparison dynamics. We also extend our theory to models with dynamic payoffs, further broadening the applicability of our framework. Throughout our analyses, we identify and elaborate on new technical conditions that are key in extending dissipativity theory from finite to infinite strategy sets, such as compactness of the set of Nash equilibria and evolution of dynamic payoffs within a compact positively invariant set. We illustrate our theory using a variety of case studies, including a novel, continuous variant of the war of attrition game. △ Less

Submitted 22 April, 2025; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.15165 [pdf, other]

Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off

Authors: Yatong Bai, Brendon G. Anderson, Somayeh Sojoudi

Abstract: Deep neural classifiers have recently found tremendous success in data-driven control systems. However, existing models suffer from a trade-off between accuracy and adversarial robustness. This limitation must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we develop classifiers that simultaneously inherit… ▽ More Deep neural classifiers have recently found tremendous success in data-driven control systems. However, existing models suffer from a trade-off between accuracy and adversarial robustness. This limitation must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we develop classifiers that simultaneously inherit high robustness from robust models and high accuracy from standard models. Specifically, we propose a theoretically motivated formulation that mixes the output probabilities of a standard neural network and a robust neural network. Both base classifiers are pre-trained, and thus our method does not require additional training. Our numerical experiments verify that the mixed classifier noticeably improves the accuracy-robustness trade-off and identify the confidence property of the robust base classifier as the key leverage of this more benign trade-off. Our theoretical results prove that under mild assumptions, when the robustness of the robust base model is certifiable, no alteration or attack within a closed-form $\ell_p$ radius on an input can result in the misclassification of the mixed classifier. △ Less

Submitted 3 June, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

Comments: arXiv admin note: text overlap with arXiv:2301.12554

MSC Class: 68T07

arXiv:2310.10872 [pdf, other]

Computing Sparse Tensor Decompositions via Chapel and C++/MPI Interoperability without Intermediate I/O

Authors: S. Isaac Geronimo Anderson, Daniel M. Dunlavy

Abstract: We extend an existing approach for efficient use of shared mapped memory across Chapel and C++ for graph data stored as 1-D arrays to sparse tensor data stored using a combination of 2-D and 1-D arrays. We describe the specific extensions that provide use of shared mapped memory tensor data for a particular C++ tensor decomposition tool called GentenMPI. We then demonstrate our approach on several… ▽ More We extend an existing approach for efficient use of shared mapped memory across Chapel and C++ for graph data stored as 1-D arrays to sparse tensor data stored using a combination of 2-D and 1-D arrays. We describe the specific extensions that provide use of shared mapped memory tensor data for a particular C++ tensor decomposition tool called GentenMPI. We then demonstrate our approach on several real-world datasets, providing timing results that illustrate minimal overhead incurred using this approach. Finally, we extend our work to improve memory usage and provide convenient random access to sparse shared mapped memory tensor elements in Chapel, while still being capable of leveraging high performance implementations of tensor algorithms in C++. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 9 pages, 2 tables

Report number: SAND2023-11029R

arXiv:2310.04916 [pdf, other]

Tight Certified Robustness via Min-Max Representations of ReLU Neural Networks

Authors: Brendon G. Anderson, Samuel Pfrommer, Somayeh Sojoudi

Abstract: The reliable deployment of neural networks in control systems requires rigorous robustness guarantees. In this paper, we obtain tight robustness certificates over convex attack sets for min-max representations of ReLU neural networks by developing a convex reformulation of the nonconvex certification problem. This is done by "lifting" the problem to an infinite-dimensional optimization over probab… ▽ More The reliable deployment of neural networks in control systems requires rigorous robustness guarantees. In this paper, we obtain tight robustness certificates over convex attack sets for min-max representations of ReLU neural networks by developing a convex reformulation of the nonconvex certification problem. This is done by "lifting" the problem to an infinite-dimensional optimization over probability measures, leveraging recent results in distributionally robust optimization to solve for an optimal discrete distribution, and proving that solutions of the original nonconvex problem are generated by the discrete distribution under mild boundedness, nonredundancy, and Slater conditions. As a consequence, optimal (worst-case) attacks against the model may be solved for exactly. This contrasts prior state-of-the-art that either requires expensive branch-and-bound schemes or loose relaxation techniques. Experiments on robust control and MNIST image classification examples highlight the benefits of our approach. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: IEEE Conference on Decision and Control, 2023

arXiv:2309.13794 [pdf, other]

Projected Randomized Smoothing for Certified Adversarial Robustness

Authors: Samuel Pfrommer, Brendon G. Anderson, Somayeh Sojoudi

Abstract: Randomized smoothing is the current state-of-the-art method for producing provably robust classifiers. While randomized smoothing typically yields robust $\ell_2$-ball certificates, recent research has generalized provable robustness to different norm balls as well as anisotropic regions. This work considers a classifier architecture that first projects onto a low-dimensional approximation of the… ▽ More Randomized smoothing is the current state-of-the-art method for producing provably robust classifiers. While randomized smoothing typically yields robust $\ell_2$-ball certificates, recent research has generalized provable robustness to different norm balls as well as anisotropic regions. This work considers a classifier architecture that first projects onto a low-dimensional approximation of the data manifold and then applies a standard classifier. By performing randomized smoothing in the low-dimensional projected space, we characterize the certified region of our smoothed composite classifier back in the high-dimensional input space and prove a tractable lower bound on its volume. We show experimentally on CIFAR-10 and SVHN that classifiers without the initial projection are vulnerable to perturbations that are normal to the data manifold and yet are captured by the certified regions of our method. We compare the volume of our certified regions against various baselines and show that our method improves on the state-of-the-art by many orders of magnitude. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: Transactions on Machine Learning Research (TMLR) 2023

arXiv:2307.03276 [pdf, other]

Analyzing the Performance Portability of Tensor Decomposition

Authors: S. Isaac Geronimo Anderson, Keita Teranishi, Daniel M. Dunlavy, Jee Choi

Abstract: We employ pressure point analysis and roofline modeling to identify performance bottlenecks and determine an upper bound on the performance of the Canonical Polyadic Alternating Poisson Regression Multiplicative Update (CP-APR MU) algorithm in the SparTen software library. Our analyses reveal that a particular matrix computation, $Φ^{(n)}$, is the critical performance bottleneck in the SparTen CP-… ▽ More We employ pressure point analysis and roofline modeling to identify performance bottlenecks and determine an upper bound on the performance of the Canonical Polyadic Alternating Poisson Regression Multiplicative Update (CP-APR MU) algorithm in the SparTen software library. Our analyses reveal that a particular matrix computation, $Φ^{(n)}$, is the critical performance bottleneck in the SparTen CP-APR MU implementation. Moreover, we find that atomic operations are not a critical bottleneck while higher cache reuse can provide a non-trivial performance improvement. We also utilize grid search on the Kokkos library parallel policy parameters to achieve 2.25x average speedup over the SparTen default for $Φ^{(n)}$ computation on CPU and 1.70x on GPU. We conclude our investigations by comparing Kokkos implementations of the STREAM benchmark and the matricized tensor times Khatri-Rao product (MTTKRP) benchmark from the Parallel Sparse Tensor Algorithm (PASTA) benchmark suite to implementations using vendor libraries. We show that with a single implementation Kokkos achieves performance comparable to hand-tuned code for fundamental operations that make up tensor decomposition kernels on a wide range of CPU and GPU systems. Overall, we conclude that Kokkos demonstrates good performance portability for simple data-intensive operations but requires tuning for algorithms with more complex dependencies and data access patterns. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: 28 pages, 19 figures

ACM Class: C.1.2; C.1.4; D.4.8; G.4

arXiv:2302.10360 [pdf, other]

Optical Transformers

Authors: Maxwell G. Anderson, Shi-Yuan Ma, Tianyu Wang, Logan G. Wright, Peter L. McMahon

Abstract: The rapidly increasing size of deep-learning models has caused renewed and growing interest in alternatives to digital computers to dramatically reduce the energy cost of running state-of-the-art neural networks. Optical matrix-vector multipliers are best suited to performing computations with very large operands, which suggests that large Transformer models could be a good target for optical comp… ▽ More The rapidly increasing size of deep-learning models has caused renewed and growing interest in alternatives to digital computers to dramatically reduce the energy cost of running state-of-the-art neural networks. Optical matrix-vector multipliers are best suited to performing computations with very large operands, which suggests that large Transformer models could be a good target for optical computing. To test this idea, we performed small-scale optical experiments with a prototype accelerator to demonstrate that Transformer operations can run on optical hardware despite noise and errors. Using simulations, validated by our experiments, we then explored the energy efficiency of optical implementations of Transformers and identified scaling laws for model performance with respect to optical energy usage. We found that the optical energy per multiply-accumulate (MAC) scales as $\frac{1}{d}$ where $d$ is the Transformer width, an asymptotic advantage over digital systems. We conclude that with well-engineered, large-scale optical hardware, it may be possible to achieve a $100 \times$ energy-efficiency advantage for running some of the largest current Transformer models, and that if both the models and the optical hardware are scaled to the quadrillion-parameter regime, optical computers could have a $>8,000\times$ energy-efficiency advantage over state-of-the-art digital-electronic processors that achieve 300 fJ/MAC. We analyzed how these results motivate and inform the construction of future optical accelerators along with optics-amenable deep-learning approaches. With assumptions about future improvements to electronics and Transformer quantization techniques (5$\times$ cheaper memory access, double the digital--analog conversion efficiency, and 4-bit precision), we estimated that optical computers' advantage against current 300-fJ/MAC digital processors could grow to $>100,000\times$. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: 27 pages, 13 figures

Journal ref: Transactions on Machine Learning Research, 03/2024, https://openreview.net/forum?id=Xxw0edFFQC

arXiv:2302.01961 [pdf, other]

Asymmetric Certified Robustness via Feature-Convex Neural Networks

Authors: Samuel Pfrommer, Brendon G. Anderson, Julien Piet, Somayeh Sojoudi

Abstract: Recent works have introduced input-convex neural networks (ICNNs) as learning models with advantageous training, inference, and generalization properties linked to their convex structure. In this paper, we propose a novel feature-convex neural network architecture as the composition of an ICNN with a Lipschitz feature map in order to achieve adversarial robustness. We consider the asymmetric binar… ▽ More Recent works have introduced input-convex neural networks (ICNNs) as learning models with advantageous training, inference, and generalization properties linked to their convex structure. In this paper, we propose a novel feature-convex neural network architecture as the composition of an ICNN with a Lipschitz feature map in order to achieve adversarial robustness. We consider the asymmetric binary classification setting with one "sensitive" class, and for this class we prove deterministic, closed-form, and easily-computable certified robust radii for arbitrary $\ell_p$-norms. We theoretically justify the use of these models by characterizing their decision region geometry, extending the universal approximation theorem for ICNN regression to the classification setting, and proving a lower bound on the probability that such models perfectly fit even unstructured uniformly distributed data in sufficiently high dimensions. Experiments on Malimg malware classification and subsets of MNIST, Fashion-MNIST, and CIFAR-10 datasets show that feature-convex classifiers attain state-of-the-art certified $\ell_1$-radii as well as substantial $\ell_2$- and $\ell_{\infty}$-radii while being far more computationally efficient than any competitive baseline. △ Less

Submitted 10 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2301.12554 [pdf, other]

Improving the Accuracy-Robustness Trade-Off of Classifiers via Adaptive Smoothing

Authors: Yatong Bai, Brendon G. Anderson, Aerin Kim, Somayeh Sojoudi

Abstract: While prior research has proposed a plethora of methods that build neural classifiers robust against adversarial robustness, practitioners are still reluctant to adopt them due to their unacceptably severe clean accuracy penalties. This paper significantly alleviates this accuracy-robustness trade-off by mixing the output probabilities of a standard classifier and a robust classifier, where the st… ▽ More While prior research has proposed a plethora of methods that build neural classifiers robust against adversarial robustness, practitioners are still reluctant to adopt them due to their unacceptably severe clean accuracy penalties. This paper significantly alleviates this accuracy-robustness trade-off by mixing the output probabilities of a standard classifier and a robust classifier, where the standard network is optimized for clean accuracy and is not robust in general. We show that the robust base classifier's confidence difference for correct and incorrect examples is the key to this improvement. In addition to providing intuitions and empirical evidence, we theoretically certify the robustness of the mixed classifier under realistic assumptions. Furthermore, we adapt an adversarial input detector into a mixing network that adaptively adjusts the mixture of the two base models, further reducing the accuracy penalty of achieving robustness. The proposed flexible method, termed "adaptive smoothing", can work in conjunction with existing or even future methods that improve clean accuracy, robustness, or adversary detection. Our empirical evaluation considers strong attack methods, including AutoAttack and adaptive attack. On the CIFAR-100 dataset, our method achieves an 85.21% clean accuracy while maintaining a 38.72% $\ell_\infty$-AutoAttacked ($ε= 8/255$) accuracy, becoming the second most robust method on the RobustBench CIFAR-100 benchmark as of submission, while improving the clean accuracy by ten percentage points compared with all listed models. The code that implements our method is available at https://github.com/Bai-YT/AdaptiveSmoothing. △ Less

Submitted 21 July, 2024; v1 submitted 29 January, 2023; originally announced January 2023.

MSC Class: 68T07

arXiv:2301.11374 [pdf, other]

Certifiably Robust Reinforcement Learning through Model-Based Abstract Interpretation

Authors: Chenxi Yang, Greg Anderson, Swarat Chaudhuri

Abstract: We present a reinforcement learning (RL) framework in which the learned policy comes with a machine-checkable certificate of provable adversarial robustness. Our approach, called CAROL, learns a model of the environment. In each learning iteration, it uses the current version of this model and an external abstract interpreter to construct a differentiable signal for provable robustness. This signa… ▽ More We present a reinforcement learning (RL) framework in which the learned policy comes with a machine-checkable certificate of provable adversarial robustness. Our approach, called CAROL, learns a model of the environment. In each learning iteration, it uses the current version of this model and an external abstract interpreter to construct a differentiable signal for provable robustness. This signal is used to guide learning, and the abstract interpretation used to construct it directly leads to the robustness certificate returned at convergence. We give a theoretical analysis that bounds the worst-case accumulative reward of CAROL. We also experimentally evaluate CAROL on four MuJoCo environments with continuous state and action spaces. On these tasks, CAROL learns policies that, when contrasted with policies from the state-of-the-art robust RL algorithms, exhibit: (i) markedly enhanced certified performance lower bounds; and (ii) comparable performance under empirical adversarial attacks. △ Less

Submitted 26 May, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2209.14148 [pdf, other]

Guiding Safe Exploration with Weakest Preconditions

Authors: Greg Anderson, Swarat Chaudhuri, Isil Dillig

Abstract: In reinforcement learning for safety-critical settings, it is often desirable for the agent to obey safety constraints at all points in time, including during training. We present a novel neurosymbolic approach called SPICE to solve this safe exploration problem. SPICE uses an online shielding layer based on symbolic weakest preconditions to achieve a more precise safety analysis than existing too… ▽ More In reinforcement learning for safety-critical settings, it is often desirable for the agent to obey safety constraints at all points in time, including during training. We present a novel neurosymbolic approach called SPICE to solve this safe exploration problem. SPICE uses an online shielding layer based on symbolic weakest preconditions to achieve a more precise safety analysis than existing tools without unduly impacting the training process. We evaluate the approach on a suite of continuous control benchmarks and show that it can achieve comparable performance to existing safe learning techniques while incurring fewer safety violations. Additionally, we present theoretical results showing that SPICE converges to the optimal safe policy under reasonable assumptions. △ Less

Submitted 27 February, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

arXiv:2208.07464 [pdf, other]

An Overview and Prospective Outlook on Robust Training and Certification of Machine Learning Models

Authors: Brendon G. Anderson, Tanmay Gautam, Somayeh Sojoudi

Abstract: In this discussion paper, we survey recent research surrounding robustness of machine learning models. As learning algorithms become increasingly more popular in data-driven control systems, their robustness to data uncertainty must be ensured in order to maintain reliable safety-critical operations. We begin by reviewing common formalisms for such robustness, and then move on to discuss popular a… ▽ More In this discussion paper, we survey recent research surrounding robustness of machine learning models. As learning algorithms become increasingly more popular in data-driven control systems, their robustness to data uncertainty must be ensured in order to maintain reliable safety-critical operations. We begin by reviewing common formalisms for such robustness, and then move on to discuss popular and state-of-the-art techniques for training robust machine learning models as well as methods for provably certifying such robustness. From this unification of robust machine learning, we identify and discuss pressing directions for future research in the area. △ Less

Submitted 27 September, 2022; v1 submitted 15 August, 2022; originally announced August 2022.

arXiv:2207.14293 [pdf, other]

doi 10.1038/s41566-023-01170-8

Image sensing with multilayer, nonlinear optical neural networks

Authors: Tianyu Wang, Mandar M. Sohoni, Logan G. Wright, Martin M. Stein, Shi-Yuan Ma, Tatsuhiro Onodera, Maxwell G. Anderson, Peter L. McMahon

Abstract: Optical imaging is commonly used for both scientific and technological applications across industry and academia. In image sensing, a measurement, such as of an object's position, is performed by computational analysis of a digitized image. An emerging image-sensing paradigm breaks this delineation between data collection and analysis by designing optical components to perform not imaging, but enc… ▽ More Optical imaging is commonly used for both scientific and technological applications across industry and academia. In image sensing, a measurement, such as of an object's position, is performed by computational analysis of a digitized image. An emerging image-sensing paradigm breaks this delineation between data collection and analysis by designing optical components to perform not imaging, but encoding. By optically encoding images into a compressed, low-dimensional latent space suitable for efficient post-analysis, these image sensors can operate with fewer pixels and fewer photons, allowing higher-throughput, lower-latency operation. Optical neural networks (ONNs) offer a platform for processing data in the analog, optical domain. ONN-based sensors have however been limited to linear processing, but nonlinearity is a prerequisite for depth, and multilayer NNs significantly outperform shallow NNs on many tasks. Here, we realize a multilayer ONN pre-processor for image sensing, using a commercial image intensifier as a parallel optoelectronic, optical-to-optical nonlinear activation function. We demonstrate that the nonlinear ONN pre-processor can achieve compression ratios of up to 800:1 while still enabling high accuracy across several representative computer-vision tasks, including machine-vision benchmarks, flow-cytometry image classification, and identification of objects in real scenes. In all cases we find that the ONN's nonlinearity and depth allowed it to outperform a purely linear ONN encoder. Although our experiments are specialized to ONN sensors for incoherent-light images, alternative ONN platforms should facilitate a range of ONN sensors. These ONN sensors may surpass conventional sensors by pre-processing optical information in spatial, temporal, and/or spectral dimensions, potentially with coherent and quantum qualities, all natively in the optical domain. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Journal ref: Nat. Photon. 18, 1-8 (2023)

arXiv:2106.00089 [pdf, other]

Node-Variant Graph Filters in Graph Neural Networks

Authors: Fernando Gama, Brendon G. Anderson, Somayeh Sojoudi

Abstract: Graph neural networks (GNNs) have been successfully employed in a myriad of applications involving graph signals. Theoretical findings establish that GNNs use nonlinear activation functions to create low-eigenvalue frequency content that can be processed in a stable manner by subsequent graph convolutional filters. However, the exact shape of the frequency content created by nonlinear functions is… ▽ More Graph neural networks (GNNs) have been successfully employed in a myriad of applications involving graph signals. Theoretical findings establish that GNNs use nonlinear activation functions to create low-eigenvalue frequency content that can be processed in a stable manner by subsequent graph convolutional filters. However, the exact shape of the frequency content created by nonlinear functions is not known and cannot be learned. In this work, we use node-variant graph filters (NVGFs) -- which are linear filters capable of creating frequencies -- as a means of investigating the role that frequency creation plays in GNNs. We show that, by replacing nonlinear activation functions by NVGFs, frequency creation mechanisms can be designed or learned. By doing so, the role of frequency creation is separated from the nonlinear nature of traditional GNNs. Simulations on graph signal processing problems are carried out to pinpoint the role of frequency creation. △ Less

Submitted 4 March, 2022; v1 submitted 31 May, 2021; originally announced June 2021.

arXiv:2104.14335 [pdf, other]

ELF-VC: Efficient Learned Flexible-Rate Video Coding

Authors: Oren Rippel, Alexander G. Anderson, Kedar Tatwawadi, Sanjay Nair, Craig Lytle, Lubomir Bourdev

Abstract: While learned video codecs have demonstrated great promise, they have yet to achieve sufficient efficiency for practical deployment. In this work, we propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode (I- and P-frames only) along with a considerable increase in computational efficiency. In this setting, for natural videos our app… ▽ More While learned video codecs have demonstrated great promise, they have yet to achieve sufficient efficiency for practical deployment. In this work, we propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode (I- and P-frames only) along with a considerable increase in computational efficiency. In this setting, for natural videos our approach compares favorably across the entire R-D curve under metrics PSNR, MS-SSIM and VMAF against all mainstream video standards (H.264, H.265, AV1) and all ML codecs. At the same time, our approach runs at least 5x faster and has fewer parameters than all ML codecs which report these figures. Our contributions include a flexible-rate framework allowing a single model to cover a large and dense range of bitrates, at a negligible increase in computation and parameter count; an efficient backbone optimized for ML-based codecs; and a novel in-loop flow prediction scheme which leverages prior information towards more efficient compression. We benchmark our method, which we call ELF-VC (Efficient, Learned and Flexible Video Coding) on popular video test sets UVG and MCL-JCV under metrics PSNR, MS-SSIM and VMAF. For example, on UVG under PSNR, it reduces the BD-rate by 44% against H.264, 26% against H.265, 15% against AV1, and 35% against the current best ML codec. At the same time, on an NVIDIA Titan V GPU our approach encodes/decodes VGA at 49/91 FPS, HD 720 at 19/35 FPS, and HD 1080 at 10/18 FPS. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Journal ref: International Conference on Computer Vision, 2021

arXiv:2104.12903 [pdf, other]

Assessing the Acceptability of a Humanoid Robot for Alzheimer's Disease and Related Dementia Care Using an Online Survey

Authors: Fengpei Yuan, Joel G. Anderson, Tami Wyatt, Ruth Palan Lopez, Monica Crane, Austin Montgomery, Xiaopeng Zhao

Abstract: In this work, an online survey was used to understand the acceptability of humanoid robots and users' needs in using these robots to assist with care among people with Alzheimer's disease and related dementias (ADRD), their family caregivers, health care professionals, and the general public. From November 12, 2020 to March 13, 2021, a total of 631 complete responses were collected, including 80 r… ▽ More In this work, an online survey was used to understand the acceptability of humanoid robots and users' needs in using these robots to assist with care among people with Alzheimer's disease and related dementias (ADRD), their family caregivers, health care professionals, and the general public. From November 12, 2020 to March 13, 2021, a total of 631 complete responses were collected, including 80 responses from people with mild cognitive impairment or ADRD, 245 responses from caregivers and health care professionals, and 306 responses from the general public. Overall, people with ADRD, caregivers, and the general public showed positive attitudes towards using the robot to assist with care for people with ADRD. The top three functions of robots required by the group of people with ADRD were reminders to take medicine, emergency call service, and helping contact medical services. Additional comments, suggestions, and concerns provided by caregivers and the general public are also discussed. △ Less

Submitted 26 April, 2021; originally announced April 2021.

arXiv:2104.09684 [pdf, other]

Suppressing simulation bias using multi-modal data

Authors: Bogdan Kustowski, Jim A. Gaffney, Brian K. Spears, Gemma J. Anderson, Rushil Anirudh, Peer-Timo Bremer, Jayaraman J. Thiagarajan, Michael K. G. Kruse, Ryan C. Nora

Abstract: Many problems in science and engineering require making predictions based on few observations. To build a robust predictive model, these sparse data may need to be augmented with simulated data, especially when the design space is multi-dimensional. Simulations, however, often suffer from an inherent bias. Estimation of this bias may be poorly constrained not only because of data sparsity, but als… ▽ More Many problems in science and engineering require making predictions based on few observations. To build a robust predictive model, these sparse data may need to be augmented with simulated data, especially when the design space is multi-dimensional. Simulations, however, often suffer from an inherent bias. Estimation of this bias may be poorly constrained not only because of data sparsity, but also because traditional predictive models fit only one type of observed outputs, such as scalars or images, instead of all available output data modalities, which might have been acquired and simulated at great cost. To break this limitation and open up the path for multi-modal calibration, we propose to combine a novel, transfer learning technique for suppressing the bias with recent developments in deep learning, which allow building predictive models with multi-modal outputs. First, we train an initial neural network model on simulated data to learn important correlations between different output modalities and between simulation inputs and outputs. Then, the model is partially retrained, or transfer learned, to fit the experiments; a method that has never been implemented in this type of architecture. Using fewer than 10 inertial confinement fusion experiments for training, transfer learning systematically improves the simulation predictions while a simple output calibration, which we design as a baseline, makes the predictions worse. We also offer extensive cross-validation with real and carefully designed synthetic data. The method described in this paper can be applied to a wide range of problems that require transferring knowledge from simulations to the domain of experiments. △ Less

Submitted 15 March, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

Report number: LLNL-JRNL-829622

arXiv:2101.09306 [pdf, other]

Towards Optimal Branching of Linear and Semidefinite Relaxations for Neural Network Robustness Certification

Authors: Brendon G. Anderson, Ziye Ma, Jingqi Li, Somayeh Sojoudi

Abstract: In this paper, we study certifying the robustness of ReLU neural networks against adversarial input perturbations. To diminish the relaxation error suffered by the popular linear programming (LP) and semidefinite programming (SDP) certification methods, we take a branch-and-bound approach to propose partitioning the input uncertainty set and solving the relaxations on each part separately. We show… ▽ More In this paper, we study certifying the robustness of ReLU neural networks against adversarial input perturbations. To diminish the relaxation error suffered by the popular linear programming (LP) and semidefinite programming (SDP) certification methods, we take a branch-and-bound approach to propose partitioning the input uncertainty set and solving the relaxations on each part separately. We show that this approach reduces relaxation error, and that the error is eliminated entirely upon performing an LP relaxation with a partition intelligently designed to exploit the nature of the ReLU activations. To scale this approach to large networks, we consider using a coarser partition whereby the number of parts in the partition is reduced. We prove that computing such a coarse partition that directly minimizes the LP relaxation error is NP-hard. By instead minimizing the worst-case LP relaxation error, we develop a closed-form branching scheme in the single-hidden layer case. We extend the analysis to the SDP, where the feasible set geometry is exploited to design a branching scheme that minimizes the worst-case SDP relaxation error. Experiments on MNIST, CIFAR-10, and Wisconsin breast cancer diagnosis classifiers demonstrate significant increases in the percentages of test samples certified. By independently increasing the input size and the number of layers, we empirically illustrate under which regimes the branched LP and branched SDP are best applied. Finally, we extend our LP branching method into a multi-layer branching heuristic, which attains comparable performance to prior state-of-the-art heuristics on large-scale, deep neural network certification benchmarks. △ Less

Submitted 10 May, 2025; v1 submitted 22 January, 2021; originally announced January 2021.

Comments: Accepted for publication in the Journal of Machine Learning Research (JMLR). This is an extension of our IEEE CDC 2020 conference paper arXiv:2004.00570

arXiv:2010.13749 [pdf, other]

Meaningful uncertainties from deep neural network surrogates of large-scale numerical simulations

Authors: Gemma J. Anderson, Jim A. Gaffney, Brian K. Spears, Peer-Timo Bremer, Rushil Anirudh, Jayaraman J. Thiagarajan

Abstract: Large-scale numerical simulations are used across many scientific disciplines to facilitate experimental development and provide insights into underlying physical processes, but they come with a significant computational cost. Deep neural networks (DNNs) can serve as highly-accurate surrogate models, with the capacity to handle diverse datatypes, offering tremendous speed-ups for prediction and ma… ▽ More Large-scale numerical simulations are used across many scientific disciplines to facilitate experimental development and provide insights into underlying physical processes, but they come with a significant computational cost. Deep neural networks (DNNs) can serve as highly-accurate surrogate models, with the capacity to handle diverse datatypes, offering tremendous speed-ups for prediction and many other downstream tasks. An important use-case for these surrogates is the comparison between simulations and experiments; prediction uncertainty estimates are crucial for making such comparisons meaningful, yet standard DNNs do not provide them. In this work we define the fundamental requirements for a DNN to be useful for scientific applications, and demonstrate a general variational inference approach to equip predictions of scalar and image data from a DNN surrogate model trained on inertial confinement fusion simulations with calibrated Bayesian uncertainties. Critically, these uncertainties are interpretable, meaningful and preserve physics-correlations in the predicted quantities. △ Less

Submitted 26 October, 2020; originally announced October 2020.

arXiv:2010.07532

Certifying Neural Network Robustness to Random Input Noise from Samples

Authors: Brendon G. Anderson, Somayeh Sojoudi

Abstract: Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial input uncertainty, but researchers have recently shown a need for methods that consider random uncertainty. In this paper, we propose a novel robustness certification method that upper bounds the p… ▽ More Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial input uncertainty, but researchers have recently shown a need for methods that consider random uncertainty. In this paper, we propose a novel robustness certification method that upper bounds the probability of misclassification when the input noise follows an arbitrary probability distribution. This bound is cast as a chance-constrained optimization problem, which is then reformulated using input-output samples to replace the optimization constraints. The resulting optimization reduces to a linear program with an analytical solution. Furthermore, we develop a sufficient condition on the number of samples needed to make the misclassification bound hold with overwhelming probability. Our case studies on MNIST classifiers show that this method is able to certify a uniform infinity-norm uncertainty region with a radius of nearly 50 times larger than what the current state-of-the-art method can certify. △ Less

Submitted 25 January, 2023; v1 submitted 15 October, 2020; originally announced October 2020.

Comments: This paper has been superseded by arXiv:2010.01171 (merged from arXiv:2010.01171v1 and arXiv:2010.07532)

arXiv:2010.01171 [pdf, other]

Data-Driven Certification of Neural Networks with Random Input Noise

Authors: Brendon G. Anderson, Somayeh Sojoudi

Abstract: Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial or worst-case inputs, but researchers have recently shown a need for methods that consider random input noise. In this paper, we examine the setting where inputs are subject to random noise coming… ▽ More Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial or worst-case inputs, but researchers have recently shown a need for methods that consider random input noise. In this paper, we examine the setting where inputs are subject to random noise coming from an arbitrary probability distribution. We propose a robustness certification method that lower-bounds the probability that network outputs are safe. This bound is cast as a chance-constrained optimization problem, which is then reformulated using input-output samples to make the optimization constraints tractable. We develop sufficient conditions for the resulting optimization to be convex, as well as on the number of samples needed to make the robustness bound hold with overwhelming probability. We show for a special case that the proposed optimization reduces to an intuitive closed-form solution. Case studies on synthetic, MNIST, and CIFAR-10 networks experimentally demonstrate that this method is able to certify robustness against various input noise regimes over larger uncertainty regions than prior state-of-the-art techniques. △ Less

Submitted 25 January, 2023; v1 submitted 2 October, 2020; originally announced October 2020.

Comments: IEEE Transactions on Control of Network Systems, 2022. This work is a merge of arXiv:2010.01171v1 and arXiv:2010.07532

arXiv:2009.12612 [pdf, other]

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Authors: Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri

Abstract: We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a… ▽ More We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that Revel enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration. △ Less

Submitted 26 October, 2020; v1 submitted 26 September, 2020; originally announced September 2020.

arXiv:2005.02328 [pdf, other]

doi 10.1038/s41467-020-19448-8

Designing Accurate Emulators for Scientific Processes using Calibration-Driven Deep Models

Authors: Jayaraman J. Thiagarajan, Bindya Venkatesh, Rushil Anirudh, Peer-Timo Bremer, Jim Gaffney, Gemma Anderson, Brian Spears

Abstract: Predictive models that accurately emulate complex scientific processes can achieve exponential speed-ups over numerical simulators or experiments, and at the same time provide surrogates for improving the subsequent analysis. Consequently, there is a recent surge in utilizing modern machine learning (ML) methods, such as deep neural networks, to build data-driven emulators. While the majority of e… ▽ More Predictive models that accurately emulate complex scientific processes can achieve exponential speed-ups over numerical simulators or experiments, and at the same time provide surrogates for improving the subsequent analysis. Consequently, there is a recent surge in utilizing modern machine learning (ML) methods, such as deep neural networks, to build data-driven emulators. While the majority of existing efforts has focused on tailoring off-the-shelf ML solutions to better suit the scientific problem at hand, we study an often overlooked, yet important, problem of choosing loss functions to measure the discrepancy between observed data and the predictions from a model. Due to lack of better priors on the expected residual structure, in practice, simple choices such as the mean squared error and the mean absolute error are made. However, the inherent symmetric noise assumption made by these loss functions makes them inappropriate in cases where the data is heterogeneous or when the noise distribution is asymmetric. We propose Learn-by-Calibrating (LbC), a novel deep learning approach based on interval calibration for designing emulators in scientific applications, that are effective even with heterogeneous data and are robust to outliers. Using a large suite of use-cases, we show that LbC provides significant improvements in generalization error over widely-adopted loss function choices, achieves high-quality emulators even in small data regimes and more importantly, recovers the inherent noise structure without any explicit priors. △ Less

Submitted 5 May, 2020; originally announced May 2020.

arXiv:2004.00570 [pdf, ps, other]

Tightened Convex Relaxations for Neural Network Robustness Certification

Authors: Brendon G. Anderson, Ziye Ma, Jingqi Li, Somayeh Sojoudi

Abstract: In this paper, we consider the problem of certifying the robustness of neural networks to perturbed and adversarial input data. Such certification is imperative for the application of neural networks in safety-critical decision-making and control systems. Certification techniques using convex optimization have been proposed, but they often suffer from relaxation errors that void the certificate. O… ▽ More In this paper, we consider the problem of certifying the robustness of neural networks to perturbed and adversarial input data. Such certification is imperative for the application of neural networks in safety-critical decision-making and control systems. Certification techniques using convex optimization have been proposed, but they often suffer from relaxation errors that void the certificate. Our work exploits the structure of ReLU networks to improve relaxation errors through a novel partition-based certification procedure. The proposed method is proven to tighten existing linear programming relaxations, and asymptotically achieves zero relaxation error as the partition is made finer. We develop a finite partition that attains zero relaxation error and use the result to derive a tractable partitioning scheme that minimizes the worst-case relaxation error. Experiments using real data show that the partitioning procedure is able to issue robustness certificates in cases where prior methods fail. Consequently, partition-based certification procedures are found to provide an intuitive, effective, and theoretically justified method for tightening existing convex relaxation techniques. △ Less

Submitted 17 September, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

Comments: Proceedings of the 59th IEEE Conference on Decision and Control, 2020

arXiv:1907.04409 [pdf, other]

Global Optimality Guarantees for Nonconvex Unsupervised Video Segmentation

Authors: Brendon G. Anderson, Somayeh Sojoudi

Abstract: In this paper, we consider the problem of unsupervised video object segmentation via background subtraction. Specifically, we pose the nonsemantic extraction of a video's moving objects as a nonconvex optimization problem via a sum of sparse and low-rank matrices. The resulting formulation, a nonnegative variant of robust principal component analysis, is more computationally tractable than its com… ▽ More In this paper, we consider the problem of unsupervised video object segmentation via background subtraction. Specifically, we pose the nonsemantic extraction of a video's moving objects as a nonconvex optimization problem via a sum of sparse and low-rank matrices. The resulting formulation, a nonnegative variant of robust principal component analysis, is more computationally tractable than its commonly employed convex relaxation, although not generally solvable to global optimality. In spite of this limitation, we derive intuitive and interpretable conditions on the video data under which the uniqueness and global optimality of the object segmentation are guaranteed using local search methods. We illustrate these novel optimality criteria through example segmentations using real video data. △ Less

Submitted 22 February, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

Comments: Proceedings of the 57th Annual Allerton Conference on Communication, Control, and Computing, 2019; added funding source information and notation definitions

Journal ref: Proceedings of the 57th Annual Allerton Conference on Communication, Control, and Computing, pp. 965--972, 2019

arXiv:1904.09959 [pdf, other]

doi 10.1145/3314221.3314614

Optimization and Abstraction: A Synergistic Approach for Analyzing Neural Network Robustness

Authors: Greg Anderson, Shankara Pailoor, Isil Dillig, Swarat Chaudhuri

Abstract: In recent years, the notion of local robustness (or robustness for short) has emerged as a desirable property of deep neural networks. Intuitively, robustness means that small perturbations to an input do not cause the network to perform misclassifications. In this paper, we present a novel algorithm for verifying robustness properties of neural networks. Our method synergistically combines gradie… ▽ More In recent years, the notion of local robustness (or robustness for short) has emerged as a desirable property of deep neural networks. Intuitively, robustness means that small perturbations to an input do not cause the network to perform misclassifications. In this paper, we present a novel algorithm for verifying robustness properties of neural networks. Our method synergistically combines gradient-based optimization methods for counterexample search with abstraction-based proof search to obtain a sound and (δ-)complete decision procedure. Our method also employs a data-driven approach to learn a verification policy that guides abstract interpretation during proof search. We have implemented the proposed approach in a tool called Charon and experimentally evaluated it on hundreds of benchmarks. Our experiments show that the proposed approach significantly outperforms three state-of-the-art tools, namely AI^2 , Reluplex, and Reluval. △ Less

Submitted 1 May, 2019; v1 submitted 22 April, 2019; originally announced April 2019.

arXiv:1811.06981 [pdf, other]

Learned Video Compression

Authors: Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, Lubomir Bourdev

Abstract: We present a new algorithm for video coding, learned end-to-end for the low-latency mode. In this setting, our approach outperforms all existing video codecs across nearly the entire bitrate range. To our knowledge, this is the first ML-based method to do so. We evaluate our approach on standard video compression test sets of varying resolutions, and benchmark against all mainstream commercial c… ▽ More We present a new algorithm for video coding, learned end-to-end for the low-latency mode. In this setting, our approach outperforms all existing video codecs across nearly the entire bitrate range. To our knowledge, this is the first ML-based method to do so. We evaluate our approach on standard video compression test sets of varying resolutions, and benchmark against all mainstream commercial codecs, in the low-latency mode. On standard-definition videos, relative to our algorithm, HEVC/H.265, AVC/H.264 and VP9 typically produce codes up to 60% larger. On high-definition 1080p videos, H.265 and VP9 typically produce codes up to 20% larger, and H.264 up to 35% larger. Furthermore, our approach does not suffer from blocking artifacts and pixelation, and thus produces videos that are more visually pleasing. We propose two main contributions. The first is a novel architecture for video compression, which (1) generalizes motion estimation to perform any learned compensation beyond simple translations, (2) rather than strictly relying on previously transmitted reference frames, maintains a state of arbitrary information learned by the model, and (3) enables jointly compressing all transmitted signals (such as optical flow and residual). Secondly, we present a framework for ML-based spatial rate control: namely, a mechanism for assigning variable bitrates across space for each frame. This is a critical component for video coding, which to our knowledge had not been developed within a machine learning setting. △ Less

Submitted 16 November, 2018; originally announced November 2018.

arXiv:1804.04152 [pdf, other]

Learning Abstractions for Program Synthesis

Authors: Xinyu Wang, Greg Anderson, Isil Dillig, K. L. McMillan

Abstract: Many example-guided program synthesis techniques use abstractions to prune the search space. While abstraction-based synthesis has proven to be very powerful, a domain expert needs to provide a suitable abstract domain, together with the abstract transformers of each DSL construct. However, coming up with useful abstractions can be non-trivial, as it requires both domain expertise and knowledge ab… ▽ More Many example-guided program synthesis techniques use abstractions to prune the search space. While abstraction-based synthesis has proven to be very powerful, a domain expert needs to provide a suitable abstract domain, together with the abstract transformers of each DSL construct. However, coming up with useful abstractions can be non-trivial, as it requires both domain expertise and knowledge about the synthesizer. In this paper, we propose a new technique for learning abstractions that are useful for instantiating a general synthesis framework in a new domain. Given a DSL and a small set of training problems, our method uses tree interpolation to infer reusable predicate templates that speed up synthesis in a given domain. Our method also learns suitable abstract transformers by solving a certain kind of second-order constraint solving problem in a data-driven way. We have implemented the proposed method in a tool called ATLAS and evaluate it in the context of the BLAZE meta-synthesizer. Our evaluation shows that (a) ATLAS can learn useful abstract domains and transformers from few training problems, and (b) the abstractions learned by ATLAS allow BLAZE to achieve significantly better results compared to manually-crafted abstractions. △ Less

Submitted 11 April, 2018; originally announced April 2018.

arXiv:1705.07199 [pdf, other]

The High-Dimensional Geometry of Binary Neural Networks

Authors: Alexander G. Anderson, Cory P. Berg

Abstract: Recent research has shown that one can train a neural network with binary weights and activations at train time by augmenting the weights with a high-precision continuous latent variable that accumulates small changes from stochastic gradient descent. However, there is a dearth of theoretical analysis to explain why we can effectively capture the features in our data with binary weights and activa… ▽ More Recent research has shown that one can train a neural network with binary weights and activations at train time by augmenting the weights with a high-precision continuous latent variable that accumulates small changes from stochastic gradient descent. However, there is a dearth of theoretical analysis to explain why we can effectively capture the features in our data with binary weights and activations. Our main result is that the neural networks with binary weights and activations trained using the method of Courbariaux, Hubara et al. (2016) work because of the high-dimensional geometry of binary vectors. In particular, the ideal continuous vectors that extract out features in the intermediate representations of these BNNs are well-approximated by binary vectors in the sense that dot products are approximately preserved. Compared to previous research that demonstrated the viability of such BNNs, our work explains why these BNNs work in terms of the HD geometry. Our theory serves as a foundation for understanding not only BNNs but a variety of methods that seek to compress traditional neural networks. Furthermore, a better understanding of multilayer binary neural networks serves as a starting point for generalizing BNNs to other neural network architectures such as recurrent neural networks. △ Less

Submitted 19 May, 2017; originally announced May 2017.

Comments: 12 pages, 4 Figures

arXiv:1606.07792 [pdf, other]

Wide & Deep Learning for Recommender Systems

Authors: Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah

Abstract: Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks… ▽ More Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow. △ Less

Submitted 24 June, 2016; originally announced June 2016.

arXiv:1605.08153 [pdf, other]

DeepMovie: Using Optical Flow and Deep Neural Networks to Stylize Movies

Authors: Alexander G. Anderson, Cory P. Berg, Daniel P. Mossing, Bruno A. Olshausen

Abstract: A recent paper by Gatys et al. describes a method for rendering an image in the style of another image. First, they use convolutional neural network features to build a statistical model for the style of an image. Then they create a new image with the content of one image but the style statistics of another image. Here, we extend this method to render a movie in a given artistic style. The naive s… ▽ More A recent paper by Gatys et al. describes a method for rendering an image in the style of another image. First, they use convolutional neural network features to build a statistical model for the style of an image. Then they create a new image with the content of one image but the style statistics of another image. Here, we extend this method to render a movie in a given artistic style. The naive solution that independently renders each frame produces poor results because the features of the style move substantially from one frame to the next. The other naive method that initializes the optimization for the next frame using the rendered version of the previous frame also produces poor results because the features of the texture stay fixed relative to the frame of the movie instead of moving with objects in the scene. The main contribution of this paper is to use optical flow to initialize the style transfer optimization so that the texture features move with the objects in the video. Finally, we suggest a method to incorporate optical flow explicitly into the cost function. △ Less

Submitted 26 May, 2016; originally announced May 2016.

Comments: 11 pages, 5 figures

arXiv:1410.4273 [pdf, other]

An Efficient Algorithm for Unweighted Spectral Graph Sparsification

Authors: David G. Anderson, Ming Gu, Christopher Melgaard

Abstract: Spectral graph sparsification has emerged as a powerful tool in the analysis of large-scale networks by reducing the overall number of edges, while maintaining a comparable graph Laplacian matrix. In this paper, we present an efficient algorithm for the construction of a new type of spectral sparsifier, the unweighted spectral sparsifier. Given a general undirected and unweighted graph… ▽ More Spectral graph sparsification has emerged as a powerful tool in the analysis of large-scale networks by reducing the overall number of edges, while maintaining a comparable graph Laplacian matrix. In this paper, we present an efficient algorithm for the construction of a new type of spectral sparsifier, the unweighted spectral sparsifier. Given a general undirected and unweighted graph $G = (V, E)$ and an integer $\ell < |E|$ (the number of edges in $E$), we compute an unweighted graph $H = (V, F)$ with $F \subset E$ and $|F| = \ell$ such that for every $x \in \mathbb{R}^{V}$ \[ {\displaystyle \frac{x^T L_G x}κ \leq x^T L_H x \leq x^T L_G x,} \] where $L_G$ and $L_H$ are the Laplacian matrices for $G$ and $H$, respectively, and $κ\geq 1$ is a slowly-varying function of $|V|, |E|$ and $\ell$. This work addresses the open question of the existence of unweighted graph sparsifiers for unweighted graphs. Additionally, our algorithm can efficiently compute unweighted graph sparsifiers for weighted graphs, leading to sparsified graphs that retain the weights of the original graphs. △ Less

Submitted 15 December, 2014; v1 submitted 15 October, 2014; originally announced October 2014.

arXiv:1406.0117 [pdf, ps, other]

doi 10.1145/2554850.2554920

EACOF: A Framework for Providing Energy Transparency to enable Energy-Aware Software Development

Authors: Hayden Field, Glen Anderson, Kerstin Eder

Abstract: Making energy consumption data accessible to software developers is an essential step towards energy efficient software engineering. The presence of various different, bespoke and incompatible, methods of instrumentation to obtain energy readings is currently limiting the widespread use of energy data in software development. This paper presents EACOF, a modular Energy-Aware Computing Framework th… ▽ More Making energy consumption data accessible to software developers is an essential step towards energy efficient software engineering. The presence of various different, bespoke and incompatible, methods of instrumentation to obtain energy readings is currently limiting the widespread use of energy data in software development. This paper presents EACOF, a modular Energy-Aware Computing Framework that provides a layer of abstraction between sources of energy data and the applications that exploit them. EACOF replaces platform specific instrumentation through two APIs - one accepts input to the framework while the other provides access to application software. This allows developers to profile their code for energy consumption in an easy and portable manner using simple API calls. We outline the design of our framework and provide details of the API functionality. In a use case, where we investigate the impact of data bit width on the energy consumption of various sorting algorithms, we demonstrate that the data obtained using EACOF provides interesting, sometimes counter-intuitive, insights. All the code is available online under an open source license. http://github.com/eacof △ Less

Submitted 31 May, 2014; originally announced June 2014.

ACM Class: D.2.8; D.2.2; D.2.13

arXiv:1310.6405 [pdf]

Utility-based Decision-making in Distributed Systems Modelling

Authors: Gabrielle Anderson, Matthew Collinson, David Pym

Abstract: We consider a calculus of resources and processes as a basis for modelling decision-making in multi-agent systems. The calculus represents the regulation of agents' choices using utility functions that take account of context. Associated with the calculus is a (Hennessy Milner-style) context sensitive modal logic of state. As an application, we show how a notion of `trust domain' can be defined fo… ▽ More We consider a calculus of resources and processes as a basis for modelling decision-making in multi-agent systems. The calculus represents the regulation of agents' choices using utility functions that take account of context. Associated with the calculus is a (Hennessy Milner-style) context sensitive modal logic of state. As an application, we show how a notion of `trust domain' can be defined for multi-agent systems. △ Less

Submitted 23 October, 2013; originally announced October 2013.

Comments: 11 pages, Contributed talk at TARK 2013 (arXiv:1310.6382) http://www.tark.org

Report number: TARK/2013/p8

arXiv:1012.4045 [pdf]

Application of Global and One-Dimensional Local Optimization to Operating System Scheduler Tuning

Authors: George Anderson, Tshilidzi Marwala, Fulufhelo Vincent Nelwamondo

Abstract: This paper describes a study of comparison of global and one-dimensional local optimization methods to operating system scheduler tuning. The operating system scheduler we use is the Linux 2.6.23 Completely Fair Scheduler (CFS) running in simulator (LinSched). We have ported the Hackbench scheduler benchmark to this simulator and use this as the workload. The global optimization approach we use is… ▽ More This paper describes a study of comparison of global and one-dimensional local optimization methods to operating system scheduler tuning. The operating system scheduler we use is the Linux 2.6.23 Completely Fair Scheduler (CFS) running in simulator (LinSched). We have ported the Hackbench scheduler benchmark to this simulator and use this as the workload. The global optimization approach we use is Particle Swarm Optimization (PSO). We make use of Response Surface Methodology (RSM) to specify optimal parameters for our PSO implementation. The one-dimensional local optimization approach we use is the Golden Section method. In order to use this approach, we convert the scheduler tuning problem from one involving setting of three parameters to one involving the manipulation of one parameter. Our results show that the global optimization approach yields better response but the one- dimensional optimization approach converges to a solution faster than the global optimization approach. △ Less

Submitted 17 December, 2010; originally announced December 2010.

Comments: Proceedings of the Twenty-First Annual Symposium of the Pattern Recognition Association of South Africa 22-23 November 2010 Stellenbosch, South Africa, pp. 7-11

arXiv:1011.1735 [pdf]

Use of Data Mining in Scheduler Optimization

Authors: George Anderson, Tshilidzi Marwala, Fulufhelo V. Nelwamondo

Abstract: The operating system's role in a computer system is to manage the various resources. One of these resources is the Central Processing Unit. It is managed by a component of the operating system called the CPU scheduler. Schedulers are optimized for typical workloads expected to run on the platform. However, a single scheduler may not be appropriate for all workloads. That is, a scheduler may schedu… ▽ More The operating system's role in a computer system is to manage the various resources. One of these resources is the Central Processing Unit. It is managed by a component of the operating system called the CPU scheduler. Schedulers are optimized for typical workloads expected to run on the platform. However, a single scheduler may not be appropriate for all workloads. That is, a scheduler may schedule a workload such that the completion time is minimized, but when another type of workload is run on the platform, scheduling and therefore completion time will not be optimal; a different scheduling algorithm, or a different set of parameters, may work better. Several approaches to solving this problem have been proposed. The objective of this survey is to summarize the approaches based on data mining, which are available in the literature. In addition to solutions that can be directly utilized for solving this problem, we are interested in data mining research in related areas that have potential for use in operating system scheduling. We also explain general technical issues involved in scheduling in modern computers, including parallel scheduling issues related to multi-core CPUs. We propose a taxonomy that classifies the scheduling approaches we discuss into different categories. △ Less

Submitted 8 November, 2010; originally announced November 2010.

Comments: 10 pages

arXiv:cs/0612128 [pdf]

SASE: Complex Event Processing over Streams

Authors: Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, Gordon Anderson

Abstract: RFID technology is gaining adoption on an increasing scale for tracking and monitoring purposes. Wide deployments of RFID devices will soon generate an unprecedented volume of data. Emerging applications require the RFID data to be filtered and correlated for complex pattern detection and transformed to events that provide meaningful, actionable information to end applications. In this work, we… ▽ More RFID technology is gaining adoption on an increasing scale for tracking and monitoring purposes. Wide deployments of RFID devices will soon generate an unprecedented volume of data. Emerging applications require the RFID data to be filtered and correlated for complex pattern detection and transformed to events that provide meaningful, actionable information to end applications. In this work, we design and develop SASE, a com-plex event processing system that performs such data-information transformation over real-time streams. We design a complex event language for specifying application logic for such transformation, devise new query processing techniques to effi-ciently implement the language, and develop a comprehensive system that collects, cleans, and processes RFID data for deliv-ery of relevant, timely information as well as storing necessary data for future querying. We demonstrate an initial prototype of SASE through a real-world retail management scenario. △ Less

Submitted 22 December, 2006; originally announced December 2006.

Comments: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, USA

Showing 1–42 of 42 results for author: Anderson, G