-
MULTI-SCOUT: Multistatic Integrated Sensing and Communications in 5G and Beyond for Moving Target Detection, Positioning, and Tracking
Authors:
Yalin E. Sagduyu,
Kemal Davaslioglu,
Tugba Erpek,
Sastry Kompella,
Gustave Anderson,
Jonathan Ashdown
Abstract:
This paper presents a complete signal-processing chain for multistatic integrated sensing and communications (ISAC) using 5G Positioning Reference Signal (PRS). We consider a distributed architecture in which one gNB transmits a periodic OFDM-PRS waveform while multiple spatially separated receivers exploit the same signal for target detection, parameter estimation and tracking. A coherent cross-a…
▽ More
This paper presents a complete signal-processing chain for multistatic integrated sensing and communications (ISAC) using 5G Positioning Reference Signal (PRS). We consider a distributed architecture in which one gNB transmits a periodic OFDM-PRS waveform while multiple spatially separated receivers exploit the same signal for target detection, parameter estimation and tracking. A coherent cross-ambiguity function (CAF) is evaluated to form a range-Doppler map from which the bistatic delay and radial velocity are extracted for every target. For a single target, the resulting bistatic delays are fused through nonlinear least-squares trilateration, yielding a geometric position estimate, and a regularized linear inversion of the radial-speed equations yields a two-dimensional velocity vector, where speed and heading are obtained. The approach is applied to 2D and 3D settings, extended to account for time synchronization bias, and generalized to multiple targets by resolving target association. The sequence of position-velocity estimates is then fed to standard and extended Kalman filters to obtain smoothed tracks. Our results show high-fidelity moving-target detection, positioning, and tracking using 5G PRS signals for multistatic ISAC.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
EHRs Data Harmonization Platform, an easy-to-use shiny app based on recodeflow for harmonizing and deriving clinical features
Authors:
Arian Aminoleslami,
Geoffrey M. Anderson,
Davide Chicco
Abstract:
Electronic health records (EHRs) contain important longitudinal information on individuals who have received medical care. Traditionally, EHRs have been used to support a wide range of administrative activities such as billing and clinical workflow, but, given the depth and breadth of clinical and demographic data they contain, they are increasingly being used to provide real-world data for resear…
▽ More
Electronic health records (EHRs) contain important longitudinal information on individuals who have received medical care. Traditionally, EHRs have been used to support a wide range of administrative activities such as billing and clinical workflow, but, given the depth and breadth of clinical and demographic data they contain, they are increasingly being used to provide real-world data for research. Although EHR data have enormous research potential, the full realization of that potential requires a data management strategy that extracts from large EHR databases, that are collected from a range of care settings and time periods, well-documented research-relevant data that can be used by different researchers. Having a common well-documented data management strategy for EHR will support reproducible research and sharing documentation on research variables that are derived from EHR variables is important to open science. In this short paper, we describe the EHRs Data Harmonization Platform. The platform is based on an easy to use web app a publicly available at https://poxotn-arian-aminoleslami.shinyapps.io/Arian/ and as a standalone software package at https://github.com/ArianAminoleslami/EHRs-Data Harmonization-Platform, that is linked to an existing R library for data harmonization called recodeflow. The platform can be used to extract, document, and harmonize variables from EHR and it can also be used to document and share research variables that have been derived from those EHR data.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
The Llama 3 Herd of Models
Authors:
Aaron Grattafiori,
Abhimanyu Dubey,
Abhinav Jauhri,
Abhinav Pandey,
Abhishek Kadian,
Ahmad Al-Dahle,
Aiesha Letman,
Akhil Mathur,
Alan Schelten,
Alex Vaughan,
Amy Yang,
Angela Fan,
Anirudh Goyal,
Anthony Hartshorn,
Aobo Yang,
Archi Mitra,
Archie Sravankumar,
Artem Korenev,
Arthur Hinsvark,
Arun Rao,
Aston Zhang,
Aurelien Rodriguez,
Austen Gregerson,
Ava Spataru,
Baptiste Roziere
, et al. (536 additional authors not shown)
Abstract:
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical…
▽ More
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
△ Less
Submitted 23 November, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Transport of Algebraic Structure to Latent Embeddings
Authors:
Samuel Pfrommer,
Brendon G. Anderson,
Somayeh Sojoudi
Abstract:
Machine learning often aims to produce latent embeddings of inputs which lie in a larger, abstract mathematical space. For example, in the field of 3D modeling, subsets of Euclidean space can be embedded as vectors using implicit neural representations. Such subsets also have a natural algebraic structure including operations (e.g., union) and corresponding laws (e.g., associativity). How can we l…
▽ More
Machine learning often aims to produce latent embeddings of inputs which lie in a larger, abstract mathematical space. For example, in the field of 3D modeling, subsets of Euclidean space can be embedded as vectors using implicit neural representations. Such subsets also have a natural algebraic structure including operations (e.g., union) and corresponding laws (e.g., associativity). How can we learn to "union" two sets using only their latent embeddings while respecting associativity? We propose a general procedure for parameterizing latent space operations that are provably consistent with the laws on the input space. This is achieved by learning a bijection from the latent space to a carefully designed mirrored algebra which is constructed on Euclidean space in accordance with desired laws. We evaluate these structural transport nets for a range of mirrored algebras against baselines that operate directly on the latent space. Our experiments provide strong evidence that respecting the underlying algebraic structure of the input space is key for learning accurate and self-consistent operations.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation
Authors:
Jan Laukemann,
Ahmed E. Helal,
S. Isaac Geronimo Anderson,
Fabio Checconi,
Yongseok Soh,
Jesmin Jahan Tithi,
Teresa Ranadive,
Brian J Gravelle,
Fabrizio Petrini,
Jee Choi
Abstract:
High-dimensional sparse data emerge in many critical application domains such as healthcare and cybersecurity. To extract meaningful insights from massive volumes of these multi-dimensional data, scientists employ unsupervised analysis tools based on tensor decomposition (TD) methods. However, real-world sparse tensors exhibit highly irregular shapes and data distributions, which pose significant…
▽ More
High-dimensional sparse data emerge in many critical application domains such as healthcare and cybersecurity. To extract meaningful insights from massive volumes of these multi-dimensional data, scientists employ unsupervised analysis tools based on tensor decomposition (TD) methods. However, real-world sparse tensors exhibit highly irregular shapes and data distributions, which pose significant challenges for making efficient use of modern parallel processors. This study breaks the prevailing assumption that compressing sparse tensors into coarse-grained structures or along a particular dimension/mode is more efficient than keeping them in a fine-grained, mode-agnostic form. Our novel sparse tensor representation, Adaptive Linearized Tensor Order (ALTO), encodes tensors in a compact format that can be easily streamed from memory and is amenable to both caching and parallel execution. In contrast to existing compressed tensor formats, ALTO constructs one tensor copy that is agnostic to both the mode orientation and the irregular distribution of nonzero elements. To demonstrate the efficacy of ALTO, we propose a set of parallel TD algorithms that exploit the inherent data reuse of tensor computations to substantially reduce synchronization overhead, decrease memory footprint, and improve parallel performance. Additionally, we characterize the major execution bottlenecks of TD methods on the latest Intel Xeon Scalable processors and introduce dynamic adaptation heuristics to automatically select the best algorithm based on the sparse tensor characteristics. Across a diverse set of real-world data sets, ALTO outperforms the state-of-the-art approaches, achieving more than an order-of-magnitude speedup over the best mode-agnostic formats. Compared to the best mode-specific formats, ALTO achieves 5.1X geometric mean speedup at a fraction (25%) of their storage costs.
△ Less
Submitted 14 March, 2025; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Evolutionary Games on Infinite Strategy Sets: Convergence to Nash Equilibria via Dissipativity
Authors:
Brendon G. Anderson,
Jingqi Li,
Somayeh Sojoudi,
Murat Arcak
Abstract:
We consider evolutionary dynamics for population games in which players have a continuum of strategies at their disposal. Models in this setting amount to infinite-dimensional differential equations evolving on the manifold of probability measures. We generalize dissipativity theory for evolutionary games from finite to infinite strategy sets that are compact metric spaces, and derive sufficient c…
▽ More
We consider evolutionary dynamics for population games in which players have a continuum of strategies at their disposal. Models in this setting amount to infinite-dimensional differential equations evolving on the manifold of probability measures. We generalize dissipativity theory for evolutionary games from finite to infinite strategy sets that are compact metric spaces, and derive sufficient conditions for the stability of Nash equilibria under the infinite-dimensional dynamics. The resulting analysis is applicable to a broad class of evolutionary games, and is modular in the sense that the pertinent conditions on the dynamics and the game's payoff structure can be verified independently. By specializing our theory to the class of monotone games, we recover as special cases existing stability results for the Brown-von Neumann-Nash and impartial pairwise comparison dynamics. We also extend our theory to models with dynamic payoffs, further broadening the applicability of our framework. Throughout our analyses, we identify and elaborate on new technical conditions that are key in extending dissipativity theory from finite to infinite strategy sets, such as compactness of the set of Nash equilibria and evolution of dynamic payoffs within a compact positively invariant set. We illustrate our theory using a variety of case studies, including a novel, continuous variant of the war of attrition game.
△ Less
Submitted 22 April, 2025; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off
Authors:
Yatong Bai,
Brendon G. Anderson,
Somayeh Sojoudi
Abstract:
Deep neural classifiers have recently found tremendous success in data-driven control systems. However, existing models suffer from a trade-off between accuracy and adversarial robustness. This limitation must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we develop classifiers that simultaneously inherit…
▽ More
Deep neural classifiers have recently found tremendous success in data-driven control systems. However, existing models suffer from a trade-off between accuracy and adversarial robustness. This limitation must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we develop classifiers that simultaneously inherit high robustness from robust models and high accuracy from standard models. Specifically, we propose a theoretically motivated formulation that mixes the output probabilities of a standard neural network and a robust neural network. Both base classifiers are pre-trained, and thus our method does not require additional training. Our numerical experiments verify that the mixed classifier noticeably improves the accuracy-robustness trade-off and identify the confidence property of the robust base classifier as the key leverage of this more benign trade-off. Our theoretical results prove that under mild assumptions, when the robustness of the robust base model is certifiable, no alteration or attack within a closed-form $\ell_p$ radius on an input can result in the misclassification of the mixed classifier.
△ Less
Submitted 3 June, 2024; v1 submitted 25 November, 2023;
originally announced November 2023.
-
Computing Sparse Tensor Decompositions via Chapel and C++/MPI Interoperability without Intermediate I/O
Authors:
S. Isaac Geronimo Anderson,
Daniel M. Dunlavy
Abstract:
We extend an existing approach for efficient use of shared mapped memory across Chapel and C++ for graph data stored as 1-D arrays to sparse tensor data stored using a combination of 2-D and 1-D arrays. We describe the specific extensions that provide use of shared mapped memory tensor data for a particular C++ tensor decomposition tool called GentenMPI. We then demonstrate our approach on several…
▽ More
We extend an existing approach for efficient use of shared mapped memory across Chapel and C++ for graph data stored as 1-D arrays to sparse tensor data stored using a combination of 2-D and 1-D arrays. We describe the specific extensions that provide use of shared mapped memory tensor data for a particular C++ tensor decomposition tool called GentenMPI. We then demonstrate our approach on several real-world datasets, providing timing results that illustrate minimal overhead incurred using this approach. Finally, we extend our work to improve memory usage and provide convenient random access to sparse shared mapped memory tensor elements in Chapel, while still being capable of leveraging high performance implementations of tensor algorithms in C++.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Tight Certified Robustness via Min-Max Representations of ReLU Neural Networks
Authors:
Brendon G. Anderson,
Samuel Pfrommer,
Somayeh Sojoudi
Abstract:
The reliable deployment of neural networks in control systems requires rigorous robustness guarantees. In this paper, we obtain tight robustness certificates over convex attack sets for min-max representations of ReLU neural networks by developing a convex reformulation of the nonconvex certification problem. This is done by "lifting" the problem to an infinite-dimensional optimization over probab…
▽ More
The reliable deployment of neural networks in control systems requires rigorous robustness guarantees. In this paper, we obtain tight robustness certificates over convex attack sets for min-max representations of ReLU neural networks by developing a convex reformulation of the nonconvex certification problem. This is done by "lifting" the problem to an infinite-dimensional optimization over probability measures, leveraging recent results in distributionally robust optimization to solve for an optimal discrete distribution, and proving that solutions of the original nonconvex problem are generated by the discrete distribution under mild boundedness, nonredundancy, and Slater conditions. As a consequence, optimal (worst-case) attacks against the model may be solved for exactly. This contrasts prior state-of-the-art that either requires expensive branch-and-bound schemes or loose relaxation techniques. Experiments on robust control and MNIST image classification examples highlight the benefits of our approach.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Projected Randomized Smoothing for Certified Adversarial Robustness
Authors:
Samuel Pfrommer,
Brendon G. Anderson,
Somayeh Sojoudi
Abstract:
Randomized smoothing is the current state-of-the-art method for producing provably robust classifiers. While randomized smoothing typically yields robust $\ell_2$-ball certificates, recent research has generalized provable robustness to different norm balls as well as anisotropic regions. This work considers a classifier architecture that first projects onto a low-dimensional approximation of the…
▽ More
Randomized smoothing is the current state-of-the-art method for producing provably robust classifiers. While randomized smoothing typically yields robust $\ell_2$-ball certificates, recent research has generalized provable robustness to different norm balls as well as anisotropic regions. This work considers a classifier architecture that first projects onto a low-dimensional approximation of the data manifold and then applies a standard classifier. By performing randomized smoothing in the low-dimensional projected space, we characterize the certified region of our smoothed composite classifier back in the high-dimensional input space and prove a tractable lower bound on its volume. We show experimentally on CIFAR-10 and SVHN that classifiers without the initial projection are vulnerable to perturbations that are normal to the data manifold and yet are captured by the certified regions of our method. We compare the volume of our certified regions against various baselines and show that our method improves on the state-of-the-art by many orders of magnitude.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Analyzing the Performance Portability of Tensor Decomposition
Authors:
S. Isaac Geronimo Anderson,
Keita Teranishi,
Daniel M. Dunlavy,
Jee Choi
Abstract:
We employ pressure point analysis and roofline modeling to identify performance bottlenecks and determine an upper bound on the performance of the Canonical Polyadic Alternating Poisson Regression Multiplicative Update (CP-APR MU) algorithm in the SparTen software library. Our analyses reveal that a particular matrix computation, $Φ^{(n)}$, is the critical performance bottleneck in the SparTen CP-…
▽ More
We employ pressure point analysis and roofline modeling to identify performance bottlenecks and determine an upper bound on the performance of the Canonical Polyadic Alternating Poisson Regression Multiplicative Update (CP-APR MU) algorithm in the SparTen software library. Our analyses reveal that a particular matrix computation, $Φ^{(n)}$, is the critical performance bottleneck in the SparTen CP-APR MU implementation. Moreover, we find that atomic operations are not a critical bottleneck while higher cache reuse can provide a non-trivial performance improvement. We also utilize grid search on the Kokkos library parallel policy parameters to achieve 2.25x average speedup over the SparTen default for $Φ^{(n)}$ computation on CPU and 1.70x on GPU. We conclude our investigations by comparing Kokkos implementations of the STREAM benchmark and the matricized tensor times Khatri-Rao product (MTTKRP) benchmark from the Parallel Sparse Tensor Algorithm (PASTA) benchmark suite to implementations using vendor libraries. We show that with a single implementation Kokkos achieves performance comparable to hand-tuned code for fundamental operations that make up tensor decomposition kernels on a wide range of CPU and GPU systems. Overall, we conclude that Kokkos demonstrates good performance portability for simple data-intensive operations but requires tuning for algorithms with more complex dependencies and data access patterns.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Optical Transformers
Authors:
Maxwell G. Anderson,
Shi-Yuan Ma,
Tianyu Wang,
Logan G. Wright,
Peter L. McMahon
Abstract:
The rapidly increasing size of deep-learning models has caused renewed and growing interest in alternatives to digital computers to dramatically reduce the energy cost of running state-of-the-art neural networks. Optical matrix-vector multipliers are best suited to performing computations with very large operands, which suggests that large Transformer models could be a good target for optical comp…
▽ More
The rapidly increasing size of deep-learning models has caused renewed and growing interest in alternatives to digital computers to dramatically reduce the energy cost of running state-of-the-art neural networks. Optical matrix-vector multipliers are best suited to performing computations with very large operands, which suggests that large Transformer models could be a good target for optical computing. To test this idea, we performed small-scale optical experiments with a prototype accelerator to demonstrate that Transformer operations can run on optical hardware despite noise and errors. Using simulations, validated by our experiments, we then explored the energy efficiency of optical implementations of Transformers and identified scaling laws for model performance with respect to optical energy usage. We found that the optical energy per multiply-accumulate (MAC) scales as $\frac{1}{d}$ where $d$ is the Transformer width, an asymptotic advantage over digital systems. We conclude that with well-engineered, large-scale optical hardware, it may be possible to achieve a $100 \times$ energy-efficiency advantage for running some of the largest current Transformer models, and that if both the models and the optical hardware are scaled to the quadrillion-parameter regime, optical computers could have a $>8,000\times$ energy-efficiency advantage over state-of-the-art digital-electronic processors that achieve 300 fJ/MAC. We analyzed how these results motivate and inform the construction of future optical accelerators along with optics-amenable deep-learning approaches. With assumptions about future improvements to electronics and Transformer quantization techniques (5$\times$ cheaper memory access, double the digital--analog conversion efficiency, and 4-bit precision), we estimated that optical computers' advantage against current 300-fJ/MAC digital processors could grow to $>100,000\times$.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
Asymmetric Certified Robustness via Feature-Convex Neural Networks
Authors:
Samuel Pfrommer,
Brendon G. Anderson,
Julien Piet,
Somayeh Sojoudi
Abstract:
Recent works have introduced input-convex neural networks (ICNNs) as learning models with advantageous training, inference, and generalization properties linked to their convex structure. In this paper, we propose a novel feature-convex neural network architecture as the composition of an ICNN with a Lipschitz feature map in order to achieve adversarial robustness. We consider the asymmetric binar…
▽ More
Recent works have introduced input-convex neural networks (ICNNs) as learning models with advantageous training, inference, and generalization properties linked to their convex structure. In this paper, we propose a novel feature-convex neural network architecture as the composition of an ICNN with a Lipschitz feature map in order to achieve adversarial robustness. We consider the asymmetric binary classification setting with one "sensitive" class, and for this class we prove deterministic, closed-form, and easily-computable certified robust radii for arbitrary $\ell_p$-norms. We theoretically justify the use of these models by characterizing their decision region geometry, extending the universal approximation theorem for ICNN regression to the classification setting, and proving a lower bound on the probability that such models perfectly fit even unstructured uniformly distributed data in sufficiently high dimensions. Experiments on Malimg malware classification and subsets of MNIST, Fashion-MNIST, and CIFAR-10 datasets show that feature-convex classifiers attain state-of-the-art certified $\ell_1$-radii as well as substantial $\ell_2$- and $\ell_{\infty}$-radii while being far more computationally efficient than any competitive baseline.
△ Less
Submitted 10 October, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
Improving the Accuracy-Robustness Trade-Off of Classifiers via Adaptive Smoothing
Authors:
Yatong Bai,
Brendon G. Anderson,
Aerin Kim,
Somayeh Sojoudi
Abstract:
While prior research has proposed a plethora of methods that build neural classifiers robust against adversarial robustness, practitioners are still reluctant to adopt them due to their unacceptably severe clean accuracy penalties. This paper significantly alleviates this accuracy-robustness trade-off by mixing the output probabilities of a standard classifier and a robust classifier, where the st…
▽ More
While prior research has proposed a plethora of methods that build neural classifiers robust against adversarial robustness, practitioners are still reluctant to adopt them due to their unacceptably severe clean accuracy penalties. This paper significantly alleviates this accuracy-robustness trade-off by mixing the output probabilities of a standard classifier and a robust classifier, where the standard network is optimized for clean accuracy and is not robust in general. We show that the robust base classifier's confidence difference for correct and incorrect examples is the key to this improvement. In addition to providing intuitions and empirical evidence, we theoretically certify the robustness of the mixed classifier under realistic assumptions. Furthermore, we adapt an adversarial input detector into a mixing network that adaptively adjusts the mixture of the two base models, further reducing the accuracy penalty of achieving robustness. The proposed flexible method, termed "adaptive smoothing", can work in conjunction with existing or even future methods that improve clean accuracy, robustness, or adversary detection. Our empirical evaluation considers strong attack methods, including AutoAttack and adaptive attack. On the CIFAR-100 dataset, our method achieves an 85.21% clean accuracy while maintaining a 38.72% $\ell_\infty$-AutoAttacked ($ε= 8/255$) accuracy, becoming the second most robust method on the RobustBench CIFAR-100 benchmark as of submission, while improving the clean accuracy by ten percentage points compared with all listed models. The code that implements our method is available at https://github.com/Bai-YT/AdaptiveSmoothing.
△ Less
Submitted 21 July, 2024; v1 submitted 29 January, 2023;
originally announced January 2023.
-
Certifiably Robust Reinforcement Learning through Model-Based Abstract Interpretation
Authors:
Chenxi Yang,
Greg Anderson,
Swarat Chaudhuri
Abstract:
We present a reinforcement learning (RL) framework in which the learned policy comes with a machine-checkable certificate of provable adversarial robustness. Our approach, called CAROL, learns a model of the environment. In each learning iteration, it uses the current version of this model and an external abstract interpreter to construct a differentiable signal for provable robustness. This signa…
▽ More
We present a reinforcement learning (RL) framework in which the learned policy comes with a machine-checkable certificate of provable adversarial robustness. Our approach, called CAROL, learns a model of the environment. In each learning iteration, it uses the current version of this model and an external abstract interpreter to construct a differentiable signal for provable robustness. This signal is used to guide learning, and the abstract interpretation used to construct it directly leads to the robustness certificate returned at convergence. We give a theoretical analysis that bounds the worst-case accumulative reward of CAROL. We also experimentally evaluate CAROL on four MuJoCo environments with continuous state and action spaces. On these tasks, CAROL learns policies that, when contrasted with policies from the state-of-the-art robust RL algorithms, exhibit: (i) markedly enhanced certified performance lower bounds; and (ii) comparable performance under empirical adversarial attacks.
△ Less
Submitted 26 May, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
Guiding Safe Exploration with Weakest Preconditions
Authors:
Greg Anderson,
Swarat Chaudhuri,
Isil Dillig
Abstract:
In reinforcement learning for safety-critical settings, it is often desirable for the agent to obey safety constraints at all points in time, including during training. We present a novel neurosymbolic approach called SPICE to solve this safe exploration problem. SPICE uses an online shielding layer based on symbolic weakest preconditions to achieve a more precise safety analysis than existing too…
▽ More
In reinforcement learning for safety-critical settings, it is often desirable for the agent to obey safety constraints at all points in time, including during training. We present a novel neurosymbolic approach called SPICE to solve this safe exploration problem. SPICE uses an online shielding layer based on symbolic weakest preconditions to achieve a more precise safety analysis than existing tools without unduly impacting the training process. We evaluate the approach on a suite of continuous control benchmarks and show that it can achieve comparable performance to existing safe learning techniques while incurring fewer safety violations. Additionally, we present theoretical results showing that SPICE converges to the optimal safe policy under reasonable assumptions.
△ Less
Submitted 27 February, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
An Overview and Prospective Outlook on Robust Training and Certification of Machine Learning Models
Authors:
Brendon G. Anderson,
Tanmay Gautam,
Somayeh Sojoudi
Abstract:
In this discussion paper, we survey recent research surrounding robustness of machine learning models. As learning algorithms become increasingly more popular in data-driven control systems, their robustness to data uncertainty must be ensured in order to maintain reliable safety-critical operations. We begin by reviewing common formalisms for such robustness, and then move on to discuss popular a…
▽ More
In this discussion paper, we survey recent research surrounding robustness of machine learning models. As learning algorithms become increasingly more popular in data-driven control systems, their robustness to data uncertainty must be ensured in order to maintain reliable safety-critical operations. We begin by reviewing common formalisms for such robustness, and then move on to discuss popular and state-of-the-art techniques for training robust machine learning models as well as methods for provably certifying such robustness. From this unification of robust machine learning, we identify and discuss pressing directions for future research in the area.
△ Less
Submitted 27 September, 2022; v1 submitted 15 August, 2022;
originally announced August 2022.
-
Image sensing with multilayer, nonlinear optical neural networks
Authors:
Tianyu Wang,
Mandar M. Sohoni,
Logan G. Wright,
Martin M. Stein,
Shi-Yuan Ma,
Tatsuhiro Onodera,
Maxwell G. Anderson,
Peter L. McMahon
Abstract:
Optical imaging is commonly used for both scientific and technological applications across industry and academia. In image sensing, a measurement, such as of an object's position, is performed by computational analysis of a digitized image. An emerging image-sensing paradigm breaks this delineation between data collection and analysis by designing optical components to perform not imaging, but enc…
▽ More
Optical imaging is commonly used for both scientific and technological applications across industry and academia. In image sensing, a measurement, such as of an object's position, is performed by computational analysis of a digitized image. An emerging image-sensing paradigm breaks this delineation between data collection and analysis by designing optical components to perform not imaging, but encoding. By optically encoding images into a compressed, low-dimensional latent space suitable for efficient post-analysis, these image sensors can operate with fewer pixels and fewer photons, allowing higher-throughput, lower-latency operation. Optical neural networks (ONNs) offer a platform for processing data in the analog, optical domain. ONN-based sensors have however been limited to linear processing, but nonlinearity is a prerequisite for depth, and multilayer NNs significantly outperform shallow NNs on many tasks. Here, we realize a multilayer ONN pre-processor for image sensing, using a commercial image intensifier as a parallel optoelectronic, optical-to-optical nonlinear activation function. We demonstrate that the nonlinear ONN pre-processor can achieve compression ratios of up to 800:1 while still enabling high accuracy across several representative computer-vision tasks, including machine-vision benchmarks, flow-cytometry image classification, and identification of objects in real scenes. In all cases we find that the ONN's nonlinearity and depth allowed it to outperform a purely linear ONN encoder. Although our experiments are specialized to ONN sensors for incoherent-light images, alternative ONN platforms should facilitate a range of ONN sensors. These ONN sensors may surpass conventional sensors by pre-processing optical information in spatial, temporal, and/or spectral dimensions, potentially with coherent and quantum qualities, all natively in the optical domain.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
Node-Variant Graph Filters in Graph Neural Networks
Authors:
Fernando Gama,
Brendon G. Anderson,
Somayeh Sojoudi
Abstract:
Graph neural networks (GNNs) have been successfully employed in a myriad of applications involving graph signals. Theoretical findings establish that GNNs use nonlinear activation functions to create low-eigenvalue frequency content that can be processed in a stable manner by subsequent graph convolutional filters. However, the exact shape of the frequency content created by nonlinear functions is…
▽ More
Graph neural networks (GNNs) have been successfully employed in a myriad of applications involving graph signals. Theoretical findings establish that GNNs use nonlinear activation functions to create low-eigenvalue frequency content that can be processed in a stable manner by subsequent graph convolutional filters. However, the exact shape of the frequency content created by nonlinear functions is not known and cannot be learned. In this work, we use node-variant graph filters (NVGFs) -- which are linear filters capable of creating frequencies -- as a means of investigating the role that frequency creation plays in GNNs. We show that, by replacing nonlinear activation functions by NVGFs, frequency creation mechanisms can be designed or learned. By doing so, the role of frequency creation is separated from the nonlinear nature of traditional GNNs. Simulations on graph signal processing problems are carried out to pinpoint the role of frequency creation.
△ Less
Submitted 4 March, 2022; v1 submitted 31 May, 2021;
originally announced June 2021.
-
ELF-VC: Efficient Learned Flexible-Rate Video Coding
Authors:
Oren Rippel,
Alexander G. Anderson,
Kedar Tatwawadi,
Sanjay Nair,
Craig Lytle,
Lubomir Bourdev
Abstract:
While learned video codecs have demonstrated great promise, they have yet to achieve sufficient efficiency for practical deployment. In this work, we propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode (I- and P-frames only) along with a considerable increase in computational efficiency. In this setting, for natural videos our app…
▽ More
While learned video codecs have demonstrated great promise, they have yet to achieve sufficient efficiency for practical deployment. In this work, we propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode (I- and P-frames only) along with a considerable increase in computational efficiency. In this setting, for natural videos our approach compares favorably across the entire R-D curve under metrics PSNR, MS-SSIM and VMAF against all mainstream video standards (H.264, H.265, AV1) and all ML codecs. At the same time, our approach runs at least 5x faster and has fewer parameters than all ML codecs which report these figures.
Our contributions include a flexible-rate framework allowing a single model to cover a large and dense range of bitrates, at a negligible increase in computation and parameter count; an efficient backbone optimized for ML-based codecs; and a novel in-loop flow prediction scheme which leverages prior information towards more efficient compression.
We benchmark our method, which we call ELF-VC (Efficient, Learned and Flexible Video Coding) on popular video test sets UVG and MCL-JCV under metrics PSNR, MS-SSIM and VMAF. For example, on UVG under PSNR, it reduces the BD-rate by 44% against H.264, 26% against H.265, 15% against AV1, and 35% against the current best ML codec. At the same time, on an NVIDIA Titan V GPU our approach encodes/decodes VGA at 49/91 FPS, HD 720 at 19/35 FPS, and HD 1080 at 10/18 FPS.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Assessing the Acceptability of a Humanoid Robot for Alzheimer's Disease and Related Dementia Care Using an Online Survey
Authors:
Fengpei Yuan,
Joel G. Anderson,
Tami Wyatt,
Ruth Palan Lopez,
Monica Crane,
Austin Montgomery,
Xiaopeng Zhao
Abstract:
In this work, an online survey was used to understand the acceptability of humanoid robots and users' needs in using these robots to assist with care among people with Alzheimer's disease and related dementias (ADRD), their family caregivers, health care professionals, and the general public. From November 12, 2020 to March 13, 2021, a total of 631 complete responses were collected, including 80 r…
▽ More
In this work, an online survey was used to understand the acceptability of humanoid robots and users' needs in using these robots to assist with care among people with Alzheimer's disease and related dementias (ADRD), their family caregivers, health care professionals, and the general public. From November 12, 2020 to March 13, 2021, a total of 631 complete responses were collected, including 80 responses from people with mild cognitive impairment or ADRD, 245 responses from caregivers and health care professionals, and 306 responses from the general public. Overall, people with ADRD, caregivers, and the general public showed positive attitudes towards using the robot to assist with care for people with ADRD. The top three functions of robots required by the group of people with ADRD were reminders to take medicine, emergency call service, and helping contact medical services. Additional comments, suggestions, and concerns provided by caregivers and the general public are also discussed.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.
-
Suppressing simulation bias using multi-modal data
Authors:
Bogdan Kustowski,
Jim A. Gaffney,
Brian K. Spears,
Gemma J. Anderson,
Rushil Anirudh,
Peer-Timo Bremer,
Jayaraman J. Thiagarajan,
Michael K. G. Kruse,
Ryan C. Nora
Abstract:
Many problems in science and engineering require making predictions based on few observations. To build a robust predictive model, these sparse data may need to be augmented with simulated data, especially when the design space is multi-dimensional. Simulations, however, often suffer from an inherent bias. Estimation of this bias may be poorly constrained not only because of data sparsity, but als…
▽ More
Many problems in science and engineering require making predictions based on few observations. To build a robust predictive model, these sparse data may need to be augmented with simulated data, especially when the design space is multi-dimensional. Simulations, however, often suffer from an inherent bias. Estimation of this bias may be poorly constrained not only because of data sparsity, but also because traditional predictive models fit only one type of observed outputs, such as scalars or images, instead of all available output data modalities, which might have been acquired and simulated at great cost. To break this limitation and open up the path for multi-modal calibration, we propose to combine a novel, transfer learning technique for suppressing the bias with recent developments in deep learning, which allow building predictive models with multi-modal outputs. First, we train an initial neural network model on simulated data to learn important correlations between different output modalities and between simulation inputs and outputs. Then, the model is partially retrained, or transfer learned, to fit the experiments; a method that has never been implemented in this type of architecture. Using fewer than 10 inertial confinement fusion experiments for training, transfer learning systematically improves the simulation predictions while a simple output calibration, which we design as a baseline, makes the predictions worse. We also offer extensive cross-validation with real and carefully designed synthetic data. The method described in this paper can be applied to a wide range of problems that require transferring knowledge from simulations to the domain of experiments.
△ Less
Submitted 15 March, 2022; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Towards Optimal Branching of Linear and Semidefinite Relaxations for Neural Network Robustness Certification
Authors:
Brendon G. Anderson,
Ziye Ma,
Jingqi Li,
Somayeh Sojoudi
Abstract:
In this paper, we study certifying the robustness of ReLU neural networks against adversarial input perturbations. To diminish the relaxation error suffered by the popular linear programming (LP) and semidefinite programming (SDP) certification methods, we take a branch-and-bound approach to propose partitioning the input uncertainty set and solving the relaxations on each part separately. We show…
▽ More
In this paper, we study certifying the robustness of ReLU neural networks against adversarial input perturbations. To diminish the relaxation error suffered by the popular linear programming (LP) and semidefinite programming (SDP) certification methods, we take a branch-and-bound approach to propose partitioning the input uncertainty set and solving the relaxations on each part separately. We show that this approach reduces relaxation error, and that the error is eliminated entirely upon performing an LP relaxation with a partition intelligently designed to exploit the nature of the ReLU activations. To scale this approach to large networks, we consider using a coarser partition whereby the number of parts in the partition is reduced. We prove that computing such a coarse partition that directly minimizes the LP relaxation error is NP-hard. By instead minimizing the worst-case LP relaxation error, we develop a closed-form branching scheme in the single-hidden layer case. We extend the analysis to the SDP, where the feasible set geometry is exploited to design a branching scheme that minimizes the worst-case SDP relaxation error. Experiments on MNIST, CIFAR-10, and Wisconsin breast cancer diagnosis classifiers demonstrate significant increases in the percentages of test samples certified. By independently increasing the input size and the number of layers, we empirically illustrate under which regimes the branched LP and branched SDP are best applied. Finally, we extend our LP branching method into a multi-layer branching heuristic, which attains comparable performance to prior state-of-the-art heuristics on large-scale, deep neural network certification benchmarks.
△ Less
Submitted 10 May, 2025; v1 submitted 22 January, 2021;
originally announced January 2021.
-
Meaningful uncertainties from deep neural network surrogates of large-scale numerical simulations
Authors:
Gemma J. Anderson,
Jim A. Gaffney,
Brian K. Spears,
Peer-Timo Bremer,
Rushil Anirudh,
Jayaraman J. Thiagarajan
Abstract:
Large-scale numerical simulations are used across many scientific disciplines to facilitate experimental development and provide insights into underlying physical processes, but they come with a significant computational cost. Deep neural networks (DNNs) can serve as highly-accurate surrogate models, with the capacity to handle diverse datatypes, offering tremendous speed-ups for prediction and ma…
▽ More
Large-scale numerical simulations are used across many scientific disciplines to facilitate experimental development and provide insights into underlying physical processes, but they come with a significant computational cost. Deep neural networks (DNNs) can serve as highly-accurate surrogate models, with the capacity to handle diverse datatypes, offering tremendous speed-ups for prediction and many other downstream tasks. An important use-case for these surrogates is the comparison between simulations and experiments; prediction uncertainty estimates are crucial for making such comparisons meaningful, yet standard DNNs do not provide them. In this work we define the fundamental requirements for a DNN to be useful for scientific applications, and demonstrate a general variational inference approach to equip predictions of scalar and image data from a DNN surrogate model trained on inertial confinement fusion simulations with calibrated Bayesian uncertainties. Critically, these uncertainties are interpretable, meaningful and preserve physics-correlations in the predicted quantities.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
Certifying Neural Network Robustness to Random Input Noise from Samples
Authors:
Brendon G. Anderson,
Somayeh Sojoudi
Abstract:
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial input uncertainty, but researchers have recently shown a need for methods that consider random uncertainty. In this paper, we propose a novel robustness certification method that upper bounds the p…
▽ More
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial input uncertainty, but researchers have recently shown a need for methods that consider random uncertainty. In this paper, we propose a novel robustness certification method that upper bounds the probability of misclassification when the input noise follows an arbitrary probability distribution. This bound is cast as a chance-constrained optimization problem, which is then reformulated using input-output samples to replace the optimization constraints. The resulting optimization reduces to a linear program with an analytical solution. Furthermore, we develop a sufficient condition on the number of samples needed to make the misclassification bound hold with overwhelming probability. Our case studies on MNIST classifiers show that this method is able to certify a uniform infinity-norm uncertainty region with a radius of nearly 50 times larger than what the current state-of-the-art method can certify.
△ Less
Submitted 25 January, 2023; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Data-Driven Certification of Neural Networks with Random Input Noise
Authors:
Brendon G. Anderson,
Somayeh Sojoudi
Abstract:
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial or worst-case inputs, but researchers have recently shown a need for methods that consider random input noise. In this paper, we examine the setting where inputs are subject to random noise coming…
▽ More
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial or worst-case inputs, but researchers have recently shown a need for methods that consider random input noise. In this paper, we examine the setting where inputs are subject to random noise coming from an arbitrary probability distribution. We propose a robustness certification method that lower-bounds the probability that network outputs are safe. This bound is cast as a chance-constrained optimization problem, which is then reformulated using input-output samples to make the optimization constraints tractable. We develop sufficient conditions for the resulting optimization to be convex, as well as on the number of samples needed to make the robustness bound hold with overwhelming probability. We show for a special case that the proposed optimization reduces to an intuitive closed-form solution. Case studies on synthetic, MNIST, and CIFAR-10 networks experimentally demonstrate that this method is able to certify robustness against various input noise regimes over larger uncertainty regions than prior state-of-the-art techniques.
△ Less
Submitted 25 January, 2023; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Neurosymbolic Reinforcement Learning with Formally Verified Exploration
Authors:
Greg Anderson,
Abhinav Verma,
Isil Dillig,
Swarat Chaudhuri
Abstract:
We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a…
▽ More
We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that Revel enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration.
△ Less
Submitted 26 October, 2020; v1 submitted 26 September, 2020;
originally announced September 2020.
-
Designing Accurate Emulators for Scientific Processes using Calibration-Driven Deep Models
Authors:
Jayaraman J. Thiagarajan,
Bindya Venkatesh,
Rushil Anirudh,
Peer-Timo Bremer,
Jim Gaffney,
Gemma Anderson,
Brian Spears
Abstract:
Predictive models that accurately emulate complex scientific processes can achieve exponential speed-ups over numerical simulators or experiments, and at the same time provide surrogates for improving the subsequent analysis. Consequently, there is a recent surge in utilizing modern machine learning (ML) methods, such as deep neural networks, to build data-driven emulators. While the majority of e…
▽ More
Predictive models that accurately emulate complex scientific processes can achieve exponential speed-ups over numerical simulators or experiments, and at the same time provide surrogates for improving the subsequent analysis. Consequently, there is a recent surge in utilizing modern machine learning (ML) methods, such as deep neural networks, to build data-driven emulators. While the majority of existing efforts has focused on tailoring off-the-shelf ML solutions to better suit the scientific problem at hand, we study an often overlooked, yet important, problem of choosing loss functions to measure the discrepancy between observed data and the predictions from a model. Due to lack of better priors on the expected residual structure, in practice, simple choices such as the mean squared error and the mean absolute error are made. However, the inherent symmetric noise assumption made by these loss functions makes them inappropriate in cases where the data is heterogeneous or when the noise distribution is asymmetric. We propose Learn-by-Calibrating (LbC), a novel deep learning approach based on interval calibration for designing emulators in scientific applications, that are effective even with heterogeneous data and are robust to outliers. Using a large suite of use-cases, we show that LbC provides significant improvements in generalization error over widely-adopted loss function choices, achieves high-quality emulators even in small data regimes and more importantly, recovers the inherent noise structure without any explicit priors.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
Tightened Convex Relaxations for Neural Network Robustness Certification
Authors:
Brendon G. Anderson,
Ziye Ma,
Jingqi Li,
Somayeh Sojoudi
Abstract:
In this paper, we consider the problem of certifying the robustness of neural networks to perturbed and adversarial input data. Such certification is imperative for the application of neural networks in safety-critical decision-making and control systems. Certification techniques using convex optimization have been proposed, but they often suffer from relaxation errors that void the certificate. O…
▽ More
In this paper, we consider the problem of certifying the robustness of neural networks to perturbed and adversarial input data. Such certification is imperative for the application of neural networks in safety-critical decision-making and control systems. Certification techniques using convex optimization have been proposed, but they often suffer from relaxation errors that void the certificate. Our work exploits the structure of ReLU networks to improve relaxation errors through a novel partition-based certification procedure. The proposed method is proven to tighten existing linear programming relaxations, and asymptotically achieves zero relaxation error as the partition is made finer. We develop a finite partition that attains zero relaxation error and use the result to derive a tractable partitioning scheme that minimizes the worst-case relaxation error. Experiments using real data show that the partitioning procedure is able to issue robustness certificates in cases where prior methods fail. Consequently, partition-based certification procedures are found to provide an intuitive, effective, and theoretically justified method for tightening existing convex relaxation techniques.
△ Less
Submitted 17 September, 2020; v1 submitted 1 April, 2020;
originally announced April 2020.
-
Global Optimality Guarantees for Nonconvex Unsupervised Video Segmentation
Authors:
Brendon G. Anderson,
Somayeh Sojoudi
Abstract:
In this paper, we consider the problem of unsupervised video object segmentation via background subtraction. Specifically, we pose the nonsemantic extraction of a video's moving objects as a nonconvex optimization problem via a sum of sparse and low-rank matrices. The resulting formulation, a nonnegative variant of robust principal component analysis, is more computationally tractable than its com…
▽ More
In this paper, we consider the problem of unsupervised video object segmentation via background subtraction. Specifically, we pose the nonsemantic extraction of a video's moving objects as a nonconvex optimization problem via a sum of sparse and low-rank matrices. The resulting formulation, a nonnegative variant of robust principal component analysis, is more computationally tractable than its commonly employed convex relaxation, although not generally solvable to global optimality. In spite of this limitation, we derive intuitive and interpretable conditions on the video data under which the uniqueness and global optimality of the object segmentation are guaranteed using local search methods. We illustrate these novel optimality criteria through example segmentations using real video data.
△ Less
Submitted 22 February, 2020; v1 submitted 9 July, 2019;
originally announced July 2019.
-
Optimization and Abstraction: A Synergistic Approach for Analyzing Neural Network Robustness
Authors:
Greg Anderson,
Shankara Pailoor,
Isil Dillig,
Swarat Chaudhuri
Abstract:
In recent years, the notion of local robustness (or robustness for short) has emerged as a desirable property of deep neural networks. Intuitively, robustness means that small perturbations to an input do not cause the network to perform misclassifications. In this paper, we present a novel algorithm for verifying robustness properties of neural networks. Our method synergistically combines gradie…
▽ More
In recent years, the notion of local robustness (or robustness for short) has emerged as a desirable property of deep neural networks. Intuitively, robustness means that small perturbations to an input do not cause the network to perform misclassifications. In this paper, we present a novel algorithm for verifying robustness properties of neural networks. Our method synergistically combines gradient-based optimization methods for counterexample search with abstraction-based proof search to obtain a sound and (δ-)complete decision procedure. Our method also employs a data-driven approach to learn a verification policy that guides abstract interpretation during proof search. We have implemented the proposed approach in a tool called Charon and experimentally evaluated it on hundreds of benchmarks. Our experiments show that the proposed approach significantly outperforms three state-of-the-art tools, namely AI^2 , Reluplex, and Reluval.
△ Less
Submitted 1 May, 2019; v1 submitted 22 April, 2019;
originally announced April 2019.
-
Learned Video Compression
Authors:
Oren Rippel,
Sanjay Nair,
Carissa Lew,
Steve Branson,
Alexander G. Anderson,
Lubomir Bourdev
Abstract:
We present a new algorithm for video coding, learned end-to-end for the low-latency mode. In this setting, our approach outperforms all existing video codecs across nearly the entire bitrate range. To our knowledge, this is the first ML-based method to do so.
We evaluate our approach on standard video compression test sets of varying resolutions, and benchmark against all mainstream commercial c…
▽ More
We present a new algorithm for video coding, learned end-to-end for the low-latency mode. In this setting, our approach outperforms all existing video codecs across nearly the entire bitrate range. To our knowledge, this is the first ML-based method to do so.
We evaluate our approach on standard video compression test sets of varying resolutions, and benchmark against all mainstream commercial codecs, in the low-latency mode. On standard-definition videos, relative to our algorithm, HEVC/H.265, AVC/H.264 and VP9 typically produce codes up to 60% larger. On high-definition 1080p videos, H.265 and VP9 typically produce codes up to 20% larger, and H.264 up to 35% larger. Furthermore, our approach does not suffer from blocking artifacts and pixelation, and thus produces videos that are more visually pleasing.
We propose two main contributions. The first is a novel architecture for video compression, which (1) generalizes motion estimation to perform any learned compensation beyond simple translations, (2) rather than strictly relying on previously transmitted reference frames, maintains a state of arbitrary information learned by the model, and (3) enables jointly compressing all transmitted signals (such as optical flow and residual).
Secondly, we present a framework for ML-based spatial rate control: namely, a mechanism for assigning variable bitrates across space for each frame. This is a critical component for video coding, which to our knowledge had not been developed within a machine learning setting.
△ Less
Submitted 16 November, 2018;
originally announced November 2018.
-
Learning Abstractions for Program Synthesis
Authors:
Xinyu Wang,
Greg Anderson,
Isil Dillig,
K. L. McMillan
Abstract:
Many example-guided program synthesis techniques use abstractions to prune the search space. While abstraction-based synthesis has proven to be very powerful, a domain expert needs to provide a suitable abstract domain, together with the abstract transformers of each DSL construct. However, coming up with useful abstractions can be non-trivial, as it requires both domain expertise and knowledge ab…
▽ More
Many example-guided program synthesis techniques use abstractions to prune the search space. While abstraction-based synthesis has proven to be very powerful, a domain expert needs to provide a suitable abstract domain, together with the abstract transformers of each DSL construct. However, coming up with useful abstractions can be non-trivial, as it requires both domain expertise and knowledge about the synthesizer. In this paper, we propose a new technique for learning abstractions that are useful for instantiating a general synthesis framework in a new domain. Given a DSL and a small set of training problems, our method uses tree interpolation to infer reusable predicate templates that speed up synthesis in a given domain. Our method also learns suitable abstract transformers by solving a certain kind of second-order constraint solving problem in a data-driven way. We have implemented the proposed method in a tool called ATLAS and evaluate it in the context of the BLAZE meta-synthesizer. Our evaluation shows that (a) ATLAS can learn useful abstract domains and transformers from few training problems, and (b) the abstractions learned by ATLAS allow BLAZE to achieve significantly better results compared to manually-crafted abstractions.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
The High-Dimensional Geometry of Binary Neural Networks
Authors:
Alexander G. Anderson,
Cory P. Berg
Abstract:
Recent research has shown that one can train a neural network with binary weights and activations at train time by augmenting the weights with a high-precision continuous latent variable that accumulates small changes from stochastic gradient descent. However, there is a dearth of theoretical analysis to explain why we can effectively capture the features in our data with binary weights and activa…
▽ More
Recent research has shown that one can train a neural network with binary weights and activations at train time by augmenting the weights with a high-precision continuous latent variable that accumulates small changes from stochastic gradient descent. However, there is a dearth of theoretical analysis to explain why we can effectively capture the features in our data with binary weights and activations. Our main result is that the neural networks with binary weights and activations trained using the method of Courbariaux, Hubara et al. (2016) work because of the high-dimensional geometry of binary vectors. In particular, the ideal continuous vectors that extract out features in the intermediate representations of these BNNs are well-approximated by binary vectors in the sense that dot products are approximately preserved. Compared to previous research that demonstrated the viability of such BNNs, our work explains why these BNNs work in terms of the HD geometry. Our theory serves as a foundation for understanding not only BNNs but a variety of methods that seek to compress traditional neural networks. Furthermore, a better understanding of multilayer binary neural networks serves as a starting point for generalizing BNNs to other neural network architectures such as recurrent neural networks.
△ Less
Submitted 19 May, 2017;
originally announced May 2017.
-
Wide & Deep Learning for Recommender Systems
Authors:
Heng-Tze Cheng,
Levent Koc,
Jeremiah Harmsen,
Tal Shaked,
Tushar Chandra,
Hrishi Aradhye,
Glen Anderson,
Greg Corrado,
Wei Chai,
Mustafa Ispir,
Rohan Anil,
Zakaria Haque,
Lichan Hong,
Vihan Jain,
Xiaobing Liu,
Hemal Shah
Abstract:
Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks…
▽ More
Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow.
△ Less
Submitted 24 June, 2016;
originally announced June 2016.
-
DeepMovie: Using Optical Flow and Deep Neural Networks to Stylize Movies
Authors:
Alexander G. Anderson,
Cory P. Berg,
Daniel P. Mossing,
Bruno A. Olshausen
Abstract:
A recent paper by Gatys et al. describes a method for rendering an image in the style of another image. First, they use convolutional neural network features to build a statistical model for the style of an image. Then they create a new image with the content of one image but the style statistics of another image. Here, we extend this method to render a movie in a given artistic style. The naive s…
▽ More
A recent paper by Gatys et al. describes a method for rendering an image in the style of another image. First, they use convolutional neural network features to build a statistical model for the style of an image. Then they create a new image with the content of one image but the style statistics of another image. Here, we extend this method to render a movie in a given artistic style. The naive solution that independently renders each frame produces poor results because the features of the style move substantially from one frame to the next. The other naive method that initializes the optimization for the next frame using the rendered version of the previous frame also produces poor results because the features of the texture stay fixed relative to the frame of the movie instead of moving with objects in the scene. The main contribution of this paper is to use optical flow to initialize the style transfer optimization so that the texture features move with the objects in the video. Finally, we suggest a method to incorporate optical flow explicitly into the cost function.
△ Less
Submitted 26 May, 2016;
originally announced May 2016.
-
An Efficient Algorithm for Unweighted Spectral Graph Sparsification
Authors:
David G. Anderson,
Ming Gu,
Christopher Melgaard
Abstract:
Spectral graph sparsification has emerged as a powerful tool in the analysis of large-scale networks by reducing the overall number of edges, while maintaining a comparable graph Laplacian matrix. In this paper, we present an efficient algorithm for the construction of a new type of spectral sparsifier, the unweighted spectral sparsifier. Given a general undirected and unweighted graph…
▽ More
Spectral graph sparsification has emerged as a powerful tool in the analysis of large-scale networks by reducing the overall number of edges, while maintaining a comparable graph Laplacian matrix. In this paper, we present an efficient algorithm for the construction of a new type of spectral sparsifier, the unweighted spectral sparsifier. Given a general undirected and unweighted graph $G = (V, E)$ and an integer $\ell < |E|$ (the number of edges in $E$), we compute an unweighted graph $H = (V, F)$ with $F \subset E$ and $|F| = \ell$ such that for every $x \in \mathbb{R}^{V}$ \[ {\displaystyle \frac{x^T L_G x}κ \leq x^T L_H x \leq x^T L_G x,} \] where $L_G$ and $L_H$ are the Laplacian matrices for $G$ and $H$, respectively, and $κ\geq 1$ is a slowly-varying function of $|V|, |E|$ and $\ell$. This work addresses the open question of the existence of unweighted graph sparsifiers for unweighted graphs. Additionally, our algorithm can efficiently compute unweighted graph sparsifiers for weighted graphs, leading to sparsified graphs that retain the weights of the original graphs.
△ Less
Submitted 15 December, 2014; v1 submitted 15 October, 2014;
originally announced October 2014.
-
EACOF: A Framework for Providing Energy Transparency to enable Energy-Aware Software Development
Authors:
Hayden Field,
Glen Anderson,
Kerstin Eder
Abstract:
Making energy consumption data accessible to software developers is an essential step towards energy efficient software engineering. The presence of various different, bespoke and incompatible, methods of instrumentation to obtain energy readings is currently limiting the widespread use of energy data in software development. This paper presents EACOF, a modular Energy-Aware Computing Framework th…
▽ More
Making energy consumption data accessible to software developers is an essential step towards energy efficient software engineering. The presence of various different, bespoke and incompatible, methods of instrumentation to obtain energy readings is currently limiting the widespread use of energy data in software development. This paper presents EACOF, a modular Energy-Aware Computing Framework that provides a layer of abstraction between sources of energy data and the applications that exploit them. EACOF replaces platform specific instrumentation through two APIs - one accepts input to the framework while the other provides access to application software. This allows developers to profile their code for energy consumption in an easy and portable manner using simple API calls. We outline the design of our framework and provide details of the API functionality. In a use case, where we investigate the impact of data bit width on the energy consumption of various sorting algorithms, we demonstrate that the data obtained using EACOF provides interesting, sometimes counter-intuitive, insights. All the code is available online under an open source license. http://github.com/eacof
△ Less
Submitted 31 May, 2014;
originally announced June 2014.
-
Utility-based Decision-making in Distributed Systems Modelling
Authors:
Gabrielle Anderson,
Matthew Collinson,
David Pym
Abstract:
We consider a calculus of resources and processes as a basis for modelling decision-making in multi-agent systems. The calculus represents the regulation of agents' choices using utility functions that take account of context. Associated with the calculus is a (Hennessy Milner-style) context sensitive modal logic of state. As an application, we show how a notion of `trust domain' can be defined fo…
▽ More
We consider a calculus of resources and processes as a basis for modelling decision-making in multi-agent systems. The calculus represents the regulation of agents' choices using utility functions that take account of context. Associated with the calculus is a (Hennessy Milner-style) context sensitive modal logic of state. As an application, we show how a notion of `trust domain' can be defined for multi-agent systems.
△ Less
Submitted 23 October, 2013;
originally announced October 2013.
-
Application of Global and One-Dimensional Local Optimization to Operating System Scheduler Tuning
Authors:
George Anderson,
Tshilidzi Marwala,
Fulufhelo Vincent Nelwamondo
Abstract:
This paper describes a study of comparison of global and one-dimensional local optimization methods to operating system scheduler tuning. The operating system scheduler we use is the Linux 2.6.23 Completely Fair Scheduler (CFS) running in simulator (LinSched). We have ported the Hackbench scheduler benchmark to this simulator and use this as the workload. The global optimization approach we use is…
▽ More
This paper describes a study of comparison of global and one-dimensional local optimization methods to operating system scheduler tuning. The operating system scheduler we use is the Linux 2.6.23 Completely Fair Scheduler (CFS) running in simulator (LinSched). We have ported the Hackbench scheduler benchmark to this simulator and use this as the workload. The global optimization approach we use is Particle Swarm Optimization (PSO). We make use of Response Surface Methodology (RSM) to specify optimal parameters for our PSO implementation. The one-dimensional local optimization approach we use is the Golden Section method. In order to use this approach, we convert the scheduler tuning problem from one involving setting of three parameters to one involving the manipulation of one parameter. Our results show that the global optimization approach yields better response but the one- dimensional optimization approach converges to a solution faster than the global optimization approach.
△ Less
Submitted 17 December, 2010;
originally announced December 2010.
-
Use of Data Mining in Scheduler Optimization
Authors:
George Anderson,
Tshilidzi Marwala,
Fulufhelo V. Nelwamondo
Abstract:
The operating system's role in a computer system is to manage the various resources. One of these resources is the Central Processing Unit. It is managed by a component of the operating system called the CPU scheduler. Schedulers are optimized for typical workloads expected to run on the platform. However, a single scheduler may not be appropriate for all workloads. That is, a scheduler may schedu…
▽ More
The operating system's role in a computer system is to manage the various resources. One of these resources is the Central Processing Unit. It is managed by a component of the operating system called the CPU scheduler. Schedulers are optimized for typical workloads expected to run on the platform. However, a single scheduler may not be appropriate for all workloads. That is, a scheduler may schedule a workload such that the completion time is minimized, but when another type of workload is run on the platform, scheduling and therefore completion time will not be optimal; a different scheduling algorithm, or a different set of parameters, may work better. Several approaches to solving this problem have been proposed. The objective of this survey is to summarize the approaches based on data mining, which are available in the literature. In addition to solutions that can be directly utilized for solving this problem, we are interested in data mining research in related areas that have potential for use in operating system scheduling. We also explain general technical issues involved in scheduling in modern computers, including parallel scheduling issues related to multi-core CPUs. We propose a taxonomy that classifies the scheduling approaches we discuss into different categories.
△ Less
Submitted 8 November, 2010;
originally announced November 2010.
-
SASE: Complex Event Processing over Streams
Authors:
Daniel Gyllstrom,
Eugene Wu,
Hee-Jin Chae,
Yanlei Diao,
Patrick Stahlberg,
Gordon Anderson
Abstract:
RFID technology is gaining adoption on an increasing scale for tracking and monitoring purposes. Wide deployments of RFID devices will soon generate an unprecedented volume of data. Emerging applications require the RFID data to be filtered and correlated for complex pattern detection and transformed to events that provide meaningful, actionable information to end applications. In this work, we…
▽ More
RFID technology is gaining adoption on an increasing scale for tracking and monitoring purposes. Wide deployments of RFID devices will soon generate an unprecedented volume of data. Emerging applications require the RFID data to be filtered and correlated for complex pattern detection and transformed to events that provide meaningful, actionable information to end applications. In this work, we design and develop SASE, a com-plex event processing system that performs such data-information transformation over real-time streams. We design a complex event language for specifying application logic for such transformation, devise new query processing techniques to effi-ciently implement the language, and develop a comprehensive system that collects, cleans, and processes RFID data for deliv-ery of relevant, timely information as well as storing necessary data for future querying. We demonstrate an initial prototype of SASE through a real-world retail management scenario.
△ Less
Submitted 22 December, 2006;
originally announced December 2006.