-
Accelerating Drug Discovery Through Agentic AI: A Multi-Agent Approach to Laboratory Automation in the DMTA Cycle
Authors:
Yao Fehlis,
Charles Crain,
Aidan Jensen,
Michael Watson,
James Juhasz,
Paul Mandel,
Betty Liu,
Shawn Mahon,
Daren Wilson,
Nick Lynch-Jonely,
Ben Leedom,
David Fuller
Abstract:
The pharmaceutical industry faces unprecedented challenges in drug discovery, with traditional approaches struggling to meet modern therapeutic development demands. This paper introduces a novel AI framework, Tippy, that transforms laboratory automation through specialized AI agents operating within the Design-Make-Test-Analyze (DMTA) cycle. Our multi-agent system employs five specialized agents -…
▽ More
The pharmaceutical industry faces unprecedented challenges in drug discovery, with traditional approaches struggling to meet modern therapeutic development demands. This paper introduces a novel AI framework, Tippy, that transforms laboratory automation through specialized AI agents operating within the Design-Make-Test-Analyze (DMTA) cycle. Our multi-agent system employs five specialized agents - Supervisor, Molecule, Lab, Analysis, and Report, with Safety Guardrail oversight - each designed to excel in specific phases of the drug discovery pipeline. Tippy represents the first production-ready implementation of specialized AI agents for automating the DMTA cycle, providing a concrete example of how AI can transform laboratory workflows. By leveraging autonomous AI agents that reason, plan, and collaborate, we demonstrate how Tippy accelerates DMTA cycles while maintaining scientific rigor essential for pharmaceutical research. The system shows significant improvements in workflow efficiency, decision-making speed, and cross-disciplinary coordination, offering a new paradigm for AI-assisted drug discovery.
△ Less
Submitted 11 July, 2025;
originally announced July 2025.
-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Authors:
Gheorghe Comanici,
Eric Bieber,
Mike Schaekermann,
Ice Pasupat,
Noveen Sachdeva,
Inderjit Dhillon,
Marcel Blistein,
Ori Ram,
Dan Zhang,
Evan Rosen,
Luke Marris,
Sam Petulla,
Colin Gaffney,
Asaf Aharoni,
Nathan Lintz,
Tiago Cardal Pais,
Henrik Jacobsson,
Idan Szpektor,
Nan-Jiang Jiang,
Krishna Haridasan,
Ahmed Omran,
Nikunj Saunshi,
Dara Bahri,
Gaurav Mishra,
Eric Chu
, et al. (3284 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde…
▽ More
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
△ Less
Submitted 22 July, 2025; v1 submitted 7 July, 2025;
originally announced July 2025.
-
Leveraging Correlation Across Test Platforms for Variance-Reduced Metric Estimation
Authors:
Rachel Luo,
Heng Yang,
Michael Watson,
Apoorva Sharma,
Sushant Veer,
Edward Schmerling,
Marco Pavone
Abstract:
Learning-based robotic systems demand rigorous validation to assure reliable performance, but extensive real-world testing is often prohibitively expensive, and if conducted may still yield insufficient data for high-confidence guarantees. In this work, we introduce a general estimation framework that leverages paired data across test platforms, e.g., paired simulation and real-world observations,…
▽ More
Learning-based robotic systems demand rigorous validation to assure reliable performance, but extensive real-world testing is often prohibitively expensive, and if conducted may still yield insufficient data for high-confidence guarantees. In this work, we introduce a general estimation framework that leverages paired data across test platforms, e.g., paired simulation and real-world observations, to achieve better estimates of real-world metrics via the method of control variates. By incorporating cheap and abundant auxiliary measurements (for example, simulator outputs) as control variates for costly real-world samples, our method provably reduces the variance of Monte Carlo estimates and thus requires significantly fewer real-world samples to attain a specified confidence bound on the mean performance. We provide theoretical analysis characterizing the variance and sample-efficiency improvement, and demonstrate empirically in autonomous driving and quadruped robotics settings that our approach achieves high-probability bounds with markedly improved sample efficiency. Our technique can lower the real-world testing burden for validating the performance of the stack, thereby enabling more efficient and cost-effective experimental evaluation of robotic systems.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
WildLive: Near Real-time Visual Wildlife Tracking onboard UAVs
Authors:
Nguyen Ngoc Dat,
Tom Richardson,
Matthew Watson,
Kilian Meier,
Jenna Kline,
Sid Reid,
Guy Maalouf,
Duncan Hine,
Majid Mirmehdi,
Tilo Burghardt
Abstract:
Live tracking of wildlife via high-resolution video processing directly onboard drones is widely unexplored and most existing solutions rely on streaming video to ground stations to support navigation. Yet, both autonomous animal-reactive flight control beyond visual line of sight and/or mission-specific individual and behaviour recognition tasks rely to some degree on this capability. In response…
▽ More
Live tracking of wildlife via high-resolution video processing directly onboard drones is widely unexplored and most existing solutions rely on streaming video to ground stations to support navigation. Yet, both autonomous animal-reactive flight control beyond visual line of sight and/or mission-specific individual and behaviour recognition tasks rely to some degree on this capability. In response, we introduce WildLive - a near real-time animal detection and tracking framework for high-resolution imagery running directly onboard uncrewed aerial vehicles (UAVs). The system performs multi-animal detection and tracking at 17.81fps for HD and 7.53fps on 4K video streams suitable for operation during higher altitude flights to minimise animal disturbance. Our system is optimised for Jetson Orin AGX onboard hardware. It integrates the efficiency of sparse optical flow tracking and mission-specific sampling with device-optimised and proven YOLO-driven object detection and segmentation techniques. Essentially, computational resource is focused onto spatio-temporal regions of high uncertainty to significantly improve UAV processing speeds. Alongside, we introduce our WildLive dataset, which comprises 200K+ annotated animal instances across 19K+ frames from 4K UAV videos collected at the Ol Pejeta Conservancy in Kenya. All frames contain ground truth bounding boxes, segmentation masks, as well as individual tracklets and tracking point trajectories. We compare our system against current object tracking approaches including OC-SORT, ByteTrack, and SORT. Our multi-animal tracking experiments with onboard hardware confirm that near real-time high-resolution wildlife tracking is possible on UAVs whilst maintaining high accuracy levels as needed for future navigational and mission-specific animal-centric operational autonomy. Our materials are available at: https://dat-nguyenvn.github.io/WildLive/
△ Less
Submitted 23 May, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
MMLA: Multi-Environment, Multi-Species, Low-Altitude Drone Dataset
Authors:
Jenna Kline,
Samuel Stevens,
Guy Maalouf,
Camille Rondeau Saint-Jean,
Dat Nguyen Ngoc,
Majid Mirmehdi,
David Guerin,
Tilo Burghardt,
Elzbieta Pastucha,
Blair Costelloe,
Matthew Watson,
Thomas Richardson,
Ulrik Pagh Schultz Lundquist
Abstract:
Real-time wildlife detection in drone imagery supports critical ecological and conservation monitoring. However, standard detection models like YOLO often fail to generalize across locations and struggle with rare species, limiting their use in automated drone deployments. We present MMLA, a novel multi-environment, multi-species, low-altitude drone dataset collected across three sites (Ol Pejeta…
▽ More
Real-time wildlife detection in drone imagery supports critical ecological and conservation monitoring. However, standard detection models like YOLO often fail to generalize across locations and struggle with rare species, limiting their use in automated drone deployments. We present MMLA, a novel multi-environment, multi-species, low-altitude drone dataset collected across three sites (Ol Pejeta Conservancy and Mpala Research Centre in Kenya, and The Wilds in Ohio), featuring six species (zebras, giraffes, onagers, and African wild dogs). The dataset contains 811K annotations from 37 high-resolution videos. Baseline YOLO models show performance disparities across locations while fine-tuning YOLOv11m on MMLA improves mAP50 to 82%, a 52-point gain over baseline. Our results underscore the need for diverse training data to enable robust animal detection in autonomous drone systems.
△ Less
Submitted 3 June, 2025; v1 submitted 10 April, 2025;
originally announced April 2025.
-
Gemma 3 Technical Report
Authors:
Gemma Team,
Aishwarya Kamath,
Johan Ferret,
Shreya Pathak,
Nino Vieillard,
Ramona Merhej,
Sarah Perrin,
Tatiana Matejovicova,
Alexandre Ramé,
Morgane Rivière,
Louis Rouillard,
Thomas Mesnard,
Geoffrey Cideron,
Jean-bastien Grill,
Sabela Ramos,
Edouard Yvinec,
Michelle Casbon,
Etienne Pot,
Ivo Penchev,
Gaël Liu,
Francesco Visin,
Kathleen Kenealy,
Lucas Beyer,
Xiaohai Zhai,
Anton Tsitsulin
, et al. (191 additional authors not shown)
Abstract:
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie…
▽ More
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Sequencing Silicates in the IRS Debris Disk Catalog I: Methodology for Unsupervised Clustering
Authors:
Cicero X. Lu,
Tushar Mittal,
Christine H. Chen,
Alexis Y. Li,
Kadin Worthen,
B. A. Sargent,
Carey M. Lisse,
G. C. Sloan,
Dean C. Hines,
Dan M. Watson,
Isabel Rebollido,
Bin B. Ren,
Joel D. Green
Abstract:
Debris disks, which consist of dust, planetesimals, planets, and gas, offer a unique window into the mineralogical composition of their parent bodies, especially during the critical phase of terrestrial planet formation spanning 10 to a few hundred million years. Observations from the $\textit{Spitzer}$ Space Telescope have unveiled thousands of debris disks, yet systematic studies remain scarce,…
▽ More
Debris disks, which consist of dust, planetesimals, planets, and gas, offer a unique window into the mineralogical composition of their parent bodies, especially during the critical phase of terrestrial planet formation spanning 10 to a few hundred million years. Observations from the $\textit{Spitzer}$ Space Telescope have unveiled thousands of debris disks, yet systematic studies remain scarce, let alone those with unsupervised clustering techniques. This study introduces $\texttt{CLUES}$ (CLustering UnsupErvised with Sequencer), a novel, non-parametric, fully-interpretable machine-learning spectral analysis tool designed to analyze and classify the spectral data of debris disks. $\texttt{CLUES}$ combines multiple unsupervised clustering methods with multi-scale distance measures to discern new groupings and trends, offering insights into compositional diversity and geophysical processes within these disks. Our analysis allows us to explore a vast parameter space in debris disk mineralogy and also offers broader applications in fields such as protoplanetary disks and solar system objects. This paper details the methodology, implementation, and initial results of $\texttt{CLUES}$, setting the stage for more detailed follow-up studies focusing on debris disk mineralogy and demographics.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Gemma 2: Improving Open Language Models at a Practical Size
Authors:
Gemma Team,
Morgane Riviere,
Shreya Pathak,
Pier Giuseppe Sessa,
Cassidy Hardin,
Surya Bhupatiraju,
Léonard Hussenot,
Thomas Mesnard,
Bobak Shahriari,
Alexandre Ramé,
Johan Ferret,
Peter Liu,
Pouya Tafti,
Abe Friesen,
Michelle Casbon,
Sabela Ramos,
Ravin Kumar,
Charline Le Lan,
Sammy Jerome,
Anton Tsitsulin,
Nino Vieillard,
Piotr Stanczyk,
Sertan Girgin,
Nikola Momchev,
Matt Hoffman
, et al. (173 additional authors not shown)
Abstract:
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al…
▽ More
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
△ Less
Submitted 2 October, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
Length-scale study in deep learning prediction for non-small cell lung cancer brain metastasis
Authors:
Haowen Zhou,
Steven,
Lin,
Mark Watson,
Cory T. Bernadt,
Oumeng Zhang,
Ramaswamy Govindan,
Richard J. Cote,
Changhuei Yang
Abstract:
Deep learning assisted digital pathology has the potential to impact clinical practice in significant ways. In recent studies, deep neural network (DNN) enabled analysis outperforms human pathologists. Increasing sizes and complexity of the DNN architecture generally improves performance at the cost of DNN's explainability. For pathology, this lack of DNN explainability is particularly problematic…
▽ More
Deep learning assisted digital pathology has the potential to impact clinical practice in significant ways. In recent studies, deep neural network (DNN) enabled analysis outperforms human pathologists. Increasing sizes and complexity of the DNN architecture generally improves performance at the cost of DNN's explainability. For pathology, this lack of DNN explainability is particularly problematic as it hinders the broader clinical interpretation of the pathology features that may provide physiological disease insights. To better assess the features that DNN uses in developing predictive algorithms to interpret digital microscopic images, we sought to understand the role of resolution and tissue scale and here describe a novel method for studying the predictive feature length-scale that underpins a DNN's predictive power. We applied the method to study a DNN's predictive capability in the case example of brain metastasis prediction from early-stage non-small-cell lung cancer biopsy slides. The study highlights the DNN attention in the brain metastasis prediction targeting both cellular scale (resolution) and tissue scale features on H&E-stained histological whole slide images. At the cellular scale, we see that DNN's predictive power is progressively increased at higher resolution (i.e., lower resolvable feature length) and is largely lost when the resolvable feature length is longer than 5 microns. In addition, DNN uses more macro-scale features (maximal feature length) associated with tissue organization/architecture and is optimized when assessing visual fields larger than 41 microns. This study for the first time demonstrates the length-scale requirements necessary for optimal DNN learning on digital whole slide images.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
KerasCV and KerasNLP: Vision and Language Power-Ups
Authors:
Matthew Watson,
Divyashree Shivakumar Sreepathihalli,
Francois Chollet,
Martin Gorner,
Kiranbir Sodhia,
Ramesh Sampath,
Tirth Patel,
Haifeng Jin,
Neel Kovelamudi,
Gabriel Rasskin,
Samaneh Saadat,
Luke Wood,
Chen Qian,
Jonathan Bischof,
Ian Stenbit,
Abheesht Sharma,
Anshuman Mishra
Abstract:
We present the Keras domain packages KerasCV and KerasNLP, extensions of the Keras API for Computer Vision and Natural Language Processing workflows, capable of running on either JAX, TensorFlow, or PyTorch. These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance. We adopt a modular, layered design: at the library's lowest level of abstraction…
▽ More
We present the Keras domain packages KerasCV and KerasNLP, extensions of the Keras API for Computer Vision and Natural Language Processing workflows, capable of running on either JAX, TensorFlow, or PyTorch. These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance. We adopt a modular, layered design: at the library's lowest level of abstraction, we provide building blocks for creating models and data preprocessing pipelines, and at the library's highest level of abstraction, we provide pretrained ``task" models for popular architectures such as Stable Diffusion, YOLOv8, GPT2, BERT, Mistral, CLIP, Gemma, T5, etc. Task models have built-in preprocessing, pretrained weights, and can be fine-tuned on raw inputs. To enable efficient training, we support XLA compilation for all models, and run all preprocessing via a compiled graph of TensorFlow operations using the tf.data API. The libraries are fully open-source (Apache 2.0 license) and available on GitHub.
△ Less
Submitted 5 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization
Authors:
Junjie Shentu,
Matthew Watson,
Noura Al Moubayed
Abstract:
With the unprecedented performance being achieved by text-to-image (T2I) diffusion models, T2I customization further empowers users to tailor the diffusion model to new concepts absent in the pre-training dataset, termed subject-driven generation. Moreover, extracting several new concepts from a single image enables the model to learn multiple concepts, and simultaneously decreases the difficultie…
▽ More
With the unprecedented performance being achieved by text-to-image (T2I) diffusion models, T2I customization further empowers users to tailor the diffusion model to new concepts absent in the pre-training dataset, termed subject-driven generation. Moreover, extracting several new concepts from a single image enables the model to learn multiple concepts, and simultaneously decreases the difficulties of training data preparation, urging the disentanglement of multiple concepts to be a new challenge. However, existing models for disentanglement commonly require pre-determined masks or retain background elements. To this end, we propose an attention-guided method, AttenCraft, for multiple concept disentanglement. In particular, our method leverages self-attention and cross-attention maps to create accurate masks for each concept within a single initialization step, omitting any required mask preparation by humans or other models. The created masks are then applied to guide the cross-attention activation of each target concept during training and achieve concept disentanglement. Additionally, we introduce Uniform sampling and Reweighted sampling schemes to alleviate the non-synchronicity of feature acquisition from different concepts, and improve generation quality. Our method outperforms baseline models in terms of image-alignment, and behaves comparably on text-alignment. Finally, we showcase the applicability of AttenCraft to more complicated settings, such as an input image containing three concepts. The project is available at https://github.com/junjie-shentu/AttenCraft.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
A Systematic Survey of the Gemini Principles for Digital Twin Ontologies
Authors:
James Michael Tooth,
Nilufer Tuptuk,
Jeremy Daniel McKendrick Watson
Abstract:
Ontologies are widely used for achieving interoperable Digital Twins (DTws), yet competing DTw definitions compound interoperability issues. Semantically linking these differing twins is feasible through ontologies and Cognitive Digital Twins (CDTws). However, it is often unclear how ontology use bolsters broader DTw advancements. This article presents a systematic survey following the PRISMA meth…
▽ More
Ontologies are widely used for achieving interoperable Digital Twins (DTws), yet competing DTw definitions compound interoperability issues. Semantically linking these differing twins is feasible through ontologies and Cognitive Digital Twins (CDTws). However, it is often unclear how ontology use bolsters broader DTw advancements. This article presents a systematic survey following the PRISMA method, to explore the potential of ontologies to support DTws to meet the Centre for Digital Built Britain's Gemini Principles and aims to link progress in ontologies to this framework. The Gemini Principles focus on common DTw requirements, considering: Purpose for 1) Public Good, 2) Value Creation, and 3) Insight; Trustworthiness with sufficient 4) Security, 5) Openness, and 6) Quality; and appropriate Functionality of 7) Federation, 8) Curation, and 9) Evolution. This systematic literature review examines the role of ontologies in facilitating each principle. Existing research uses ontologies to solve DTw challenges within these principles, particularly by connecting DTws, optimising decisionmaking, and reasoning governance policies. Furthermore, analysing the sectoral distribution of literature found that research encompassing the crossover of ontologies, DTws and the Gemini Principles is emerging, and that most innovation is predominantly within manufacturing and built environment sectors. Critical gaps for researchers, industry practitioners, and policymakers are subsequently identified.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation
Authors:
Junjie Shentu,
Matthew Watson,
Noura Al Moubayed
Abstract:
Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-ima…
▽ More
Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-image model (Texual Localization) to handle multi-concept input images. During fine-tuning, our method incorporates a novel cross-attention guidance to decompose multiple concepts, establishing distinct connections between the visual representation of the target concept and the identifier token in the text prompt. Experimental results reveal that our method outperforms or performs comparably to the baseline models in terms of image fidelity and image-text alignment on multi-concept input images. In comparison to Custom Diffusion, our method with hard guidance achieves CLIP-I scores that are 7.04%, 8.13% higher and CLIP-T scores that are 2.22%, 5.85% higher in single-concept and multi-concept generation, respectively. Notably, our method generates cross-attention maps consistent with the target concept in the generated images, a capability absent in existing models.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Case Study: Securing MMU-less Linux Using CHERI
Authors:
Hesham Almatary,
Alfredo Mazzinghi,
Robert N. M. Watson
Abstract:
MMU-less Linux variant lacks security because it does not have protection or isolation mechanisms. It also does not use MPUs as they do not fit with its software model because of the design drawbacks of MPUs (\ie coarse-grained protection with fixed number of protected regions). We secure the existing MMU-less Linux version of the RISC-V port using CHERI. CHERI is a hardware-software capability-ba…
▽ More
MMU-less Linux variant lacks security because it does not have protection or isolation mechanisms. It also does not use MPUs as they do not fit with its software model because of the design drawbacks of MPUs (\ie coarse-grained protection with fixed number of protected regions). We secure the existing MMU-less Linux version of the RISC-V port using CHERI. CHERI is a hardware-software capability-based system that extends the ISA, toolchain, programming languages, operating systems, and applications in order to provide complete pointer and memory safety. We believe that CHERI could provide significant security guarantees for high-end dynamic MMU-less embedded systems at lower costs, compared to MMUs and MPUs, by: 1) building the entire software stack in pure-capability CHERI C mode which provides complete spatial memory safety at the kernel and user-level, 2) isolating user programs as separate ELFs, each with its own CHERI-based capability table; this provides spatial memory safety similar to what the MMU offers (\ie user programs cannot access each other's memory), 3) isolating user programs from the kernel as the kernel has its own capability table from the users and vice versa, and 4) compartmentalising kernel modules using CompartOS' linkage-based compartmentalisation. This offers a new security front that is not possible using the current MMU-based Linux, where vulnerable/malicious kernel modules (\eg device drivers) executing in the kernel space would not compromise or take down the entire system. These are the four main contributions of this paper, presenting novel CHERI-based mechanisms to secure MMU-less embedded Linux.
△ Less
Submitted 18 January, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Socio-Technical Security Modelling: Analysis of State-of-the-Art, Application, and Maturity in Critical Industrial Infrastructure Environments/Domains
Authors:
Uchenna D Ani,
Jeremy M Watson,
Nilufer Tuptuk,
Steve Hailes,
Aslam Jawar
Abstract:
This study explores the state-of-the-art, application, and maturity of socio-technical security models for industries and sectors dependent on CI and investigates the gap between academic research and industry practices concerning the modelling of both the social and technical aspects of security. Systematic study and critical analysis of literature show that a steady and growing on socio-technica…
▽ More
This study explores the state-of-the-art, application, and maturity of socio-technical security models for industries and sectors dependent on CI and investigates the gap between academic research and industry practices concerning the modelling of both the social and technical aspects of security. Systematic study and critical analysis of literature show that a steady and growing on socio-technical security M&S approaches is emerging, possibly prompted by the growing recognition that digital systems and workplaces do not only comprise technologies, but also social (human) and sometimes physical elements.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Improving the Cybersecurity of Critical National Infrastructure using Modelling and Simulation
Authors:
Uchenna D Ani,
Jeremy D McK Watson,
Nilufer Tuptuk,
Steve Hailes,
Madeline Carr,
Carsten Maple
Abstract:
The UK Critical National Infrastructure is critically dependent on digital technologies that provide communications, monitoring, control, and decision-support functionalities. Digital technologies are progressively enhancing efficiency, reliability, and availability of infrastructure, and enabling new benefits not previously available. These benefits can introduce vulnerabilities through the conne…
▽ More
The UK Critical National Infrastructure is critically dependent on digital technologies that provide communications, monitoring, control, and decision-support functionalities. Digital technologies are progressively enhancing efficiency, reliability, and availability of infrastructure, and enabling new benefits not previously available. These benefits can introduce vulnerabilities through the connectivity enabled by the digital systems, thus, making it easier for would-be attackers, who frequently use socio-technical approaches, exploiting humans-in-the-loop to break in and sabotage an organization. Therefore, policies and strategies that minimize and manage risks must include an understanding of operator and corporate behaviors, as well as technical elements and the interfaces between them and humans. Better security via socio-technical security Modelling and Simulation can be achieved if backed by government effort, including appropriate policy interventions. Government, through its departments and agencies, can contribute by sign-posting and shaping the decision-making environment concerning cybersecurity M&S approaches and tools, showing how they can contribute to enhancing security in Modern Critical Infrastructure Systems.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
CompartOS: CHERI Compartmentalization for Embedded Systems
Authors:
Hesham Almatary,
Michael Dodson,
Jessica Clarke,
Peter Rugg,
Ivan Gomes,
Michal Podhradsky,
Peter G. Neumann,
Simon W. Moore,
Robert N. M. Watson
Abstract:
Existing high-end embedded systems face frequent security attacks. Software compartmentalization is one technique to limit the attacks' effects to the compromised compartment and not the entire system. Unfortunately, the existing state-of-the-art embedded hardware-software solutions do not work well to enforce software compartmentalization for high-end embedded systems. MPUs are not fine-grained a…
▽ More
Existing high-end embedded systems face frequent security attacks. Software compartmentalization is one technique to limit the attacks' effects to the compromised compartment and not the entire system. Unfortunately, the existing state-of-the-art embedded hardware-software solutions do not work well to enforce software compartmentalization for high-end embedded systems. MPUs are not fine-grained and suffer from significant scalability limitations as they can only protect a small and fixed number of memory regions. On the other hand, MMUs suffer from non-determinism and coarse-grained protection. This paper introduces CompartOS as a lightweight linkage-based compartmentalization model for high-end, complex, mainstream embedded systems. CompartOS builds on CHERI, a capability-based hardware architecture, to meet scalability, availability, compatibility, and fine-grained security goals. Microbenchmarks show that CompartOS' protection-domain crossing is 95% faster than MPU-based IPC. We applied the CompartOS model, with low effort, to complex existing systems, including TCP servers and a safety-critical automotive demo. CompartOS not only catches 10 out of 13 FreeRTOS-TCP published vulnerabilities that MPU-based protection (e.g., uVisor) cannot catch but can also recover from them. Further, our TCP throughput evaluations show that our CompartOS prototype is 52% faster than relevant MPU-based compartmentalization models (e.g., ACES), with a 15% overhead compared to an unprotected system. This comes at an FPGA's LUTs overhead of 10.4% to support CHERI for an unprotected baseline RISC-V processor, compared to 7.6% to support MPU, while CHERI only incurs 1.3% of the registers area overhead compared to 2% for MPU.
△ Less
Submitted 11 June, 2022; v1 submitted 6 June, 2022;
originally announced June 2022.
-
Probability Distributions for Elliptic Curves in the CGL Hash Function
Authors:
Dhruv Bhatia,
Kara Fagerstrom,
Maximillian Watson
Abstract:
Hash functions map data of arbitrary length to data of predetermined length. Good hash functions are hard to predict, making them useful in cryptography. We are interested in the elliptic curve CGL hash function, which maps a bitstring to an elliptic curve by traversing an input-determined path through an isogeny graph. The nodes of an isogeny graph are elliptic curves, and the edges are special m…
▽ More
Hash functions map data of arbitrary length to data of predetermined length. Good hash functions are hard to predict, making them useful in cryptography. We are interested in the elliptic curve CGL hash function, which maps a bitstring to an elliptic curve by traversing an input-determined path through an isogeny graph. The nodes of an isogeny graph are elliptic curves, and the edges are special maps betwixt elliptic curves called isogenies. Knowing which hash values are most likely informs us of potential security weaknesses in the hash function. We use stochastic matrices to compute the expected probability distributions of the hash values. We generalize our experimental data into a theorem that completely describes all possible probability distributions of the CGL hash function. We use this theorem to evaluate the collision resistance of the CGL hash function and compare this to the collision resistance of an "ideal" hash function.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Agree to Disagree: When Deep Learning Models With Identical Architectures Produce Distinct Explanations
Authors:
Matthew Watson,
Bashar Awwad Shiekh Hasan,
Noura Al Moubayed
Abstract:
Deep Learning of neural networks has progressively become more prominent in healthcare with models reaching, or even surpassing, expert accuracy levels. However, these success stories are tainted by concerning reports on the lack of model transparency and bias against some medical conditions or patients' sub-groups. Explainable methods are considered the gateway to alleviate many of these concerns…
▽ More
Deep Learning of neural networks has progressively become more prominent in healthcare with models reaching, or even surpassing, expert accuracy levels. However, these success stories are tainted by concerning reports on the lack of model transparency and bias against some medical conditions or patients' sub-groups. Explainable methods are considered the gateway to alleviate many of these concerns. In this study we demonstrate that the generated explanations are volatile to changes in model training that are perpendicular to the classification task and model structure. This raises further questions about trust in deep learning models for healthcare. Mainly, whether the models capture underlying causal links in the data or just rely on spurious correlations that are made visible via explanation methods. We demonstrate that the output of explainability methods on deep neural networks can vary significantly by changes of hyper-parameters, such as the random seed or how the training set is shuffled. We introduce a measure of explanation consistency which we use to highlight the identified problems on the MIMIC-CXR dataset. We find explanations of identical models but with different training setups have a low consistency: $\approx$ 33% on average. On the contrary, kernel methods are robust against any orthogonal changes, with explanation consistency at 94%. We conclude that current trends in model explanation are not sufficient to mitigate the risks of deploying models in real life healthcare applications.
△ Less
Submitted 30 October, 2021; v1 submitted 14 May, 2021;
originally announced May 2021.
-
Attack-agnostic Adversarial Detection on Medical Data Using Explainable Machine Learning
Authors:
Matthew Watson,
Noura Al Moubayed
Abstract:
Explainable machine learning has become increasingly prevalent, especially in healthcare where explainable models are vital for ethical and trusted automated decision making. Work on the susceptibility of deep learning models to adversarial attacks has shown the ease of designing samples to mislead a model into making incorrect predictions. In this work, we propose a model agnostic explainability-…
▽ More
Explainable machine learning has become increasingly prevalent, especially in healthcare where explainable models are vital for ethical and trusted automated decision making. Work on the susceptibility of deep learning models to adversarial attacks has shown the ease of designing samples to mislead a model into making incorrect predictions. In this work, we propose a model agnostic explainability-based method for the accurate detection of adversarial samples on two datasets with different complexity and properties: Electronic Health Record (EHR) and chest X-ray (CXR) data. On the MIMIC-III and Henan-Renmin EHR datasets, we report a detection accuracy of 77% against the Longitudinal Adversarial Attack. On the MIMIC-CXR dataset, we achieve an accuracy of 88%; significantly improving on the state of the art of adversarial detection in both datasets by over 10% in all settings. We propose an anomaly detection based method using explainability techniques to detect adversarial samples which is able to generalise to different attack methods without a need for retraining.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
The Internet of Things in Ports: Six Key Security and Governance Challenges for the UK (Policy Brief)
Authors:
Feja Lesniewska,
Uchenna D Ani,
Jeremy M Watson,
Madeline Carr
Abstract:
In January 2019, the UK Government published its Maritime 2050 on Navigating the Future strategy. In the strategy, the government highlighted the importance of digitalization (with well-designed regulatory support) to achieve its goal of ensuring that the UK plays a global leadership role in the maritime sector. Ports, the gateways for 95% of UK trade movements, were identified as key sites for in…
▽ More
In January 2019, the UK Government published its Maritime 2050 on Navigating the Future strategy. In the strategy, the government highlighted the importance of digitalization (with well-designed regulatory support) to achieve its goal of ensuring that the UK plays a global leadership role in the maritime sector. Ports, the gateways for 95% of UK trade movements, were identified as key sites for investment in technological innovation. The government identified the potential of the Internet of Things (IoT), in conjunction with other information-sharing technologies, such as shared data platforms, and Artificial Intelligence applications (AI), to synchronize processes within the port ecosystem leading to improved efficiency, safety, and environmental benefits, including improved air quality and lower greenhouse gas emissions.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
Design Considerations for Building Credible Security Testbeds: A Systematic Study of Industrial Control System Use Cases
Authors:
Uchenna D Ani,
Jeremy M Watson,
Benjamin Green,
Barnaby Craggs,
Jason Nurse
Abstract:
This paper presents a mapping framework for design factors and implementation process for building credible Industrial Control Systems (ICS) security testbeds. The resilience of ICSs has become a critical concern to operators and governments following widely publicised cyber security events. The inability to apply conventional Information Technology security practice to ICSs further compounds chal…
▽ More
This paper presents a mapping framework for design factors and implementation process for building credible Industrial Control Systems (ICS) security testbeds. The resilience of ICSs has become a critical concern to operators and governments following widely publicised cyber security events. The inability to apply conventional Information Technology security practice to ICSs further compounds challenges in adequately securing critical systems. To overcome these challenges, and do so without impacting live environments, testbeds for the exploration, development and evaluation of security controls are widely used. However, how a testbed is designed and its attributes, can directly impact not only its viability but also its credibility as a whole. Through a combined systematic and thematic analysis and mapping of ICS security testbed design attributes, this paper suggests that the expertise of human experimenters, design objectives, the implementation approach, architectural coverage, core characteristics, and evaluation methods; are considerations that can help establish or enhance confidence, trustworthiness and acceptance; thus, credibility of ICS security testbeds.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
Robust Incremental State Estimation through Covariance Adaptation
Authors:
Ryan M. Watson,
Jason N. Gross,
Clark N. Taylor,
Robert C. Leishman
Abstract:
Recent advances in the fields of robotics and automation have spurred significant interest in robust state estimation. To enable robust state estimation, several methodologies have been proposed. One such technique, which has shown promising performance, is the concept of iteratively estimating a Gaussian Mixture Model (GMM), based upon the state estimation residuals, to characterize the measureme…
▽ More
Recent advances in the fields of robotics and automation have spurred significant interest in robust state estimation. To enable robust state estimation, several methodologies have been proposed. One such technique, which has shown promising performance, is the concept of iteratively estimating a Gaussian Mixture Model (GMM), based upon the state estimation residuals, to characterize the measurement uncertainty model. Through this iterative process, the measurement uncertainty model is more accurately characterized, which enables robust state estimation through the appropriate de-weighting of erroneous observations. This approach, however, has traditionally required a batch estimation framework to enable the estimation of the measurement uncertainty model, which is not advantageous to robotic applications. In this paper, we propose an efficient, incremental extension to the measurement uncertainty model estimation paradigm. The incremental covariance estimation (ICE) approach, as detailed within this paper, is evaluated on several collected data sets, where it is shown to provide a significant increase in localization accuracy when compared to other state-of-the-art robust, incremental estimation algorithms.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Enabling Robust State Estimation through Measurement Error Covariance Adaptation
Authors:
Ryan M. Watson,
Jason N. Gross,
Clark N. Taylor,
Robert C. Leishman
Abstract:
Accurate platform localization is an integral component of most robotic systems. As these robotic systems become more ubiquitous, it is necessary to develop robust state estimation algorithms that are able to withstand novel and non-cooperative environments. When dealing with novel and non-cooperative environments, little is known a priori about the measurement error uncertainty, thus, there is a…
▽ More
Accurate platform localization is an integral component of most robotic systems. As these robotic systems become more ubiquitous, it is necessary to develop robust state estimation algorithms that are able to withstand novel and non-cooperative environments. When dealing with novel and non-cooperative environments, little is known a priori about the measurement error uncertainty, thus, there is a requirement that the uncertainty models of the localization algorithm be adaptive. Within this paper, we propose the batch covariance estimation technique, which enables robust state estimation through the iterative adaptation of the measurement uncertainty model. The adaptation of the measurement uncertainty model is granted through non-parametric clustering of the residuals, which enables the characterization of the measurement uncertainty via a Gaussian mixture model. The provided Gaussian mixture model can be utilized within any non-linear least squares optimization algorithm by approximately characterizing each observation with the sufficient statistics of the assigned cluster (i.e., each observation's uncertainty model is updated based upon the assignment provided by the non-parametric clustering algorithm). The proposed algorithm is verified on several GNSS collected data sets, where it is shown that the proposed technique exhibits some advantages when compared to other robust estimation techniques when confronted with degraded data quality.
△ Less
Submitted 13 August, 2019; v1 submitted 10 June, 2019;
originally announced June 2019.
-
Revisiting Visual Grounding
Authors:
Erik Conser,
Kennedy Hahn,
Chandler M. Watson,
Melanie Mitchell
Abstract:
We revisit a particular visual grounding method: the "Image Retrieval Using Scene Graphs" (IRSG) system of Johnson et al. (2015). Our experiments indicate that the system does not effectively use its learned object-relationship models. We also look closely at the IRSG dataset, as well as the widely used Visual Relationship Dataset (VRD) that is adapted from it. We find that these datasets exhibit…
▽ More
We revisit a particular visual grounding method: the "Image Retrieval Using Scene Graphs" (IRSG) system of Johnson et al. (2015). Our experiments indicate that the system does not effectively use its learned object-relationship models. We also look closely at the IRSG dataset, as well as the widely used Visual Relationship Dataset (VRD) that is adapted from it. We find that these datasets exhibit biases that allow methods that ignore relationships to perform relatively well. We also describe several other problems with the IRSG dataset, and report on experiments using a subset of the dataset in which the biases and other problems are removed. Our studies contribute to a more general effort: that of better understanding what machine learning methods that combine language and vision actually learn and what popular datasets actually test.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
A Review of Critical Infrastructure Protection Approaches: Improving Security through Responsiveness to the Dynamic Modelling Landscape
Authors:
Uchenna D Ani,
Jeremy D McK. Watson,
Jason R. C. Nurse,
Al Cook,
Carsten Maple
Abstract:
As new technologies such as the Internet of Things (IoT) are integrated into Critical National Infrastructures (CNI), new cybersecurity threats emerge that require specific security solutions. Approaches used for analysis include the modelling and simulation of critical infrastructure systems using attributes, functionalities, operations, and behaviours to support various security analysis viewpoi…
▽ More
As new technologies such as the Internet of Things (IoT) are integrated into Critical National Infrastructures (CNI), new cybersecurity threats emerge that require specific security solutions. Approaches used for analysis include the modelling and simulation of critical infrastructure systems using attributes, functionalities, operations, and behaviours to support various security analysis viewpoints, recognising and appropriately managing associated security risks. With several critical infrastructure protection approaches available, the question of how to effectively model the complex behaviour of interconnected CNI elements and to configure their protection as a system-of-systems remains a challenge. Using a systematic review approach, existing critical infrastructure protection approaches (tools and techniques) are examined to determine their suitability given trends like IoT, and effective security modelling and analysis issues. It is found that empirical-based, agent-based, system dynamics-based, and network-based modelling are more commonly applied than economic-based and equation-based techniques, and empirical-based modelling is the most widely used. The energy and transportation critical infrastructure sectors reflect the most responsive sectors, and no one Critical Infrastructure Protection (CIP) approach - tool, technique, methodology or framework -- provides a fit-for-all capacity for all-round attribute modelling and simulation of security risks. Typically, deciding factors for CIP choices to adopt are often dominated by trade-offs between complexity of use and popularity of approach, as well as between specificity and generality of application in sectors.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.
-
Robust Navigation In GNSS Degraded Environment Using Graph Optimization
Authors:
Ryan M. Watson,
Jason N. Gross
Abstract:
Robust navigation in urban environments has received a considerable amount of both academic and commercial interest over recent years. This is primarily due to large commercial organizations such as Google and Uber stepping into the autonomous navigation market. Most of this research has shied away from Global Navigation Satellite System (GNSS) based navigation. The aversion to utilizing GNSS data…
▽ More
Robust navigation in urban environments has received a considerable amount of both academic and commercial interest over recent years. This is primarily due to large commercial organizations such as Google and Uber stepping into the autonomous navigation market. Most of this research has shied away from Global Navigation Satellite System (GNSS) based navigation. The aversion to utilizing GNSS data is due to the degraded nature of the data in urban environment (e.g., multipath, poor satellite visibility). The degradation of the GNSS data in urban environments makes it such that traditional (GNSS) positioning methods (e.g., extended Kalman filter, particle filters) perform poorly. However, recent advances in robust graph theoretic based sensor fusion methods, primarily applied to Simultaneous Localization and Mapping (SLAM) based robotic applications, can also be applied to GNSS data processing. This paper will utilize one such method known as the factor graph in conjunction several robust optimization techniques to evaluate their applicability to robust GNSS data processing. The goals of this study are two-fold. First, for GNSS applications, we will experimentally evaluate the effectiveness of robust optimization techniques within a graph-theoretic estimation framework. Second, by releasing the software developed and data sets used for this study, we will introduce a new open-source front-end to the Georgia Tech Smoothing and Mapping (GTSAM) library for the purpose of integrating GNSS pseudorange observations.
△ Less
Submitted 22 June, 2018;
originally announced June 2018.
-
Evaluation of Kinematic Precise Point Positioning Convergence with an Incremental Graph Optimizer
Authors:
Ryan M. Watson,
Jason N. Gross
Abstract:
Estimation techniques to precisely localize a kinematic platform with GNSS observables can be broadly partitioned into two categories: differential, or undifferenced. The differential techniques (e.g., real-time kinematic (RTK)) have several attractive properties, such as correlated error mitigation and fast convergence; however, to support a differential processing scheme, an infrastructure of re…
▽ More
Estimation techniques to precisely localize a kinematic platform with GNSS observables can be broadly partitioned into two categories: differential, or undifferenced. The differential techniques (e.g., real-time kinematic (RTK)) have several attractive properties, such as correlated error mitigation and fast convergence; however, to support a differential processing scheme, an infrastructure of reference stations within a proximity of the platform must be in place to construct observation corrections. This infrastructure requirement makes differential processing techniques infeasible in many locations. To mitigate the need for additional receivers within proximity of the platform, the precise point positioning (PPP) method utilizes accurate orbit and clock models to localize the platform. The autonomy of PPP from local reference stations make it an attractive processing scheme for several applications; however, a current disadvantage of PPP is the slow positioning convergence when compared to differential techniques. In this paper, we evaluate the convergence properties of PPP with an incremental graph optimization scheme (Incremental Smoothing and Mapping (iSAM2)), which allows for real-time filtering and smoothing. The characterization is first conducted through a Monte Carlo analysis within a simulation environment, which allows for the variations of parameters, such as atmospheric conditions, satellite geometry, and intensity of multipath. Then, an example collected data set is utilized to validate the trends presented in the simulation study.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
Using the Buffer to Avoid Rebuffers: Evidence from a Large Video Streaming Service
Authors:
Te-Yuan Huang,
Ramesh Johari,
Nick McKeown,
Matthew Trunnell,
Mark Watson
Abstract:
To provide a better streaming experience, video clients today select their video rates by observing and estimating the available capacity. Recent work has shown that capacity estimation is fraught with difficulties because of complex interactions between the ABR control loop, HTTP server performance and TCP congestion control. Estimation-based rate selection algorithms can lead to unnecessary rebu…
▽ More
To provide a better streaming experience, video clients today select their video rates by observing and estimating the available capacity. Recent work has shown that capacity estimation is fraught with difficulties because of complex interactions between the ABR control loop, HTTP server performance and TCP congestion control. Estimation-based rate selection algorithms can lead to unnecessary rebuffering events and suboptimal video quality. This paper argues that we should do away with estimating network capacity, and instead directly observe and control the playback buffer--which is the state variable we are most interested in controlling. We present a class of "buffer-based" rate selection algorithms that reduce the rebuffering rate while allowing us to control the delivered video quality. We implemented our algorithms inside the Netflix video client and ran a series of experiments spanning millions of Netflix users around the world. Our results show that by doing away with estimating network capacity and instead focusing on buffer occupancy, we can reduce rebuffer rates by 20% while holding video rate constant.
△ Less
Submitted 9 January, 2014;
originally announced January 2014.