Search | arXiv e-print repository

arXiv:2507.06261 [pdf, ps, other]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving. △ Less

Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

Comments: 72 pages, 17 figures

arXiv:2505.24262 [pdf, ps, other]

On Fairness of Task Arithmetic: The Role of Task Vectors

Authors: Hiroki Naganuma, Kotaro Yoshida, Laura Gomezjurado Gonzalez, Takafumi Horie, Yuji Naraki, Ryotaro Shimizu

Abstract: Model editing techniques, particularly task arithmetic using task vectors, have shown promise in efficiently modifying pre-trained models through arithmetic operations like task addition and negation. Despite computational advantages, these methods may inadvertently affect model fairness, creating risks in sensitive applications like hate speech detection. However, the fairness implications of tas… ▽ More Model editing techniques, particularly task arithmetic using task vectors, have shown promise in efficiently modifying pre-trained models through arithmetic operations like task addition and negation. Despite computational advantages, these methods may inadvertently affect model fairness, creating risks in sensitive applications like hate speech detection. However, the fairness implications of task arithmetic remain largely unexplored, presenting a critical gap in the existing literature. We systematically examine how manipulating task vectors affects fairness metrics, including Demographic Parity and Equalized Odds. To rigorously assess these effects, we benchmark task arithmetic against full fine-tuning, a costly but widely used baseline, and Low-Rank Adaptation (LoRA), a prevalent parameter-efficient fine-tuning method. Additionally, we explore merging task vectors from models fine-tuned on demographic subgroups vulnerable to hate speech, investigating whether fairness outcomes can be controlled by adjusting task vector coefficients, potentially enabling tailored model behavior. Our results offer novel insights into the fairness implications of model editing and establish a foundation for fairness-aware and responsible model editing practices. △ Less

Submitted 30 May, 2025; originally announced May 2025.

arXiv:2504.06189 [pdf, other]

Accessible and Pedagogically-Grounded Explainability for Human-Robot Interaction: A Framework Based on UDL and Symbolic Interfaces

Authors: Francisco J. Rodríguez Lera, Raquel Fernández Hernández, Sonia Lopez González, Miguel Angel González-Santamarta, Francisco Jesús Rodríguez Sedano, Camino Fernandez Llamas

Abstract: This paper presents a novel framework for accessible and pedagogically-grounded robot explainability, designed to support human-robot interaction (HRI) with users who have diverse cognitive, communicative, or learning needs. We combine principles from Universal Design for Learning (UDL) and Universal Design (UD) with symbolic communication strategies to facilitate the alignment of mental models be… ▽ More This paper presents a novel framework for accessible and pedagogically-grounded robot explainability, designed to support human-robot interaction (HRI) with users who have diverse cognitive, communicative, or learning needs. We combine principles from Universal Design for Learning (UDL) and Universal Design (UD) with symbolic communication strategies to facilitate the alignment of mental models between humans and robots. Our approach employs Asterics Grid and ARASAAC pictograms as a multimodal, interpretable front-end, integrated with a lightweight HTTP-to-ROS 2 bridge that enables real-time interaction and explanation triggering. We emphasize that explainability is not a one-way function but a bidirectional process, where human understanding and robot transparency must co-evolve. We further argue that in educational or assistive contexts, the role of a human mediator (e.g., a teacher) may be essential to support shared understanding. We validate our framework with examples of multimodal explanation boards and discuss how it can be extended to different scenarios in education, assistive robotics, and inclusive AI. △ Less

Submitted 8 April, 2025; originally announced April 2025.

Comments: 6 pages, 6 figures

arXiv:2504.02622 [pdf]

Exploring undercurrents of learning tensions in an LLM-enhanced landscape: A student-centered qualitative perspective on LLM vs Search

Authors: Rahul R. Divekar, Sophia Guerra, Lisette Gonzalez, Natasha Boos, Helen Zhou

Abstract: Large language models (LLMs) are transforming how students learn by providing readily available tools that can quickly augment or complete various learning activities with non-trivial performance. Similar paradigm shifts have occurred in the past with the introduction of search engines and Wikipedia, which replaced or supplemented traditional information sources such as libraries and books. This s… ▽ More Large language models (LLMs) are transforming how students learn by providing readily available tools that can quickly augment or complete various learning activities with non-trivial performance. Similar paradigm shifts have occurred in the past with the introduction of search engines and Wikipedia, which replaced or supplemented traditional information sources such as libraries and books. This study investigates the potential for LLMs to represent the next shift in learning, focusing on their role in information discovery and synthesis compared to existing technologies, such as search engines. Using a within-subjects, counterbalanced design, participants learned new topics using a search engine (Google) and an LLM (ChatGPT). Post-task follow-up interviews explored students' reflections, preferences, pain points, and overall perceptions. We present analysis of their responses that show nuanced insights into when, why, and how students prefer LLMs over search engines, offering implications for educators, policymakers, and technology developers navigating the evolving educational landscape. △ Less

Submitted 3 April, 2025; originally announced April 2025.

arXiv:2503.19786 [pdf, other]

Gemma 3 Technical Report

Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community. △ Less

Submitted 25 March, 2025; originally announced March 2025.

arXiv:2503.16539 [pdf, other]

A Digital Twin Simulator of a Pastillation Process with Applications to Automatic Control based on Computer Vision

Authors: Leonardo D. González, Joshua L. Pulsipher, Shengli Jiang, Tyler Soderstrom, Victor M. Zavala

Abstract: We present a digital-twin simulator for a pastillation process. The simulation framework produces realistic thermal image data of the process that is used to train computer vision-based soft sensors based on convolutional neural networks (CNNs); the soft sensors produce output signals for temperature and product flow rate that enable real-time monitoring and feedback control. Pastillation technolo… ▽ More We present a digital-twin simulator for a pastillation process. The simulation framework produces realistic thermal image data of the process that is used to train computer vision-based soft sensors based on convolutional neural networks (CNNs); the soft sensors produce output signals for temperature and product flow rate that enable real-time monitoring and feedback control. Pastillation technologies are high-throughput devices that are used in a broad range of industries; these processes face operational challenges such as real-time identification of clog locations (faults) in the rotating shell and the automatic, real-time adjustment of conveyor belt speed and operating conditions to stabilize output. The proposed simulator is able to capture this behavior and generates realistic data that can be used to benchmark different algorithms for image processing and different control architectures. We present a case study to illustrate the capabilities; the study explores behavior over a range of equipment sizes, clog locations, and clog duration. A feedback controller (tuned using Bayesian optimization) is used to adjust the conveyor belt speed based on the CNN output signal to achieve the desired process outputs. △ Less

Submitted 18 March, 2025; originally announced March 2025.

arXiv:2501.16974 [pdf, other]

Excited-state nonadiabatic dynamics in explicit solvent using machine learned interatomic potentials

Authors: Maximilian X. Tiefenbacher, Brigitta Bachmair, Cheng Giuseppe Chen, Julia Westermayr, Philipp Marquetand, Johannes C. B. Dietschreit, Leticia González

Abstract: Excited-state nonadiabatic simulations with quantum mechanics/molecular mechanics (QM/MM) are essential to understand photoinduced processes in explicit environments. However, the high computational cost of the underlying quantum chemical calculations limits its application in combination with trajectory surface hopping methods. Here, we use FieldSchNet, a machine-learned interatomic potential cap… ▽ More Excited-state nonadiabatic simulations with quantum mechanics/molecular mechanics (QM/MM) are essential to understand photoinduced processes in explicit environments. However, the high computational cost of the underlying quantum chemical calculations limits its application in combination with trajectory surface hopping methods. Here, we use FieldSchNet, a machine-learned interatomic potential capable of incorporating electric field effects into the electronic states, to replace traditional QM/MM electrostatic embedding with its ML/MM counterpart for nonadiabatic excited state trajectories. The developed method is applied to furan in water, including five coupled singlet states. Our results demonstrate that with sufficiently curated training data, the ML/MM model reproduces the electronic kinetics and structural rearrangements of QM/MM surface hopping reference simulations. Furthermore, we identify performance metrics that provide robust and interpretable validation of model accuracy. △ Less

Submitted 28 January, 2025; originally announced January 2025.

arXiv:2501.11157 [pdf, other]

doi 10.1016/j.dam.2024.12.027

On the thinness of trees

Authors: Flavia Bonomo-Braberman, Eric Brandwein, Carolina Lucía González, Agustín Sansone

Abstract: The study of structural graph width parameters like tree-width, clique-width and rank-width has been ongoing during the last five decades, and their algorithmic use has also been increasing [Cygan et al., 2015]. New width parameters continue to be defined, for example, MIM-width in 2012, twin-width in 2020, and mixed-thinness, a generalization of thinness, in 2022. The concept of thinness of a g… ▽ More The study of structural graph width parameters like tree-width, clique-width and rank-width has been ongoing during the last five decades, and their algorithmic use has also been increasing [Cygan et al., 2015]. New width parameters continue to be defined, for example, MIM-width in 2012, twin-width in 2020, and mixed-thinness, a generalization of thinness, in 2022. The concept of thinness of a graph was introduced in 2007 by Mannino, Oriolo, Ricci and Chandran, and it can be seen as a generalization of interval graphs, which are exactly the graphs with thinness equal to one. This concept is interesting because if a representation of a graph as a $k$-thin graph is given for a constant value $k$, then several known NP-complete problems can be solved in polynomial time. Some examples are the maximum weighted independent set problem, solved in the seminal paper by Mannino et al., and the capacitated coloring with fixed number of colors [Bonomo, Mattia and Oriolo, 2011]. In this work we present a constructive $O(n\log(n))$-time algorithm to compute the thinness for any given $n$-vertex tree, along with a corresponding thin representation. We use intermediate results of this construction to improve known bounds of the thinness of some special families of trees. △ Less

Submitted 22 January, 2025; v1 submitted 19 January, 2025; originally announced January 2025.

Comments: 46 pages, 7 figures

MSC Class: 68R10 ACM Class: G.2.2

Journal ref: Discrete Applied Mathematics, Volume 365, 15 April 2025, Pages 39-60 Discrete Applied Mathematics, Volume 365, 2025, Pages 39-60,

arXiv:2501.05882 [pdf, other]

doi 10.1016/j.entcom.2024.100652

Solving nonograms using Neural Networks

Authors: José María Buades Rubio, Antoni Jaume-i-Capó, David López González, Gabriel Moyà Alcover

Abstract: Nonograms are logic puzzles in which cells in a grid must be colored or left blank according to the numbers that are located in its headers. In this study, we analyze different techniques to solve this type of logical problem using an Heuristic Algorithm, Genetic Algorithm, and Heuristic Algorithm with Neural Network. Furthermore, we generate a public dataset to train the neural networks. We publi… ▽ More Nonograms are logic puzzles in which cells in a grid must be colored or left blank according to the numbers that are located in its headers. In this study, we analyze different techniques to solve this type of logical problem using an Heuristic Algorithm, Genetic Algorithm, and Heuristic Algorithm with Neural Network. Furthermore, we generate a public dataset to train the neural networks. We published this dataset and the code of the algorithms. Combination of the heuristic algorithm with a neural network obtained the best results. From state of the art review, no previous works used neural network to solve nonograms, nor combined a network with other algorithms to accelerate the resolution process. △ Less

Submitted 10 January, 2025; originally announced January 2025.

Journal ref: Entertainment Computing 50 (2024): 100652

arXiv:2501.00967 [pdf, other]

On the Implementation of a Bayesian Optimization Framework for Interconnected Systems

Authors: Leonardo D. González, Victor M. Zavala

Abstract: Bayesian optimization (BO) is an effective paradigm for the optimization of expensive-to-sample systems. Standard BO learns the performance of a system $f(x)$ by using a Gaussian Process (GP) model; this treats the system as a black-box and limits its ability to exploit available structural knowledge (e.g., physics and sparse interconnections in a complex system). Grey-box modeling, wherein the pe… ▽ More Bayesian optimization (BO) is an effective paradigm for the optimization of expensive-to-sample systems. Standard BO learns the performance of a system $f(x)$ by using a Gaussian Process (GP) model; this treats the system as a black-box and limits its ability to exploit available structural knowledge (e.g., physics and sparse interconnections in a complex system). Grey-box modeling, wherein the performance function is treated as a composition of known and unknown intermediate functions $f(x, y(x))$ (where $y(x)$ is a GP model) offers a solution to this limitation; however, generating an analytical probability density for $f$ from the Gaussian density of $y(x)$ is often an intractable problem (e.g., when $f$ is nonlinear). Previous work has handled this issue by using sampling techniques or by solving an auxiliary problem over an augmented space where the values of $y(x)$ are constrained by confidence intervals derived from the GP models; such solutions are computationally intensive. In this work, we provide a detailed implementation of a recently proposed grey-box BO paradigm, BOIS, that uses adaptive linearizations of $f$ to obtain analytical expressions for the statistical moments of the composite function. We show that the BOIS approach enables the exploitation of structural knowledge, such as that arising in interconnected systems as well as systems that embed multiple GP models and combinations of physics and GP models. We benchmark the effectiveness of BOIS against standard BO and existing grey-box BO algorithms using a pair of case studies focused on chemical process optimization and design. Our results indicate that BOIS performs as well as or better than existing grey-box methods, while also being less computationally intensive. △ Less

Submitted 1 January, 2025; originally announced January 2025.

Comments: 32 pages, 12 figures

arXiv:2412.01983 [pdf, other]

Smart Parking with Pixel-Wise ROI Selection for Vehicle Detection Using YOLOv8, YOLOv9, YOLOv10, and YOLOv11

Authors: Gustavo P. C. P. da Luz, Gabriel Massuyoshi Sato, Luis Fernando Gomez Gonzalez, Juliana Freitag Borin

Abstract: The increasing urbanization and the growing number of vehicles in cities have underscored the need for efficient parking management systems. Traditional smart parking solutions often rely on sensors or cameras for occupancy detection, each with its limitations. Recent advancements in deep learning have introduced new YOLO models (YOLOv8, YOLOv9, YOLOv10, and YOLOv11), but these models have not bee… ▽ More The increasing urbanization and the growing number of vehicles in cities have underscored the need for efficient parking management systems. Traditional smart parking solutions often rely on sensors or cameras for occupancy detection, each with its limitations. Recent advancements in deep learning have introduced new YOLO models (YOLOv8, YOLOv9, YOLOv10, and YOLOv11), but these models have not been extensively evaluated in the context of smart parking systems, particularly when combined with Region of Interest (ROI) selection for object detection. Existing methods still rely on fixed polygonal ROI selections or simple pixel-based modifications, which limit flexibility and precision. This work introduces a novel approach that integrates Internet of Things, Edge Computing, and Deep Learning concepts, by using the latest YOLO models for vehicle detection. By exploring both edge and cloud computing, it was found that inference times on edge devices ranged from 1 to 92 seconds, depending on the hardware and model version. Additionally, a new pixel-wise post-processing ROI selection method is proposed for accurately identifying regions of interest to count vehicles in parking lot images. The proposed system achieved 99.68% balanced accuracy on a custom dataset of 3,484 images, offering a cost-effective smart parking solution that ensures precise vehicle detection while preserving data privacy △ Less

Submitted 6 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

Comments: Submitted to Elsevier Internet of Things, 22 pages, 11 figures, 6 tables

arXiv:2409.13051 [pdf, ps, other]

Choosing Between an LLM versus Search for Learning: A HigherEd Student Perspective

Authors: Rahul R. Divekar, Sophia Guerra, Lisette Gonzalez, Natasha Boos

Abstract: Large language models (LLMs) are rapidly changing learning processes, as they are readily available to students and quickly complete or augment several learning-related activities with non-trivial performance. Such major shifts in learning dynamic have previously occurred when search engines and Wikipedia were introduced, and they augmented or traditional information consumption sources such as li… ▽ More Large language models (LLMs) are rapidly changing learning processes, as they are readily available to students and quickly complete or augment several learning-related activities with non-trivial performance. Such major shifts in learning dynamic have previously occurred when search engines and Wikipedia were introduced, and they augmented or traditional information consumption sources such as libraries and books for university students. We investigate the possibility of the next shift: the use of LLMs to find and digest information in the context of learning and how they relate to existing technologies such as the search engine. We conducted a study where students were asked to learn new topics using a search engine and an LLM in a within-subjects counterbalanced design. We used that study as a contextual grounding for a post-experience follow-up interview where we elicited student reflections, preferences, pain points, and general outlook of an LLM (ChatGPT) over a search engine (Google). △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.01792 [pdf]

Three-dimensional geometric resolution of the inverse kinematics of a 7 degree of freedom articulated arm

Authors: Antonio Losada González

Abstract: This work presents a three-dimensional geometric resolution method to calculate the complete inverse kinematics of a 7-degree-of-freedom articulated arm, including the hand itself. The method is classified as an analytical method with geometric solution, since it obtains a precise solution in a closed number of steps, converting the inverse kinematic problem into a three-dimensional geometric mode… ▽ More This work presents a three-dimensional geometric resolution method to calculate the complete inverse kinematics of a 7-degree-of-freedom articulated arm, including the hand itself. The method is classified as an analytical method with geometric solution, since it obtains a precise solution in a closed number of steps, converting the inverse kinematic problem into a three-dimensional geometric model. To simplify the problem, the kinematic decoupling method is used, so that the position of the wrist is calculated independently on one hand with information on the orientation of the hand, and the angles of the rest of the arm are calculated from the wrist. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: in Spanish language

arXiv:2409.01617 [pdf]

High Precision Positioning System

Authors: Antonio Losada González

Abstract: SAPPO is a high-precision, low-cost and highly scalable indoor localization system. The system is designed using modified HC-SR04 ultrasound transducers as a base to be used as distance meters between beacons and mobile robots. Additionally, it has a very unusual arrangement of its elements, such that the beacons and the array of transmitters of the mobile robot are located in very close planes, i… ▽ More SAPPO is a high-precision, low-cost and highly scalable indoor localization system. The system is designed using modified HC-SR04 ultrasound transducers as a base to be used as distance meters between beacons and mobile robots. Additionally, it has a very unusual arrangement of its elements, such that the beacons and the array of transmitters of the mobile robot are located in very close planes, in a horizontal emission arrangement, parallel to the ground, achieving a range per transducer of almost 12 meters. SAPPO represents a significant leap forward in ultrasound localization systems, in terms of reducing the density of beacons while maintaining average precision in the millimeter range. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: in Spanish language

arXiv:2409.01176 [pdf]

Space module with gyroscope and accelerometer integration

Authors: Antonio Losada González

Abstract: MEIGA is a module specially designed for people with tetraplegia or anyone who has very limited movement capacity in their upper limbs. MEIGA converts the user's head movements into mouse movements. To simulate keystrokes, it uses blinking, reading the movement of the cheek that occurs with it. The performance, speed of movement of the mouse and its precision are practically equivalent to their re… ▽ More MEIGA is a module specially designed for people with tetraplegia or anyone who has very limited movement capacity in their upper limbs. MEIGA converts the user's head movements into mouse movements. To simulate keystrokes, it uses blinking, reading the movement of the cheek that occurs with it. The performance, speed of movement of the mouse and its precision are practically equivalent to their respective measurements using the hand. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: in Spanish language

arXiv:2408.17314 [pdf]

XULIA -- Comprehensive control system for Windows$^{tm}$ devices designed for people with tetraplegia

Authors: Antonio Losada Gonzalez

Abstract: XULIA is a comprehensive control system for Windows computers designed specifically to be used by quadriplegic people or people who do not have the ability to move their upper limbs accurately. XULIA allows you to manage all the functions necessary to control all Windows functions using only your voice. As a voice-to-text transcription system, it uses completely free modules combining the Windows… ▽ More XULIA is a comprehensive control system for Windows computers designed specifically to be used by quadriplegic people or people who do not have the ability to move their upper limbs accurately. XULIA allows you to manage all the functions necessary to control all Windows functions using only your voice. As a voice-to-text transcription system, it uses completely free modules combining the Windows SAPI voice recognition libraries for command recognition with Google's cloud-based voice recognition systems indirectly through a Google Chrome browser, which allows you to use Google's paid voice-to-text transcription services completely free of charge. XULIA manages multiple grammars simultaneously with automatic activation to ensure that the set of commands to be recognized is reduced to a minimum at all times, which allows false positives in command recognition to be reduced to a minimum. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: in Spanish language

arXiv:2408.16726 [pdf]

Bipedal locomotion using geometric techniques

Authors: Antonio Losada Gonzalez, Manuel Perez Cota

Abstract: This article describes a bipedal walking algorithm with inverse kinematics resolution based solely on geometric methods, so that all mathematical concepts are explained from the base, in order to clarify the reason for this solution. To do so, it has been necessary to simplify the problem and carry out didactic work to distribute content. In general, the articles related to this topic use matrix s… ▽ More This article describes a bipedal walking algorithm with inverse kinematics resolution based solely on geometric methods, so that all mathematical concepts are explained from the base, in order to clarify the reason for this solution. To do so, it has been necessary to simplify the problem and carry out didactic work to distribute content. In general, the articles related to this topic use matrix systems to solve both direct and inverse kinematics, using complex techniques such as decoupling or the Jacobian calculation. By simplifying the walking process, its resolution has been proposed in a simple way using only geometric techniques. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: in Spanish language

arXiv:2406.14150 [pdf, other]

Multi-modal Transfer Learning between Biological Foundation Models

Authors: Juan Jose Garau-Luis, Patrick Bordes, Liam Gonzalez, Masa Roller, Bernardo P. de Almeida, Lorenz Hexemer, Christopher Blum, Stefan Laurent, Jan Grzegorzewski, Maren Lang, Thomas Pierrot, Guillaume Richard

Abstract: Biological sequences encode fundamental instructions for the building blocks of life, in the form of DNA, RNA, and proteins. Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great promise in solving certain biological tasks but current approaches are limited to a single sequence moda… ▽ More Biological sequences encode fundamental instructions for the building blocks of life, in the form of DNA, RNA, and proteins. Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great promise in solving certain biological tasks but current approaches are limited to a single sequence modality (DNA, RNA, or protein). Key problems in genomics intrinsically involve multiple modalities, but it remains unclear how to adapt general-purpose sequence models to those cases. In this work we propose a multi-modal model that connects DNA, RNA, and proteins by leveraging information from different pre-trained modality-specific encoders. We demonstrate its capabilities by applying it to the largely unsolved problem of predicting how multiple RNA transcript isoforms originate from the same gene (i.e. same DNA sequence) and map to different transcription expression levels across various human tissues. We show that our model, dubbed IsoFormer, is able to accurately predict differential transcript expression, outperforming existing methods and leveraging the use of multiple modalities. Our framework also achieves efficient transfer knowledge from the encoders pre-training as well as in between modalities. We open-source our model, paving the way for new multi-modal gene expression approaches. △ Less

Submitted 20 June, 2024; originally announced June 2024.

MSC Class: 68T07 (Primary)

arXiv:2405.16343 [pdf, other]

doi 10.23919/EUSIPCO63174.2024.10715342

Learning Point Spread Function Invertibility Assessment for Image Deconvolution

Authors: Romario Gualdrón-Hurtado, Roman Jacome, Sergio Urrea, Henry Arguello, Luis Gonzalez

Abstract: Deep-learning (DL)-based image deconvolution (ID) has exhibited remarkable recovery performance, surpassing traditional linear methods. However, unlike traditional ID approaches that rely on analytical properties of the point spread function (PSF) to achieve high recovery performance - such as specific spectrum properties or small conditional numbers in the convolution matrix - DL techniques lack… ▽ More Deep-learning (DL)-based image deconvolution (ID) has exhibited remarkable recovery performance, surpassing traditional linear methods. However, unlike traditional ID approaches that rely on analytical properties of the point spread function (PSF) to achieve high recovery performance - such as specific spectrum properties or small conditional numbers in the convolution matrix - DL techniques lack quantifiable metrics for evaluating PSF suitability for DL-assisted recovery. Aiming to enhance deconvolution quality, we propose a metric that employs a non-linear approach to learn the invertibility of an arbitrary PSF using a neural network by mapping it to a unit impulse. A lower discrepancy between the mapped PSF and a unit impulse indicates a higher likelihood of successful inversion by a DL network. Our findings reveal that this metric correlates with high recovery performance in DL and traditional methods, thereby serving as an effective regularizer in deconvolution tasks. This approach reduces the computational complexity over conventional condition number assessments and is a differentiable process. These useful properties allow its application in designing diffractive optical elements through end-to-end (E2E) optimization, achieving invertible PSFs, and outperforming the E2E baseline framework. △ Less

Submitted 27 January, 2025; v1 submitted 25 May, 2024; originally announced May 2024.

Comments: Accepted at the 2024 32nd European Signal Processing Conference (EUSIPCO), 2024

MSC Class: 68T10 (Pattern Recognition); 94A08 (Image Processing) ACM Class: I.4.5

Journal ref: Proceedings of the 2024 32nd European Signal Processing Conference (EUSIPCO), 2024, pp. 501-505

arXiv:2403.05921 [pdf, other]

OntoChat: a Framework for Conversational Ontology Engineering using Language Models

Authors: Bohui Zhang, Valentina Anita Carriero, Katrin Schreiberhuber, Stefani Tsaneva, Lucía Sánchez González, Jongmo Kim, Jacopo de Berardinis

Abstract: Ontology engineering (OE) in large projects poses a number of challenges arising from the heterogeneous backgrounds of the various stakeholders, domain experts, and their complex interactions with ontology designers. This multi-party interaction often creates systematic ambiguities and biases from the elicitation of ontology requirements, which directly affect the design, evaluation and may jeopar… ▽ More Ontology engineering (OE) in large projects poses a number of challenges arising from the heterogeneous backgrounds of the various stakeholders, domain experts, and their complex interactions with ontology designers. This multi-party interaction often creates systematic ambiguities and biases from the elicitation of ontology requirements, which directly affect the design, evaluation and may jeopardise the target reuse. Meanwhile, current OE methodologies strongly rely on manual activities (e.g., interviews, discussion pages). After collecting evidence on the most crucial OE activities, we introduce \textbf{OntoChat}, a framework for conversational ontology engineering that supports requirement elicitation, analysis, and testing. By interacting with a conversational agent, users can steer the creation of user stories and the extraction of competency questions, while receiving computational support to analyse the overall requirements and test early versions of the resulting ontologies. We evaluate OntoChat by replicating the engineering of the Music Meta Ontology, and collecting preliminary metrics on the effectiveness of each component from users. We release all code at https://github.com/King-s-Knowledge-Graph-Lab/OntoChat. △ Less

Submitted 26 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: ESWC 2024 Special Track on Large Language Models for Knowledge Engineering

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.15391 [pdf, other]

Genie: Generative Interactive Environments

Authors: Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel

Abstract: We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotem… ▽ More We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: https://sites.google.com/corp/view/genie-2024/

arXiv:2402.09066 [pdf]

doi 10.1016/j.wasman.2024.08.003

Solid Waste Detection, Monitoring and Mapping in Remote Sensing Images: A Survey

Authors: Piero Fraternali, Luca Morandini, Sergio Luis Herrera González

Abstract: The detection and characterization of illegal solid waste disposal sites are essential for environmental protection, particularly for mitigating pollution and health hazards. Improperly managed landfills contaminate soil and groundwater via rainwater infiltration, posing threats to both animals and humans. Traditional landfill identification approaches, such as on-site inspections, are time-consum… ▽ More The detection and characterization of illegal solid waste disposal sites are essential for environmental protection, particularly for mitigating pollution and health hazards. Improperly managed landfills contaminate soil and groundwater via rainwater infiltration, posing threats to both animals and humans. Traditional landfill identification approaches, such as on-site inspections, are time-consuming and expensive. Remote sensing is a cost-effective solution for the identification and monitoring of solid waste disposal sites that enables broad coverage and repeated acquisitions over time. Earth Observation (EO) satellites, equipped with an array of sensors and imaging capabilities, have been providing high-resolution data for several decades. Researchers proposed specialized techniques that leverage remote sensing imagery to perform a range of tasks such as waste site detection, dumping site monitoring, and assessment of suitable locations for new landfills. This review aims to provide a detailed illustration of the most relevant proposals for the detection and monitoring of solid waste sites by describing and comparing the approaches, the implemented techniques, and the employed data. Furthermore, since the data sources are of the utmost importance for developing an effective solid waste detection model, a comprehensive overview of the satellites and publicly available data sets is presented. Finally, this paper identifies the open issues in the state-of-the-art and discusses the relevant research directions for reducing the costs and improving the effectiveness of novel solid waste detection methods. △ Less

Submitted 13 December, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Journal ref: Waste Management 189 (2024) 88-102

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.11254 [pdf, other]

BOIS: Bayesian Optimization of Interconnected Systems

Authors: Leonardo D. González, Victor M. Zavala

Abstract: Bayesian optimization (BO) has proven to be an effective paradigm for the global optimization of expensive-to-sample systems. One of the main advantages of BO is its use of Gaussian processes (GPs) to characterize model uncertainty which can be leveraged to guide the learning and search process. However, BO typically treats systems as black-boxes and this limits the ability to exploit structural k… ▽ More Bayesian optimization (BO) has proven to be an effective paradigm for the global optimization of expensive-to-sample systems. One of the main advantages of BO is its use of Gaussian processes (GPs) to characterize model uncertainty which can be leveraged to guide the learning and search process. However, BO typically treats systems as black-boxes and this limits the ability to exploit structural knowledge (e.g., physics and sparse interconnections). Composite functions of the form $f(x, y(x))$, wherein GP modeling is shifted from the performance function $f$ to an intermediate function $y$, offer an avenue for exploiting structural knowledge. However, the use of composite functions in a BO framework is complicated by the need to generate a probability density for $f$ from the Gaussian density of $y$ calculated by the GP (e.g., when $f$ is nonlinear it is not possible to obtain a closed-form expression). Previous work has handled this issue using sampling techniques; these are easy to implement and flexible but are computationally intensive. In this work, we introduce a new paradigm which allows for the efficient use of composite functions in BO; this uses adaptive linearizations of $f$ to obtain closed-form expressions for the statistical moments of the composite function. We show that this simple approach (which we call BOIS) enables the exploitation of structural knowledge, such as that arising in interconnected systems as well as systems that embed multiple GP models and combinations of physics and GP models. Using a chemical process optimization case study, we benchmark the effectiveness of BOIS against standard BO and sampling approaches. Our results indicate that BOIS achieves performance gains and accurately captures the statistics of composite functions. △ Less

Submitted 28 November, 2023; v1 submitted 19 November, 2023; originally announced November 2023.

Comments: 6 pages, 5 figures

arXiv:2305.10403 [pdf, other]

PaLM 2 Technical Report

Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report. △ Less

Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2301.07608 [pdf, other]

Human-Timescale Adaptation in an Open-Ended Task Space

Authors: Adaptive Agent Team, Jakob Bauer, Kate Baumli, Satinder Baveja, Feryal Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Jakub Sygnowski, Karl Tuyls , et al. (3 additional authors not shown)

Abstract: Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a… ▽ More Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that prioritises tasks at the frontier of an agent's capabilities. We demonstrate characteristic scaling laws with respect to network size, memory length, and richness of the training task distribution. We believe our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2210.01071 [pdf, other]

doi 10.1016/j.compchemeng.2022.108110

New Paradigms for Exploiting Parallel Experiments in Bayesian Optimization

Authors: Leonardo D. González, Victor M. Zavala

Abstract: Bayesian optimization (BO) is one of the most effective methods for closed-loop experimental design and black-box optimization. However, a key limitation of BO is that it is an inherently sequential algorithm (one experiment is proposed per round) and thus cannot directly exploit high-throughput (parallel) experiments. Diverse modifications to the BO framework have been proposed in the literature… ▽ More Bayesian optimization (BO) is one of the most effective methods for closed-loop experimental design and black-box optimization. However, a key limitation of BO is that it is an inherently sequential algorithm (one experiment is proposed per round) and thus cannot directly exploit high-throughput (parallel) experiments. Diverse modifications to the BO framework have been proposed in the literature to enable exploitation of parallel experiments but such approaches are limited in the degree of parallelization that they can achieve and can lead to redundant experiments (thus wasting resources and potentially compromising performance). In this work, we present new parallel BO paradigms that exploit the structure of the system to partition the design space. Specifically, we propose an approach that partitions the design space by following the level sets of the performance function and an approach that exploits partially-separable structures of the performance function found. We conduct extensive numerical experiments using a reactor case study to benchmark the effectiveness of these approaches against a variety of state-of-the-art parallel algorithms reported in the literature. Our computational results show that our approaches significantly reduce the required search time and increase the probability of finding a global (rather than local) solution. △ Less

Submitted 9 December, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: 36 pages, 16 figures, 8 algorithms

arXiv:2209.14978 [pdf, other]

Enumeration of max-pooling responses with generalized permutohedra

Authors: Laura Escobar, Patricio Gallardo, Javier González-Anaya, José L. González, Guido Montúfar, Alejandro H. Morales

Abstract: We investigate the combinatorics of max-pooling layers, which are functions that downsample input arrays by taking the maximum over shifted windows of input coordinates, and which are commonly used in convolutional neural networks. We obtain results on the number of linearity regions of these functions by equivalently counting the number of vertices of certain Minkowski sums of simplices. We chara… ▽ More We investigate the combinatorics of max-pooling layers, which are functions that downsample input arrays by taking the maximum over shifted windows of input coordinates, and which are commonly used in convolutional neural networks. We obtain results on the number of linearity regions of these functions by equivalently counting the number of vertices of certain Minkowski sums of simplices. We characterize the faces of such polytopes and obtain generating functions and closed formulas for the number of vertices and facets in a 1D max-pooling layer depending on the size of the pooling windows and stride, and for the number of vertices in a special case of 2D max-pooling. △ Less

Submitted 23 September, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: 35 pages, 11 figures, 4 tables. V2: Improved exposition, added computations in Section 4, and expanded analysis of data

MSC Class: 05A15; 52B05; 68T07 (Primary) 05A05; 05A16; 06A07 (Secondary)

arXiv:2207.09669 [pdf, other]

Efficient Dependency Analysis for Rule-Based Ontologies

Authors: Larry González, Alex Ivliev, Markus Krötzsch, Stephan Mennicke

Abstract: Several types of dependencies have been proposed for the static analysis of existential rule ontologies, promising insights about computational properties and possible practical uses of a given set of rules, e.g., in ontology-based query answering. Unfortunately, these dependencies are rarely implemented, so their potential is hardly realised in practice. We focus on two kinds of rule dependencies… ▽ More Several types of dependencies have been proposed for the static analysis of existential rule ontologies, promising insights about computational properties and possible practical uses of a given set of rules, e.g., in ontology-based query answering. Unfortunately, these dependencies are rarely implemented, so their potential is hardly realised in practice. We focus on two kinds of rule dependencies -- positive reliances and restraints -- and design and implement optimised algorithms for their efficient computation. Experiments on real-world ontologies of up to more than 100,000 rules show the scalability of our approach, which lets us realise several previously proposed applications as practical case studies. In particular, we can analyse to what extent rule-based bottom-up approaches of reasoning can be guaranteed to yield redundancy-free "lean" knowledge graphs (so-called cores) on practical ontologies. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: Extended report of our ISWC 2022 paper

arXiv:2207.06591 [pdf, other]

A methodology to characterize bias and harmful stereotypes in natural language processing in Latin America

Authors: Laura Alonso Alemany, Luciana Benotti, Hernán Maina, Lucía González, Mariela Rajngewerc, Lautaro Martínez, Jorge Sánchez, Mauro Schilman, Guido Ivetta, Alexia Halvorsen, Amanda Mata Rojo, Matías Bordone, Beatriz Busaniche

Abstract: Automated decision-making systems, especially those based on natural language processing, are pervasive in our lives. They are not only behind the internet search engines we use daily, but also take more critical roles: selecting candidates for a job, determining suspects of a crime, diagnosing autism and more. Such automated systems make errors, which may be harmful in many ways, be it because of… ▽ More Automated decision-making systems, especially those based on natural language processing, are pervasive in our lives. They are not only behind the internet search engines we use daily, but also take more critical roles: selecting candidates for a job, determining suspects of a crime, diagnosing autism and more. Such automated systems make errors, which may be harmful in many ways, be it because of the severity of the consequences (as in health issues) or because of the sheer number of people they affect. When errors made by an automated system affect a population more than others, we call the system \textit{biased}. Most modern natural language technologies are based on artifacts obtained from enormous volumes of text using machine learning, namely language models and word embeddings. Since they are created by applying subsymbolic machine learning, mostly artificial neural networks, they are opaque and practically uninterpretable by direct inspection, thus making it very difficult to audit them. In this paper, we present a methodology that spells out how social scientists, domain experts, and machine learning experts can collaboratively explore biases and harmful stereotypes in word embeddings and large language models. Our methodology is based on the following principles: * focus on the linguistic manifestations of discrimination on word embeddings and language models, not on the mathematical properties of the models * reduce the technical barrier for discrimination experts%, be it social scientists, domain experts or other * characterize through a qualitative exploratory process in addition to a metric-based approach * address mitigation as part of the training process, not as an afterthought △ Less

Submitted 28 March, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

arXiv:2203.15724 [pdf, ps, other]

On $d$-stable locally checkable problems parameterized by mim-width

Authors: Carolina Lucía Gonzalez, Felix Mann

Abstract: In this paper we continue the study of locally checkable problems under the framework introduced by Bonomo-Braberman and Gonzalez in 2020, by focusing on graphs of bounded mim-width. We study which restrictions on a locally checkable problem are necessary in order to be able to solve it efficiently on graphs of bounded mim-width. To this end, we introduce the concept of $d$-stability of a check fu… ▽ More In this paper we continue the study of locally checkable problems under the framework introduced by Bonomo-Braberman and Gonzalez in 2020, by focusing on graphs of bounded mim-width. We study which restrictions on a locally checkable problem are necessary in order to be able to solve it efficiently on graphs of bounded mim-width. To this end, we introduce the concept of $d$-stability of a check function. The related locally checkable problems contain large classes of problems, among which we can mention, for example, LCVP problems. We give an algorithm showing that these problems are XP when parameterized by the mim-width of a given binary decomposition tree of the input graph, that is, that they can be solved in polynomial time given a binary decomposition tree of bounded mim-width. We explore the relation between $d$-stable locally checkable problems and the recently introduced DN logic (Bergougnoux, Dreier and Jaffke, 2022), and show that both frameworks model the same family of problems. We include a list of concrete examples of $d$-stable locally checkable problems whose complexity on graphs of bounded mim-width was open so far. △ Less

Submitted 13 October, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

MSC Class: 05C69; 05C85; 68Q25; 68R10

arXiv:2203.02992 [pdf, ps, other]

Locally checkable problems parameterized by clique-width

Authors: Narmina Baghirova, Carolina Lucía Gonzalez, Bernard Ries, David Schindl

Abstract: We continue the study initiated by Bonomo-Braberman and Gonzalez in 2020 on $r$-locally checkable problems. We propose a dynamic programming algorithm that takes as input a graph with an associated clique-width expression and solves a $1$-locally checkable problem under certain restrictions. We show that it runs in polynomial time in graphs of bounded clique-width, when the number of colors of the… ▽ More We continue the study initiated by Bonomo-Braberman and Gonzalez in 2020 on $r$-locally checkable problems. We propose a dynamic programming algorithm that takes as input a graph with an associated clique-width expression and solves a $1$-locally checkable problem under certain restrictions. We show that it runs in polynomial time in graphs of bounded clique-width, when the number of colors of the locally checkable problem is fixed. Furthermore, we present a first extension of our framework to global properties by taking into account the sizes of the color classes, and consequently enlarge the set of problems solvable in polynomial time with our approach in graphs of bounded clique-width. As examples, we apply this setting to show that, when parameterized by clique-width, the $[k]-$Roman domination problem is FPT, and the $k$-community problem, Max PDS and other variants are XP. △ Less

Submitted 28 June, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

MSC Class: 05C69; 05C85; 68Q25; 68R10

arXiv:2201.09769 [pdf, other]

A Sorted Datalog Hammer for Supervisor Verification Conditions Modulo Simple Linear Arithmetic

Authors: Martin Bromberger, Irina Dragoste, Rasha Faqeh, Christof Fetzer, Larry González, Markus Krötzsch, Maximilian Marx, Harish K Murali, Christoph Weidenbach

Abstract: In a previous paper, we have shown that clause sets belonging to the Horn Bernays-Schönfinkel fragment over simple linear real arithmetic (HBS(SLR)) can be translated into HBS clause sets over a finite set of first-order constants. The translation preserves validity and satisfiability and it is still applicable if we extend our input with positive universally or existentially quantified verificati… ▽ More In a previous paper, we have shown that clause sets belonging to the Horn Bernays-Schönfinkel fragment over simple linear real arithmetic (HBS(SLR)) can be translated into HBS clause sets over a finite set of first-order constants. The translation preserves validity and satisfiability and it is still applicable if we extend our input with positive universally or existentially quantified verification conditions (conjectures). We call this translation a Datalog hammer. The combination of its implementation in SPASS-SPL with the Datalog reasoner VLog establishes an effective way of deciding verification conditions in the Horn fragment. We verify supervisor code for two examples: a lane change assistant in a car and an electronic control unit of a supercharged combustion engine. In this paper, we improve our Datalog hammer in several ways: we generalize it to mixed real-integer arithmetic and finite first-order sorts; we extend the class of acceptable inequalities beyond variable bounds and positively grounded inequalities; and we significantly reduce the size of the hammer output by a soft typing discipline. We call the result the sorted Datalog hammer. It not only allows us to handle more complex supervisor code and to model already considered supervisor code more concisely, but it also improves our performance on real world benchmark examples. Finally, we replace the before file-based interface between SPASS-SPL and VLog by a close coupling resulting in a single executable binary. △ Less

Submitted 24 January, 2022; originally announced January 2022.

Comments: 34 pages, to be published in the proceedings for TACAS 2022. arXiv admin note: text overlap with arXiv:2107.03189

arXiv:2104.08126 [pdf, other]

Exploiting Global and Local Attentions for Heavy Rain Removal on Single Images

Authors: Dac Tung Vu, Juan Luis Gonzalez, Munchurl Kim

Abstract: Heavy rain removal from a single image is the task of simultaneously eliminating rain streaks and fog, which can dramatically degrade the quality of captured images. Most existing rain removal methods do not generalize well for the heavy rain case. In this work, we propose a novel network architecture consisting of three sub-networks to remove heavy rain from a single image without estimating rain… ▽ More Heavy rain removal from a single image is the task of simultaneously eliminating rain streaks and fog, which can dramatically degrade the quality of captured images. Most existing rain removal methods do not generalize well for the heavy rain case. In this work, we propose a novel network architecture consisting of three sub-networks to remove heavy rain from a single image without estimating rain streaks and fog separately. The first sub-net, a U-net-based architecture that incorporates our Spatial Channel Attention (SCA) blocks, extracts global features that provide sufficient contextual information needed to remove atmospheric distortions caused by rain and fog. The second sub-net learns the additive residues information, which is useful in removing rain streak artifacts via our proposed Residual Inception Modules (RIM). The third sub-net, the multiplicative sub-net, adopts our Channel-attentive Inception Modules (CIM) and learns the essential brighter local features which are not effectively extracted in the SCA and additive sub-nets by modulating the local pixel intensities in the derained images. Our three clean image results are then combined via an attentive blending block to generate the final clean image. Our method with SCA, RIM, and CIM significantly outperforms the previous state-of-the-art single-image deraining methods on the synthetic datasets, shows considerably cleaner and sharper derained estimates on the real image datasets. We present extensive experiments and ablation studies supporting each of our method's contributions on both synthetic and real image datasets. △ Less

Submitted 16 April, 2021; originally announced April 2021.

arXiv:2008.11578 [pdf, other]

doi 10.1007/978-3-662-61983-4

Simulating Crowds and Autonomous Vehicles

Authors: John Charlton, Luis Rene Montana Gonzalez, Steve Maddock, Paul Richmond

Abstract: Understanding how people view and interact with autonomous vehicles is important to guide future directions of research. One such way of aiding understanding is through simulations of virtual environments involving people and autonomous vehicles. We present a simulation model that incorporates people and autonomous vehicles in a shared urban space. The model is able to simulate many thousands of p… ▽ More Understanding how people view and interact with autonomous vehicles is important to guide future directions of research. One such way of aiding understanding is through simulations of virtual environments involving people and autonomous vehicles. We present a simulation model that incorporates people and autonomous vehicles in a shared urban space. The model is able to simulate many thousands of people and vehicles in real-time. This is achieved by use of GPU hardware, and through a novel linear program solver optimized for large numbers of problems on the GPU. The model is up to 30 times faster than the equivalent multi-core CPU model. △ Less

Submitted 25 August, 2020; originally announced August 2020.

Comments: 15 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:1908.10107

Journal ref: Transactions on Computational Science XXXVII, 2020, 129-143

arXiv:2008.03633 [pdf, other]

Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes

Authors: Juan Luis Gonzalez, Munchurl Kim

Abstract: Self-supervised depth estimators have recently shown results comparable to the supervised methods on the challenging single image depth estimation (SIDE) task, by exploiting the geometrical relations between target and reference views in the training data. However, previous methods usually learn forward or backward image synthesis, but not depth estimation, as they cannot effectively neglect occlu… ▽ More Self-supervised depth estimators have recently shown results comparable to the supervised methods on the challenging single image depth estimation (SIDE) task, by exploiting the geometrical relations between target and reference views in the training data. However, previous methods usually learn forward or backward image synthesis, but not depth estimation, as they cannot effectively neglect occlusions between the target and the reference images. Previous works rely on rigid photometric assumptions or the SIDE network to infer depth and occlusions, resulting in limited performance. On the other hand, we propose a method to "Forget About the LiDAR" (FAL), for the training of depth estimators, with Mirrored Exponential Disparity (MED) probability volumes, from which we obtain geometrically inspired occlusion maps with our novel Mirrored Occlusion Module (MOM). Our MOM does not impose a burden on our FAL-net. Contrary to the previous methods that learn SIDE from stereo pairs by regressing disparity in the linear space, our FAL-net regresses disparity by binning it into the exponential space, which allows for better detection of distant and nearby objects. We define a two-step training strategy for our FAL-net: It is first trained for view synthesis and then fine-tuned for depth estimation with our MOM. Our FAL-net is remarkably light-weight and outperforms the previous state-of-the-art methods with 8x fewer parameters and 3x faster inference speeds on the challenging KITTI dataset. We present extensive experimental results on the KITTI, CityScapes, and Make3D datasets to verify our method's effectiveness. To the authors' best knowledge, the presented method performs the best among all the previous self-supervised methods until now. △ Less

Submitted 26 September, 2020; v1 submitted 8 August, 2020; originally announced August 2020.

Comments: Accepted to NeurIPS2020

arXiv:2007.11038 [pdf]

Sistema experto para el diagnóstico de enfermedades y plagas en los cultivos del arroz, tabaco, tomate, pimiento, maíz, pepino y frijol

Authors: Ing. Yosvany Medina Carbó, MSc. Iracely Milagros Santana Ges, Lic. Saily Leo González

Abstract: Agricultural production has become a complex business that requires the accumulation and integration of knowledge, in addition to information from many different sources. To remain competitive, the modern farmer often relies on agricultural specialists and advisors who provide them with information for decision making in their crops. But unfortunately, the help of the agricultural specialist is no… ▽ More Agricultural production has become a complex business that requires the accumulation and integration of knowledge, in addition to information from many different sources. To remain competitive, the modern farmer often relies on agricultural specialists and advisors who provide them with information for decision making in their crops. But unfortunately, the help of the agricultural specialist is not always available when the farmer needs it. To alleviate this problem, expert systems have become a powerful instrument that has great potential within agriculture. This paper presents an Expert System for the diagnosis of diseases and pests in rice, tobacco, tomato, pepper, corn, cucumber and bean crops. For the development of this Expert System, SWI-Prolog was used to create the knowledge base, so it works with predicates and allows the system to be based on production rules. This system allows a fast and reliable diagnosis of pests and diseases that affect these crops. △ Less

Submitted 21 July, 2020; originally announced July 2020.

Comments: in Spanish

arXiv:2007.09117 [pdf, other]

Estimating COVID-19 cases and reproduction number in Mexico

Authors: Michelle Anzarut, Luis Felipe González, Sonia Mendizábal, María Teresa Ortiz

Abstract: In this report we fit a semi-mechanistic Bayesian hierarchical model to describe the Mexican COVID-19 epidemic. We obtain two epidemiological measures: the number of infections and the reproduction number. Estimations are based on death data. Hence, we expect our estimates to be more accurate than the attack rates estimated from the reported number of cases. In this report we fit a semi-mechanistic Bayesian hierarchical model to describe the Mexican COVID-19 epidemic. We obtain two epidemiological measures: the number of infections and the reproduction number. Estimations are based on death data. Hence, we expect our estimates to be more accurate than the attack rates estimated from the reported number of cases. △ Less

Submitted 17 July, 2020; originally announced July 2020.

MSC Class: 62P10

arXiv:2006.16887 [pdf, ps, other]

doi 10.1016/j.dam.2021.04.003

Thinness of product graphs

Authors: Flavia Bonomo-Braberman, Carolina L. Gonzalez, Fabiano S. Oliveira, Moysés S. Sampaio Jr., Jayme L. Szwarcfiter

Abstract: The thinness of a graph is a width parameter that generalizes some properties of interval graphs, which are exactly the graphs of thinness one. Many NP-complete problems can be solved in polynomial time for graphs with bounded thinness, given a suitable representation of the graph. In this paper we study the thinness and its variations of graph products. We show that the thinness behaves "well" in… ▽ More The thinness of a graph is a width parameter that generalizes some properties of interval graphs, which are exactly the graphs of thinness one. Many NP-complete problems can be solved in polynomial time for graphs with bounded thinness, given a suitable representation of the graph. In this paper we study the thinness and its variations of graph products. We show that the thinness behaves "well" in general for products, in the sense that for most of the graph products defined in the literature, the thinness of the product of two graphs is bounded by a function (typically product or sum) of their thinness, or of the thinness of one of them and the size of the other. We also show for some cases the non-existence of such a function. △ Less

Submitted 16 April, 2021; v1 submitted 30 June, 2020; originally announced June 2020.

Comments: 45 pages. arXiv admin note: text overlap with arXiv:1704.00379

MSC Class: 05C76 ACM Class: G.2.2

Journal ref: Discrete Applied Mathematics 312 (2022), 52-71

arXiv:2006.00681 [pdf, ps, other]

doi 10.1016/j.dam.2022.01.019

A new approach on locally checkable problems

Authors: Flavia Bonomo-Braberman, Carolina Lucía Gonzalez

Abstract: By providing a new framework, we extend previous results on locally checkable problems in bounded treewidth graphs. As a consequence, we show how to solve, in polynomial time for bounded treewidth graphs, double Roman domination and Grundy domination, among other problems for which no such algorithm was previously known. Moreover, by proving that fixed powers of bounded degree and bounded treewidt… ▽ More By providing a new framework, we extend previous results on locally checkable problems in bounded treewidth graphs. As a consequence, we show how to solve, in polynomial time for bounded treewidth graphs, double Roman domination and Grundy domination, among other problems for which no such algorithm was previously known. Moreover, by proving that fixed powers of bounded degree and bounded treewidth graphs are also bounded degree and bounded treewidth graphs, we can enlarge the family of problems that can be solved in polynomial time for these graph classes, including distance coloring problems and distance domination problems (for bounded distances). △ Less

Submitted 29 December, 2020; v1 submitted 31 May, 2020; originally announced June 2020.

MSC Class: 05C15; 05C69; 05C85; 68Q25; 68R10

Journal ref: Discrete Applied Mathematics 314 (2022), 53-80

arXiv:1912.06432 [pdf, other]

A Bayesian Approach to Rule Mining

Authors: Luis Ignacio Lopera González, Adrian Derungs, Oliver Amft

Abstract: In this paper, we introduce the increasing belief criterion in association rule mining. The criterion uses a recursive application of Bayes' theorem to compute a rule's belief. Extracted rules are required to have their belief increase with their last observation. We extend the taxonomy of association rule mining algorithms with a new branch for Bayesian rule mining~(BRM), which uses increasing be… ▽ More In this paper, we introduce the increasing belief criterion in association rule mining. The criterion uses a recursive application of Bayes' theorem to compute a rule's belief. Extracted rules are required to have their belief increase with their last observation. We extend the taxonomy of association rule mining algorithms with a new branch for Bayesian rule mining~(BRM), which uses increasing belief as the rule selection criterion. In contrast, the well-established frequent association rule mining~(FRM) branch relies on the minimum-support concept to extract rules. We derive properties of the increasing belief criterion, such as the increasing belief boundary, no-prior-worries, and conjunctive premises. Subsequently, we implement a BRM algorithm using the increasing belief criterion, and illustrate its functionality in three experiments: (1)~a proof-of-concept to illustrate BRM properties, (2)~an analysis relating socioeconomic information and chemical exposure data, and (3)~mining behaviour routines in patients undergoing neurological rehabilitation. We illustrate how BRM is capable of extracting rare rules and does not suffer from support dilution. Furthermore, we show that BRM focuses on the individual event generating processes, while FRM focuses on their commonalities. We consider BRM's increasing belief as an alternative criterion to thresholds on rule support, as often applied in FRM, to determine rule usefulness. △ Less

Submitted 13 January, 2020; v1 submitted 13 December, 2019; originally announced December 2019.

arXiv:1909.06231 [pdf, other]

doi 10.1016/j.dam.2020.01.027

Characterising circular-arc contact $B_0$-VPG graphs

Authors: Flavia Bonomo-Braberman, Esther Galby, Carolina Lucía Gonzalez

Abstract: A contact $B_0$-VPG graph is a graph for which there exists a collection of nontrivial pairwise interiorly disjoint horizontal and vertical segments in one-to-one correspondence with its vertex set such that two vertices are adjacent if and only if the corresponding segments touch. It was shown by Deniz et al. that Recognition is $\mathsf{NP}$-complete for contact $B_0$-VPG graphs. In this paper w… ▽ More A contact $B_0$-VPG graph is a graph for which there exists a collection of nontrivial pairwise interiorly disjoint horizontal and vertical segments in one-to-one correspondence with its vertex set such that two vertices are adjacent if and only if the corresponding segments touch. It was shown by Deniz et al. that Recognition is $\mathsf{NP}$-complete for contact $B_0$-VPG graphs. In this paper we present a minimal forbidden induced subgraph characterisation of contact $B_0$-VPG graphs within the class of circular-arc graphs and provide a polynomial-time algorithm for recognising these graphs. △ Less

Submitted 13 September, 2019; originally announced September 2019.

Journal ref: Discrete Applied Mathematics 283 (2020), 435-443

arXiv:1908.10107 [pdf, other]

doi 10.1007/978-3-030-22514-8_22

Fast Simulation of Crowd Collision Avoidance

Authors: John Charlton, Luis Rene Montana Gonzalez, Steve Maddock, Paul Richmond

Abstract: Real-time large-scale crowd simulations with realistic behavior, are important for many application areas. On CPUs, the ORCA pedestrian steering model is often used for agent-based pedestrian simulations. This paper introduces a technique for running the ORCA pedestrian steering model on the GPU. Performance improvements of up to 30 times greater than a multi-core CPU model are demonstrated. This… ▽ More Real-time large-scale crowd simulations with realistic behavior, are important for many application areas. On CPUs, the ORCA pedestrian steering model is often used for agent-based pedestrian simulations. This paper introduces a technique for running the ORCA pedestrian steering model on the GPU. Performance improvements of up to 30 times greater than a multi-core CPU model are demonstrated. This improvement is achieved through a specialized linear program solver on the GPU and spatial partitioning of information sharing. This allows over 100,000 people to be simulated in real time (60 frames per second). △ Less

Submitted 27 August, 2019; originally announced August 2019.

Comments: 12 pages, 6 figures, 36th Computer Graphics International Conference (CGI 2019)

Journal ref: CGI 2019: Advances in Computer Graphics, 36, pp 266-277

arXiv:1907.01581 [pdf, ps, other]

Covering graphs with convex sets and partitioning graphs into convex sets

Authors: Lucía M. González, Luciano N. Grippo, Martín D. Safe, Vinícius F. dos Santos

Abstract: We present some complexity results concerning the problems of covering a graph with $p$ convex sets and of partitioning a graph into $p$ convex sets. The following convexities are considered: digital convexity, monophonic convexity, $P_3$-convexity, and $P_3^*$-convexity. We present some complexity results concerning the problems of covering a graph with $p$ convex sets and of partitioning a graph into $p$ convex sets. The following convexities are considered: digital convexity, monophonic convexity, $P_3$-convexity, and $P_3^*$-convexity. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: 10 pages

MSC Class: 05 Combinatorics

arXiv:1904.00205 [pdf, other]

A HVS-inspired Attention to Improve Loss Metrics for CNN-based Perception-Oriented Super-Resolution

Authors: Taimoor Tariq, Juan Luis Gonzalez, Munchurl Kim

Abstract: Deep Convolutional Neural Network (CNN) features have been demonstrated to be effective perceptual quality features. The perceptual loss, based on feature maps of pre-trained CNN's has proven to be remarkably effective for CNN based perceptual image restoration problems. In this work, taking inspiration from the the Human Visual System (HVS) and visual perception, we propose a spatial attention me… ▽ More Deep Convolutional Neural Network (CNN) features have been demonstrated to be effective perceptual quality features. The perceptual loss, based on feature maps of pre-trained CNN's has proven to be remarkably effective for CNN based perceptual image restoration problems. In this work, taking inspiration from the the Human Visual System (HVS) and visual perception, we propose a spatial attention mechanism based on the dependency human contrast sensitivity on spatial frequency. We identify regions in input images, based on the underlying spatial frequency, which are not generally well reconstructed during Super-Resolution but are most important in terms of visual sensitivity. Based on this prior, we design a spatial attention map that is applied to feature maps in the perceptual loss and its variants, helping them to identify regions that are of more perceptual importance. The results demonstrate the our technique improves the ability of the perceptual loss and contextual loss to deliver more natural images in CNN based super-resolution. △ Less

Submitted 27 July, 2019; v1 submitted 30 March, 2019; originally announced April 2019.

arXiv:1810.03155 [pdf, other]

Finding Correspondences for Optical Flow and Disparity Estimations using a Sub-pixel Convolution-based Encoder-Decoder Network

Authors: Juan Luis Gonzalez, Muhammad Sarmad, Hyunjoo J. Lee, Munchurl Kim

Abstract: Deep convolutional neural networks (DCNN) have recently shown promising results in low-level computer vision problems such as optical flow and disparity estimation, but still, have much room to further improve their performance. In this paper, we propose a novel sub-pixel convolution-based encoder-decoder network for optical flow and disparity estimations, which can extend FlowNetS and DispNet by… ▽ More Deep convolutional neural networks (DCNN) have recently shown promising results in low-level computer vision problems such as optical flow and disparity estimation, but still, have much room to further improve their performance. In this paper, we propose a novel sub-pixel convolution-based encoder-decoder network for optical flow and disparity estimations, which can extend FlowNetS and DispNet by replacing the deconvolution layers with sup-pixel convolution blocks. By using sub-pixel refinement and estimation on the decoder stages instead of deconvolution, we can significantly improve the estimation accuracy for optical flow and disparity, even with reduced numbers of parameters. We show a supervised end-to-end training of our proposed networks for optical flow and disparity estimations, and an unsupervised end-to-end training for monocular depth and pose estimations. In order to verify the effectiveness of our proposed networks, we perform intensive experiments for (i) optical flow and disparity estimations, and (ii) monocular depth and pose estimations. Throughout the extensive experiments, our proposed networks outperform the baselines such as FlowNetS and DispNet in terms of estimation accuracy and training times. △ Less

Submitted 7 October, 2018; originally announced October 2018.

arXiv:1712.09327 [pdf, other]

Building Robust Deep Neural Networks for Road Sign Detection

Authors: Arkar Min Aung, Yousef Fadila, Radian Gondokaryono, Luis Gonzalez

Abstract: Deep Neural Networks are built to generalize outside of training set in mind by using techniques such as regularization, early stopping and dropout. But considerations to make them more resilient to adversarial examples are rarely taken. As deep neural networks become more prevalent in mission-critical and real-time systems, miscreants start to attack them by intentionally making deep neural netwo… ▽ More Deep Neural Networks are built to generalize outside of training set in mind by using techniques such as regularization, early stopping and dropout. But considerations to make them more resilient to adversarial examples are rarely taken. As deep neural networks become more prevalent in mission-critical and real-time systems, miscreants start to attack them by intentionally making deep neural networks to misclassify an object of one type to be seen as another type. This can be catastrophic in some scenarios where the classification of a deep neural network can lead to a fatal decision by a machine. In this work, we used GTSRB dataset to craft adversarial samples by Fast Gradient Sign Method and Jacobian Saliency Method, used those crafted adversarial samples to attack another Deep Convolutional Neural Network and built the attacked network to be more resilient against adversarial attacks by making it more robust by Defensive Distillation and Adversarial Training △ Less

Submitted 26 December, 2017; originally announced December 2017.

arXiv:1708.05106 [pdf, other]

doi 10.1109/ICDMW.2017.116

The Mean and Median Criterion for Automatic Kernel Bandwidth Selection for Support Vector Data Description

Authors: Arin Chaudhuri, Deovrat Kakde, Carol Sadek, Laura Gonzalez, Seunghyun Kong

Abstract: Support vector data description (SVDD) is a popular technique for detecting anomalies. The SVDD classifier partitions the whole space into an inlier region, which consists of the region near the training data, and an outlier region, which consists of points away from the training data. The computation of the SVDD classifier requires a kernel function, and the Gaussian kernel is a common choice for… ▽ More Support vector data description (SVDD) is a popular technique for detecting anomalies. The SVDD classifier partitions the whole space into an inlier region, which consists of the region near the training data, and an outlier region, which consists of points away from the training data. The computation of the SVDD classifier requires a kernel function, and the Gaussian kernel is a common choice for the kernel function. The Gaussian kernel has a bandwidth parameter, whose value is important for good results. A small bandwidth leads to overfitting, and the resulting SVDD classifier overestimates the number of anomalies. A large bandwidth leads to underfitting, and the classifier fails to detect many anomalies. In this paper we present a new automatic, unsupervised method for selecting the Gaussian kernel bandwidth. The selected value can be computed quickly, and it is competitive with existing bandwidth selection methods. △ Less

Submitted 21 August, 2017; v1 submitted 16 August, 2017; originally announced August 2017.

ACM Class: I.2.7

arXiv:1108.0599 [pdf]

Proposal for improvement in the transfer and execution of multiple instances of a virtual image

Authors: Tomas Ramirez Picarzo, Francisco Fernandez de Vega, Daniel Lombrana Gonzalez

Abstract: Virtualization technology allows currently any application run any application complex and expensive computational (the scientific applications are a good example) on heterogeneous distributed systems, which make regular use of Grid and Cloud technologies, enabling significant savings in computing time. This model is particularly interesting for the mass execution of scientific simulations and cal… ▽ More Virtualization technology allows currently any application run any application complex and expensive computational (the scientific applications are a good example) on heterogeneous distributed systems, which make regular use of Grid and Cloud technologies, enabling significant savings in computing time. This model is particularly interesting for the mass execution of scientific simulations and calculations, allowing parallel execution of applications using the same execution environment (unchanged) used by the scientist as usual. However, the use and distribution of large virtual images can be a problem (up to tens of GBytes), which is aggravated when attempting a mass mailing on a large number of distributed computers. This work has as main objective to present an analysis of how implementation and a proposal for the improvement (reduction in size) of the virtual images pretending reduce distribution time in distributed systems. This analysis is done very specific requirements that need an operating system (guest OS) on some aspects of its execution. △ Less

Submitted 2 August, 2011; originally announced August 2011.

Showing 1–50 of 51 results for author: Gonzalez, L