这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 216 results for author: Ahmed, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.17121  [pdf, ps, other

    cs.CV cs.LG

    Robust Five-Class and binary Diabetic Retinopathy Classification Using Transfer Learning and Data Augmentation

    Authors: Faisal Ahmed, Mohammad Alfrad Nobel Bhuiyan

    Abstract: Diabetic retinopathy (DR) is a leading cause of vision loss worldwide, and early diagnosis through automated retinal image analysis can significantly reduce the risk of blindness. This paper presents a robust deep learning framework for both binary and five-class DR classification, leveraging transfer learning and extensive data augmentation to address the challenges of class imbalance and limited… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 9 pages, 1 Figure

    ACM Class: F.2.2; I.2.7

  2. arXiv:2507.11574  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Distribution-Free Uncertainty-Aware Virtual Sensing via Conformalized Neural Operators

    Authors: Kazuma Kobayashi, Shailesh Garg, Farid Ahmed, Souvik Chakraborty, Syed Bahauddin Alam

    Abstract: Robust uncertainty quantification (UQ) remains a critical barrier to the safe deployment of deep learning in real-time virtual sensing, particularly in high-stakes domains where sparse, noisy, or non-collocated sensor data are the norm. We introduce the Conformalized Monte Carlo Operator (CMCO), a framework that transforms neural operator-based virtual sensing with calibrated, distribution-free pr… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  3. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  4. arXiv:2507.06133  [pdf, ps, other

    cs.CE

    Bridging Sequential Deep Operator Network and Video Diffusion: Residual Refinement of Spatio-Temporal PDE Solutions

    Authors: Jaewan Park, Farid Ahmed, Kazuma Kobayashi, Seid Koric, Syed Bahauddin Alam, Iwona Jasiuk, Diab Abueidda

    Abstract: Video-diffusion models have recently set the standard in video generation, inpainting, and domain translation thanks to their training stability and high perceptual fidelity. Building on these strengths, we repurpose conditional video diffusion as a physics surrogate for spatio-temporal fields governed by partial differential equations (PDEs). Our two-stage surrogate first applies a Sequential Dee… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  5. arXiv:2507.04317  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.LG

    CLIP-RL: Surgical Scene Segmentation Using Contrastive Language-Vision Pretraining & Reinforcement Learning

    Authors: Fatmaelzahraa Ali Ahmed, Muhammad Arsalan, Abdulaziz Al-Ali, Khalid Al-Jalham, Shidin Balakrishnan

    Abstract: Understanding surgical scenes can provide better healthcare quality for patients, especially with the vast amount of video data that is generated during MIS. Processing these videos generates valuable assets for training sophisticated models. In this paper, we introduce CLIP-RL, a novel contrastive language-image pre-training model tailored for semantic segmentation for surgical scenes. CLIP-RL pr… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  6. arXiv:2507.04304  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Surg-SegFormer: A Dual Transformer-Based Model for Holistic Surgical Scene Segmentation

    Authors: Fatimaelzahraa Ahmed, Muraam Abdel-Ghani, Muhammad Arsalan, Mahmoud Ali, Abdulaziz Al-Ali, Shidin Balakrishnan

    Abstract: Holistic surgical scene segmentation in robot-assisted surgery (RAS) enables surgical residents to identify various anatomical tissues, articulated tools, and critical structures, such as veins and vessels. Given the firm intraoperative time constraints, it is challenging for surgeons to provide detailed real-time explanations of the operative field for trainees. This challenge is compounded by th… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: Accepted in IEEE Case 2025

  7. arXiv:2507.03006  [pdf, ps, other

    cs.CV cs.LG

    Topological Signatures vs. Gradient Histograms: A Comparative Study for Medical Image Classification

    Authors: Faisal Ahmed, Mohammad Alfrad Nobel Bhuiyan

    Abstract: We present the first comparative study of two fundamentally distinct feature extraction techniques: Histogram of Oriented Gradients (HOG) and Topological Data Analysis (TDA), for medical image classification using retinal fundus images. HOG captures local texture and edge patterns through gradient orientation histograms, while TDA, using cubical persistent homology, extracts high-level topological… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 18 pages, 12 figures

  8. arXiv:2506.06522  [pdf, ps, other

    cs.CL cs.AI

    Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance

    Authors: Aladin Djuhera, Swanand Ravindra Kadhe, Syed Zawad, Farhan Ahmed, Heiko Ludwig, Holger Boche

    Abstract: Recent work on large language models (LLMs) has increasingly focused on post-training and alignment with datasets curated to enhance instruction following, world knowledge, and specialized skills. However, most post-training datasets used in leading open- and closed-source LLMs remain inaccessible to the public, with limited information about their construction process. This lack of transparency h… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  9. arXiv:2506.00822  [pdf, ps, other

    cs.NI

    Federated Deep Reinforcement Learning-Driven O-RAN for Automatic Multirobot Reconfiguration

    Authors: Faisal Ahmed, Myungjin Lee, Shao-Yu Lien, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin

    Abstract: The rapid evolution of Industry 4.0 has led to the emergence of smart factories, where multirobot system autonomously operates to enhance productivity, reduce operational costs, and improve system adaptability. However, maintaining reliable and efficient network operations in these dynamic and complex environments requires advanced automation mechanisms. This study presents a zero-touch network pl… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  10. arXiv:2506.00062  [pdf, other

    cs.CY cs.CL cs.CR cs.LG

    SafeCOMM: What about Safety Alignment in Fine-Tuned Telecom Large Language Models?

    Authors: Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche, Walid Saad

    Abstract: Fine-tuning large language models (LLMs) for telecom tasks and datasets is a common practice to adapt general-purpose models to the telecom domain. However, little attention has been paid to how this process may compromise model safety. Recent research has shown that even benign fine-tuning can degrade the safety alignment of LLMs, causing them to respond to harmful or unethical user queries. In t… ▽ More

    Submitted 29 May, 2025; originally announced June 2025.

  11. arXiv:2505.24838  [pdf, ps, other

    cs.CV cs.AI

    VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software

    Authors: Brandon Man, Ghadi Nehme, Md Ferdous Alam, Faez Ahmed

    Abstract: Computer-Aided Design (CAD) is a time-consuming and complex process, requiring precise, long-horizon user interactions with intricate 3D interfaces. While recent advances in AI-driven user interface (UI) agents show promise, most existing datasets and methods focus on short, low-complexity tasks in mobile or web applications, failing to capture the demands of professional engineering tools. In thi… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  12. arXiv:2505.20685  [pdf, ps, other

    cs.CE

    GIT-BO: High-Dimensional Bayesian Optimization with Tabular Foundation Models

    Authors: Rosen Ting-Ying Yu, Cyril Picard, Faez Ahmed

    Abstract: Bayesian optimization (BO) effectively optimizes expensive black-box functions but faces significant challenges in high-dimensional spaces (dimensions exceeding 100) due to the curse of dimensionality. Existing high-dimensional BO methods typically leverage low-dimensional embeddings or structural assumptions to mitigate this challenge, yet these approaches frequently incur considerable computatio… ▽ More

    Submitted 29 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  13. arXiv:2505.14646  [pdf, ps, other

    cs.CV cs.AI

    CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation

    Authors: Anna C. Doris, Md Ferdous Alam, Amin Heyrani Nobari, Faez Ahmed

    Abstract: Efficient creation of accurate and editable 3D CAD models is critical in engineering design, significantly impacting cost and time-to-market in product innovation. Current manual workflows remain highly time-consuming and demand extensive user expertise. While recent developments in AI-driven CAD generation show promise, existing models are limited by incomplete representations of CAD operations,… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  14. arXiv:2505.01980  [pdf

    cs.CL

    LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load

    Authors: Theo Guidroz, Diego Ardila, Jimmy Li, Adam Mansour, Paul Jhun, Nina Gonzalez, Xiang Ji, Mike Sanchez, Sujay Kakarmath, Mathias MJ Bellaiche, Miguel Ángel Garrido, Faruk Ahmed, Divyansh Choudhary, Jay Hartford, Chenwei Xu, Henry Javier Serrano Echeverria, Yifan Wang, Jeff Shaffer, Eric, Cao, Yossi Matias, Avinatan Hassidim, Dale R Webster, Yun Liu, Sho Fujiwara , et al. (2 additional authors not shown)

    Abstract: Information on the web, such as scientific publications and Wikipedia, often surpasses users' reading level. To help address this, we used a self-refinement approach to develop a LLM capability for minimally lossy text simplification. To validate our approach, we conducted a randomized study involving 4563 participants and 31 texts spanning 6 broad subject areas: PubMed (biomedical scientific arti… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  15. arXiv:2505.01925  [pdf

    cs.SE

    ImageR: Enhancing Bug Report Clarity by Screenshots

    Authors: Xuchen Tan, Deenu Yadav, Faiz Ahmed, Maleknaz Nayebi

    Abstract: In issue-tracking systems, incorporating screenshots significantly enhances the clarity of bug reports, facilitating more efficient communication and expediting issue resolution. However, determining when and what type of visual content to include remains challenging, as not all attachments effectively contribute to problem-solving; studies indicate that 22.5% of images in issue reports fail to ai… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: Paper accepted to EASE 2025

    Journal ref: EASE 2025

  16. arXiv:2504.18912  [pdf

    cs.SE

    Inferring Questions from Programming Screenshots

    Authors: Faiz Ahmed, Xuchen Tan, Folajinmi Adewole, Suprakash Datta, Maleknaz Nayebi

    Abstract: The integration of generative AI into developer forums like Stack Overflow presents an opportunity to enhance problem-solving by allowing users to post screenshots of code or Integrated Development Environments (IDEs) instead of traditional text-based queries. This study evaluates the effectiveness of various large language models (LLMs), specifically LLAMA, GEMINI, and GPT-4o in interpreting such… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Journal ref: MSR 2025

  17. arXiv:2504.18113  [pdf, other

    cs.LG cs.AI

    Learning from Less: SINDy Surrogates in RL

    Authors: Aniket Dixit, Muhammad Ibrahim Khan, Faizan Ahmed, James Brusey

    Abstract: This paper introduces an approach for developing surrogate environments in reinforcement learning (RL) using the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm. We demonstrate the effectiveness of our approach through extensive experiments in OpenAI Gym environments, particularly Mountain Car and Lunar Lander. Our results show that SINDy-based surrogate models can accurately capture… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: World Models @ ICLR 2025

  18. arXiv:2504.12503  [pdf, other

    cs.LG cs.AI cs.CE

    Continual Learning Strategies for 3D Engineering Regression Problems: A Benchmarking Study

    Authors: Kaira M. Samuel, Faez Ahmed

    Abstract: Engineering problems that apply machine learning often involve computationally intensive methods but rely on limited datasets. As engineering data evolves with new designs and constraints, models must incorporate new knowledge over time. However, high computational costs make retraining models from scratch infeasible. Continual learning (CL) offers a promising solution by enabling models to learn… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  19. arXiv:2504.11159  [pdf, other

    cs.AI

    C-SHAP for time series: An approach to high-level temporal explanations

    Authors: Annemarie Jutte, Faizan Ahmed, Jeroen Linssen, Maurice van Keulen

    Abstract: Time series are ubiquitous in domains such as energy forecasting, healthcare, and industry. Using AI systems, some tasks within these domains can be efficiently handled. Explainable AI (XAI) aims to increase the reliability of AI solutions by explaining model reasoning. For time series, many XAI methods provide point- or sequence-based attribution maps. These methods explain model reasoning in ter… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 10 pages, 6 figures

  20. arXiv:2504.04927  [pdf, other

    cs.HC cs.CL

    How Is Generative AI Used for Persona Development?: A Systematic Review of 52 Research Articles

    Authors: Danial Amin, Joni Salminen, Farhan Ahmed, Sonja M. H. Tervola, Sankalp Sethi, Bernard J. Jansen

    Abstract: Although Generative AI (GenAI) has the potential for persona development, many challenges must be addressed. This research systematically reviews 52 articles from 2022-2024, with important findings. First, closed commercial models are frequently used in persona development, creating a monoculture Second, GenAI is used in various stages of persona development (data collection, segmentation, enrichm… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  21. arXiv:2504.00938  [pdf, other

    cs.AI cs.LG

    AI Judges in Design: Statistical Perspectives on Achieving Human Expert Equivalence With Vision-Language Models

    Authors: Kristen M. Edwards, Farnaz Tehranchi, Scarlett R. Miller, Faez Ahmed

    Abstract: The subjective evaluation of early stage engineering designs, such as conceptual sketches, traditionally relies on human experts. However, expert evaluations are time-consuming, expensive, and sometimes inconsistent. Recent advances in vision-language models (VLMs) offer the potential to automate design assessments, but it is crucial to ensure that these AI ``judges'' perform on par with human exp… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 21 pages, 8 tables, 6 figures, 8 tables in the appendix

  22. arXiv:2503.23315  [pdf, other

    cs.AI cs.CE cs.LG

    AI Agents in Engineering Design: A Multi-Agent Framework for Aesthetic and Aerodynamic Car Design

    Authors: Mohamed Elrefaie, Janet Qian, Raina Wu, Qian Chen, Angela Dai, Faez Ahmed

    Abstract: We introduce the concept of "Design Agents" for engineering applications, particularly focusing on the automotive design process, while emphasizing that our approach can be readily extended to other engineering and design domains. Our framework integrates AI-driven design agents into the traditional engineering workflow, demonstrating how these specialized computational agents interact seamlessly… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

  23. arXiv:2503.17564  [pdf, other

    eess.IV cs.CV cs.LG

    ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology

    Authors: Vishwesh Ramanathan, Tony Xu, Pushpak Pati, Faruk Ahmed, Maged Goubran, Anne L. Martel

    Abstract: Prediction tasks in digital pathology are challenging due to the massive size of whole-slide images (WSIs) and the weak nature of training signals. Advances in computing, data availability, and self-supervised learning (SSL) have paved the way for slide-level foundation models (SLFMs) that can improve prediction tasks in low-data regimes. However, working with these models is challenging, with iss… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  24. arXiv:2503.17400  [pdf, other

    physics.flu-dyn cs.LG

    TripNet: Learning Large-scale High-fidelity 3D Car Aerodynamics with Triplane Networks

    Authors: Qian Chen, Mohamed Elrefaie, Angela Dai, Faez Ahmed

    Abstract: Surrogate modeling has emerged as a powerful tool to accelerate Computational Fluid Dynamics (CFD) simulations. Existing 3D geometric learning models based on point clouds, voxels, meshes, or graphs depend on explicit geometric representations that are memory-intensive and resolution-limited. For large-scale simulations with millions of nodes and cells, existing models require aggressive downsampl… ▽ More

    Submitted 23 May, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

  25. arXiv:2503.17239  [pdf, other

    cs.CL cs.AI

    SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

    Authors: Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche

    Abstract: Fine-tuning large language models (LLMs) on downstream tasks can inadvertently erode their safety alignment, even for benign fine-tuning datasets. We address this challenge by proposing SafeMERGE, a post-fine-tuning framework that preserves safety while maintaining task utility. It achieves this by selectively merging fine-tuned and safety-aligned model layers only when those deviate from safe beh… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Journal ref: ICLR 2025 Workshop on Building Trust in Language Models and Applications

  26. arXiv:2503.09038  [pdf

    cs.CR

    Image Encryption Using DNA Encoding, Snake Permutation and Chaotic Substitution Techniques

    Authors: Waleed Ahmed Farooqui, Jawad Ahmad, Nadeem Kureshi, Fawad Ahmed, Aizaz Ahmad Khattak, Muhammad Shahbaz Khan

    Abstract: Securing image data in IoT networks and other insecure information channels is a matter of critical concern. This paper presents a new image encryption scheme using DNA encoding, snake permutation and chaotic substitution techniques that ensures robust security of the image data with reduced computational overhead. The DNA encoding and snake permutation modules ensure effective scrambling of the p… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  27. arXiv:2503.01436  [pdf

    cs.CV

    Fall Detection from Indoor Videos using MediaPipe and Handcrafted Feature

    Authors: Fatima Ahmed, Parag Biswas, Abdur Rashid, Md. Khaliluzzaman

    Abstract: Falls are a common cause of fatal injuries and hospitalization. However, having fall detection on person, in particular for senior citizens can prove to be critical. Presently,there are handheld, ambient detector and vision-based detection techniques being utilized for fall detection. However, the approaches have issues with accuracy and cost. In this regard, in this research, an approach is propo… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  28. arXiv:2502.14907  [pdf, other

    cs.CL cs.AI

    GneissWeb: Preparing High Quality Data for LLMs at Scale

    Authors: Hajar Emami Gohari, Swanand Ravindra Kadhe, Syed Yousaf Shah. Constantin Adam, Abdulhamid Adebayo, Praneet Adusumilli, Farhan Ahmed, Nathalie Baracaldo Angel, Santosh Borse, Yuan-Chi Chang, Xuan-Hong Dang, Nirmit Desai, Ravital Eres, Ran Iwamoto, Alexei Karve, Yan Koyfman, Wei-Han Lee, Changchang Liu, Boris Lublinsky, Takuyo Ohko, Pablo Pesce, Maroun Touma, Shiqiang Wang, Shalisha Witherspoon, Herbert Woisetschlager, David Wood , et al. (6 additional authors not shown)

    Abstract: Data quantity and quality play a vital role in determining the performance of Large Language Models (LLMs). High-quality data, in particular, can significantly boost the LLM's ability to generalize on a wide range of downstream tasks. Large pre-training datasets for leading LLMs remain inaccessible to the public, whereas many open datasets are small in size (less than 5 trillion tokens), limiting… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  29. arXiv:2502.10536  [pdf, other

    cs.CV cs.AI cs.LG

    PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation

    Authors: Faruk Ahmed, Lin Yang, Tiam Jaroensri, Andrew Sellergren, Yossi Matias, Avinatan Hassidim, Greg S. Corrado, Dale R. Webster, Shravya Shetty, Shruthi Prabhakara, Yun Liu, Daniel Golden, Ellery Wulczyn, David F. Steiner

    Abstract: The interpretation of histopathology cases underlies many important diagnostic and treatment decisions in medicine. Notably, this process typically requires pathologists to integrate and summarize findings across multiple slides per case. Existing vision-language capabilities in computational pathology have so far been largely limited to small regions of interest, larger regions at low magnificati… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: 8 main pages, 21 pages in total

  30. arXiv:2502.02594  [pdf, other

    cs.CE eess.SY

    Offshore Wind Turbine Tower Design and Optimization: A Review and AI-Driven Future Directions

    Authors: João Alves Ribeiro, Bruno Alves Ribeiro, Francisco Pimenta, Sérgio M. O. Tavares, Jie Zhang, Faez Ahmed

    Abstract: Offshore wind energy leverages the high intensity and consistency of oceanic winds, playing a key role in the transition to renewable energy. As energy demands grow, larger turbines are required to optimize power generation and reduce the Levelized Cost of Energy (LCoE), which represents the average cost of electricity over a project's lifetime. However, upscaling turbines introduces engineering c… ▽ More

    Submitted 28 December, 2024; originally announced February 2025.

  31. arXiv:2502.02421  [pdf, ps, other

    cs.CL cs.AI

    Activation-Informed Merging of Large Language Models

    Authors: Amin Heyrani Nobari, Kaveh Alimohammadi, Ali ArjomandBigdeli, Akash Srivastava, Faez Ahmed, Navid Azizan

    Abstract: Model merging, a method that combines the parameters and embeddings of multiple fine-tuned large language models (LLMs), offers a promising approach to enhance model performance across various tasks while maintaining computational efficiency. This paper introduces Activation-Informed Merging (AIM), a technique that integrates the information from the activation space of LLMs into the merging proce… ▽ More

    Submitted 14 June, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  32. arXiv:2502.00347  [pdf, other

    cs.CR

    IoT-enabled Drowsiness Driver Safety Alert System with Real-Time Monitoring Using Integrated Sensors Technology

    Authors: Bakhtiar Muiz, Abdul Hasib, Md. Faishal Ahmed, Abdullah Al Zubaer, Rakib Hossen, Mst Deloara Khushi, Anichur Rahman

    Abstract: Significant losses in terms of life and property occur from road traffic accidents, which are often caused by drunk and drowsy drivers. Reducing accidents requires effective detection of alcohol impairment and drowsiness as well as real-time driver monitoring. This paper aims to create an Internet of Things (IoT)--enabled Drowsiness Driver Safety Alert System with Real-Time Monitoring Using Integr… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

  33. arXiv:2412.13281  [pdf, other

    cs.CE

    Generative Optimization: A Perspective on AI-Enhanced Problem Solving in Engineering

    Authors: Cyril Picard, Lyle Regenwetter, Amin Heyrani Nobari, Akash Srivastava, Faez Ahmed

    Abstract: The field of engineering is shaped by the tools and methods used to solve problems. Optimization is one such class of powerful, robust, and effective engineering tools proven over decades of use. Within just a few years, generative artificial intelligence (GenAI) has risen as another promising tool for general-purpose problem-solving. While optimization shines at finding high-quality and precise s… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  34. arXiv:2412.05641  [pdf, other

    cs.LG cs.AI cs.SI

    Hyperedge Anomaly Detection with Hypergraph Neural Network

    Authors: Md. Tanvir Alam, Chowdhury Farhan Ahmed, Carson K. Leung

    Abstract: Hypergraph is a data structure that enables us to model higher-order associations among data entities. Conventional graph-structured data can represent pairwise relationships only, whereas hypergraph enables us to associate any number of entities, which is essential in many real-life applications. Hypergraph learning algorithms have been well-studied for numerous problem settings, such as node cla… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

  35. arXiv:2412.04707  [pdf, other

    cs.AI cs.CE cs.CV cs.HC

    Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis

    Authors: Rui Zhou, Yanxia Zhang, Chenyang Yuan, Frank Permenter, Nikos Arechiga, Matt Klenk, Faez Ahmed

    Abstract: This paper introduces a generative model designed for multimodal control over text-to-image foundation generative AI models such as Stable Diffusion, specifically tailored for engineering design synthesis. Our model proposes parametric, image, and text control modalities to enhance design precision and diversity. Firstly, it handles both partial and complete parametric inputs using a diffusion mod… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  36. arXiv:2412.00107  [pdf, other

    cs.LG cs.AI eess.SP

    Virtual Sensing to Enable Real-Time Monitoring of Inaccessible Locations \& Unmeasurable Parameters

    Authors: Kazuma Kobayashi, Farid Ahmed, Syed Bahauddin Alam

    Abstract: Real-time monitoring of critical parameters is essential for energy systems' safe and efficient operation. However, traditional sensors often fail and degrade in harsh environments where physical sensors cannot be placed (inaccessible locations). In addition, there are important parameters that cannot be directly measured by sensors. We need machine learning (ML)-based real-time monitoring in thos… ▽ More

    Submitted 27 November, 2024; originally announced December 2024.

    Comments: 17 pages, 7 figures

  37. arXiv:2411.15128  [pdf, other

    cs.LG cs.AI cs.CV cs.MM eess.IV

    Health AI Developer Foundations

    Authors: Atilla P. Kiraly, Sebastien Baur, Kenneth Philbrick, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Nick George, Fayaz Jamil, Jing Tang, Kai Bailey, Faruk Ahmed, Akshay Goel, Abbi Ward, Lin Yang, Andrew Sellergren, Yossi Matias, Avinatan Hassidim, Shravya Shetty, Daniel Golden, Shekoofeh Azizi, David F. Steiner, Yun Liu, Tim Thelin, Rory Pilgrim , et al. (1 additional authors not shown)

    Abstract: Robust medical Machine Learning (ML) models have the potential to revolutionize healthcare by accelerating clinical research, improving workflows and outcomes, and producing novel insights or capabilities. Developing such ML models from scratch is cost prohibitive and requires substantial compute, data, and time (e.g., expert labeling). To address these challenges, we introduce Health AI Developer… ▽ More

    Submitted 26 November, 2024; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: 16 pages, 8 figures

  38. arXiv:2411.11527  [pdf

    cs.CY

    Design and Development of a Localized E-Commerce Solution for Students focussing on Economical Sharing

    Authors: Faiz Ahmed, Nitin Kumar Jha, Md Faizan

    Abstract: The rapid adoption of e-commerce has transformed how students access goods and resources. However, existing platforms often fail to address the specific needs of campus communities, where students face challenges such as financial constraints, lack of access to affordable goods, and inefficient resource circulation. This research proposes ShareSpace, a localized web application designed specifical… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  39. arXiv:2411.04936  [pdf, other

    cs.LG cs.DC cs.SI

    Fed-LDR: Federated Local Data-infused Graph Creation with Node-centric Model Refinement

    Authors: Jiechao Gao, Yuangang Li, Syeda Faiza Ahmed

    Abstract: The rapid acceleration of global urbanization has introduced novel challenges in enhancing urban infrastructure and services. Spatio-temporal data, integrating spatial and temporal dimensions, has emerged as a critical tool for understanding urban phenomena and promoting sustainability. In this context, Federated Learning (FL) has gained prominence as a distributed learning paradigm aligned with t… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

  40. arXiv:2410.13762  [pdf, other

    cs.LG cs.AI

    Virtual Sensing-Enabled Digital Twin Framework for Real-Time Monitoring of Nuclear Systems Leveraging Deep Neural Operators

    Authors: Raisa Bentay Hossain, Farid Ahmed, Kazuma Kobayashi, Seid Koric, Diab Abueidda, Syed Bahauddin Alam

    Abstract: Effective real-time monitoring is a foundation of digital twin technology, crucial for detecting material degradation and maintaining the structural integrity of nuclear systems to ensure both safety and operational efficiency. Traditional physical sensor systems face limitations such as installation challenges, high costs, and difficulty measuring critical parameters in hard-to-reach or harsh env… ▽ More

    Submitted 28 November, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

  41. arXiv:2410.12076  [pdf, ps, other

    cs.LG cs.CR

    Taking off the Rose-Tinted Glasses: A Critical Look at Adversarial ML Through the Lens of Evasion Attacks

    Authors: Kevin Eykholt, Farhan Ahmed, Pratik Vaishnavi, Amir Rahmati

    Abstract: The vulnerability of machine learning models in adversarial scenarios has garnered significant interest in the academic community over the past decade, resulting in a myriad of attacks and defenses. However, while the community appears to be overtly successful in devising new attacks across new contexts, the development of defenses has stalled. After a decade of research, we appear no closer to se… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  42. arXiv:2410.10853  [pdf, other

    cs.CL cs.AI

    Mitigating Hallucinations Using Ensemble of Knowledge Graph and Vector Store in Large Language Models to Enhance Mental Health Support

    Authors: Abdul Muqtadir, Hafiz Syed Muhammad Bilal, Ayesha Yousaf, Hafiz Farooq Ahmed, Jamil Hussain

    Abstract: This research work delves into the manifestation of hallucination within Large Language Models (LLMs) and its consequential impacts on applications within the domain of mental health. The primary objective is to discern effective strategies for curtailing hallucinatory occurrences, thereby bolstering the dependability and security of LLMs in facilitating mental health interventions such as therapy… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  43. arXiv:2410.08207  [pdf, other

    cs.CV cs.LG

    DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

    Authors: Xiaoxiao He, Ligong Han, Quan Dao, Song Wen, Minhao Bai, Di Liu, Han Zhang, Martin Renqiang Min, Felix Juefei-Xu, Chaowei Tan, Bo Liu, Kang Li, Hongdong Li, Junzhou Huang, Faez Ahmed, Akash Srivastava, Dimitris Metaxas

    Abstract: Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to enable precise inversion for discrete diffusion models, including multinomial diffusion and masked generative models. By recording noise sequences and ma… ▽ More

    Submitted 31 March, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: Project webpage: https://hexiaoxiao-cs.github.io/DICE/. This paper was accepted to CVPR 2025 but later desk-rejected post camera-ready, due to a withdrawal from ICLR made 14 days before reviewer assignment

  44. arXiv:2410.07269  [pdf

    eess.IV cs.AI cs.CV

    Deep Learning for Surgical Instrument Recognition and Segmentation in Robotic-Assisted Surgeries: A Systematic Review

    Authors: Fatimaelzahraa Ali Ahmed, Mahmoud Yousef, Mariam Ali Ahmed, Hasan Omar Ali, Anns Mahboob, Hazrat Ali, Zubair Shah, Omar Aboumarzouk, Abdulla Al Ansari, Shidin Balakrishnan

    Abstract: Applying deep learning (DL) for annotating surgical instruments in robot-assisted minimally invasive surgeries (MIS) represents a significant advancement in surgical technology. This systematic review examines 48 studies that and advanced DL methods and architectures. These sophisticated DL models have shown notable improvements in the precision and efficiency of detecting and segmenting surgical… ▽ More

    Submitted 7 November, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: 57 pages, 9 figures, Published in Artificial Intelligence Reviews journal <https://link.springer.com/journal/10462>

  45. arXiv:2409.16294  [pdf, other

    cs.CV cs.GR cs.LG

    GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors

    Authors: Md Ferdous Alam, Faez Ahmed

    Abstract: The creation of manufacturable and editable 3D shapes through Computer-Aided Design (CAD) remains a highly manual and time-consuming task, hampered by the complex topology of boundary representations of 3D solids and unintuitive design tools. While most work in the 3D shape generation literature focuses on representations like meshes, voxels, or point clouds, practical engineering applications dem… ▽ More

    Submitted 8 April, 2025; v1 submitted 8 September, 2024; originally announced September 2024.

    Comments: 24 pages, 13 figures

  46. arXiv:2409.06699  [pdf

    eess.IV cs.CV

    A study on Deep Convolutional Neural Networks, Transfer Learning and Ensemble Model for Breast Cancer Detection

    Authors: Md Taimur Ahad, Sumaya Mustofa, Faruk Ahmed, Yousuf Rayhan Emon, Aunirudra Dey Anu

    Abstract: In deep learning, transfer learning and ensemble models have shown promise in improving computer-aided disease diagnosis. However, applying the transfer learning and ensemble model is still relatively limited. Moreover, the ensemble model's development is ad-hoc, overlooks redundant layers, and suffers from imbalanced datasets and inadequate augmentation. Lastly, significant Deep Convolutional Neu… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  47. Multi-Class Plant Leaf Disease Detection: A CNN-based Approach with Mobile App Integration

    Authors: Md Aziz Hosen Foysal, Foyez Ahmed, Md Zahurul Haque

    Abstract: Plant diseases significantly impact agricultural productivity, resulting in economic losses and food insecurity. Prompt and accurate detection is crucial for the efficient management and mitigation of plant diseases. This study investigates advanced techniques in plant disease detection, emphasizing the integration of image processing, machine learning, deep learning methods, and mobile technologi… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Journal ref: International Journal of Computer Applications Volume 186, No.41, September 2024

  48. Prompting for products: Investigating design space exploration strategies for text-to-image generative models

    Authors: Leah Chong, I-Ping Lo, Jude Rayan, Steven Dow, Faez Ahmed, Ioanna Lykourentzou

    Abstract: Text-to-image models are enabling efficient design space exploration, rapidly generating images from text prompts. However, many generative AI tools are imperfect for product design applications as they are not built for the goals and requirements of product design. The unclear link between text input and image output further complicates their application. This work empirically investigates design… ▽ More

    Submitted 22 July, 2024; originally announced August 2024.

    Comments: 12 pages, 7 figures

    ACM Class: I.2.1

    Journal ref: Des. Sci. 11 (2025) e2

  49. arXiv:2407.12281  [pdf, other

    cs.CR cs.AI

    Turning Generative Models Degenerate: The Power of Data Poisoning Attacks

    Authors: Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Farhan Ahmed, Ling Cai, Nathalie Baracaldo

    Abstract: The increasing use of large language models (LLMs) trained by third parties raises significant security concerns. In particular, malicious actors can introduce backdoors through poisoning attacks to generate undesirable outputs. While such attacks have been extensively studied in image domains and classification tasks, they remain underexplored for natural language generation (NLG) tasks. To addre… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 18 pages, 11 figures

  50. arXiv:2407.08675  [pdf, other

    cs.AI

    CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs

    Authors: Leah Chong, Jude Rayan, Steven Dow, Ioanna Lykourentzou, Faez Ahmed

    Abstract: Text-to-image generative models have increasingly been used to assist designers during concept generation in various creative domains, such as graphic design, user interface design, and fashion design. However, their applications in engineering design remain limited due to the models' challenges in generating images of feasible designs concepts. To address this issue, this paper introduces a metho… ▽ More

    Submitted 22 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures, 4 tables

    ACM Class: I.2.1