Cryptography and Security
See recent articles
Showing new listings for Thursday, 24 July 2025
- [1] arXiv:2507.16840 [pdf, html, other]
-
Title: CASPER: Contrastive Approach for Smart Ponzi Scheme Detecter with More Negative SamplesSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
The rapid evolution of digital currency trading, fueled by the integration of blockchain technology, has led to both innovation and the emergence of smart Ponzi schemes. A smart Ponzi scheme is a fraudulent investment operation in smart contract that uses funds from new investors to pay returns to earlier investors. Traditional Ponzi scheme detection methods based on deep learning typically rely on fully supervised models, which require large amounts of labeled data. However, such data is often scarce, hindering effective model training. To address this challenge, we propose a novel contrastive learning framework, CASPER (Contrastive Approach for Smart Ponzi detectER with more negative samples), designed to enhance smart Ponzi scheme detection in blockchain transactions. By leveraging contrastive learning techniques, CASPER can learn more effective representations of smart contract source code using unlabeled datasets, significantly reducing both operational costs and system complexity. We evaluate CASPER on the XBlock dataset, where it outperforms the baseline by 2.3% in F1 score when trained with 100% labeled data. More impressively, with only 25% labeled data, CASPER achieves an F1 score nearly 20% higher than the baseline under identical experimental conditions. These results highlight CASPER's potential for effective and cost-efficient detection of smart Ponzi schemes, paving the way for scalable fraud detection solutions in the future.
- [2] arXiv:2507.16852 [pdf, html, other]
-
Title: SynthCTI: LLM-Driven Synthetic CTI Generation to enhance MITRE Technique MappingÁlvaro Ruiz-Ródenas, Jaime Pujante Sáez, Daniel García-Algora, Mario Rodríguez Béjar, Jorge Blasco, José Luis Hernández-RamosComments: 17 pages, 13 figuresSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cyber Threat Intelligence (CTI) mining involves extracting structured insights from unstructured threat data, enabling organizations to understand and respond to evolving adversarial behavior. A key task in CTI mining is mapping threat descriptions to MITRE ATT\&CK techniques. However, this process is often performed manually, requiring expert knowledge and substantial effort. Automated approaches face two major challenges: the scarcity of high-quality labeled CTI data and class imbalance, where many techniques have very few examples. While domain-specific Large Language Models (LLMs) such as SecureBERT have shown improved performance, most recent work focuses on model architecture rather than addressing the data limitations. In this work, we present SynthCTI, a data augmentation framework designed to generate high-quality synthetic CTI sentences for underrepresented MITRE ATT\&CK techniques. Our method uses a clustering-based strategy to extract semantic context from training data and guide an LLM in producing synthetic CTI sentences that are lexically diverse and semantically faithful. We evaluate SynthCTI on two publicly available CTI datasets, CTI-to-MITRE and TRAM, using LLMs with different capacity. Incorporating synthetic data leads to consistent macro-F1 improvements: for example, ALBERT improves from 0.35 to 0.52 (a relative gain of 48.6\%), and SecureBERT reaches 0.6558 (up from 0.4412). Notably, smaller models augmented with SynthCTI outperform larger models trained without augmentation, demonstrating the value of data generation methods for building efficient and effective CTI classification systems.
- [3] arXiv:2507.16870 [pdf, html, other]
-
Title: Building a robust OAuth token based API Security: A High level OverviewComments: 11 pages, 5 figures, IEEE Transactions on Dependable and Secure ComputingSubjects: Cryptography and Security (cs.CR)
APIs (Application Programming Interfaces) or Web Services are the foundational building blocks that enable interconnected systems. However this proliferation of APIs has also introduced security challenges that require systematic and scalable solutions for secure authentication and authorization. This paper presents the fundamentals necessary for building a such a token-based API security system. It discusses the components necessary, the integration of OAuth 2.0, extensibility of the token architectures, necessary cryptographic foundations, and persistence strategies to ensure secure and resilient operations. In addition to architectural concerns, the paper explores best practices for token lifecycle management, scope definition, expiration policies, and revocation mechanisms, all framed within a real-world scenario. By adhering to these principles, developers can establish a robust baseline while maintaining the flexibility to customize their domain-specific requirements. The approach does not claim to cover all variations necessary for diverse architectures but instead focuses on key principles essential for any standard API token authentication system. Throughout, the paper emphasizes balancing practical considerations with security imperatives and uses key concepts such as the CIA triad, OAuth standards, secure token life cycle, and practices for protecting sensitive user and application data. The intent is to equip developers with the foundational knowledge necessary to build secure, scalable token-based API security systems ready to handle the evolving threat landscape.
- [4] arXiv:2507.16872 [pdf, html, other]
-
Title: CompLeak: Deep Learning Model Compression Exacerbates Privacy LeakageSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Model compression is crucial for minimizing memory storage and accelerating inference in deep learning (DL) models, including recent foundation models like large language models (LLMs). Users can access different compressed model versions according to their resources and budget. However, while existing compression operations primarily focus on optimizing the trade-off between resource efficiency and model performance, the privacy risks introduced by compression remain overlooked and insufficiently understood.
In this work, through the lens of membership inference attack (MIA), we propose CompLeak, the first privacy risk evaluation framework examining three widely used compression configurations that are pruning, quantization, and weight clustering supported by the commercial model compression framework of Google's TensorFlow-Lite (TF-Lite) and Facebook's PyTorch Mobile. CompLeak has three variants, given available access to the number of compressed models and original model. CompLeakNR starts by adopting existing MIA methods to attack a single compressed model, and identifies that different compressed models influence members and non-members differently. When the original model and one compressed model are available, CompLeakSR leverages the compressed model as a reference to the original model and uncovers more privacy by combining meta information (e.g., confidence vector) from both models. When multiple compressed models are available with/without accessing the original model, CompLeakMR innovatively exploits privacy leakage info from multiple compressed versions to substantially signify the overall privacy leakage. We conduct extensive experiments on seven diverse model architectures (from ResNet to foundation models of BERT and GPT-2), and six image and textual benchmark datasets. - [5] arXiv:2507.16887 [pdf, html, other]
-
Title: Revisiting Pre-trained Language Models for Vulnerability DetectionSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
The rapid advancement of pre-trained language models (PLMs) has demonstrated promising results for various code-related tasks. However, their effectiveness in detecting real-world vulnerabilities remains a critical challenge. % for the security community. While existing empirical studies evaluate PLMs for vulnerability detection (VD), their inadequate consideration in data preparation, evaluation setups, and experimental settings undermines the accuracy and comprehensiveness of evaluations. This paper introduces RevisitVD, an extensive evaluation of 17 PLMs spanning smaller code-specific PLMs and large-scale PLMs using newly constructed datasets. Specifically, we compare the performance of PLMs under both fine-tuning and prompt engineering, assess their effectiveness and generalizability across various training and testing settings, and analyze their robustness against code normalization, abstraction, and semantic-preserving transformations.
Our findings reveal that, for VD tasks, PLMs incorporating pre-training tasks designed to capture the syntactic and semantic patterns of code outperform both general-purpose PLMs and those solely pre-trained or fine-tuned on large code corpora. However, these models face notable challenges in real-world scenarios, such as difficulties in detecting vulnerabilities with complex dependencies, handling perturbations introduced by code normalization and abstraction, and identifying semantic-preserving vulnerable code transformations. Also, the truncation caused by the limited context windows of PLMs can lead to a non-negligible amount of labeling errors. This study underscores the importance of thorough evaluations of model performance in practical scenarios and outlines future directions to help enhance the effectiveness of PLMs for realistic VD applications. - [6] arXiv:2507.16952 [pdf, other]
-
Title: Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER DatasetSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
This study investigates the effectiveness of several machine learning algorithms for static malware detection using the EMBER dataset, which contains feature representations of Portable Executable (PE) files. We evaluate eight classification models: LightGBM, XGBoost, CatBoost, Random Forest, Extra Trees, HistGradientBoosting, k-Nearest Neighbors (KNN), and TabNet, under three preprocessing settings: original feature space, Principal Component Analysis (PCA), and Linear Discriminant Analysis (LDA). The models are assessed on accuracy, precision, recall, F1 score, and AUC to examine both predictive performance and robustness. Ensemble methods, especially LightGBM and XGBoost, show the best overall performance across all configurations, with minimal sensitivity to PCA and consistent generalization. LDA improves KNN performance but significantly reduces accuracy for boosting models. TabNet, while promising in theory, underperformed under feature reduction, likely due to architectural sensitivity to input structure. The analysis is supported by detailed exploratory data analysis (EDA), including mutual information ranking, PCA or t-SNE visualizations, and outlier detection using Isolation Forest and Local Outlier Factor (LOF), which confirm the discriminatory capacity of key features in the EMBER dataset. The results suggest that boosting models remain the most reliable choice for high-dimensional static malware detection, and that dimensionality reduction should be applied selectively based on model type. This work provides a benchmark for comparing classification models and preprocessing strategies in malware detection tasks and contributes insights that can guide future system development and real-world deployment.
- [7] arXiv:2507.16996 [pdf, html, other]
-
Title: From Cracks to Crooks: YouTube as a Vector for Malware DistributionSubjects: Cryptography and Security (cs.CR)
With billions of users and an immense volume of daily uploads, YouTube has become an attractive target for cybercriminals aiming to leverage its vast audience. The platform's openness and trustworthiness provide an ideal environment for deceptive campaigns that can operate under the radar of conventional security tools. This paper explores how cybercriminals exploit YouTube to disseminate malware, focusing on campaigns that promote free software or game cheats. It discusses deceptive video demonstrations and the techniques behind malware delivery. Additionally, the paper presents a new evasion technique that abuses YouTube's multilingual metadata capabilities to circumvent automated detection systems. Findings indicate that this method is increasingly being used in recent malicious videos to avoid detection and removal.
- [8] arXiv:2507.17007 [pdf, html, other]
-
Title: The Postman: A Journey of Ethical Hacking in PosteID/SPID BorderlandSubjects: Cryptography and Security (cs.CR)
This paper presents a vulnerability assessment activity that we carried out on PosteID, the implementation of the Italian Public Digital Identity System (SPID) by Poste Italiane. The activity led to the discovery of a critical privilege escalation vulnerability, which was eventually patched. The overall analysis and disclosure process represents a valuable case study for the community of ethical hackers. In this work, we present both the technical steps and the details of the disclosure process.
- [9] arXiv:2507.17010 [pdf, html, other]
-
Title: Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge ProofsComments: Submitted for peer-review in TrustXR - 2025Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In the era of synthetic media, deepfake manipulations pose a significant threat to information integrity. To address this challenge, we propose TrustDefender, a two-stage framework comprising (i) a lightweight convolutional neural network (CNN) that detects deepfake imagery in real-time extended reality (XR) streams, and (ii) an integrated succinct zero-knowledge proof (ZKP) protocol that validates detection results without disclosing raw user data. Our design addresses both the computational constraints of XR platforms while adhering to the stringent privacy requirements in sensitive settings. Experimental evaluations on multiple benchmark deepfake datasets demonstrate that TrustDefender achieves 95.3% detection accuracy, coupled with efficient proof generation underpinned by rigorous cryptography, ensuring seamless integration with high-performance artificial intelligence (AI) systems. By fusing advanced computer vision models with provable security mechanisms, our work establishes a foundation for reliable AI in immersive and privacy-sensitive applications.
- [10] arXiv:2507.17033 [pdf, other]
-
Title: GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AIJoshua Kalyanapu, Farshad Dizani, Darsh Asher, Azam Ghanbari, Rosario Cammarota, Aydin Aysu, Samira Mirbagher AjorpazComments: Accepted at MICRO 2025Subjects: Cryptography and Security (cs.CR)
As power consumption from AI training and inference continues to increase, AI accelerators are being integrated directly into the CPU. Intel's Advanced Matrix Extensions (AMX) is one such example, debuting on the 4th generation Intel Xeon Scalable CPU. We discover a timing side and covert channel, GATEBLEED, caused by the aggressive power gating utilized to keep the CPU within operating limits. We show that the GATEBLEED side channel is a threat to AI privacy as many ML models such as transformers and CNNs make critical computationally-heavy decisions based on private values like confidence thresholds and routing logits. Timing delays from selective powering down of AMX components mean that each matrix multiplication is a potential leakage point when executed on the AMX accelerator. Our research identifies over a dozen potential gadgets across popular ML libraries (HuggingFace, PyTorch, TensorFlow, etc.), revealing that they can leak sensitive and private information. GATEBLEED poses a risk for local and remote timing inference, even under previous protective measures. GATEBLEED can be used as a high performance, stealthy remote covert channel and a generic magnifier for timing transmission channels, capable of bypassing traditional cache defenses to leak arbitrary memory addresses and evading state of the art microarchitectural attack detectors under realistic network conditions and system configurations in which previous attacks fail. We implement an end-to-end microarchitectural inference attack on a transformer model optimized with Intel AMX, achieving a membership inference accuracy of 81% and a precision of 0.89. In a CNN-based or transformer-based mixture-of-experts model optimized with Intel AMX, we leak expert choice with 100% accuracy.
- [11] arXiv:2507.17064 [pdf, other]
-
Title: SoK: Securing the Final Frontier for Cybersecurity in Space-Based InfrastructureSubjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
With the advent of modern technology, critical infrastructure, communications, and national security depend increasingly on space-based assets. These assets, along with associated assets like data relay systems and ground stations, are, therefore, in serious danger of cyberattacks. Strong security defenses are essential to ensure data integrity, maintain secure operations, and protect assets in space and on the ground against various threats. Previous research has found discrete vulnerabilities in space systems and suggested specific solutions to address them. Such research has yielded valuable insights, but lacks a thorough examination of space cyberattack vectors and a rigorous assessment of the efficacy of mitigation techniques. This study tackles this issue by taking a comprehensive approach to analyze the range of possible space cyber-attack vectors, which include ground, space, satellite, and satellite constellations. In order to address the particular threats, the study also assesses the efficacy of mitigation measures that are linked with space infrastructures and proposes a Risk Scoring Framework. Based on the analysis, this paper identifies potential research challenges for developing and testing cutting-edge technology solutions, encouraging robust cybersecurity measures needed in space.
- [12] arXiv:2507.17074 [pdf, html, other]
-
Title: Analysis of Post-Quantum Cryptography in User Equipment in 5G and BeyondComments: Table 5, Figures 7, This paper has been accepted as a regular paper at LCN 2025 and will appear in the conference proceedings. The final version will be published by IEEE and the copyright will belong to IEEESubjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Performance (cs.PF)
The advent of quantum computing threatens the security of classical public-key cryptographic systems, prompting the transition to post-quantum cryptography (PQC). While PQC has been analyzed in theory, its performance in practical wireless communication environments remains underexplored. This paper presents a detailed implementation and performance evaluation of NIST-selected PQC algorithms in user equipment (UE) to UE communications over 5G networks. Using a full 5G emulation stack (Open5GS and UERANSIM) and PQC-enabled TLS 1.3 via BoringSSL and liboqs, we examine key encapsulation mechanisms and digital signature schemes across realistic network conditions. We evaluate performance based on handshake latency, CPU and memory usage, bandwidth, and retransmission rates, under varying cryptographic configurations and client loads. Our findings show that ML-KEM with ML-DSA offers the best efficiency for latency-sensitive applications, while SPHINCS+ and HQC combinations incur higher computational and transmission overheads, making them unsuitable for security-critical but time-sensitive 5G scenarios.
- [13] arXiv:2507.17180 [pdf, html, other]
-
Title: A Privacy-Preserving Data Collection Method for Diversified Statistical AnalysisSubjects: Cryptography and Security (cs.CR)
Data perturbation-based privacy-preserving methods have been widely adopted in various scenarios due to their efficiency and the elimination of the need for a trusted third party. However, these methods primarily focus on individual statistical indicators, neglecting the overall quality of the collected data from a distributional perspective. Consequently, they often fall short of meeting the diverse statistical analysis requirements encountered in practical data analysis. As a promising sensitive data perturbation method, negative survey methods is able to complete the task of collecting sensitive information distribution while protecting personal privacy. Yet, existing negative survey methods are primarily designed for discrete sensitive information and are inadequate for real-valued data distributions. To bridge this gap, this paper proposes a novel real-value negative survey model, termed RVNS, for the first time in the field of real-value sensitive information collection. The RVNS model exempts users from the necessity of discretizing their data and only requires them to sample a set of data from a range that deviates from their actual sensitive details, thereby preserving the privacy of their genuine information. Moreover, to accurately capture the distribution of sensitive information, an optimization problem is formulated, and a novel approach is employed to solve it. Rigorous theoretical analysis demonstrates that the RVNS model conforms to the differential privacy model, ensuring robust privacy preservation. Comprehensive experiments conducted on both synthetic and real-world datasets further validate the efficacy of the proposed method.
- [14] arXiv:2507.17199 [pdf, html, other]
-
Title: Threshold-Protected Searchable Sharing: Privacy Preserving Aggregated-ANN Search for Collaborative RAGSubjects: Cryptography and Security (cs.CR)
LLM-powered search services have driven data integration as a significant trend. However, this trend's progress is fundamentally hindered, despite the fact that combining individual knowledge can significantly improve the relevance and quality of responses in specialized queries and make AI more professional at providing services. Two key bottlenecks are private data repositories' locality constraints and the need to maintain compatibility with mainstream search techniques, particularly Hierarchical Navigable Small World (HNSW) indexing for high-dimensional vector spaces. In this work, we develop a secure and privacy-preserving aggregated approximate nearest neighbor search (SP-A$^2$NN) with HNSW compatibility under a threshold-based searchable sharing primitive. A sharable bitgraph structure is constructed and extended to support searches and dynamical insertions over shared data without compromising the underlying graph topology. The approach reduces the complexity of a search from $O(n^2)$ to $O(n)$ compared to naive (undirected) graph-sharing approach when organizing graphs in the identical HNSW manner.
On the theoretical front, we explore a novel security analytical framework that incorporates privacy analysis via reductions. The proposed leakage-guessing proof system is built upon an entirely different interactive game that is independent of existing coin-toss game design. Rather than being purely theoretical, this system is rooted in existing proof systems but goes beyond them to specifically address leakage concerns and standardize leakage analysis -- one of the most critical security challenges with AI's rapid development. - [15] arXiv:2507.17259 [pdf, html, other]
-
Title: Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMsSubjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
Large language models (LLMs) are increasingly trained on tabular data, which, unlike unstructured text, often contains personally identifiable information (PII) in a highly structured and explicit format. As a result, privacy risks arise, since sensitive records can be inadvertently retained by the model and exposed through data extraction or membership inference attacks (MIAs). While existing MIA methods primarily target textual content, their efficacy and threat implications may differ when applied to structured data, due to its limited content, diverse data types, unique value distributions, and column-level semantics. In this paper, we present Tab-MIA, a benchmark dataset for evaluating MIAs on tabular data in LLMs and demonstrate how it can be used. Tab-MIA comprises five data collections, each represented in six different encoding formats. Using our Tab-MIA benchmark, we conduct the first evaluation of state-of-the-art MIA methods on LLMs finetuned with tabular data across multiple encoding formats. In the evaluation, we analyze the memorization behavior of pretrained LLMs on structured data derived from Wikipedia tables. Our findings show that LLMs memorize tabular data in ways that vary across encoding formats, making them susceptible to extraction via MIAs. Even when fine-tuned for as few as three epochs, models exhibit high vulnerability, with AUROC scores approaching 90% in most cases. Tab-MIA enables systematic evaluation of these risks and provides a foundation for developing privacy-preserving methods for tabular data in LLMs.
- [16] arXiv:2507.17324 [pdf, html, other]
-
Title: An Empirical Study on Virtual Reality Software Security WeaknessesSubjects: Cryptography and Security (cs.CR)
Virtual Reality (VR) has emerged as a transformative technology across industries, yet its security weaknesses, including vulnerabilities, are underinvestigated. This study investigates 334 VR projects hosted on GitHub, examining 1,681 software security weaknesses to understand: what types of weaknesses are prevalent in VR software; {\em when} and {\em how} weaknesses are introduced; how long they have survived; and how they have been removed. Due to the limited availability of VR software security weaknesses in public databases (e.g., the National Vulnerability Database or NVD), we prepare the {first systematic} dataset of VR software security weaknesses by introducing a novel framework to collect such weaknesses from GitHub commit data. Our empirical study on the dataset leads to useful insights, including: (i) VR weaknesses are heavily skewed toward user interface weaknesses, followed by resource-related weaknesses; (ii) VR development tools pose higher security risks than VR applications; (iii) VR security weaknesses are often introduced at the VR software birth time.
- [17] arXiv:2507.17385 [pdf, other]
-
Title: A Zero-overhead Flow for Security ClosureSubjects: Cryptography and Security (cs.CR)
In the traditional Application-Specific Integrated Circuit (ASIC) design flow, the concept of timing closure implies to reach convergence during physical synthesis such that, under a given area and power budget, the design works at the targeted frequency. However, security has been largely neglected when evaluating the Quality of Results (QoR) from physical synthesis. In general, commercial place & route tools do not understand security goals. In this work, we propose a modified ASIC design flow that is security-aware and, differently from prior research, does not degrade QoR for the sake of security improvement. Therefore, we propose a first-of-its-kind zero-overhead flow for security closure. Our flow is concerned with two distinct threat models: (i) insertion of Hardware Trojans (HTs) and (ii) physical probing/fault injection. Importantly, the flow is entirely executed within a commercial place & route engine and is scalable. In several metrics, our security-aware flow achieves the best-known results for the ISPD`22 set of benchmark circuits while incurring negligible design overheads due to security-related strategies. Finally, we open source the entire methodology (as a set of scripts) and also share the protected circuits (as design databases) for the benefit of the hardware security community.
- [18] arXiv:2507.17491 [pdf, html, other]
-
Title: Active Attack Resilience in 5G: A New Take on Authentication and Key AgreementComments: Accepted at RAID 2025Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
As 5G networks expand into critical infrastructure, secure and efficient user authentication is more important than ever. The 5G-AKA protocol, standardized by 3GPP in TS 33.501, is central to authentication in current 5G deployments. It provides mutual authentication, user privacy, and key secrecy. However, despite its adoption, 5G-AKA has known limitations in both security and performance. While it focuses on protecting privacy against passive attackers, recent studies show its vulnerabilities to active attacks. It also relies on a sequence number mechanism to prevent replay attacks, requiring perfect synchronization between the device and the core network. This stateful design adds complexity, causes desynchronization, and incurs extra communication overhead. More critically, 5G-AKA lacks Perfect Forward Secrecy (PFS), exposing past communications if long-term keys are compromised-an increasing concern amid sophisticated threats. This paper proposes an enhanced authentication protocol that builds on 5G-AKA's design while addressing its shortcomings. First, we introduce a stateless version that removes sequence number reliance, reducing complexity while staying compatible with existing SIM cards and infrastructure. We then extend this design to add PFS with minimal cryptographic overhead. Both protocols are rigorously analyzed using ProVerif, confirming their compliance with all major security requirements, including resistance to passive and active attacks, as well as those defined by 3GPP and academic studies. We also prototype both protocols and evaluate their performance against 5G-AKA and 5G-AKA' (USENIX'21). Our results show the proposed protocols offer stronger security with only minor computational overhead, making them practical, future-ready solutions for 5G and beyond.
- [19] arXiv:2507.17516 [pdf, html, other]
-
Title: Frequency Estimation of Correlated Multi-attribute Data under Local Differential PrivacySubjects: Cryptography and Security (cs.CR)
Large-scale data collection, from national censuses to IoT-enabled smart homes, routinely gathers dozens of attributes per individual. These multi-attribute datasets are vital for analytics but pose significant privacy risks. Local Differential Privacy (LDP) is a powerful tool to protect user data privacy by allowing users to locally perturb their records before releasing to an untrusted data aggregator. However, existing LDP mechanisms either split the privacy budget across all attributes or treat each attribute independently, ignoring natural inter-attribute correlations. This leads to excessive noise or fragmented budgets, resulting in significant utility loss, particularly in high-dimensional settings.
To overcome these limitations, we propose Correlated Randomized Response (Corr-RR), a novel LDP mechanism that leverages correlations among attributes to substantially improve utility while maintaining rigorous LDP guarantees. Corr-RR allocates the full privacy budget to perturb a single, randomly selected attribute and reconstructs the remaining attributes using estimated interattribute dependencies, without incurring additional privacy cost. To enable this, Corr-RR operates in two phases: (1) a subset of users apply standard LDP mechanisms to estimate correlations, and (2) each remaining user perturbs one attribute and infers the others using the learned correlations. We theoretically prove that Corr-RR satisfies $\epsilon$-LDP, and extensive experiments on synthetic and real-world datasets demonstrate that Corr-RR consistently outperforms state-of-the-art LDP mechanisms, particularly in scenarios with many attributes and strong inter-attribute correlations. - [20] arXiv:2507.17518 [pdf, html, other]
-
Title: Enabling Cyber Security Education through Digital Twins and Generative AISubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Software Engineering (cs.SE)
Digital Twins (DTs) are gaining prominence in cybersecurity for their ability to replicate complex IT (Information Technology), OT (Operational Technology), and IoT (Internet of Things) infrastructures, allowing for real time monitoring, threat analysis, and system simulation. This study investigates how integrating DTs with penetration testing tools and Large Language Models (LLMs) can enhance cybersecurity education and operational readiness. By simulating realistic cyber environments, this approach offers a practical, interactive framework for exploring vulnerabilities and defensive strategies. At the core of this research is the Red Team Knife (RTK), a custom penetration testing toolkit aligned with the Cyber Kill Chain model. RTK is designed to guide learners through key phases of cyberattacks, including reconnaissance, exploitation, and response within a DT powered ecosystem. The incorporation of Large Language Models (LLMs) further enriches the experience by providing intelligent, real-time feedback, natural language threat explanations, and adaptive learning support during training exercises. This combined DT LLM framework is currently being piloted in academic settings to develop hands on skills in vulnerability assessment, threat detection, and security operations. Initial findings suggest that the integration significantly improves the effectiveness and relevance of cybersecurity training, bridging the gap between theoretical knowledge and real-world application. Ultimately, the research demonstrates how DTs and LLMs together can transform cybersecurity education to meet evolving industry demands.
- [21] arXiv:2507.17628 [pdf, html, other]
-
Title: Quantifying the ROI of Cyber Threat Intelligence: A Data-Driven ApproachComments: 14 pagesSubjects: Cryptography and Security (cs.CR)
The valuation of Cyber Threat Intelligence (CTI) remains a persistent challenge due to the problem of negative evidence: successful threat prevention results in non-events that generate minimal observable financial impact, making CTI expenditures difficult to justify within traditional cost-benefit frameworks. This study introduces a data-driven methodology for quantifying the return on investment (ROI) of CTI, thereby reframing it as a measurable contributor to risk mitigation. The proposed framework extends established models in security economics, including the Gordon-Loeb and FAIR models, to account for CTI's complex influence on both the probability of security breaches and the severity of associated losses. The framework is operationalized through empirically grounded performance indicators, such as reductions in mean time to detect (MTTD), mean time to respond (MTTR), and adversary dwell time, supported by three sector-specific case studies in finance, healthcare, and retail. To address limitations in conventional linear assessment methodologies, the Threat Intelligence Effectiveness Index (TIEI) is introduced as a composite metric based on a weighted geometric mean. TIEI penalizes underperformance across critical dimensions: quality, enrichment, integration, and operational impact; thereby capturing bottleneck effect where the least effective component limits overall performance. By integrating financial quantification, adversarial coverage, and qualitative assessments of business enablement, the proposed hybrid model converts negative evidence into a justifiable ROI explanation. This approach offers a replicable means of repositioning CTI from an expense to a strategic investment, enabling informed decision-making and continuous optimization across diverse organizational contexts.
- [22] arXiv:2507.17655 [pdf, html, other]
-
Title: Rethinking HSM and TPM Security in the Cloud: Real-World Attacks and Next-Gen DefensesComments: 9 pages, 2 Flowcharts, 2 TablesSubjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Software Engineering (cs.SE)
As organizations rapidly migrate to the cloud, the security of cryptographic key management has become a growing concern. Hardware Security Modules (HSMs) and Trusted Platform Modules (TPMs), traditionally seen as the gold standard for securing encryption keys and digital trust, are increasingly challenged by cloud-native threats. Real-world breaches have exposed weaknesses in cloud deployments, including misconfigurations, API abuse, and privilege escalations, allowing attackers to access sensitive key material and bypass protections. These incidents reveal that while the hardware remains secure, the surrounding cloud ecosystem introduces systemic vulnerabilities. This paper analyzes notable security failures involving HSMs and TPMs, identifies common attack vectors, and questions longstanding assumptions about their effectiveness in distributed environments. We explore alternative approaches such as confidential computing, post-quantum cryptography, and decentralized key management. Our findings highlight that while HSMs and TPMs still play a role, modern cloud security requires more adaptive, layered architectures. By evaluating both current weaknesses and emerging models, this research equips cloud architects and security engineers with strategies to reinforce cryptographic trust in the evolving threat landscape.
New submissions (showing 22 of 22 entries)
- [23] arXiv:2507.17006 (cross-list from quant-ph) [pdf, html, other]
-
Title: Quantitative Quantum Soundness for Bipartite Compiled Bell Games via the Sequential NPA HierarchyIgor Klep, Connor Paddock, Marc-Olivier Renou, Simon Schmidt, Lucas Tendick, Xiangling Xu, Yuming ZhaoComments: 41 pages, 1 figure; comments welcome. We refer to Cui, Falor, Natarajan, and Zhang for an independent parallel work on the same topicSubjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Mathematical Physics (math-ph)
Compiling Bell games under cryptographic assumptions replaces the need for physical separation, allowing nonlocality to be probed with a single untrusted device. While Kalai et al. (STOC'23) showed that this compilation preserves quantum advantages, its quantitative quantum soundness has remained an open problem. We address this gap with two primary contributions. First, we establish the first quantitative quantum soundness bounds for every bipartite compiled Bell game whose optimal quantum strategy is finite-dimensional: any polynomial-time prover's score in the compiled game is negligibly close to the game's ideal quantum value. More generally, for all bipartite games we show that the compiled score cannot significantly exceed the bounds given by a newly formalized sequential Navascués-Pironio-Acín (NPA) hierarchy. Second, we provide a full characterization of this sequential NPA hierarchy, establishing it as a robust numerical tool that is of independent interest. Finally, for games without finite-dimensional optimal strategies, we explore the necessity of NPA approximation error for quantitatively bounding their compiled scores, linking these considerations to the complexity conjecture $\mathrm{MIP}^{\mathrm{co}}=\mathrm{coRE}$ and open challenges such as quantum homomorphic encryption correctness for "weakly commuting" quantum registers.
- [24] arXiv:2507.17017 (cross-list from cs.DS) [pdf, html, other]
-
Title: Optimal Pure Differentially Private Sparse Histograms in Near-Linear Deterministic TimeSubjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR)
We introduce an algorithm that releases a pure differentially private sparse histogram over $n$ participants drawn from a domain of size $d \gg n$. Our method attains the optimal $\ell_\infty$-estimation error and runs in strictly $O(n \ln \ln d)$ time in the word-RAM model, thereby improving upon the previous best known deterministic-time bound of $\tilde{O}(n^2)$ and resolving the open problem of breaking this quadratic barrier (Balcer and Vadhan, 2019). Central to our algorithm is a novel private item blanket technique with target-length padding, which transforms the approximate differentially private stability-based histogram algorithm into a pure differentially private one.
- [25] arXiv:2507.17188 (cross-list from cs.NI) [pdf, html, other]
-
Title: LLM Meets the Sky: Heuristic Multi-Agent Reinforcement Learning for Secure Heterogeneous UAV NetworksComments: Submitted to IEEE Transactions on Mobile ComputingSubjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
This work tackles the physical layer security (PLS) problem of maximizing the secrecy rate in heterogeneous UAV networks (HetUAVNs) under propulsion energy constraints. Unlike prior studies that assume uniform UAV capabilities or overlook energy-security trade-offs, we consider a realistic scenario where UAVs with diverse payloads and computation resources collaborate to serve ground terminals in the presence of eavesdroppers. To manage the complex coupling between UAV motion and communication, we propose a hierarchical optimization framework. The inner layer uses a semidefinite relaxation (SDR)-based S2DC algorithm combining penalty functions and difference-of-convex (d.c.) programming to solve the secrecy precoding problem with fixed UAV positions. The outer layer introduces a Large Language Model (LLM)-guided heuristic multi-agent reinforcement learning approach (LLM-HeMARL) for trajectory optimization. LLM-HeMARL efficiently incorporates expert heuristics policy generated by the LLM, enabling UAVs to learn energy-aware, security-driven trajectories without the inference overhead of real-time LLM calls. The simulation results show that our method outperforms existing baselines in secrecy rate and energy efficiency, with consistent robustness across varying UAV swarm sizes and random seeds.
- [26] arXiv:2507.17577 (cross-list from cs.CV) [pdf, other]
-
Title: Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based PriorsComments: Published at ICLR 2025 (Spotlight paper)Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
One of the most practical and challenging types of black-box adversarial attacks is the hard-label attack, where only the top-1 predicted label is available. One effective approach is to search for the optimal ray direction from the benign image that minimizes the $\ell_p$-norm distance to the adversarial region. The unique advantage of this approach is that it transforms the hard-label attack into a continuous optimization problem. The objective function value is the ray's radius, which can be obtained via binary search at a high query cost. Existing methods use a "sign trick" in gradient estimation to reduce the number of queries. In this paper, we theoretically analyze the quality of this gradient estimation and propose a novel prior-guided approach to improve ray search efficiency both theoretically and empirically. Specifically, we utilize the transfer-based priors from surrogate models, and our gradient estimators appropriately integrate them by approximating the projection of the true gradient onto the subspace spanned by these priors and random directions, in a query-efficient manner. We theoretically derive the expected cosine similarities between the obtained gradient estimators and the true gradient, and demonstrate the improvement achieved by incorporating priors. Extensive experiments on the ImageNet and CIFAR-10 datasets show that our approach significantly outperforms 11 state-of-the-art methods in terms of query efficiency.
- [27] arXiv:2507.17589 (cross-list from quant-ph) [pdf, html, other]
-
Title: Encrypted-State Quantum Compilation Scheme Based on Quantum Circuit ObfuscationSubjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
With the rapid advancement of quantum computing, quantum compilation has become a crucial layer connecting high-level algorithms with physical hardware. In quantum cloud computing, compilation is performed on the cloud side, which exposes user circuits to potential risks such as structural leakage and output predictability. To address these issues, we propose the encrypted-state quantum compilation scheme based on quantum circuit obfuscation (ECQCO), the first secure compilation framework tailored for the co-location of compilers and quantum hardware. It applies quantum homomorphic encryption to conceal output states and instantiates a structure obfuscation mechanism based on quantum indistinguishability obfuscation, effectively protecting both functionality and topology of the circuit. Additionally, an adaptive decoupling obfuscation algorithm is designed to suppress potential idle errors while inserting pulse operations. The proposed scheme achieves information-theoretic security and guarantees computational indistinguishability under the quantum random oracle model. Experimental results on benchmark datasets show that ECQCO achieves a TVD of up to 0.7 and a normalized GED of 0.88, enhancing compilation-stage security. Moreover, it introduces only a slight increase in circuit depth, while keeping the average fidelity change within 1%, thus achieving a practical balance between security and efficiency.
- [28] arXiv:2507.17691 (cross-list from cs.SE) [pdf, html, other]
-
Title: CASCADE: LLM-Powered JavaScript Deobfuscator at GoogleSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Programming Languages (cs.PL)
Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google's production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.
- [29] arXiv:2507.17712 (cross-list from quant-ph) [pdf, html, other]
-
Title: Quantum Software Security Challenges within Shared Quantum Computing EnvironmentsComments: This paper has been accepted for presentation at the 2025 IEEE International Conference on Quantum Computing and Engineering (QCE)Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
The number of qubits in quantum computers keeps growing, but most quantum programs remain relatively small because of the noisy nature of the underlying quantum hardware. This might lead quantum cloud providers to explore increased hardware utilization, and thus profitability through means such as multi-programming, which would allow the execution of multiple programs in parallel. The adoption of such technology would bring entirely new challenges to the field of quantum software security. This article explores and reports the key challenges identified in quantum software security within shared quantum computing environments.
- [30] arXiv:2507.17736 (cross-list from cs.IT) [pdf, html, other]
-
Title: Symmetric Private Information Retrieval (SPIR) on Graph-Based Replicated SystemsSubjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Databases (cs.DB); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
We introduce the problem of symmetric private information retrieval (SPIR) on replicated databases modeled by a simple graph. In this model, each vertex corresponds to a server, and a message is replicated on two servers if and only if there is an edge between them. We consider the setting where the server-side common randomness necessary to accomplish SPIR is also replicated at the servers according to the graph, and we call this as message-specific common randomness. In this setting, we establish a lower bound on the SPIR capacity, i.e., the maximum download rate, for general graphs, by proposing an achievable SPIR scheme. Next, we prove that, for any SPIR scheme to be feasible, the minimum size of message-specific randomness should be equal to the size of a message. Finally, by providing matching upper bounds, we derive the exact SPIR capacity for the class of path and regular graphs.
Cross submissions (showing 8 of 8 entries)
- [31] arXiv:2411.04030 (replaced) [pdf, other]
-
Title: Quantum-Safe Hybrid Key Exchanges with KEM-Based AuthenticationSubjects: Cryptography and Security (cs.CR); Quantum Physics (quant-ph)
Authenticated Key Exchange (AKE) between any two entities is one of the most important security protocols available for securing our digital networks and infrastructures. In PQCrypto 2023, Bruckner, Ramacher and Striecks proposed a novel hybrid AKE (HAKE) protocol, dubbed Muckle+, that is particularly useful in large quantum-safe networks consisting of a large number of nodes. Their protocol is hybrid in the sense that it allows key material from conventional and post-quantum primitives, as well as from quantum key distribution, to be incorporated into a single end-to-end shared key.
To achieve the desired authentication properties, Muckle+ utilizes post-quantum digital signatures. However, available instantiations of such signatures schemes are not yet efficient enough compared to their post-quantum key-encapsulation mechanism (KEM) counterparts, particularly in large networks with potentially several connections in a short period of time.
To mitigate this gap, we propose Muckle# that pushes the efficiency boundaries of currently known HAKE constructions. Muckle# uses post-quantum key-encapsulating mechanisms for implicit authentication inspired by recent works done in the area of Transport Layer Security (TLS) protocols, particularly, in KEMTLS (CCS'20).
We port those ideas to the HAKE framework and develop novel proof techniques on the way. Due to our novel KEM-based approach, the resulting protocol has a slightly different message flow compared to prior work that we carefully align with the HAKE framework and which makes our changes to the Muckle+ non-trivial. - [32] arXiv:2504.03173 (replaced) [pdf, html, other]
-
Title: PPFPL: Cross-silo Privacy-preserving Federated Prototype Learning Against Data Poisoning Attacks on Non-IID DataHongliang Zhang, Jiguo Yu, Fenghua Xu, Chunqiang Hu, Yongzhao Zhang, Xiaofen Wang, Zhongyuan Yu, Xiaosong ZhangSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Privacy-Preserving Federated Learning (PPFL) allows multiple clients to collaboratively train a deep learning model by submitting hidden model updates. Nonetheless, PPFL is vulnerable to data poisoning attacks due to the distributed training nature of clients. Existing solutions have struggled to improve the performance of cross-silo PPFL in poisoned Non-IID data. To address the issues, this paper proposes a privacy-preserving federated prototype learning framework, named PPFPL, which enhances the cross-silo FL performance in poisoned Non-IID data while effectively resisting data poisoning attacks. Specifically, we adopt prototypes as client-submitted model updates to eliminate the impact of tampered data distribution on federated learning. Moreover, we utilize two servers to achieve Byzantine-robust aggregation by secure aggregation protocol, which greatly reduces the impact of malicious clients. Theoretical analyses confirm the convergence of PPFPL, and experimental results on publicly available datasets show that PPFPL is effective for resisting data poisoning attacks with Non-IID conditions.
- [33] arXiv:2507.07901 (replaced) [pdf, html, other]
-
Title: The Trust Fabric: Decentralized Interoperability and Economic Coordination for the Agentic WebSree Bhargavi Balija, Rekha Singal, Ramesh Raskar, Erfan Darzi, Raghu Bala, Thomas Hardjono, Ken HuangSubjects: Cryptography and Security (cs.CR)
The fragmentation of AI agent ecosystems has created urgent demands for interoperability, trust, and economic coordination that current protocols -- including MCP (Hou et al., 2025), A2A (Habler et al., 2025), ACP (Liu et al., 2025), and Cisco's AGP (Edwards, 2025) -- cannot address at scale. We present the Nanda Unified Architecture, a decentralized framework built around three core innovations: fast DID-based agent discovery through distributed registries, semantic agent cards with verifiable credentials and composability profiles, and a dynamic trust layer that integrates behavioral attestations with policy compliance. The system introduces X42/H42 micropayments for economic coordination and MAESTRO, a security framework incorporating Synergetics' patented AgentTalk protocol (US Patent 12,244,584 B1) and secure containerization. Real-world deployments demonstrate 99.9 percent compliance in healthcare applications and substantial monthly transaction volumes with strong privacy guarantees. By unifying MIT's trust research with production deployments from Cisco and Synergetics, we show how cryptographic proofs and policy-as-code transform agents into trust-anchored participants in a decentralized economy (Lakshmanan, 2025; Sha, 2025). The result enables a globally interoperable Internet of Agents where trust becomes the native currency of collaboration across both enterprise and Web3 ecosystems.
- [34] arXiv:2305.08175 (replaced) [pdf, other]
-
Title: ResidualPlanner+: a scalable matrix mechanism for marginals and beyondSubjects: Databases (cs.DB); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Noisy marginals are a common form of confidentiality protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms.
We propose ResidualPlanner and ResidualPlanner+, two highly scalable matrix mechanisms. ResidualPlanner is both optimal and scalable for answering marginal queries with Gaussian noise, while ResidualPlanner+ provides support for more general workloads, such as combinations of marginals and range queries or prefix-sum queries. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore, ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets).
ResidualPlanner+ provides support for more complex workloads that combine marginal and range/prefix-sum queries (e.g., a marginal on race, a range query on age, and a combined race/age tabulation that answers age range queries for each race). It even supports custom user-defined workloads on different attributes. With this added flexibility, ResidualPlanner+ is not necessarily optimal, however it is still extremely scalable and outperforms the prior state-of-the-art (HDMM) on prefix-sum queries both in terms of accuracy and speed. - [35] arXiv:2408.12240 (replaced) [pdf, other]
-
Title: The Bright Side of Timed OpacityComments: This is the author (and extended) version of the manuscript of the same name published in the proceedings of the 25th International Conference on Formal Engineering Methods (ICFEM 2024)Subjects: Logic in Computer Science (cs.LO); Cryptography and Security (cs.CR); Formal Languages and Automata Theory (cs.FL)
Timed automata (TAs) are an extension of finite automata that can measure and react to the passage of time, providing the ability to handle real-time constraints using clocks. In 2009, Franck Cassez showed that the timed opacity problem, where an attacker can observe some actions with their timestamps and attempts to deduce information, is undecidable for TAs. Moreover, he showed that the undecidability holds even for subclasses such as event-recording automata. In this article, we consider the same definition of opacity, by restricting either the system or the attacker. Our first contribution is to prove the inter-reducibility of two variants of opacity: full opacity (for which the observations should be the same regardless of the visit of a private location) and weak opacity (for which it suffices that the attacker cannot deduce whether the private location was visited, but for which it is harmless to deduce that it was not visited); we also prove further results including a connection with timed language inclusion. Our second contribution is to study opacity for several subclasses of TAs: with restrictions on the number of clocks, the number of actions, the nature of time, or a new subclass called observable event-recording automata. We show that opacity is mostly decidable in these cases, except for one-action TAs and for one-clock TAs with $\epsilon$-transitions, for which undecidability remains. Our third (and arguably main) contribution is to propose a new definition of opacity in which the number of observations made by the attacker is limited to the first $N$ observations, or to a set of $N$ timestamps after which the attacker observes the first action that follows immediately. This set can be defined either a priori or at runtime; all three versions yield decidability for the whole TA class.
- [36] arXiv:2501.14644 (replaced) [pdf, html, other]
-
Title: Optimizing Privacy-Utility Trade-off in Decentralized Learning with Generalized Correlated NoiseComments: 6 pages, 5 figures, accepted at IEEE ITW 2025Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Decentralized learning enables distributed agents to collaboratively train a shared machine learning model without a central server, through local computation and peer-to-peer communication. Although each agent retains its dataset locally, sharing local models can still expose private information about the local training datasets to adversaries. To mitigate privacy attacks, a common strategy is to inject random artificial noise at each agent before exchanging local models between neighbors. However, this often leads to utility degradation due to the negative effects of cumulated artificial noise on the learning algorithm. In this work, we introduce CorN-DSGD, a novel covariance-based framework for generating correlated privacy noise across agents, which unifies several state-of-the-art methods as special cases. By leveraging network topology and mixing weights, CorN-DSGD optimizes the noise covariance to achieve network-wide noise cancellation. Experimental results show that CorN-DSGD cancels more noise than existing pairwise correlation schemes, improving model performance under formal privacy guarantees.
- [37] arXiv:2502.20650 (replaced) [pdf, html, other]
-
Title: Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
In recent years, Diffusion Models (DMs) have demonstrated significant advances in the field of image generation. However, according to current research, DMs are vulnerable to backdoor attacks, which allow attackers to control the model's output by inputting data containing covert triggers, such as a specific visual patch or phrase. Existing defense strategies are well equipped to thwart such attacks through backdoor detection and trigger inversion because previous attack methods are constrained by limited input spaces and low-dimensional triggers. For example, visual triggers are easily observed by defenders, text-based or attention-based triggers are more susceptible to neural network detection. To explore more possibilities of backdoor attack in DMs, we propose Gungnir, a novel method that enables attackers to activate the backdoor in DMs through style triggers within input images. Our approach proposes using stylistic features as triggers for the first time and implements backdoor attacks successfully in image-to-image tasks by introducing Reconstructing-Adversarial Noise (RAN) and Short-Term Timesteps-Retention (STTR). Our technique generates trigger-embedded images that are perceptually indistinguishable from clean images, thus bypassing both manual inspection and automated detection neural networks. Experiments demonstrate that Gungnir can easily bypass existing defense methods. Among existing DM defense frameworks, our approach achieves a 0 backdoor detection rate (BDR). Our codes are available at this https URL.
- [38] arXiv:2503.12192 (replaced) [pdf, html, other]
-
Title: Closing the Chain: How to reduce your risk of being SolarWinds, Log4j, or XZ UtilsSubjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Software supply chain frameworks, such as the US NIST Secure Software Development Framework (SSDF), detail what tasks software development organizations are recommended or mandated to adopt to reduce security risk. However, to further reduce the risk of similar attacks occurring, software organizations benefit from knowing what tasks mitigate attack techniques the attackers are currently using to address specific threats, prioritize tasks, and close mitigation gaps. The goal of this study is to aid software organizations in reducing the risk of software supply chain attacks by systematically synthesizing how framework tasks mitigate the attack techniques used in the SolarWinds, Log4j, and XZ Utils attacks. We qualitatively analyzed 106 Cyber Threat Intelligence (CTI) reports of the 3 attacks to gather the attack techniques. We then systematically constructed a mapping between attack techniques and the 73 tasks enumerated in 10 software supply chain frameworks. Afterward, we established and ranked priority tasks that mitigate attack techniques. The three mitigation tasks with the highest scores are role-based access control, system monitoring, and boundary protection. Additionally, three mitigation tasks were missing from all ten frameworks, including sustainable open-source software and environmental scanning tools. Thus, software products would still be vulnerable to software supply chain attacks even if organizations adopted all recommended tasks.
- [39] arXiv:2507.13508 (replaced) [pdf, other]
-
Title: Fake or Real: The Impostor Hunt in Texts for Space OperationsAgata Kaczmarek, Dawid Płudowski, Piotr Wilczyński, Krzysztof Kotowski, Ramez Shendy, Evridiki Ntagiou, Jakub Nalepa, Artur Janicki, Przemysław BiecekSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
The "Fake or Real" competition hosted on Kaggle (this https URL ) is the second part of a series of follow-up competitions and hackathons related to the "Assurance for Space Domain AI Applications" project funded by the European Space Agency (this https URL ). The competition idea is based on two real-life AI security threats identified within the project -- data poisoning and overreliance in Large Language Models. The task is to distinguish between the proper output from LLM and the output generated under malicious modification of the LLM. As this problem was not extensively researched, participants are required to develop new techniques to address this issue or adjust already existing ones to this problem's statement.