这是indexloc提供的服务,不要输入任何密码

Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10795 publications
    FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
    Diganta Misra
    Yanqi Luo
    Anjali Sridhar
    Justine Gehring
    Silvio Soares Ribeiro Junior
    2026
    Preview abstract AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization. View details
    Productionizing Quantum Mass Production
    Bill Huggins
    Nathan Wiebe
    arXiv for now (2026) (to appear)
    Preview abstract For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step. View details
    Preview abstract Generative AI is revolutionizing content creation and holds promise for real-time, personalized educational experiences. We investigated the effectiveness of converting textbook chapters into AI-generated podcasts and explored the impact of personalizing these podcasts for individual learner profiles. We conducted a 3x3 user study with 180 college students in the United States, comparing traditional textbook reading with both generalized and personalized AI-generated podcasts across three textbook subjects. The personalized podcasts were tailored to students’ majors, interests, and learning styles. Our findings show that students found the AI-generated podcast format to be more enjoyable than textbooks and that personalized podcasts led to significantly improved learning outcomes, although this was subject-specific. These results highlight that AI-generated podcasts can offer an engaging and effective modality transformation of textbook material, with personalization enhancing content relevance. We conclude with design recommendations for leveraging AI in education, informed by student feedback. View details
    Preview abstract Despite the surge in popularity of virtual reality (VR), mobile phones remain the primary medium for accessing digital content, offering both privacy and portability. This short paper presents Beyond the Phone, a novel framework that enhances mobile phones in VR with context-aware controls and spatial augmentation. We first establish a comprehensive design space through brainstorming and iterative discussions with VR experts. We then develop a proof-of-concept system that analyzes UI layouts to offer context-aware controls and spatial augmentation, targeting six key application areas within our design space. Finally, we demonstrate that our system can effectively adapt to a broad spectrum of applications at runtime, and discuss future directions with reviews with seven experts. View details
    Security Signals: Making Web Security Posture Measurable At Scale
    David Dworken
    Artur Janc
    Santiago (Sal) Díaz
    Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb)
    Preview abstract The area of security measurability is gaining increased attention, with a wide range of organizations calling for the development of scalable approaches for assessing the security of software systems and infrastructure. In this paper, we present our experience developing Security Signals, a comprehensive system providing security measurability for web services, deployed in a complex application ecosystem of thousands of web services handling traffic from billions of users. The system collects security-relevant information from production HTTP traffic at the reverse proxy layer, utilizing novel concepts such as synthetic signals augmented with additional risk information to provide a holistic view of the security posture of individual services and the broader application ecosystem. This approach to measurability has enabled large-scale security improvements to our services, including prioritized rollouts of security enhancements and the implementation of automated regression monitoring. Furthermore, it has proven valuable for security research and prioritization of defensive work. Security Signals addresses shortcomings of prior web measurability proposals by tracking a comprehensive set of security properties relevant to web applications, and by extracting insights from collected data for use by both security experts and non-experts. We believe the lessons learned from the implementation and use of Security Signals offer valuable insights for practitioners responsible for web service security, potentially inspiring new approaches to web security measurability. View details
    Preview abstract Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), have demonstrated significant potential in clinical reasoning skills such as history-taking and differential diagnosis generation—critical aspects of medical education. This work explores how LLMs can augment medical curricula through interactive learning. We conducted a participatory design process with medical students, residents and medical education experts to co-create an AI-powered tutor prototype for clinical reasoning. As part of the co-design process, we conducted a qualitative user study, investigating learning needs and practices via interviews, and conducting concept evaluations through interactions with the prototype. Findings highlight the challenges learners face in transitioning from theoretical knowledge to practical application, and how an AI tutor can provide personalized practice and feedback. We conclude with design considerations, emphasizing the importance of context-specific knowledge and emulating positive preceptor traits, to guide the development of AI tools for medical education. View details
    Preview abstract Despite the advent of legislation such as the General Data Protection Regulation (GDPR) with its associated "Right to be Forgotten" (RTBF), few, if any, studies have measured user reactions to realistic edge cases with public-interest content. Surveying both users covered by and excluded from RTBF, this vignette-based survey experiment sought to better understand how users think of delisting content from search engine results and what factors influence user perceptions. While leaving information accessible in search engine results generally leads to warmer feelings towards those search engines than delisting it, we find that users do prefer different outcomes depending on contextual elements specific to given cases. We also find that whether a country has active RTBF legislation does seem to be associated with both knowledge and attitudes about RTBF, but is unlikely to explain all of it. These results indicate a complex context around removing public-interest content from search engines’ results; it is essential that experts sensitive to local context perform the review in order to ensure that removal requests are handled in a way that meets users’ expectations. View details
    Preview abstract We give a new privacy amplification analysis for truncated Poisson sampling, a Poisson sampling variant that truncates a batch if it exceeds a given maximum batch size. View details
    Applying multimodal AI to physiological waveforms improves genetic prediction of cardiovascular traits
    Yuchen Zhou
    Mahantesh I. Biradar
    Jacqueline Shreibati
    Dongbing Lai
    Tae-Hwi Schwantes-An
    Robert Luben
    Zachary R. McCaw
    Jorgen Engmann
    Rui Providencia
    Amand Floriaan Schmidt
    Patricia B. Munroe
    Howard Yang
    Andrew Carroll
    Anthony Khawaja
    Babak Behsaz
    American Journal of Human Genetics, 112 (2025), pp. 1562 - 1579
    Preview abstract Electronic health records, biobanks, and wearable biosensors enable the collection of multiple health modalities from many individuals. Access to multimodal health data provides a unique opportunity for genetic studies of complex traits because different modalities relevant to a single physiological system (e.g., circulatory system) encode complementary and overlapping information. We propose a multimodal deep learning method, multimodal representation learning for genetic discovery on low-dimensional embeddings (M-REGLE), for discovering genetic associations from a joint representation of complementary electrophysiological waveform modalities. M-REGLE jointly learns a lower representation (i.e., latent factors) of multimodal physiological waveforms using a convolutional variational autoencoder, performs genome-wide association studies (GWASs) on each latent factor, then combines the results to study the genetics of the underlying system. To validate the advantages of M-REGLE and multimodal learning, we apply it to common cardiovascular modalities (photoplethysmogram [PPG] and electrocardiogram [ECG]) and compare its results to unimodal learning methods in which representations are learned from each data modality separately but are statistically combined for downstream genetic comparison. M-REGLE identifies 19.3% more loci on the 12-lead ECG dataset, 13.0% more loci on the ECG lead I + PPG dataset, and its genetic risk score significantly outperforms the unimodal risk score at predicting cardiac phenotypes, such as atrial fibrillation (Afib), in multiple biobanks. View details
    Efficient Single-Step Item ID Generation for Large-ScaleLLM-based Recommendation
    Anushya Subbiah
    Vikram Aggarwal
    James Pine
    Krishna Sayana
    Kun Su
    2025
    Preview abstract Integrating product catalogs and user behavior into LLMs can enhance recommendations with broad world knowledge, but the scale of real-world item catalogs, often containing millions of discrete item identifiers (Item IDs), poses a significant challenge. This contrasts with the smaller, tokenized text vocabularies typically used in LLMs. The predominant view within the LLM-based recommendation literature is that it is infeasible to treat item ids as a first class citizen in the LLM and instead some sort of tokenization of an item into multiple tokens is required. However, this creates a key practical bottleneck in serving these models for real-time low-latency applications. Our paper challenges this predominant practice and integrates item ids as first class citizens into the LLM. We provide simple, yet highly effective, novel training and inference modifications that enable single-token representations of items and single-step decoding. Our method shows improvements in recommendation quality (Recall and NDCG) over existing techniques on the Amazon shopping datasets while significantly improving inference efficiency by 5x-14x. Our work offers an efficiency perspective distinct from that of other popular approaches within LLM-based recommendation, potentially inspiring further research and opening up a new direction for integrating IDs into LLMs. Our code is available here https://drive.google.com/file/d/1cUMj37rV0Z1bCWMdhQ6i4q4eTRQLURtC/edit View details
    DroidCCT: Cryptographic Compliance Test via Trillion-Scale Measurement
    Rémi Audebert
    Pedro Barbosa
    Borbala Benko
    Alex (Mac) Mihai
    László Siroki
    Catherine Vlasov
    Annual Computer Security Applications Conference (ACSAC) (2025) (to appear)
    Preview
    Preview abstract In Julia, JuMP is the go-to modelling package for mathematical optimisation. As of this writing, Google's award-winning solvers have not been accessible through JuMP; which offers Julia's ease of use. ORTools.jl is changing this. Julia users will now have access to Google's Glop, CP-SAT, and PDLP solvers through JuMP as provided by the ORTools.jl package. This talk offers an introduction to the features of the package and an overview of the difficulties we encountered. View details
    Preview abstract Estimating Origin-Destination (OD) travel demand is vital for effective urban planning and traffic management. Developing universally applicable OD estimation methodologies is significantly challenged by the pervasive scarcity of high-fidelity traffic data and the difficulty in obtaining city-specific prior OD estimates (or seed ODs), which are often prerequisite for traditional approaches. Our proposed method directly estimates OD travel demand by systematically leveraging aggregated, anonymized statistics from Google Maps Traffic Trends, obviating the need for conventional census or city-provided OD data. The OD demand is estimated by formulating a single-level, one-dimensional, continuous nonlinear optimization problem with nonlinear equality and bound constraints to replicate highway path travel times. The method achieves efficiency and scalability by employing a differentiable analytical macroscopic network model. This model by design is computationally lightweight, distinguished by its parsimonious parameterization that requires minimal calibration effort and its capacity for instantaneous evaluation. These attributes ensure the method's broad applicability and practical utility across diverse cities globally. Using segment sensor counts from Los Angeles and San Diego highway networks, we validate our proposed approach, demonstrating a two-thirds to three-quarters improvement in the fit to segment count data over a baseline. Beyond validation, we establish the method's scalability and robust performance in replicating path travel times across diverse highway networks, including Seattle, Orlando, Denver, Philadelphia, and Boston. In these expanded evaluations, our method not only aligns with simulation-based benchmarks but also achieves an average 13% improvement in it's ability to fit travel time data compared to the baseline during afternoon peak hours. View details
    Preview abstract Due to the size and complexity of modern large language models (LLMs), it has proven challenging to uncover the underlying mechanisms that models use to solve reasoning problems. For instance, is their reasoning for a specific problem localized to certain parts of the network? Do they break down the reasoning problem into modular components that are then executed as sequential steps as we go deeper in the model? To better understand the reasoning capability of LLMs, we study a minimal propositional logic problem that requires combining multiple facts to arrive at a solution. By studying this problem on Mistral and Gemma models, up to 27B parameters, we illuminate the core components the models use to solve such logic problems. From a mechanistic interpretability point of view, we use causal mediation analysis to uncover the pathways and components of the LLMs' reasoning processes. Then, we offer fine-grained insights into the functions of attention heads in different layers. We not only find a sparse circuit that computes the answer, but we decompose it into sub-circuits that have four distinct and modular uses. Finally, we reveal that three distinct models -- Mistral-7B, Gemma-2-9B and Gemma-2-27B -- contain analogous but not identical mechanisms. View details
    Preview abstract Suppose Alice has a distribution ${P}$ and Bob has a distribution ${Q}$. Alice wants to draw a sample $a\sim {P}$ and Bob a sample $b \sim {Q}$ such that $a = b$ with as high of probability as possible. It is well-known that, by sampling from an optimal coupling between the distributions, Alice and Bob can achieve $\Pr[a = b] = 1 - D_{\text{tv}}({P},{Q})$, where $D_{\text{tv}}({P},{Q})$ is the total variation distance between ${P}$ and ${Q}$. What if Alice and Bob must solve this same problem \emph{without communicating at all?} Perhaps surprisingly, with access to public randomness, they can still achieve $\Pr[a = b] \geq \frac{1 - D_{\text{tv}}({P},{Q})}{1 + D_{\text{tv}}({P},{Q})} \geq 1-2D_{\text{tv}}({P},{Q})$ using a simple protocol based on the Weighted MinHash algorithm. This bound was shown to be optimal in the worst-case by Bavarian, Ghazi, Haramaty, Kamath, Rivest, and Sudan [ToC 2020]. In this work, we revisit the ``communication-free coupling'' problem. We provide a simpler proof of the optimality result from [Bavarian et al., 2020]. Moreover we show that, while the \emph{worst-case} success probability of Weighted MinHash cannot be improved, an equally simple protocol based on Gumbel sampling offers a Pareto improvement: for every pair of distributions ${P}$ and ${Q}$, Gumbel sampling achieves an equal or higher value of $\Pr[a = b]$ than Weighted MinHash. Importantly, this improvement translates to practice. We demonstrate an application of communication-free coupling to \emph{speculative decoding}, a recent method for accelerating autoregressive large language models [Leviathan, Kalman, Matias, ICML 2023]. We show that communication-free protocols can be used to contruct \emph{\CSD{}} schemes, which have the desirable property that their output is fixed given a fixed random seed, regardless of what drafter is used for speculation. In experiments on a language generation task, Gumbel sampling outperforms Weighted MinHash. Code is available at \url{https://github.com/majid-daliri/DISD}. Finally, we study the coupling problem in the setting where communication is \emph{bounded}, rather than completely eliminated. We describe a protocol that uses just $O(\log(n/\epsilon))$ bits of communication to achieve $\Pr[a = b] = 1 - D_{\text{tv}}({P},{Q}) - \epsilon$, i.e. to essentially match optimal coupling. View details