这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 108 results for author: Kallus, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2511.07280  [pdf, ps, other

    econ.GN cs.IR cs.LG

    The Value of Personalized Recommendations: Evidence from Netflix

    Authors: Kevin Zielnicki, Guy Aridor, Aurélien Bibaut, Allen Tran, Winston Chou, Nathan Kallus

    Abstract: Personalized recommendation systems shape much of user choice online, yet their targeted nature makes separating out the value of recommendation and the underlying goods challenging. We build a discrete choice model that embeds recommendation-induced utility, low-rank heterogeneity, and flexible state dependence and apply the model to viewership data at Netflix. We exploit idiosyncratic variation… ▽ More

    Submitted 10 November, 2025; v1 submitted 10 November, 2025; originally announced November 2025.

  2. arXiv:2510.20150  [pdf, ps, other

    cs.IR

    Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning

    Authors: Yaochen Zhu, Harald Steck, Dawen Liang, Yinhan He, Vito Ostuni, Jundong Li, Nathan Kallus

    Abstract: Large language models (LLMs) are reshaping the recommender system paradigm by enabling users to express preferences and receive recommendations through conversations. Yet, aligning LLMs to the recommendation task remains challenging: pretrained LLMs often generate out-of-catalog items, violate required output formats, and their ranking quality degrades sharply toward the end of the generated list.… ▽ More

    Submitted 23 October, 2025; v1 submitted 22 October, 2025; originally announced October 2025.

  3. arXiv:2510.10440  [pdf, ps, other

    cs.IR cs.LG stat.ML

    Does Weighting Improve Matrix Factorization for Recommender Systems?

    Authors: Alex Ayoub, Samuel Robertson, Dawen Liang, Harald Steck, Nathan Kallus

    Abstract: Matrix factorization is a widely used approach for top-N recommendation and collaborative filtering. When implemented on implicit feedback data (such as clicks), a common heuristic is to upweight the observed interactions. This strategy has been shown to improve performance for certain algorithms. In this paper, we conduct a systematic study of various weighting schemes and matrix factorization al… ▽ More

    Submitted 12 October, 2025; originally announced October 2025.

    Comments: In the proceedings of the Web Conference (WWW) 2025 (11 pages)

  4. arXiv:2510.02212  [pdf, ps, other

    cs.LG cs.AI

    DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning

    Authors: Hanyang Zhao, Dawen Liang, Wenpin Tang, David Yao, Nathan Kallus

    Abstract: We propose DiFFPO, Diffusion Fast and Furious Policy Optimization, a unified framework for training masked diffusion large language models (dLLMs) to reason not only better (furious), but also faster via reinforcement learning (RL). We first unify the existing baseline approach such as d1 by proposing to train surrogate policies via off-policy RL, whose likelihood is much more tractable as an appr… ▽ More

    Submitted 2 October, 2025; originally announced October 2025.

  5. arXiv:2509.26522  [pdf, ps, other

    cs.LG

    Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting

    Authors: Xi Wang, James McInerney, Lequn Wang, Nathan Kallus

    Abstract: Large reasoning models show improved performance with longer chains of thought. However, recent work has highlighted (qualitatively) their tendency to overthink, continuing to revise answers even after reaching the correct solution. We quantitatively confirm this inefficiency by tracking Pass@1 for answers averaged over a large number of rollouts and find that the model often begins to always prod… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

  6. arXiv:2509.21172  [pdf, ps, other

    cs.LG econ.EM math.OC stat.ML

    Inverse Reinforcement Learning Using Just Classification and a Few Regressions

    Authors: Lars van der Laan, Nathan Kallus, Aurélien Bibaut

    Abstract: Inverse reinforcement learning (IRL) aims to explain observed behavior by uncovering an underlying reward. In the maximum-entropy or Gumbel-shocks-to-reward frameworks, this amounts to fitting a reward function and a soft value function that together satisfy the soft Bellman consistency condition and maximize the likelihood of observed actions. While this perspective has had enormous impact in imi… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  7. arXiv:2506.03324  [pdf, ps, other

    cs.LG

    Optimization of Epsilon-Greedy Exploration

    Authors: Ethan Che, Hakan Ceylan, James McInerney, Nathan Kallus

    Abstract: Modern recommendation systems rely on exploration to learn user preferences for new items, typically implementing uniform exploration policies (e.g., epsilon-greedy) due to their simplicity and compatibility with machine learning (ML) personalization models. Within these systems, a crucial consideration is the rate of exploration - what fraction of user traffic should receive random item recommend… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  8. arXiv:2506.02881  [pdf, ps, other

    stat.ME cs.LG stat.ML

    Simulation-Based Inference for Adaptive Experiments

    Authors: Brian M Cho, Aurélien Bibaut, Nathan Kallus

    Abstract: Multi-arm bandit experimental designs are increasingly being adopted over standard randomized trials due to their potential to improve outcomes for study participants, enable faster identification of the best-performing options, and/or enhance the precision of estimating key parameters. Current approaches for inference after adaptive sampling either rely on asymptotic normality under restricted ex… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  9. arXiv:2505.17468  [pdf, ps, other

    stat.ME cs.LG stat.ML

    Efficient Adaptive Experimentation with Noncompliance

    Authors: Miruna Oprescu, Brian M Cho, Nathan Kallus

    Abstract: We study the problem of estimating the average treatment effect (ATE) in adaptive experiments where treatment can only be encouraged -- rather than directly assigned -- via a binary instrumental variable. Building on semiparametric efficiency theory, we derive the efficiency bound for ATE estimation under arbitrary, history-dependent instrument-assignment policies, and show it is minimized by a va… ▽ More

    Submitted 28 October, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: 37 pages, 4 figures, 2 tables, NeurIPS 2025

  10. arXiv:2505.17373  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Value-Guided Search for Efficient Chain-of-Thought Reasoning

    Authors: Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun

    Abstract: In this paper, we propose a simple and efficient method for value model training on long-context reasoning traces. Compared to existing process reward models (PRMs), our method does not require a fine-grained notion of "step," which is difficult to define for long-context reasoning models. By collecting a dataset of 2.5 million reasoning traces, we train a 1.5B token-level value model and apply it… ▽ More

    Submitted 30 September, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: NeurIPS 2025

  11. arXiv:2504.15476  [pdf, other

    cs.IR

    From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System

    Authors: Rohan Surana, Junda Wu, Zhouhang Xie, Yu Xia, Harald Steck, Dawen Liang, Nathan Kallus, Julian McAuley

    Abstract: Conversational recommender systems (CRS) typically require extensive domain-specific conversational datasets, yet high costs, privacy concerns, and data-collection challenges severely limit their availability. Although Large Language Models (LLMs) demonstrate strong zero-shot recommendation capabilities, practical applications often favor smaller, internally managed recommender models due to scala… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 11 pages, 2 figures

  12. arXiv:2503.12760  [pdf, other

    stat.ML cs.LG econ.EM

    SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

    Authors: Brian Cho, Ana-Roxana Pop, Ariel Evnine, Nathan Kallus

    Abstract: To design effective digital interventions, experimenters face the challenge of learning decision policies that balance multiple objectives using offline data. Often, they aim to develop policies that maximize goal outcomes, while ensuring there are no undesirable changes in guardrail outcomes. To provide credible recommendations, experimenters must not only identify policies that satisfy the desir… ▽ More

    Submitted 21 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  13. arXiv:2502.20548  [pdf, ps, other

    cs.LG cs.AI cs.CL

    $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

    Authors: Jin Peng Zhou, Kaiwen Wang, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kilian Q. Weinberger, Kianté Brantley, Wen Sun

    Abstract: Reinforcement learning (RL) post-training is crucial for LLM alignment and reasoning, but existing policy-based methods, such as PPO and DPO, can fall short of fixing shortcuts inherited from pre-training. In this work, we introduce $Q\sharp$, a value-based algorithm for KL-regularized RL that guides the reference policy using the optimal regularized $Q$ function. We propose to learn the optimal… ▽ More

    Submitted 19 October, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: NeurIPS 2025

  14. arXiv:2502.14137  [pdf, other

    cs.IR

    Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems

    Authors: Yaochen Zhu, Chao Wan, Harald Steck, Dawen Liang, Yesu Feng, Nathan Kallus, Jundong Li

    Abstract: Conversational recommender systems (CRS) aim to provide personalized recommendations via interactive dialogues with users. While large language models (LLMs) enhance CRS with their superior understanding of context-aware user preferences, they typically struggle to leverage behavioral data, which have proven to be important for classical collaborative filtering (CF)-based approaches. For this reas… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted by WWW'2025

  15. arXiv:2502.05295  [pdf, ps, other

    cs.LG stat.ME

    GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding

    Authors: Miruna Oprescu, David K. Park, Xihaier Luo, Shinjae Yoo, Nathan Kallus

    Abstract: Estimating causal effects from spatiotemporal observational data is essential in public health, environmental science, and policy evaluation, where randomized experiments are often infeasible. Existing approaches, however, either rely on strong structural assumptions or fail to handle key challenges such as interference, spatial confounding, temporal carryover, and time-varying confounding -- wher… ▽ More

    Submitted 28 October, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: 29 pages, 6 figures, 6 tables, NeurIPS 2025

  16. arXiv:2501.06926  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference

    Authors: Lars van der Laan, David Hubbard, Allen Tran, Nathan Kallus, Aurélien Bibaut

    Abstract: Double Reinforcement Learning (DRL) enables efficient inference for policy values in nonparametric Markov decision processes (MDPs), but existing methods face two major obstacles: (1) they require stringent intertemporal overlap conditions on state trajectories, and (2) they rely on estimating high-dimensional occupancy density ratios. Motivated by problems in long-term causal inference, we extend… ▽ More

    Submitted 12 November, 2025; v1 submitted 12 January, 2025; originally announced January 2025.

  17. arXiv:2410.15564  [pdf, other

    cs.LG stat.ME stat.ML

    Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits

    Authors: Brian Cho, Dominik Meier, Kyra Gan, Nathan Kallus

    Abstract: In multi-armed bandits, the tasks of reward maximization and pure exploration are often at odds with each other. The former focuses on exploiting arms with the highest means, while the latter may require constant exploration across all arms. In this work, we focus on good arm identification (GAI), a practical bandit inference objective that aims to label arms with means above a threshold as quickl… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  18. arXiv:2409.17466  [pdf, other

    stat.ML cs.AI cs.LG

    Adjusting Regression Models for Conditional Uncertainty Calibration

    Authors: Ruijiang Gao, Mingzhang Yin, James McInerney, Nathan Kallus

    Abstract: Conformal Prediction methods have finite-sample distribution-free marginal coverage guarantees. However, they generally do not offer conditional coverage guarantees, which can be important for high-stakes decisions. In this paper, we propose a novel algorithm to train a regression function to improve the conditional coverage after applying the split conformal prediction procedure. We establish an… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: Machine Learning Special Issue on Uncertainty Quantification

  19. arXiv:2409.12799  [pdf, ps, other

    stat.ML cs.LG math.ST

    The Central Role of the Loss Function in Reinforcement Learning

    Authors: Kaiwen Wang, Nathan Kallus, Wen Sun

    Abstract: This paper illustrates the central role of loss functions in data-driven decision making, providing a comprehensive survey on their influence in cost-sensitive classification (CSC) and reinforcement learning (RL). We demonstrate how different regression loss functions affect the sample efficiency and adaptivity of value-based decision making algorithms. Across multiple settings, we prove that algo… ▽ More

    Submitted 4 April, 2025; v1 submitted 19 September, 2024; originally announced September 2024.

    Comments: Accepted to Statistical Science

  20. arXiv:2408.12004  [pdf, other

    cs.LG stat.ME stat.ML

    CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies

    Authors: Brian M Cho, Ana-Roxana Pop, Kyra Gan, Sam Corbett-Davies, Israel Nir, Ariel Evnine, Nathan Kallus

    Abstract: When modifying existing policies in high-risk settings, it is often necessary to ensure with high certainty that the newly proposed policy improves upon a baseline, such as the status quo. In this work, we consider the problem of safe policy improvement, where one only adopts a new policy if it is deemed to be better than the specified baseline with at least pre-specified probability. We focus on… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  21. arXiv:2406.06452  [pdf, other

    stat.ME cs.LG stat.ML

    Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data

    Authors: Miruna Oprescu, Nathan Kallus

    Abstract: Accurately predicting conditional average treatment effects (CATEs) is crucial in personalized medicine and digital platform analytics. Since the treatments of interest often cannot be directly randomized, observational data is leveraged to learn CATEs, but this approach can incur significant bias from unobserved confounding. One strategy to overcome these limitations is to leverage instrumental v… ▽ More

    Submitted 1 November, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 30 pages, 4 figures, NeurIPS 2024

  22. arXiv:2405.16564  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Contextual Linear Optimization with Partial Feedback

    Authors: Yichun Hu, Nathan Kallus, Xiaojie Mao, Yanchen Wu

    Abstract: Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients in the objective and thereby improve decision-making performance. A canonical example is the stochastic shortest path problem with random edge costs (e.g., travel time) and contextual features (e.g., lagged traffic, weather). While existing work on CLO assumes fully observed c… ▽ More

    Submitted 9 November, 2025; v1 submitted 26 May, 2024; originally announced May 2024.

  23. arXiv:2405.12119  [pdf, other

    cs.IR cs.AI cs.CL

    Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation

    Authors: Zhankui He, Zhouhang Xie, Harald Steck, Dawen Liang, Rahul Jha, Nathan Kallus, Julian McAuley

    Abstract: Large language models (LLMs) are revolutionizing conversational recommender systems by adeptly indexing item content, understanding complex conversational contexts, and generating relevant item titles. However, controlling the distribution of recommended items remains a challenge. This leads to suboptimal performance due to the failure to capture rapidly changing data distributions, such as item p… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  24. arXiv:2404.00099  [pdf, other

    cs.AI stat.ML

    Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes

    Authors: Andrew Bennett, Nathan Kallus, Miruna Oprescu, Wen Sun, Kaiwen Wang

    Abstract: We study the evaluation of a policy under best- and worst-case perturbations to a Markov decision process (MDP), using transition observations from the original MDP, whether they are generated under the same or a different policy. This is an important problem when there is the possibility of a shift between historical and future environments, $\textit{e.g.}$ due to unmeasured confounding, distribu… ▽ More

    Submitted 1 November, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

    Comments: 39 pages, 2 figures, NeurIPS 2024

  25. arXiv:2403.10671  [pdf, other

    stat.ML cs.LG

    Variation Due to Regularization Tractably Recovers Bayesian Deep Learning

    Authors: James McInerney, Nathan Kallus

    Abstract: Uncertainty quantification in deep learning is crucial for safe and reliable decision-making in downstream tasks. Existing methods quantify uncertainty at the last layer or other approximations of the network which may miss some sources of uncertainty in the model. To address this gap, we propose an uncertainty quantification method for large networks based on variation due to regularization. Esse… ▽ More

    Submitted 24 April, 2025; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: 16 pages, 9 figures

  26. arXiv:2403.06323  [pdf, other

    cs.LG

    A Reductions Approach to Risk-Sensitive Reinforcement Learning with Optimized Certainty Equivalents

    Authors: Kaiwen Wang, Dawen Liang, Nathan Kallus, Wen Sun

    Abstract: We study risk-sensitive RL where the goal is learn a history-dependent policy that optimizes some risk measure of cumulative rewards. We consider a family of risks called the optimized certainty equivalents (OCE), which captures important risk measures such as conditional value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. In this setting, we propose two meta-algorithms: one grounde… ▽ More

    Submitted 27 February, 2025; v1 submitted 10 March, 2024; originally announced March 2024.

  27. Is Cosine-Similarity of Embeddings Really About Similarity?

    Authors: Harald Steck, Chaitanya Ekanadham, Nathan Kallus

    Abstract: Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 9 pages

    Journal ref: ACM Web Conference 2024 (WWW 2024 Companion)

  28. arXiv:2403.05385  [pdf, other

    cs.LG

    Switching the Loss Reduces the Cost in Batch (Offline) Reinforcement Learning

    Authors: Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James McInerney, Dawen Liang, Nathan Kallus, Csaba Szepesvári

    Abstract: We propose training fitted Q-iteration with log-loss (FQI-log) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-log scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving small-cost bo… ▽ More

    Submitted 1 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  29. arXiv:2403.02467  [pdf

    econ.EM cs.LG stat.ME stat.ML

    Applied Causal Inference Powered by ML and AI

    Authors: Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis

    Abstract: An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools.

    Submitted 4 March, 2024; originally announced March 2024.

  30. arXiv:2402.07198  [pdf, other

    cs.LG

    More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

    Authors: Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun

    Abstract: In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the return distribution, can obtain second-order bounds in both online and offline RL in general settings with function approximation. Second-order bounds are instance-dependent bounds that scale with the variance of return, which we prove are tighter than the previously known small-loss bounds of distributio… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  31. arXiv:2402.06122  [pdf, other

    stat.ME cs.LG stat.ML

    Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams

    Authors: Brian Cho, Kyra Gan, Nathan Kallus

    Abstract: We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-by-betting framework and provides a non-asymptotic $α$-level test across any stopping time. Our contributions are two-fold: (1) we propose a novel betting scheme and provide theoreti… ▽ More

    Submitted 2 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: To appear at the Forty-first International Conference on Machine Learning (ICML 2024)

  32. arXiv:2402.01845  [pdf, other

    cs.LG stat.ML

    Multi-Armed Bandits with Interference

    Authors: Su Jia, Peter Frazier, Nathan Kallus

    Abstract: Experimentation with interference poses a significant challenge in contemporary online platforms. Prior research on experimentation with interference has concentrated on the final output of a policy. The cumulative performance, while equally crucial, is less well understood. To address this gap, we introduce the problem of {\em Multi-armed Bandits with Interference} (MABI), where the learner assig… ▽ More

    Submitted 15 July, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  33. arXiv:2312.15574  [pdf, other

    math.ST cs.LG

    Clustered Switchback Designs for Experimentation Under Spatio-temporal Interference

    Authors: Su Jia, Nathan Kallus, Christina Lee Yu

    Abstract: We consider experimentation in the presence of non-stationarity, inter-unit (spatial) interference, and carry-over effects (temporal interference), where we wish to estimate the global average treatment effect (GATE), the difference between average outcomes having exposed all units at all times to treatment or to control. We suppose spatial interference is described by a graph, where a unit's outc… ▽ More

    Submitted 26 March, 2025; v1 submitted 24 December, 2023; originally announced December 2023.

  34. arXiv:2311.03564  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Low-Rank MDPs with Continuous Action Spaces

    Authors: Andrew Bennett, Nathan Kallus, Miruna Oprescu

    Abstract: Low-Rank Markov Decision Processes (MDPs) have recently emerged as a promising framework within the domain of reinforcement learning (RL), as they allow for provably approximately correct (PAC) learning guarantees while also incorporating ML algorithms for representation learning. However, current methods for low-rank MDPs are limited in that they only consider finite action spaces, and give vacuo… ▽ More

    Submitted 1 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: 25 pages, AISTATS 2024

    Journal ref: PMLR, Volume 238, 2024

  35. arXiv:2310.15433  [pdf, other

    cs.LG cs.IR

    Off-Policy Evaluation for Large Action Spaces via Policy Convolution

    Authors: Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley

    Abstract: Developing accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we aim to evaluate. Typically, techniques for correcting distribution shift involve some form of importance sampling. This approach results in unbiase… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Under review. 36 pages, 31 figures

  36. Large Language Models as Zero-Shot Conversational Recommenders

    Authors: Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Prasad Majumder, Nathan Kallus, Julian McAuley

    Abstract: In this paper, we present empirical studies on conversational recommendation tasks using representative large language models in a zero-shot setting with three primary contributions. (1) Data: To gain insights into model behavior in "in-the-wild" conversational recommendation scenarios, we construct a new dataset of recommendation-related conversations by scraping a popular discussion website. Thi… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted as CIKM 2023 long paper. Longer version is coming soon (e.g., more details about dataset)

  37. arXiv:2307.13793  [pdf, ps, other

    stat.ME cs.LG econ.EM math.ST stat.ML

    Source Condition Double Robust Inference on Functionals of Inverse Problems

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems. Any such parameter admits a doubly robust representation that depends on the solution to a dual linear inverse problem, where the dual solution can be thought as a generalization of the inverse propensity function. We provide the first source condition double robust inference method that ens… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  38. arXiv:2307.11704  [pdf, other

    cs.LG

    JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

    Authors: Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun

    Abstract: Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost and it is the core NP-hard combinatorial optimization problem of query optimization. In this paper, we present JoinGym, a lightweight and easy-to-use query optimization environment for reinforcement learning (RL) that captures both the left-deep and bushy variants of the JOS problem. Compar… ▽ More

    Submitted 17 October, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: JoinGym is available at https://github.com/kaiwenw/JoinGym!

  39. arXiv:2305.15703  [pdf, ps, other

    cs.LG cs.AI math.OC math.ST stat.ML

    The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

    Authors: Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun

    Abstract: While distributional reinforcement learning (DistRL) has been empirically effective, the question of when and why it is better than vanilla, non-distributional RL has remained unanswered. This paper explains the benefits of DistRL through the lens of small-loss bounds, which are instance-dependent bounds that scale with optimal achievable cost. Particularly, our bounds converge much faster than th… ▽ More

    Submitted 22 September, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023

  40. arXiv:2305.14816  [pdf, ps, other

    cs.LG math.ST stat.ML

    Provable Offline Preference-Based Reinforcement Learning

    Authors: Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

    Abstract: In this paper, we investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback where feedback is available in the form of preference between trajectory pairs rather than explicit rewards. Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offl… ▽ More

    Submitted 29 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: The first two authors contribute equally

  41. arXiv:2304.10577  [pdf, other

    cs.LG stat.ML

    B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding

    Authors: Miruna Oprescu, Jacob Dorn, Marah Ghoummaid, Andrew Jesson, Nathan Kallus, Uri Shalit

    Abstract: Estimating heterogeneous treatment effects from observational data is a crucial task across many fields, helping policy and decision-makers take better actions. There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitraril… ▽ More

    Submitted 13 June, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: 20 pages, 4 figures, ICML 2023

    Journal ref: PMLR 202 (2023) 26599-26618

  42. arXiv:2302.05404  [pdf, ps, other

    stat.ML cs.LG econ.EM math.ST stat.ME

    Minimax Instrumental Variable Regression and $L_2$ Convergence Guarantees without Identification or Closedness

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: In this paper, we study nonparametric estimation of instrumental variable (IV) regressions. Recently, many flexible machine learning methods have been developed for instrumental variable estimation. However, these methods have at least one of the following limitations: (1) restricting the IV regression to be uniquely identified; (2) only obtaining estimation error rates in terms of pseudometrics (… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Under review

  43. arXiv:2302.03201  [pdf, ps, other

    cs.LG math.OC math.ST stat.ML

    Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

    Authors: Kaiwen Wang, Nathan Kallus, Wen Sun

    Abstract: In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $τ$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is $Ω(\sqrt{τ^{-1}AK})$, where $A$ is the number of actions and $K$ is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a nov… ▽ More

    Submitted 24 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023

  44. arXiv:2302.02392  [pdf, ps, other

    cs.LG stat.ML

    Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

    Authors: Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

    Abstract: In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage… ▽ More

    Submitted 13 November, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: The original title of this paper was "Refined Value-Based Offline RL under Realizability and Partial Coverage," but it was later changed. This paper has been accepted for NeurIPS 2023

  45. arXiv:2301.12366  [pdf, other

    cs.LG cs.AI math.OC math.ST

    Smooth Non-Stationary Bandits

    Authors: Su Jia, Qian Xie, Nathan Kallus, Peter I. Frazier

    Abstract: In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time. However, in practice, environments often change {\em smoothly}, so such algorithms may incur higher-tha… ▽ More

    Submitted 17 November, 2024; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: Accepted by ICML 2023

  46. arXiv:2212.06355  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    A Review of Off-Policy Evaluation in Reinforcement Learning

    Authors: Masatoshi Uehara, Chengchun Shi, Nathan Kallus

    Abstract: Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Still under revision

  47. arXiv:2211.06457  [pdf, other

    stat.ML cs.LG

    The Implicit Delta Method

    Authors: Nathan Kallus, James McInerney

    Abstract: Epistemic uncertainty quantification is a crucial part of drawing credible conclusions from predictive models, whether concerned about the prediction at a given point or any downstream evaluation that uses the model as input. When the predictive model is simple and its evaluation differentiable, this task is solved by the delta method, where we propagate the asymptotically-normal uncertainty in th… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: 18 pages, NeurIPS 2022

  48. arXiv:2210.14492  [pdf, other

    cs.LG cs.AI stat.ML

    Provable Safe Reinforcement Learning with Binary Feedback

    Authors: Andrew Bennett, Dipendra Misra, Nathan Kallus

    Abstract: Safety is a crucial necessity in many applications of reinforcement learning (RL), whether robotic, automotive, or medical. Many existing approaches to safe RL rely on receiving numeric safety feedback, but in many cases this feedback can only take binary values; that is, whether an action in a given state is safe or unsafe. This is particularly true when feedback comes from human experts. We ther… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  49. arXiv:2207.13081  [pdf, other

    cs.LG stat.ML

    Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

    Authors: Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

    Abstract: We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs.… ▽ More

    Submitted 14 November, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: This paper was accepted in NeurIPS 2023

  50. arXiv:2207.05837  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Learning Bellman Complete Representations for Offline Policy Evaluation

    Authors: Jonathan D. Chang, Kaiwen Wang, Nathan Kallus, Wen Sun

    Abstract: We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to supervised learning, realizability of the Q-function is not enough for learning it. Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. Prior work often assumes that representations… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Accepted for Long Talk at ICML 2022

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:2938-2971, 2022