这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–14 of 14 results for author: Yadlowsky, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.16429  [pdf, other

    cs.CY cs.AI cs.LG

    LearnLM: Improving Gemini for Learning

    Authors: LearnLM Team, Abhinit Modi, Aditya Srikanth Veerubhotla, Aliya Rysbek, Andrea Huber, Brett Wiltshire, Brian Veprek, Daniel Gillick, Daniel Kasenberg, Derek Ahmed, Irina Jurenka, James Cohan, Jennifer She, Julia Wilkowski, Kaiz Alarakyia, Kevin R. McKee, Lisa Wang, Markus Kunesch, Mike Schaekermann, Miruna Pîslar, Nikhil Joshi, Parsa Mahmoudieh, Paul Jhun, Sara Wiltberger, Shakir Mohamed , et al. (21 additional authors not shown)

    Abstract: Today's generative AI systems are tuned to present information by default rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level ins… ▽ More

    Submitted 25 December, 2024; v1 submitted 20 December, 2024; originally announced December 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2311.00871  [pdf, other

    cs.LG cs.CL stat.ML

    Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

    Authors: Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni

    Abstract: Transformer models, notably large language models (LLMs), have the remarkable ability to perform in-context learning (ICL) -- to perform new tasks when prompted with unseen input-output examples without any explicit model training. In this work, we study how effectively transformers can bridge between their pretraining data mixture, comprised of multiple distinct task families, to identify and lea… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  4. arXiv:2309.07893  [pdf, other

    stat.ME cs.LG stat.ML

    Choosing a Proxy Metric from Past Experiments

    Authors: Nilesh Tripuraneni, Lee Richardson, Alexander D'Amour, Jacopo Soriano, Steve Yadlowsky

    Abstract: In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they… ▽ More

    Submitted 15 June, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: To appear in KDD 2024

  5. arXiv:2303.02011  [pdf, other

    stat.ML cs.LG

    Diagnosing Model Performance Under Distribution Shift

    Authors: Tiffany Tianhui Cai, Hongseok Namkoong, Steve Yadlowsky

    Abstract: Prediction models can perform poorly when deployed to target distributions different from the training distribution. To understand these operational failure modes, we develop a method, called DIstribution Shift DEcomposition (DISDE), to attribute a drop in performance to different types of distribution shifts. Our approach decomposes the performance drop into terms for 1) an increase in harder but… ▽ More

    Submitted 10 July, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

  6. arXiv:2207.02941  [pdf, other

    cs.LG cs.AI

    Boosting the interpretability of clinical risk scores with intervention predictions

    Authors: Eric Loreaux, Ke Yu, Jonas Kemp, Martin Seneviratne, Christina Chen, Subhrajit Roy, Ivan Protsyuk, Natalie Harris, Alexander D'Amour, Steve Yadlowsky, Ming-Jun Chen

    Abstract: Machine learning systems show significant promise for forecasting patient adverse events via risk scores. However, these risk scores implicitly encode assumptions about future interventions that the patient is likely to receive, based on the intervention policy present in the training data. Without this important context, predictions from such systems are less interpretable for clinicians. We prop… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted by DSHealth on KDD 2022

  7. arXiv:2106.16163  [pdf, other

    cs.CL

    The MultiBERTs: BERT Reproductions for Robustness Analysis

    Authors: Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick

    Abstract: Experiments with pre-trained models such as BERT are often based on a single checkpoint. While the conclusions drawn apply to the artifact tested in the experiment (i.e., the particular instance of the model), it is not always clear whether they hold for the more general procedure which includes the architecture, training data, initialization scheme, and loss function. Recent work has shown that r… ▽ More

    Submitted 21 March, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: Accepted at ICLR'22. Checkpoints and example analyses: http://goo.gle/multiberts

  8. arXiv:2106.00545  [pdf, other

    cs.LG cs.AI stat.ML

    Counterfactual Invariance to Spurious Correlations: Why and How to Pass Stress Tests

    Authors: Victor Veitch, Alexander D'Amour, Steve Yadlowsky, Jacob Eisenstein

    Abstract: Informally, a 'spurious correlation' is the dependence of a model on some aspect of the input data that an analyst thinks shouldn't matter. In machine learning, these have a know-it-when-you-see-it character; e.g., changing the gender of a sentence's subject changes a sentiment predictor's output. To check for spurious correlations, we can 'stress test' models by perturbing irrelevant parts of inp… ▽ More

    Submitted 2 November, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

    Comments: Published at NeurIPS 2021 (spotlight)

  9. arXiv:2103.12725  [pdf, other

    stat.ML cs.LG math.ST

    SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression

    Authors: Steve Yadlowsky, Taedong Yun, Cory McLean, Alexander D'Amour

    Abstract: Logistic regression remains one of the most widely used tools in applied statistics, machine learning and data science. However, in moderately high-dimensional problems, where the number of features $d$ is a non-negligible fraction of the sample size $n$, the logistic regression maximum likelihood estimator (MLE), and statistical procedures based the large-sample approximation of its distribution,… ▽ More

    Submitted 25 May, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

  10. arXiv:2101.06536  [pdf, other

    cs.LG stat.ME stat.ML

    Deep Cox Mixtures for Survival Regression

    Authors: Chirag Nagpal, Steve Yadlowsky, Negar Rostamzadeh, Katherine Heller

    Abstract: Survival analysis is a challenging variation of regression modeling because of the presence of censoring, where the outcome measurement is only partially known, due to, for example, loss to follow up. Such problems come up frequently in medical applications, making survival analysis a key endeavor in biostatistics and machine learning for healthcare, with Cox regression models being amongst the mo… ▽ More

    Submitted 26 June, 2022; v1 submitted 16 January, 2021; originally announced January 2021.

    Comments: Machine Learning for Healthcare Conference, 2021

    Journal ref: Proceedings of the 6th Machine Learning for Healthcare Conference, PMLR 149:674-708, 2021

  11. arXiv:2011.03395  [pdf, other

    cs.LG stat.ML

    Underspecification Presents Challenges for Credibility in Modern Machine Learning

    Authors: Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne , et al. (15 additional authors not shown)

    Abstract: ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predict… ▽ More

    Submitted 24 November, 2020; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: Updates: Updated statistical analysis in Section 6; Additional citations

  12. arXiv:2003.05623  [pdf, other

    stat.ML cs.LG

    Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

    Authors: Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill

    Abstract: When observed decisions depend only on observed features, off-policy policy evaluation (OPE) methods for sequential decision making problems can estimate the performance of evaluation policies before deploying them. This assumption is frequently violated due to unobserved confounders, unrecorded variables that impact both the decisions and their outcomes. We assess robustness of OPE methods under… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

  13. arXiv:1912.06977  [pdf, other

    stat.ML cs.LG stat.ME

    Estimation and Validation of Ratio-based Conditional Average Treatment Effects Using Observational Data

    Authors: Steve Yadlowsky, Fabio Pellegrini, Federica Lionetto, Stefan Braune, Lu Tian

    Abstract: While sample sizes in randomized clinical trials are large enough to estimate the average treatment effect well, they are often insufficient for estimation of treatment-covariate interactions critical to studying data-driven precision medicine. Observational data from real world practice may play an important role in alleviating this problem. One common approach in trials is to predict the outcome… ▽ More

    Submitted 20 April, 2020; v1 submitted 15 December, 2019; originally announced December 2019.

  14. arXiv:1804.03761  [pdf, other

    stat.ML cs.LG

    Derivative free optimization via repeated classification

    Authors: Tatsunori B. Hashimoto, Steve Yadlowsky, John C. Duchi

    Abstract: We develop an algorithm for minimizing a function using $n$ batched function value measurements at each of $T$ rounds by using classifiers to identify a function's sublevel set. We show that sufficiently accurate classifiers can achieve linear convergence rates, and show that the convergence rate is tied to the difficulty of active learning sublevel sets. Further, we show that the bootstrap is a c… ▽ More

    Submitted 10 April, 2018; originally announced April 2018.

    Comments: At AISTATS2018