这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 74 results for author: Weng, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.15743  [pdf, ps, other

    cs.AI cs.CL cs.HC cs.LG

    Towards physician-centered oversight of conversational diagnostic AI

    Authors: Elahe Vedadi, David Barrett, Natalie Harris, Ellery Wulczyn, Shashir Reddy, Roma Ruparel, Mike Schaekermann, Tim Strother, Ryutaro Tanno, Yash Sharma, Jihyeon Lee, Cían Hughes, Dylan Slack, Anil Palepu, Jan Freyberg, Khaled Saab, Valentin Liévin, Wei-Hung Weng, Tao Tu, Yun Liu, Nenad Tomasev, Kavita Kulkarni, S. Sara Mahdavi, Kelvin Guu, Joëlle Barral , et al. (10 additional authors not shown)

    Abstract: Recent work has demonstrated the promise of conversational AI systems for diagnostic dialogue. However, real-world assurance of patient safety means that providing individual diagnoses and treatment plans is considered a regulated activity by licensed professionals. Furthermore, physicians commonly oversee other team members in such activities, including nurse practitioners (NPs) or physician assi… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  2. arXiv:2506.02609  [pdf

    cs.AI

    A Time-Enhanced Data Disentanglement Network for Traffic Flow Forecasting

    Authors: Tianfan Jiang, Mei Wu, Wenchao Weng, Dewen Seng, Yiqian Lin

    Abstract: In recent years, traffic flow prediction has become a highlight in the field of intelligent transportation systems. However, due to the temporal variations and dynamic spatial correlations of traffic data, traffic prediction remains highly challenging.Traditional spatiotemporal networks, which rely on end-to-end training, often struggle to handle the diverse data dependencies of multiple traffic f… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  3. arXiv:2505.21395  [pdf, ps, other

    cs.LG

    Square$χ$PO: Differentially Private and Robust $χ^2$-Preference Optimization in Offline Direct Alignment

    Authors: Xingyu Zhou, Yulian Wu, Wenqian Weng, Francesco Orabona

    Abstract: In this paper, we theoretically study the offline alignment of language models with human preference feedback, under both preference label corruption and privacy protections. To this end, we propose Square$χ$PO, a simple one-line change to $χ$PO where the standard log-loss is replaced by a new square loss over probability. Thanks to the inherent properties of this new loss, we have advanced the st… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  4. arXiv:2505.21331  [pdf, ps, other

    cs.DS cs.GT cs.LG cs.PF math.PR

    Scheduling with Uncertain Holding Costs and its Application to Content Moderation

    Authors: Caner Gocmen, Thodoris Lykouris, Deeksha Sinha, Wentao Weng

    Abstract: In content moderation for social media platforms, the cost of delaying the review of a content is proportional to its view trajectory, which fluctuates and is apriori unknown. Motivated by such uncertain holding costs, we consider a queueing model where job states evolve based on a Markov chain with state-dependent instantaneous holding costs. We demonstrate that in the presence of such uncertain… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  5. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  6. arXiv:2505.04974  [pdf, other

    cs.CV

    ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment

    Authors: Wanjiang Weng, Xiaofeng Tan, Hongsong Wang, Pan Zhou

    Abstract: Bilingual text-to-motion generation, which synthesizes 3D human motions from bilingual text inputs, holds immense potential for cross-linguistic applications in gaming, film, and robotics. However, this task faces critical challenges: the absence of bilingual motion-language datasets and the misalignment between text and motion distributions in diffusion models, leading to semantically inconsisten… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 17 pages, 9 figures

  7. arXiv:2505.04653  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.LG

    Advancing Conversational Diagnostic AI with Multimodal Reasoning

    Authors: Khaled Saab, Jan Freyberg, Chunjong Park, Tim Strother, Yong Cheng, Wei-Hung Weng, David G. T. Barrett, David Stutz, Nenad Tomasev, Anil Palepu, Valentin Liévin, Yash Sharma, Roma Ruparel, Abdullah Ahmed, Elahe Vedadi, Kimberly Kanada, Cian Hughes, Yun Liu, Geoff Brown, Yang Gao, Sean Li, S. Sara Mahdavi, James Manyika, Katherine Chou, Yossi Matias , et al. (11 additional authors not shown)

    Abstract: Large Language Models (LLMs) have demonstrated great potential for conducting diagnostic conversations but evaluation has been largely limited to language-only interactions, deviating from the real-world requirements of remote care delivery. Instant messaging platforms permit clinicians and patients to upload and discuss multimodal medical artifacts seamlessly in medical consultation, but the abil… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  8. arXiv:2504.13554  [pdf, ps, other

    cs.AI cs.LG cs.RO

    Task Assignment and Exploration Optimization for Low Altitude UAV Rescue via Generative AI Enhanced Multi-agent Reinforcement Learning

    Authors: Xin Tang, Qian Chen, Wenjie Weng, Chao Jin, Zhang Liu, Jiacheng Wang, Geng Sun, Xiaohuan Li, Dusit Niyato

    Abstract: The integration of emerging uncrewed aerial vehicles (UAVs) with artificial intelligence (AI) and ground-embedded robots (GERs) has transformed emergency rescue operations in unknown environments. However, the high computational demands often exceed a single UAV's capacity, making it difficult to continuously provide stable high-level services. To address this, this paper proposes a cooperation fr… ▽ More

    Submitted 10 July, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

  9. arXiv:2503.06074  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Conversational AI for Disease Management

    Authors: Anil Palepu, Valentin Liévin, Wei-Hung Weng, Khaled Saab, David Stutz, Yong Cheng, Kavita Kulkarni, S. Sara Mahdavi, Joëlle Barral, Dale R. Webster, Katherine Chou, Avinatan Hassidim, Yossi Matias, James Manyika, Ryutaro Tanno, Vivek Natarajan, Adam Rodman, Tao Tu, Alan Karthikesalingam, Mike Schaekermann

    Abstract: While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic syste… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: 62 pages, 7 figures in main text, 36 figures in appendix

  10. arXiv:2502.18864  [pdf, other

    cs.AI cs.CL cs.HC cs.LG physics.soc-ph q-bio.OT

    Towards an AI co-scientist

    Authors: Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, Khaled Saab, Dan Popovici, Jacob Blum, Fan Zhang, Katherine Chou, Avinatan Hassidim, Burak Gokturk, Amin Vahdat, Pushmeet Kohli, Yossi Matias, Andrew Carroll, Kavita Kulkarni, Nenad Tomasev, Yuan Guan, Vikram Dhillon , et al. (9 additional authors not shown)

    Abstract: Scientific discovery relies on scientists generating novel hypotheses that undergo rigorous experimental validation. To augment this process, we introduce an AI co-scientist, a multi-agent system built on Gemini 2.0. The AI co-scientist is intended to help uncover new, original knowledge and to formulate demonstrably novel research hypotheses and proposals, building upon prior evidence and aligned… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 81 pages in total (main 38 pages, appendix 43 pages), 13 main figures, 40 appendix figures, 1 main table, 2 appendix tables, 143 main references, 7 appendix references

  11. arXiv:2501.04060  [pdf, other

    cs.LG

    SFADNet: Spatio-temporal Fused Graph based on Attention Decoupling Network for Traffic Prediction

    Authors: Mei Wu, Wenchao Weng, Jun Li, Yiqian Lin, Jing Chen, Dewen Seng

    Abstract: In recent years, traffic flow prediction has played a crucial role in the management of intelligent transportation systems. However, traditional prediction methods are often limited by static spatial modeling, making it difficult to accurately capture the dynamic and complex relationships between time and space, thereby affecting prediction accuracy. This paper proposes an innovative traffic flow… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: Accepted by 2025 lEEE International Conference on Acoustics, speech, and signal Processing (lCASSP2025)

  12. arXiv:2501.03635  [pdf, other

    cs.LG cs.AI

    MHGNet: Multi-Heterogeneous Graph Neural Network for Traffic Prediction

    Authors: Mei Wu, Yiqian Lin, Tianfan Jiang, Wenchao Weng

    Abstract: In recent years, traffic flow prediction has played a crucial role in the management of intelligent transportation systems. However, traditional forecasting methods often model non-Euclidean low-dimensional traffic data as a simple graph with single-type nodes and edges, failing to capture similar trends among nodes of the same type. To address this limitation, this paper proposes MHGNet, a novel… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: Accepted by 2025 lEEE International Conference on Acoustics, speech, and signal Processing (lCASSP2025)

  13. arXiv:2412.09220  [pdf, other

    cs.CV

    USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation

    Authors: Wanjiang Weng, Hongsong Wang, Junbo Wang, Lei He, Guosen Xie

    Abstract: Contrastive learning has achieved great success in skeleton-based representation learning recently. However, the prevailing methods are predominantly negative-based, necessitating additional momentum encoder and memory bank to get negative samples, which increases the difficulty of model training. Furthermore, these methods primarily concentrate on learning a global representation for recognition… ▽ More

    Submitted 14 December, 2024; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025

  14. MPBD-LSTM: A Predictive Model for Colorectal Liver Metastases Using Time Series Multi-phase Contrast-Enhanced CT Scans

    Authors: Xueyang Li, Han Xiao, Weixiang Weng, Xiaowei Xu, Yiyu Shi

    Abstract: Colorectal cancer is a prevalent form of cancer, and many patients develop colorectal cancer liver metastasis (CRLM) as a result. Early detection of CRLM is critical for improving survival rates. Radiologists usually rely on a series of multi-phase contrast-enhanced computed tomography (CECT) scans done during follow-up visits to perform early detection of the potential CRLM. These scans form uniq… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Journal ref: MICCAI 2023; vol 14225; page 379-388

  15. arXiv:2411.16180  [pdf, other

    cs.CV

    Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction

    Authors: Wenhao Xu, Wenming Weng, Yueyi Zhang, Ruikang Xu, Zhiwei Xiong

    Abstract: Deformable 3D Gaussian Splatting (3D-GS) is limited by missing intermediate motion information due to the low temporal resolution of RGB cameras. To address this, we introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for dynamic scene reconstruction. We observe that threshold modeling for events plays a crucia… ▽ More

    Submitted 27 March, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

  16. In Serverless, OS Scheduler Choice Costs Money: A Hybrid Scheduling Approach for Cheaper FaaS

    Authors: Yuxuan Zhao, Weikang Weng, Rob van Nieuwpoort, Alexandru Uta

    Abstract: In Function-as-a-Service (FaaS) serverless, large applications are split into short-lived stateless functions. Deploying functions is mutually profitable: users need not be concerned with resource management, while providers can keep their servers at high utilization rates running thousands of functions concurrently on a single machine. It is exactly this high concurrency that comes at a cost. The… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: Accepted at Middleware 2024, author draft made available for timely dissemination

  17. arXiv:2411.08299  [pdf, other

    cs.AI

    DNN Task Assignment in UAV Networks: A Generative AI Enhanced Multi-Agent Reinforcement Learning Approach

    Authors: Xin Tang, Qian Chen, Wenjie Weng, Binhan Liao, Jiacheng Wang, Xianbin Cao, Xiaohuan Li

    Abstract: Unmanned Aerial Vehicles (UAVs) possess high mobility and flexible deployment capabilities, prompting the development of UAVs for various application scenarios within the Internet of Things (IoT). The unique capabilities of UAVs give rise to increasingly critical and complex tasks in uncertain and potentially harsh environments. The substantial amount of data generated from these applications nece… ▽ More

    Submitted 13 December, 2024; v1 submitted 12 November, 2024; originally announced November 2024.

  18. arXiv:2411.03395  [pdf, other

    cs.HC cs.CL

    Exploring Large Language Models for Specialist-level Oncology Care

    Authors: Anil Palepu, Vikram Dhillon, Polly Niravath, Wei-Hung Weng, Preethi Prasad, Khaled Saab, Ryutaro Tanno, Yong Cheng, Hanh Mai, Ethan Burns, Zainub Ajmal, Kavita Kulkarni, Philip Mansfield, Dale Webster, Joelle Barral, Juraj Gottweis, Mike Schaekermann, S. Sara Mahdavi, Vivek Natarajan, Alan Karthikesalingam, Tao Tu

    Abstract: Large language models (LLMs) have shown remarkable progress in encoding clinical knowledge and responding to complex medical queries with appropriate clinical reasoning. However, their applicability in subspecialist or complex medical settings remains underexplored. In this work, we probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  19. arXiv:2410.05740  [pdf, ps, other

    cs.RO cs.AI eess.SY

    Learning to Drift in Extreme Turning with Active Exploration and Gaussian Process Based MPC

    Authors: Guoqiang Wu, Cheng Hu, Wangjia Weng, Zhouheng Li, Yonghao Fu, Lei Xie, Hongye Su

    Abstract: Extreme cornering in racing often leads to large sideslip angles, presenting a significant challenge for vehicle control. Conventional vehicle controllers struggle to manage this scenario, necessitating the use of a drifting controller. However, the large sideslip angle in drift conditions introduces model mismatch, which in turn affects control precision. To address this issue, we propose a model… ▽ More

    Submitted 1 June, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

  20. arXiv:2410.03741  [pdf, other

    cs.HC cs.AI

    Towards Democratization of Subspeciality Medical Expertise

    Authors: Jack W. O'Sullivan, Anil Palepu, Khaled Saab, Wei-Hung Weng, Yong Cheng, Emily Chu, Yaanik Desai, Aly Elezaby, Daniel Seung Kim, Roy Lan, Wilson Tang, Natalie Tapaskar, Victoria Parikh, Sneha S. Jain, Kavita Kulkarni, Philip Mansfield, Dale Webster, Juraj Gottweis, Joelle Barral, Mike Schaekermann, Ryutaro Tanno, S. Sara Mahdavi, Vivek Natarajan, Alan Karthikesalingam, Euan Ashley , et al. (1 additional authors not shown)

    Abstract: The scarcity of subspecialist medical expertise, particularly in rare, complex and life-threatening diseases, poses a significant challenge for healthcare delivery. This issue is particularly acute in cardiology where timely, accurate management determines outcomes. We explored the potential of AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based experimental AI syst… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  21. arXiv:2409.07331  [pdf, other

    cs.CV cs.LG

    Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering

    Authors: Weixi Weng, Jieming Zhu, Xiaojun Meng, Hao Zhang, Rui Zhang, Chun Yuan

    Abstract: Multimodal large language models (MLLMs) have demonstrated great performance on visual question answering (VQA). When it comes to knowledge-based Visual Question Answering (KB-VQA), MLLMs may lack the specialized domain knowledge needed to answer questions, necessitating the retrieval of necessary information from external knowledge sources. Previous works like Retrival-Augmented VQA-v2 (RAVQA-v2)… ▽ More

    Submitted 31 January, 2025; v1 submitted 11 September, 2024; originally announced September 2024.

  22. arXiv:2408.07100  [pdf, other

    cs.LG cs.AI

    Pattern-Matching Dynamic Memory Network for Dual-Mode Traffic Prediction

    Authors: Wenchao Weng, Mei Wu, Hanyu Jiang, Wanzeng Kong, Xiangjie Kong, Feng Xia

    Abstract: In recent years, deep learning has increasingly gained attention in the field of traffic prediction. Existing traffic prediction models often rely on GCNs or attention mechanisms with O(N^2) complexity to dynamically extract traffic node features, which lack efficiency and are not lightweight. Additionally, these models typically only utilize historical data for prediction, without considering the… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  23. arXiv:2407.06611  [pdf, other

    cs.CV cs.AI

    CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding

    Authors: Wenhao Xu, Wenming Weng, Yueyi Zhang, Zhiwei Xiong

    Abstract: We present CEIA, an effective framework for open-world event-based understanding. Currently training a large event-text model still poses a huge challenge due to the shortage of paired event-text data. In response to this challenge, CEIA learns to align event and image data as an alternative instead of directly aligning event and text data. Specifically, we leverage the rich event-image datasets t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  24. arXiv:2406.06512  [pdf, other

    cs.CV cs.AI

    Merlin: A Vision Language Foundation Model for 3D Computed Tomography

    Authors: Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zhihong Chen, Jean-Benoit Delbrouck, Eduardo Reis, Cesar Truyts, Christian Bluethgen, Malte Engmann Kjeldskov Jensen, Sophie Ostmeier, Maya Varma, Jeya Maria Jose Valanarasu, Zhongnan Fang, Zepeng Huo, Zaid Nabulsi, Diego Ardila, Wei-Hung Weng, Edson Amaro Junior, Neera Ahuja, Jason Fries, Nigam H. Shah, Andrew Johnston , et al. (6 additional authors not shown)

    Abstract: Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures

  25. arXiv:2405.07142  [pdf, other

    cs.LG cs.AI

    Cross-Domain Continual Learning via CLAMP

    Authors: Weiwei Weng, Mahardhika Pratama, Jie Zhang, Chen Chen, Edward Yapp Kien Yee, Ramasamy Savitha

    Abstract: Artificial neural networks, celebrated for their human-like cognitive learning abilities, often encounter the well-known catastrophic forgetting (CF) problem, where the neural networks lose the proficiency in previously acquired knowledge. Despite numerous efforts to mitigate CF, it remains the significant challenge particularly in complex changing environments. This challenge is even more pronoun… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Under Review in Elsevier Journal

  26. arXiv:2405.03162  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Advancing Multimodal Medical Capabilities of Gemini

    Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

    Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  27. arXiv:2405.01563  [pdf, other

    cs.LG cs.AI cs.CL

    Mitigating LLM Hallucinations via Conformal Abstention

    Authors: Yasin Abbasi Yadkori, Ilja Kuzborskij, David Stutz, András György, Adam Fisch, Arnaud Doucet, Iuliya Beloshapka, Wei-Hung Weng, Yao-Yuan Yang, Csaba Szepesvári, Ali Taylan Cemgil, Nenad Tomasev

    Abstract: We develop a principled procedure for determining when a large language model (LLM) should abstain from responding (e.g., by saying "I don't know") in a general domain, instead of resorting to possibly "hallucinating" a non-sensical or incorrect answer. Building on earlier approaches that use self-consistency as a more reliable measure of model confidence, we propose using the LLM itself to self-e… ▽ More

    Submitted 4 April, 2024; originally announced May 2024.

  28. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  29. arXiv:2404.01945  [pdf, other

    cs.CV

    Event-assisted Low-Light Video Object Segmentation

    Authors: Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun

    Abstract: In the realm of video object segmentation (VOS), the challenge of operating under low-light conditions persists, resulting in notably degraded image quality and compromised accuracy when comparing query and memory frames for similarity computation. Event cameras, characterized by their high dynamic range and ability to capture motion information of objects, offer promise in enhancing object visibi… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  30. arXiv:2403.02522  [pdf, other

    cs.LG cs.AI

    HeAR -- Health Acoustic Representations

    Authors: Sebastien Baur, Zaid Nabulsi, Wei-Hung Weng, Jake Garrison, Louis Blankemeier, Sam Fishman, Christina Chen, Sujay Kakarmath, Minyoi Maimbolwa, Nsala Sanjase, Brian Shuma, Yossi Matias, Greg S. Corrado, Shwetak Patel, Shravya Shetty, Shruthi Prabhakara, Monde Muyoyeta, Diego Ardila

    Abstract: Health acoustic sounds such as coughs and breaths are known to contain useful health signals with significant potential for monitoring health and disease, yet are underexplored in the medical machine learning community. The existing deep learning systems for health acoustics are often narrowly trained and evaluated on a single task, which is limited by data and may hinder generalization to other t… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 4 tables, 4 figures, 6 supplementary tables, 3 supplementary figures

  31. arXiv:2402.12237  [pdf, other

    cs.LG cs.AI cs.GT cs.HC cs.PF

    Learning to Defer in Content Moderation: The Human-AI Interplay

    Authors: Thodoris Lykouris, Wentao Weng

    Abstract: Successful content moderation in online platforms relies on a human-AI collaboration approach. A typical heuristic estimates the expected harmfulness of a post and uses fixed thresholds to decide whether to remove it and whether to send it for human review. This disregards the prediction uncertainty, the time-varying element of human review capacity and post arrivals, and the selective sampling in… ▽ More

    Submitted 2 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  32. arXiv:2402.11274  [pdf, other

    eess.IV cs.CV cs.LG

    TC-DiffRecon: Texture coordination MRI reconstruction method based on diffusion model and modified MF-UNet method

    Authors: Chenyan Zhang, Yifei Chen, Zhenxiong Fan, Yiyu Huang, Wenchao Weng, Ruiquan Ge, Dong Zeng, Changmiao Wang

    Abstract: Recently, diffusion models have gained significant attention as a novel set of deep learning-based generative methods. These models attempt to sample data from a Gaussian distribution that adheres to a target distribution, and have been successfully adapted to the reconstruction of MRI data. However, as an unconditional generative model, the diffusion model typically disrupts image coordination be… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 5 pages, 2 figures, accept ISBI2024

    Journal ref: ISBI 2024

  33. arXiv:2401.05446  [pdf, other

    eess.SP cs.AI cs.LG

    Self-supervised Learning for Electroencephalogram: A Systematic Survey

    Authors: Weining Weng, Yang Gu, Shuai Guo, Yuan Ma, Zhaohua Yang, Yuchen Liu, Yiqiang Chen

    Abstract: Electroencephalogram (EEG) is a non-invasive technique to record bioelectrical signals. Integrating supervised deep learning techniques with EEG signals has recently facilitated automatic analysis across diverse EEG-based tasks. However, the label issues of EEG signals have constrained the development of EEG-based deep models. Obtaining EEG annotations is difficult that requires domain experts to… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 35 pages, 12 figures

    MSC Class: 68-02 (Primarily); 68T01 (Secondary) ACM Class: I.2; J.3; I.5.4

  34. arXiv:2311.18834  [pdf, other

    cs.CV

    ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

    Authors: Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

    Abstract: We present ART$\boldsymbol{\cdot}$V, an efficient framework for auto-regressive video generation with diffusion models. Unlike existing methods that generate entire videos in one-shot, ART$\boldsymbol{\cdot}$V generates a single frame at a time, conditioned on the previous ones. The framework offers three distinct advantages. First, it only learns simple continual motions between adjacent frames,… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 24 pages, 21 figures. Project page at https://warranweng.github.io/art.v

  35. arXiv:2311.18829  [pdf, other

    cs.CV

    MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

    Authors: Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

    Abstract: We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy which divides the text-to-video into a two-stage process: text-to-image generation and image\&text-to-video generation. This strategy offers two signific… ▽ More

    Submitted 29 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Project page: https://wangyanhui666.github.io/MicroCinema.github.io/

  36. arXiv:2310.15646  [pdf, other

    cs.CV

    Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework

    Authors: Weixi Weng, Chun Yuan

    Abstract: Unsupervised domain adaptation object detection (UDAOD) research on Detection Transformer(DETR) mainly focuses on feature alignment and existing methods can be divided into two kinds, each of which has its unresolved issues. One-stage feature alignment methods can easily lead to performance fluctuation and training stagnation. Two-stage feature alignment method based on mean teacher comprises a pr… ▽ More

    Submitted 18 January, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: AAAI2024

  37. arXiv:2310.03747  [pdf, other

    eess.SP cs.AI cs.LG

    A Knowledge-Driven Cross-view Contrastive Learning for EEG Representation

    Authors: Weining Weng, Yang Gu, Qihui Zhang, Yingying Huang, Chunyan Miao, Yiqiang Chen

    Abstract: Due to the abundant neurophysiological information in the electroencephalogram (EEG) signal, EEG signals integrated with deep learning methods have gained substantial traction across numerous real-world tasks. However, the development of supervised learning methods based on EEG signals has been hindered by the high cost and significant label discrepancies to manually label large-scale EEG datasets… ▽ More

    Submitted 21 September, 2023; originally announced October 2023.

    Comments: 14pages,7 figures

    MSC Class: 68T30 Knowledge representation ACM Class: I.2.4; I.5.2; J.3.1

  38. arXiv:2309.17239  [pdf, other

    cs.CV

    EGVD: Event-Guided Video Deraining

    Authors: Yueyi Zhang, Jin Wang, Wenming Weng, Xiaoyan Sun, Zhiwei Xiong

    Abstract: With the rapid development of deep learning, video deraining has experienced significant progress. However, existing video deraining pipelines cannot achieve satisfying performance for scenes with rain layers of complex spatio-temporal distribution. In this paper, we approach video deraining by employing an event camera. As a neuromorphic sensor, the event camera suits scenes of non-uniform motion… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  39. arXiv:2309.16496  [pdf, other

    cs.CV

    CCEdit: Creative and Controllable Video Editing via Diffusion Models

    Authors: Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

    Abstract: In this paper, we present CCEdit, a versatile generative video editing framework based on diffusion models. Our approach employs a novel trident network structure that separates structure and appearance control, ensuring precise and creative editing capabilities. Utilizing the foundational ControlNet architecture, we maintain the structural integrity of the video during editing. The incorporation… ▽ More

    Submitted 6 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

  40. arXiv:2309.05843  [pdf, other

    cs.LG cs.SD eess.AS

    Optimizing Audio Augmentations for Contrastive Learning of Health-Related Acoustic Signals

    Authors: Louis Blankemeier, Sebastien Baur, Wei-Hung Weng, Jake Garrison, Yossi Matias, Shruthi Prabhakara, Diego Ardila, Zaid Nabulsi

    Abstract: Health-related acoustic signals, such as cough and breathing sounds, are relevant for medical diagnosis and continuous health monitoring. Most existing machine learning approaches for health acoustics are trained and evaluated on specific tasks, limiting their generalizability across various healthcare applications. In this paper, we leverage a self-supervised learning framework, SimCLR with a Slo… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 7 pages, 2 pages appendix, 2 figures, 5 appendix tables

  41. arXiv:2308.07817  [pdf, other

    cs.LG cs.DS cs.PF math.PR

    The Transient Cost of Learning in Queueing Systems

    Authors: Daniel Freund, Thodoris Lykouris, Wentao Weng

    Abstract: Queueing systems are widely applicable stochastic models with use cases in communication networks, healthcare, service systems, etc. Although their optimal control has been extensively studied, most existing approaches assume perfect knowledge of the system parameters. This assumption rarely holds in practice where there is parameter uncertainty, thus motivating a recent line of work on bandit lea… ▽ More

    Submitted 7 April, 2025; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: A condensed preliminary version of this work, titled "Quantifying the Cost of Learning in Queueing Systems", was accepted for presentation at the Conference on Neural Information Processing Systems (NeurIPS 2023)

  42. arXiv:2308.01317  [pdf

    cs.CV eess.IV

    ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

    Authors: Shawn Xu, Lin Yang, Christopher Kelly, Marcin Sieniek, Timo Kohlberger, Martin Ma, Wei-Hung Weng, Atilla Kiraly, Sahar Kazemzadeh, Zakkai Melamed, Jungyeon Park, Patricia Strachan, Yun Liu, Chuck Lau, Preeti Singh, Christina Chen, Mozziyar Etemadi, Sreenivasa Raju Kalidindi, Yossi Matias, Katherine Chou, Greg S. Corrado, Shravya Shetty, Daniel Tse, Shruthi Prabhakara, Daniel Golden , et al. (3 additional authors not shown)

    Abstract: In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR ach… ▽ More

    Submitted 7 September, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  43. arXiv:2305.05648  [pdf

    cs.CV cs.AI cs.LG

    Predicting Cardiovascular Disease Risk using Photoplethysmography and Deep Learning

    Authors: Wei-Hung Weng, Sebastien Baur, Mayank Daswani, Christina Chen, Lauren Harrell, Sujay Kakarmath, Mariam Jabara, Babak Behsaz, Cory Y. McLean, Yossi Matias, Greg S. Corrado, Shravya Shetty, Shruthi Prabhakara, Yun Liu, Goodarz Danaei, Diego Ardila

    Abstract: Cardiovascular diseases (CVDs) are responsible for a large proportion of premature deaths in low- and middle-income countries. Early CVD detection and intervention is critical in these populations, yet many existing CVD risk scores require a physical examination or lab measurements, which can be challenging in such health systems due to limited accessibility. Here we investigated the potential to… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: main: 24 pages (3 tables, 2 figures, 42 references), supplementary: 25 pages (9 tables, 4 figures, 11 references)

  44. arXiv:2302.11989  [pdf, other

    cs.SD cs.CL eess.AS

    Metric-oriented Speech Enhancement using Diffusion Probabilistic Model

    Authors: Chen Chen, Yuchen Hu, Weiwei Weng, Eng Siong Chng

    Abstract: Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data. However, the task-specific evaluation metric (e.g., PESQ) is usually non-differentiable and can not be directly constructed in the training criteria. This mismatch between the training objective and evaluation metric likely results in sub-optimal performanc… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP2023

  45. arXiv:2301.10642  [pdf, other

    cs.GT

    Group fairness in dynamic refugee assignment

    Authors: Daniel Freund, Thodoris Lykouris, Elisabeth Paulson, Bradley Sturt, Wentao Weng

    Abstract: Ensuring that refugees and asylum seekers thrive (e.g., find employment) in their host countries is a profound humanitarian goal, and a primary driver of employment is the geographic location within a host country to which the refugee or asylum seeker is assigned. Recent research has proposed and implemented algorithms that assign refugees and asylum seekers to geographic locations in a manner tha… ▽ More

    Submitted 21 January, 2025; v1 submitted 25 January, 2023; originally announced January 2023.

  46. Autonomous Cross Domain Adaptation under Extreme Label Scarcity

    Authors: Weiwei Weng, Mahardhika Pratama, Choiru Za'in, Marcus De Carvalho, Rakaraddi Appan, Andri Ashfahani, Edward Yapp Kien Yee

    Abstract: A cross domain multistream classification is a challenging problem calling for fast domain adaptations to handle different but related streams in never-ending and rapidly changing environments. Notwithstanding that existing multistream classifiers assume no labelled samples in the target stream, they still incur expensive labelling cost since they require fully labelled samples of the source strea… ▽ More

    Submitted 4 September, 2022; originally announced September 2022.

    Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2022

  47. arXiv:2206.03324  [pdf, other

    cs.LG

    Efficient decentralized multi-agent learning in asymmetric bipartite queueing systems

    Authors: Daniel Freund, Thodoris Lykouris, Wentao Weng

    Abstract: We study decentralized multi-agent learning in bipartite queueing systems, a standard model for service systems. In particular, N agents request service from K servers in a fully decentralized way, i.e, by running the same algorithm without communication. Previous decentralized algorithms are restricted to symmetric systems, have performance that is degrading exponentially in the number of servers… ▽ More

    Submitted 5 August, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: To appear in Operations Research. A preliminary version of this work was accepted for presentation at the Conference on Learning Theory (COLT) 2022. Compared to the first version of the paper, the current version expands upon the related work and adds intuition on the technical content

  48. arXiv:2112.02625  [pdf, other

    cs.LG cs.AI

    Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution View

    Authors: Di Jin, Elena Sergeeva, Wei-Hung Weng, Geeticka Chauhan, Peter Szolovits

    Abstract: The increasing availability of large collections of electronic health record (EHR) data and unprecedented technical advances in deep learning (DL) have sparked a surge of research interest in developing DL based clinical decision support systems for diagnosis, prognosis, and treatment. Despite the recognition of the value of deep learning in healthcare, impediments to further adoption in real heal… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

    Comments: The first four authors contributed equally, psz is the corresponding author. To appear as an advanced review in WIREs Mechanisms of Disease Journal

  49. arXiv:2111.09489  [pdf, ps, other

    cs.LG math.AP nlin.PS nlin.SI

    Data-driven discoveries of Bäcklund transforms and soliton evolution equations via deep neural network learning schemes

    Authors: Zijian Zhou, Li Wang, Weifang Weng, Zhenya Yan

    Abstract: We introduce a deep neural network learning scheme to learn the Bäcklund transforms (BTs) of soliton evolution equations and an enhanced deep learning scheme for data-driven soliton equation discovery based on the known BTs, respectively. The first scheme takes advantage of some solution (or soliton equation) information to study the data-driven BT of sine-Gordon equation, and complex and real Miu… ▽ More

    Submitted 21 March, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: 25 pages, 12 figures

    Journal ref: Physics Letters A 450 (2022) 128373

  50. arXiv:2109.14156  [pdf, other

    cs.PF eess.SY math.OC

    Labor-right Protecting Dispatch of Meal Delivery Platforms

    Authors: Wentao Weng, Yang Yu

    Abstract: The boom in the meal delivery industry brings growing concern about the labor rights of riders. Current dispatch policies of meal-delivery platforms focus mainly on satisfying consumers or minimizing the number of riders for cost savings. There are few discussions on improving the working conditions of riders by algorithm design. The lack of concerns on labor rights in mechanism and dispatch desig… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: 10 pages, 4 figures