这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–13 of 13 results for author: Kwon, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.16262  [pdf, ps, other

    cs.CV

    R3eVision: A Survey on Robust Rendering, Restoration, and Enhancement for 3D Low-Level Vision

    Authors: Weeyoung Kwon, Jeahun Sung, Minkyu Jeon, Chanho Eom, Jihyong Oh

    Abstract: Neural rendering methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have achieved significant progress in photorealistic 3D scene reconstruction and novel view synthesis. However, most existing models assume clean and high-resolution (HR) multi-view inputs, which limits their robustness under real-world degradations such as noise, blur, low-resolution (LR), and weather-… ▽ More

    Submitted 23 June, 2025; v1 submitted 19 June, 2025; originally announced June 2025.

    Comments: Please visit our project page at https://github.com/CMLab-Korea/Awesome-3D-Low-Level-Vision

  2. arXiv:2503.19786  [pdf, other

    cs.CL cs.AI

    Gemma 3 Technical Report

    Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

    Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  3. arXiv:2503.18292  [pdf, other

    cs.DC

    Jenga: Effective Memory Management for Serving LLM with Heterogeneity

    Authors: Chen Zhang, Kuntai Du, Shu Liu, Woosuk Kwon, Xiangxi Mo, Yufeng Wang, Xiaoxuan Liu, Kaichao You, Zhuohan Li, Mingsheng Long, Jidong Zhai, Joseph Gonzalez, Ion Stoica

    Abstract: Large language models (LLMs) are widely used but expensive to run, especially as inference workloads grow. To lower costs, maximizing the request batch size by managing GPU memory efficiently is crucial. While PagedAttention has recently been proposed to improve the efficiency of memory management, we find that the growing heterogeneity in the embeddings dimensions, attention, and access patterns… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 16 pages, 19 figures

  4. arXiv:2411.17651  [pdf, other

    cs.DC

    APEX: An Extensible and Dynamism-Aware Simulator for Automated Parallel Execution in LLM Serving

    Authors: Yi-Chien Lin, Woosuk Kwon, Ronald Pineda, Fanny Nina Paravecino

    Abstract: Efficiently serving Large Language Models (LLMs) requires selecting an optimal parallel execution plan, balancing computation, memory, and communication overhead. However, determining the best strategy is challenging due to varying parallelism techniques (data, pipeline, tensor) and workload characteristics (e.g., compute-intensive tasks with long prompts vs. memory-intensive tasks with long gener… ▽ More

    Submitted 29 April, 2025; v1 submitted 26 November, 2024; originally announced November 2024.

  5. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (173 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 October, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  6. arXiv:2406.14066  [pdf, other

    cs.AI cs.PF

    Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

    Authors: Xiaoxuan Liu, Cade Daniel, Langxiang Hu, Woosuk Kwon, Zhuohan Li, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang

    Abstract: Reducing the inference latency of large language models (LLMs) is crucial, and speculative decoding (SD) stands out as one of the most effective techniques. Rather than letting the LLM generate all tokens directly, speculative decoding employs effective proxies to predict potential outputs, which are then verified by the LLM without compromising the generation quality. Yet, deploying SD in real on… ▽ More

    Submitted 25 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  7. arXiv:2309.06180  [pdf, other

    cs.LG cs.DC

    Efficient Memory Management for Large Language Model Serving with PagedAttention

    Authors: Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

    Abstract: High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamically. When managed inefficiently, this memory can be significantly wasted by fragmentation and redundant duplication, limiting the batch size. To address… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: SOSP 2023

  8. arXiv:2308.07741  [pdf, other

    cs.RO cs.LG

    Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World

    Authors: Nico Gürtler, Felix Widmaier, Cansu Sancaktar, Sebastian Blaes, Pavel Kolev, Stefan Bauer, Manuel Wüthrich, Markus Wulfmeier, Martin Riedmiller, Arthur Allshire, Qiang Wang, Robert McCarthy, Hangyeol Kim, Jongchan Baek, Wookyong Kwon, Shanliang Qian, Yasunori Toshimitsu, Mike Yan Michelis, Amirhossein Kazemipour, Arman Raayatsanati, Hehui Zheng, Barnabas Gavin Cangan, Bernhard Schölkopf, Georg Martius

    Abstract: Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore… ▽ More

    Submitted 24 November, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Typo in author list fixed

  9. arXiv:2211.06522  [pdf

    eess.IV cs.CV q-bio.QM

    Deep Learning Generates Synthetic Cancer Histology for Explainability and Education

    Authors: James M. Dolezal, Rachelle Wolk, Hanna M. Hieromnimon, Frederick M. Howard, Andrew Srisuwananukorn, Dmitry Karpeyev, Siddhi Ramesh, Sara Kochanny, Jung Woo Kwon, Meghana Agni, Richard C. Simon, Chandni Desai, Raghad Kherallah, Tung D. Nguyen, Jefree J. Schulte, Kimberly Cole, Galina Khramtsova, Marina Chiara Garassino, Aliya N. Husain, Huihua Li, Robert Grossman, Nicole A. Cipriani, Alexander T. Pearson

    Abstract: Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic fea… ▽ More

    Submitted 9 December, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

  10. arXiv:2204.09656  [pdf, other

    cs.CL cs.LG

    A Fast Post-Training Pruning Framework for Transformers

    Authors: Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami

    Abstract: Pruning is an effective way to reduce the huge inference cost of Transformer models. However, prior work on pruning Transformers requires retraining the models. This can add high training cost and high complexity to model deployment, making it difficult to use in many practical situations. To address this, we propose a fast post-training pruning framework for Transformers that does not require any… ▽ More

    Submitted 17 October, 2022; v1 submitted 29 March, 2022; originally announced April 2022.

    Comments: NeurIPS 2022

  11. arXiv:2107.00910  [pdf, other

    cs.CL

    Learned Token Pruning for Transformers

    Authors: Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Woosuk Kwon, Joseph Hassoun, Kurt Keutzer

    Abstract: Deploying transformer models in practice is challenging due to their inference cost, which scales quadratically with input sequence length. To address this, we present a novel Learned Token Pruning (LTP) method which adaptively removes unimportant tokens as an input sequence passes through transformer layers. In particular, LTP prunes tokens with an attention score below a threshold value which is… ▽ More

    Submitted 2 June, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: KDD 2022 (Research Track)

  12. arXiv:2012.02732  [pdf, other

    cs.LG cs.DC

    Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

    Authors: Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, Byung-Gon Chun

    Abstract: Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference and training. Ideally, DL frameworks should be able to fully utilize the computation power of GPUs such that the running time depends on the amount of computation assigned to GPUs. Yet, we observe that in scheduling GPU tasks, existing DL frameworks suffer from inefficiencies such as large scheduling overhead… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

    Comments: In NeurIPS 2020

  13. Selective Distillation of Weakly Annotated GTD for Vision-based Slab Identification System

    Authors: Sang Jun Lee, Sang Woo Kim, Wookyong Kwon, Gyogwon Koo, Jong Pil Yun

    Abstract: This paper proposes an algorithm for recognizing slab identification numbers in factory scenes. In the development of a deep-learning based system, manual labeling to make ground truth data (GTD) is an important but expensive task. Furthermore, the quality of GTD is closely related to the performance of a supervised learning algorithm. To reduce manual work in the labeling process, we generated we… ▽ More

    Submitted 13 December, 2018; v1 submitted 9 October, 2018; originally announced October 2018.

    Comments: 10 pages, 12 figures, submitted to a journal

    Journal ref: IEEE Access 7 (2019) 23177-23186