这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–35 of 35 results for author: Lai, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2510.26692  [pdf, ps, other

    cs.CL cs.LG

    Kimi Linear: An Expressive, Efficient Attention Architecture

    Authors: Kimi Team, Yu Zhang, Zongyu Lin, Xingcheng Yao, Jiaxi Hu, Fanqing Meng, Chengyin Liu, Xin Men, Songlin Yang, Zhiyuan Li, Wentao Li, Enzhe Lu, Weizhou Liu, Yanru Chen, Weixin Xu, Longhui Yu, Yejie Wang, Yu Fan, Longguang Zhong, Enming Yuan, Dehao Zhang, Yizhi Zhang, T. Y. Liu, Haiming Wang, Shengjun Fang , et al. (35 additional authors not shown)

    Abstract: We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-context, long-context, and reinforcement learning (RL) scaling regimes. At its core lies Kimi Delta Attention (KDA), an expressive linear attention module that extends Gated DeltaNet with a finer-grained gating mech… ▽ More

    Submitted 1 November, 2025; v1 submitted 30 October, 2025; originally announced October 2025.

    Comments: Kimi Linear tech report

  2. arXiv:2509.22580  [pdf, ps, other

    cs.LG

    The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?

    Authors: Guannan Lai, Da-Wei Zhou, Xin Yang, Han-Jia Ye

    Abstract: Class Incremental Learning (CIL) requires models to continuously learn new classes without forgetting previously learned ones, while maintaining stable performance across all possible class sequences. In real-world settings, the order in which classes arrive is diverse and unpredictable, and model performance can vary substantially across different sequences. Yet mainstream evaluation protocols ca… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  3. arXiv:2507.20534  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Kimi K2: Open Agentic Intelligence

    Authors: Kimi Team, Yifan Bai, Yiping Bao, Guanduo Chen, Jiahao Chen, Ningxin Chen, Ruijue Chen, Yanru Chen, Yuankun Chen, Yutian Chen, Zhuofu Chen, Jialei Cui, Hao Ding, Mengnan Dong, Angang Du, Chenzhuang Du, Dikang Du, Yulun Du, Yu Fan, Yichen Feng, Kelin Fu, Bofei Gao, Hongcheng Gao, Peizhong Gao, Tong Gao , et al. (144 additional authors not shown)

    Abstract: We introduce Kimi K2, a Mixture-of-Experts (MoE) large language model with 32 billion activated parameters and 1 trillion total parameters. We propose the MuonClip optimizer, which improves upon Muon with a novel QK-clip technique to address training instability while enjoying the advanced token efficiency of Muon. Based on MuonClip, K2 was pre-trained on 15.5 trillion tokens with zero loss spike.… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

    Comments: tech report of Kimi K2

  4. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3410 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 16 October, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  5. arXiv:2504.18425  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.MM cs.SD

    Kimi-Audio Technical Report

    Authors: KimiTeam, Ding Ding, Zeqian Ju, Yichong Leng, Songxiang Liu, Tong Liu, Zeyu Shang, Kai Shen, Wei Song, Xu Tan, Heyi Tang, Zhengtao Wang, Chu Wei, Yifei Xin, Xinran Xu, Jianwei Yu, Yutao Zhang, Xinyu Zhou, Y. Charles, Jun Chen, Yanru Chen, Yulun Du, Weiran He, Zhenxing Hu, Guokun Lai , et al. (15 additional authors not shown)

    Abstract: We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation. Specifically, we leverage a 12.5Hz audio tokenizer, design a novel LLM-based architecture with continuous features as input a… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  6. arXiv:2504.07491  [pdf, ps, other

    cs.CV

    Kimi-VL Technical Report

    Authors: Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang , et al. (70 additional authors not shown)

    Abstract: We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B parameters in its language decoder (Kimi-VL-A3B). Kimi-VL demonstrates strong performance across challenging domains: as a general-purpose VLM, Kimi-VL excels in multi-… ▽ More

    Submitted 23 June, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: Updated Kimi-VL-A3B-Thinking-2506 information

  7. arXiv:2503.20320  [pdf

    cs.CL cs.AI cs.ET

    Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models

    Authors: Shih-Wen Ke, Guan-Yu Lai, Guo-Lin Fang, Hsi-Yuan Kao

    Abstract: Large language models (LLMs) are designed to align with human values in their responses. This study exploits LLMs with an iterative prompting technique where each prompt is systematically modified and refined across multiple iterations to enhance its effectiveness in jailbreaking attacks progressively. This technique involves analyzing the response patterns of LLMs, including GPT-3.5, GPT-4, LLaMa… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  8. arXiv:2502.20124  [pdf, other

    cs.LG cs.AI

    Exploring Open-world Continual Learning with Knowns-Unknowns Knowledge Transfer

    Authors: Yujie Li, Guannan Lai, Xin Yang, Yonghao Li, Marcello Bonsangue, Tianrui Li

    Abstract: Open-World Continual Learning (OWCL) is a challenging paradigm where models must incrementally learn new knowledge without forgetting while operating under an open-world assumption. This requires handling incomplete training data and recognizing unknown samples during inference. However, existing OWCL methods often treat open detection and continual learning as separate tasks, limiting their abili… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  9. arXiv:2502.20032  [pdf, other

    cs.LG cs.AI

    Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping

    Authors: Guannan Lai, Yujie Li, Xiangkun Wang, Junbo Zhang, Tianrui Li, Xin Yang

    Abstract: Class Incremental Learning (CIL) aims to enable models to learn new classes sequentially while retaining knowledge of previous ones. Although current methods have alleviated catastrophic forgetting (CF), recent studies highlight that the performance of CIL models is highly sensitive to the order of class arrival, particularly when sequentially introduced classes exhibit high inter-class similarity… ▽ More

    Submitted 17 March, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: Accepted to CVPR 2025

    MSC Class: 68T05; 68Q25; 68U05 ACM Class: I.2.6; I.2.10

  10. arXiv:2502.16982  [pdf, other

    cs.LG cs.AI cs.CL

    Muon is Scalable for LLM Training

    Authors: Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yidao Qin, Weixin Xu, Enzhe Lu, Junjie Yan, Yanru Chen, Huabin Zheng, Yibo Liu, Shaowei Liu, Bohong Yin, Weiran He, Han Zhu, Yuzhi Wang, Jianzhou Wang, Mengnan Dong, Zheng Zhang, Yongsheng Kang, Hao Zhang, Xinran Xu, Yutao Zhang , et al. (3 additional authors not shown)

    Abstract: Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not been proven. We identify two crucial techniques for scaling up Muon: (1) adding weight decay and (2) carefully adjusting the per-parameter update scale. These techniques allow Muon to work out-of-the-box on large-scale… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  11. arXiv:2502.13189  [pdf, other

    cs.LG cs.AI cs.CL

    MoBA: Mixture of Block Attention for Long-Context LLMs

    Authors: Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu

    Abstract: Scaling the effective context length is essential for advancing large language models (LLMs) toward artificial general intelligence (AGI). However, the quadratic increase in computational complexity inherent in traditional attention mechanisms presents a prohibitive overhead. Existing approaches either impose strongly biased structures, such as sink or window attention which are task-specific, or… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: 15 pages

  12. arXiv:2501.12599  [pdf, ps, other

    cs.AI cs.LG

    Kimi k1.5: Scaling Reinforcement Learning with LLMs

    Authors: Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxiang Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang , et al. (71 additional authors not shown)

    Abstract: Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior pu… ▽ More

    Submitted 2 June, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: 25 pages

  13. arXiv:2501.04940  [pdf, other

    cs.LG cs.CV

    A New Perspective on Privacy Protection in Federated Learning with Granular-Ball Computing

    Authors: Guannan Lai, Yihui Feng, Xin Yang, Xiaoyu Deng, Hao Yu, Shuyin Xia, Guoyin Wang, Tianrui Li

    Abstract: Federated Learning (FL) facilitates collaborative model training while prioritizing privacy by avoiding direct data sharing. However, most existing articles attempt to address challenges within the model's internal parameters and corresponding outputs, while neglecting to solve them at the input level. To address this gap, we propose a novel framework called Granular-Ball Federated Learning (GrBFL… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  14. arXiv:2411.00168  [pdf

    cs.HC cs.AI

    Creativity in the Age of AI: Evaluating the Impact of Generative AI on Design Outputs and Designers' Creative Thinking

    Authors: Yue Fu, Han Bin, Tony Zhou, Marx Wang, Yixin Chen, Zelia Gomes Da Costa Lai, Jacob O. Wobbrock, Alexis Hiniker

    Abstract: As generative AI (GenAI) increasingly permeates design workflows, its impact on design outcomes and designers' creative capabilities warrants investigation. We conducted a within-subjects experiment where we asked participants to design advertisements both with and without GenAI support. Our results show that expert evaluators rated GenAI-supported designs as more creative and unconventional ("wei… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

  15. arXiv:2410.15311  [pdf, other

    cs.AI cs.CL cs.CY

    Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game

    Authors: Ruiqi Dong, Zhixuan Liao, Guangwei Lai, Yuhan Ma, Danni Ma, Chenyou Fan

    Abstract: Large Language Models (LLMs) are pivotal AI agents in complex tasks but still face challenges in open decision-making problems within complex scenarios. To address this, we use the language logic game ``Who is Undercover?'' (WIU) as an experimental platform to propose the Multi-Perspective Team Tactic (MPTT) framework. MPTT aims to cultivate LLMs' human-like language expression logic, multi-dimens… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  16. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  17. arXiv:2401.09695  [pdf

    cs.HC cs.AI

    Should ChatGPT Write Your Breakup Text? Exploring the Role of AI in Relationship Dissolution

    Authors: Yue Fu, Yixin Chen, Zelia Gomes Da Costa Lai, Alexis Hiniker

    Abstract: Relationships are essential to our happiness and wellbeing, yet their dissolution-the final stage of a relationship's lifecycle-is among the most stressful events individuals can experience, often leading to profound and lasting impacts. With the breakup process increasingly facilitated by technology, such as computer-mediated communication, and the likely future influence of generative AI (GenAI)… ▽ More

    Submitted 31 October, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Report number: 2025 Computer-Supported Cooperative Work & Social Computing (CSCW)

  18. arXiv:2311.10614  [pdf, other

    cs.CL cs.AI

    A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest

    Authors: Ruohong Zhang, Luyu Gao, Chen Zheng, Zhen Fan, Guokun Lai, Zheng Zhang, Fangzhou Ai, Yiming Yang, Hongxia Yang

    Abstract: Large Language Models (LLMs), despite their great power in language generation, often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains. This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries.… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: Work in progress

  19. arXiv:2207.06366  [pdf, other

    cs.CL cs.LG

    N-Grammer: Augmenting Transformers with latent n-grams

    Authors: Aurko Roy, Rohan Anil, Guangda Lai, Benjamin Lee, Jeffrey Zhao, Shuyuan Zhang, Shibo Wang, Ye Zhang, Shen Wu, Rigel Swavely, Tao, Yu, Phuong Dao, Christopher Fifty, Zhifeng Chen, Yonghui Wu

    Abstract: Transformer models have recently emerged as one of the foundational models in natural language processing, and as a byproduct, there is significant recent interest and investment in scaling these models. However, the training and inference costs of these large Transformer language models are prohibitive, thus necessitating more research in identifying more efficient variants. In this work, we prop… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 8 pages, 2 figures

  20. arXiv:2204.07705  [pdf, other

    cs.CL cs.AI

    Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

    Authors: Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza , et al. (15 additional authors not shown)

    Abstract: How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting,… ▽ More

    Submitted 24 October, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: Accepted to EMNLP 2022, 25 pages

  21. arXiv:2204.02604  [pdf, other

    cs.NE

    Interactive Evolutionary Multi-Objective Optimization via Learning-to-Rank

    Authors: Ke Li, Guiyu Lai, Xin Yao

    Abstract: In practical multi-criterion decision-making, it is cumbersome if a decision maker (DM) is asked to choose among a set of trade-off alternatives covering the whole Pareto-optimal front. This is a paradox in conventional evolutionary multi-objective optimization (EMO) that always aim to achieve a well balance between convergence and diversity. In essence, the ultimate goal of multi-objective optimi… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

  22. Two Decades of Game Jams

    Authors: Gorm Lai, Annakaisa Kultima, Foaad Khosmood, Johanna Pirker, Allan Fowler, Ilaria Vecchi, William Latham, Frederic Fol Leymarie

    Abstract: In less than a year's time, March 2022 will mark the twentieth anniversary of the first documented game jam, the Indie Game Jam, which took place in Oakland, California in 2002. Initially, game jams were widely seen as frivolous activities. Since then, they have taken the world by storm. Game jams have not only become part of the day-to-day process of many game developers, but jams are also used f… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Journal ref: ICGJ 2021: Sixth Annual International Conference on Game Jams, Hackathons, and Game Creation Events

  23. arXiv:2009.08595  [pdf, ps, other

    cs.CL

    Unsupervised Parallel Corpus Mining on Web Data

    Authors: Guokun Lai, Zihang Dai, Yiming Yang

    Abstract: With a large amount of parallel data, neural machine translation systems are able to deliver human-level performance for sentence-level translation. However, it is costly to label a large amount of parallel data by humans. In contrast, there is a large-scale of parallel corpus created by humans on the Internet. The major difficulty to utilize them is how to filter them out from the noise website e… ▽ More

    Submitted 17 September, 2020; originally announced September 2020.

  24. arXiv:2006.03236  [pdf, other

    cs.LG cs.CL stat.ML

    Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

    Authors: Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le

    Abstract: With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence. With… ▽ More

    Submitted 5 June, 2020; originally announced June 2020.

  25. arXiv:2005.09324  [pdf, other

    cs.HC

    Towards Friendly Mixed Initiative Procedural Content Generation: Three Pillars of Industry

    Authors: Gorm Lai, William Latham, Frederic Fol Leymarie

    Abstract: While the games industry is moving towards procedural content generation (PCG) with tools available under popular platforms such as Unreal, Unity or Houdini, and video game titles like No Man's Sky and Horizon Zero Dawn taking advantage of PCG, the gap between academia and industry is as wide as it has ever been, in terms of communication and sharing methods. One of the authors, has worked on both… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  26. arXiv:2004.11934  [pdf, other

    cs.LG stat.ML

    Correlation-aware Unsupervised Change-point Detection via Graph Neural Networks

    Authors: Ruohong Zhang, Yu Hao, Donghan Yu, Wei-Cheng Chang, Guokun Lai, Yiming Yang

    Abstract: Change-point detection (CPD) aims to detect abrupt changes over time series data. Intuitively, effective CPD over multivariate time series should require explicit modeling of the dependencies across input variables. However, existing CPD methods either ignore the dependency structures entirely or rely on the (unrealistic) assumption that the correlation structures are static over time. In this pap… ▽ More

    Submitted 13 September, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: Accepted for publication in the International Conference on Neural Information Processing (ICONIP) 2020 Original paper is 12 pages, additional appendix is available on arxiv

    MSC Class: I.2.6

    Journal ref: ICONIP 2020: Neural Information Processing

  27. arXiv:2004.01170  [pdf, other

    cs.CV

    DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes

    Authors: Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi

    Abstract: We propose DOPS, a fast single-stage 3D object detection method for LIDAR data. Previous methods often make domain-specific design decisions, for example projecting points into a bird-eye view image in autonomous driving scenarios. In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes. The core novelty of our method is a fast, single-pass architecture that b… ▽ More

    Submitted 6 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: To appear in CVPR 2020

  28. arXiv:1909.07009  [pdf, other

    cs.CL

    Bridging the domain gap in cross-lingual document classification

    Authors: Guokun Lai, Barlas Oguz, Yiming Yang, Veselin Stoyanov

    Abstract: The scarcity of labeled training data often prohibits the internationalization of NLP models to multiple languages. Recent developments in cross-lingual understanding (XLU) has made progress in this area, trying to bridge the language barrier using language universal representations. However, even if the language problem was resolved, models trained in one language would not transfer to another la… ▽ More

    Submitted 20 September, 2019; v1 submitted 16 September, 2019; originally announced September 2019.

  29. Introducing: The Game Jam License

    Authors: Gorm Lai, Kai Erenli, Foaad Khosmood, William Latham

    Abstract: Since their inception at the Indie Game Jam in 2002, a significant part of game jams has been knowledge sharing and showcasing ideas and work to peers. While various licensing mechanisms have been used for game jams throughout the years, there has never been a licence uniquely designed for artifacts created during a game jam. In this paper, we present to the community the Game Jam License (GJL) wh… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

  30. arXiv:1902.01388  [pdf, ps, other

    cs.LG stat.ML

    Re-examination of the Role of Latent Variables in Sequence Modeling

    Authors: Zihang Dai, Guokun Lai, Yiming Yang, Shinjae Yoo

    Abstract: With latent variables, stochastic recurrent models have achieved state-of-the-art performance in modeling sound-wave sequence. However, opposite results are also observed in other domains, where standard recurrent networks often outperform stochastic models. To better understand this discrepancy, we re-examine the roles of latent variables in stochastic recurrent models for speech density estimati… ▽ More

    Submitted 16 September, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: Code available at https://github.com/zihangdai/reexamine-srnn, accepted by NeurIPS 2019

  31. arXiv:1806.06116  [pdf, other

    cs.LG stat.ML

    Stochastic WaveNet: A Generative Latent Variable Model for Sequential Data

    Authors: Guokun Lai, Bohan Li, Guoqing Zheng, Yiming Yang

    Abstract: How to model distribution of sequential data, including but not limited to speech and human motions, is an important ongoing research problem. It has been demonstrated that model capacity can be significantly enhanced by introducing stochastic latent variables in the hidden states of recurrent neural networks. Simultaneously, WaveNet, equipped with dilated convolutions, achieves astonishing empiri… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

    Comments: ICML 2018 Workshop

  32. arXiv:1711.03225  [pdf, other

    cs.CL cs.AI

    Large-scale Cloze Test Dataset Created by Teachers

    Authors: Qizhe Xie, Guokun Lai, Zihang Dai, Eduard Hovy

    Abstract: Cloze tests are widely adopted in language exams to evaluate students' language proficiency. In this paper, we propose the first large-scale human-created cloze test dataset CLOTH, containing questions used in middle-school and high-school language exams. With missing blanks carefully created by teachers and candidate choices purposely designed to be nuanced, CLOTH requires a deeper language under… ▽ More

    Submitted 27 August, 2018; v1 submitted 8 November, 2017; originally announced November 2017.

    Comments: EMNLP 2018

  33. arXiv:1710.11577  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Depthwise Separable Graph Convolution from Data Manifold

    Authors: Guokun Lai, Hanxiao Liu, Yiming Yang

    Abstract: Convolution Neural Network (CNN) has gained tremendous success in computer vision tasks with its outstanding ability to capture the local latent features. Recently, there has been an increasing interest in extending convolution operations to the non-Euclidean geometry. Although various types of convolution operations have been proposed for graphs or manifolds, their connections with traditional co… ▽ More

    Submitted 8 November, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

  34. arXiv:1704.04683  [pdf, other

    cs.CL cs.AI cs.LG

    RACE: Large-scale ReAding Comprehension Dataset From Examinations

    Authors: Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy

    Abstract: We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluat… ▽ More

    Submitted 5 December, 2017; v1 submitted 15 April, 2017; originally announced April 2017.

    Comments: EMNLP 2017

  35. arXiv:1703.07015  [pdf, other

    cs.LG

    Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

    Authors: Guokun Lai, Wei-Cheng Chang, Yiming Yang, Hanxiao Liu

    Abstract: Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. Temporal data arise in these real-world applications often involves a mixture of long-term and short-term patterns, for which traditional approaches such as Autoregressive models and Gaussian Proce… ▽ More

    Submitted 18 April, 2018; v1 submitted 20 March, 2017; originally announced March 2017.

    Comments: Accepted by SIGIR 2018