Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective

Mao, Yuxin; Qin, Zhen; Zhou, Jinxing; Deng, Hui; Shen, Xuyang; Fan, Bin; Zhang, Jing; Zhong, Yiran; Dai, Yuchao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.01652 (cs)

[Submitted on 2 Jul 2025]

Title:Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective

Authors:Yuxin Mao, Zhen Qin, Jinxing Zhou, Hui Deng, Xuyang Shen, Bin Fan, Jing Zhang, Yiran Zhong, Yuchao Dai

View PDF HTML (experimental)

Abstract:Autoregressive (AR) models have garnered significant attention in image generation for their ability to effectively capture both local and global structures within visual data. However, prevalent AR models predominantly rely on the transformer architectures, which are beset by quadratic computational complexity concerning input sequence length and substantial memory overhead due to the necessity of maintaining key-value caches. Although linear attention mechanisms have successfully reduced this burden in language models, our initial experiments reveal that they significantly degrade image generation quality because of their inability to capture critical long-range dependencies in visual data. We propose Linear Attention with Spatial-Aware Decay (LASAD), a novel attention mechanism that explicitly preserves genuine 2D spatial relationships within the flattened image sequences by computing position-dependent decay factors based on true 2D spatial location rather than 1D sequence positions. Based on this mechanism, we present LASADGen, an autoregressive image generator that enables selective attention to relevant spatial contexts with linear complexity. Experiments on ImageNet show LASADGen achieves state-of-the-art image generation performance and computational efficiency, bridging the gap between linear attention's efficiency and spatial understanding needed for high-quality generation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Cite as:	arXiv:2507.01652 [cs.CV]
	(or arXiv:2507.01652v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.01652

Submission history

From: Yuxin Mao [view email]
[v1] Wed, 2 Jul 2025 12:27:06 UTC (482 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators