-
Notifications
You must be signed in to change notification settings - Fork 79
Open
Description
Problem Description
The semantic highlighting feature in Vim LSP client has a critical issue
when handling multi-byte characters (such as Chinese, Japanese, Korean,
emojis, etc.). The highlighting ranges appear shorter than expected for
text containing these characters.
Example
text: > [!Warning]
pos.character: 3
text->strutf16len(true): 12
text->strdisplaywidth: 12
text->charidx(pos.character, true, true): 3
text: > [!Warning]
pos.character: 11
text->strutf16len(true): 12
text->strdisplaywidth: 12
text->charidx(pos.character, true, true): 11
text: - [[《说理与思辨》]]
pos.character: 4
text->strutf16len(true): 13
text->strdisplaywidth: 20
text->charidx(pos.character, true, true): 4
text: - [[《说理与思辨》]]
pos.character: 11
text->strutf16len(true): 13
text->strdisplaywidth: 20
text->charidx(pos.character, true, true): 11
Root Cause Analysis
- Vim's
charidx()
: Correctly converts UTF-16 positions to character positions - Vim's
prop_add_list()
: Expects byte positions for text property ranges
For multi-byte characters:
- 1 character position ≠ 1 byte position
- Chinese characters typically use 3 bytes in UTF-8
- This caused the highlighting ranges to be 1/3 of the expected length for
Chinese text
Fix
a demo fix, wasn't tested:
diff --git a/autoload/lsp/offset.vim b/autoload/lsp/offset.vim
index 699964e..a0d7b4e 100644
--- a/autoload/lsp/offset.vim
+++ b/autoload/lsp/offset.vim
@@ -69,7 +69,7 @@ export def DecodePosition(lspserver: dict<any>, bnr: number, pos: dict<number>)
pos.character = text->strchars()
else
if lspserver.posEncoding == 16
- pos.character = text->charidx(pos.character, true, true)
+ pos.character = text->byteidx(pos.character, true)
else
pos.character = text->charidx(pos.character, true)
endif
Metadata
Metadata
Assignees
Labels
No labels