这是indexloc提供的服务,不要输入任何密码
Skip to content

core:os and core:os/os2: read on windows loses/corrupts code points #5901

@Neirokan

Description

@Neirokan

Context

    Odin:    dev-2025-11-nightly:1fb60c4
    OS:      Windows 11 Professional (version: 24H2), build 26100.3476
    CPU:     AMD Ryzen 5 5600X 6-Core Processor
    RAM:     32693 MiB
    Backend: LLVM 20.1.0

Requires UTF-8 encoding support in terminal to properly test this.
Also technically it's multiple bugs, but they all related to a single procedure in the implementation - read_console:

read_console :: proc(handle: win32.HANDLE, b: []byte) -> (n: int, err: Error) {

Expected Behavior

Behavior

Basically I expect read_console not to corrupt information when buffer is too small to fit full line in one call by preserving a little info between calls (at least 1 code point) but the full list is:

  1. read_console properly copies input into buffer
  2. read_console properly reads utf-16 surrogate pairs even if it's separated across 2 win32.ReadConsoleW calls.
  3. if the last code point doesn't fully fit into the buffer, read_console spits out the rest of it on next call OR holds the whole code point until next call (if current call has written at least something? I'm not sure what should be the proper behavior if len(buffer) < 4). Both ways works for me, but I would prefer the second one.

Output for provided example

[240, 159, 153, 130, 240, 159, 153, 130, 13, 10]
[240, 159, 153, 130, 13, 10]

Current Behavior

Behavior

  1. read_console loses part of input because copying loop compares n+i < len(b), but both variables are incremented (n in the end of loop), so it should compare only n < len(b) (actually it's When reading from os2.stdin.stream using io.read_full, input is sometimes skipped or lost on Windows. #5086, but because of other 2 points I decided to create this issue)
    for i := 0; i < len(src) && n+i < len(b); i += 1 {
  2. read_console replaces surrogate pairs with 2 REPLACEMENT_CHAR's if it gets split across 2 win32.ReadConsoleW calls.
  3. read_console copies only part of code point if it doesn't fit into buffer and loses the rest of it.

Output for provided example

[240, 159, 153, 130, 239, 191, 239, 191, 189, 240, 159, 10]

It actually combines all bugs, since it consists of: [240, 159, 153, 130] (full emoji), [239, 191] (truncated replacement character), [239, 191, 189] (replacement character), [240, 159] (truncated emoji) and [10] ('\n').

Steps to Reproduce

Code:

package main

import "core:fmt"
import "core:os/os2"

main :: proc() {
	buf: [12]u8
	for {
		n, err := os2.read(os2.stdin, buf[:])
		assert(err == nil)
		fmt.printf("%d\n", buf[:n])
	}
}
  1. Open cmd in Windows Terminal
  2. Write chcp 65001
  3. Compile and run example code
  4. Paste into terminal 🙂🙂 and press Enter
  5. Paste into terminal 🙂 and press Enter

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugreplicatedWe were able to replicate the bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions