-
-
Notifications
You must be signed in to change notification settings - Fork 849
Description
Context
Odin: dev-2025-11-nightly:1fb60c4
OS: Windows 11 Professional (version: 24H2), build 26100.3476
CPU: AMD Ryzen 5 5600X 6-Core Processor
RAM: 32693 MiB
Backend: LLVM 20.1.0
Requires UTF-8 encoding support in terminal to properly test this.
Also technically it's multiple bugs, but they all related to a single procedure in the implementation - read_console:
Odin/core/os/os2/file_windows.odin
Line 317 in 1fb60c4
| read_console :: proc(handle: win32.HANDLE, b: []byte) -> (n: int, err: Error) { |
Expected Behavior
Behavior
Basically I expect read_console not to corrupt information when buffer is too small to fit full line in one call by preserving a little info between calls (at least 1 code point) but the full list is:
read_consoleproperly copies input into bufferread_consoleproperly reads utf-16 surrogate pairs even if it's separated across 2win32.ReadConsoleWcalls.- if the last code point doesn't fully fit into the buffer,
read_consolespits out the rest of it on next call OR holds the whole code point until next call (if current call has written at least something? I'm not sure what should be the proper behavior iflen(buffer) < 4). Both ways works for me, but I would prefer the second one.
Output for provided example
[240, 159, 153, 130, 240, 159, 153, 130, 13, 10]
[240, 159, 153, 130, 13, 10]
Current Behavior
Behavior
read_consoleloses part of input because copying loop comparesn+i < len(b), but both variables are incremented (nin the end of loop), so it should compare onlyn < len(b)(actually it's When reading fromos2.stdin.streamusingio.read_full, input is sometimes skipped or lost on Windows. #5086, but because of other 2 points I decided to create this issue)
Odin/core/os/os2/file_windows.odin
Line 344 in 1fb60c4
for i := 0; i < len(src) && n+i < len(b); i += 1 { read_consolereplaces surrogate pairs with 2 REPLACEMENT_CHAR's if it gets split across 2win32.ReadConsoleWcalls.read_consolecopies only part of code point if it doesn't fit into buffer and loses the rest of it.
Output for provided example
[240, 159, 153, 130, 239, 191, 239, 191, 189, 240, 159, 10]
It actually combines all bugs, since it consists of: [240, 159, 153, 130] (full emoji), [239, 191] (truncated replacement character), [239, 191, 189] (replacement character), [240, 159] (truncated emoji) and [10] ('\n').
Steps to Reproduce
Code:
package main
import "core:fmt"
import "core:os/os2"
main :: proc() {
buf: [12]u8
for {
n, err := os2.read(os2.stdin, buf[:])
assert(err == nil)
fmt.printf("%d\n", buf[:n])
}
}- Open cmd in Windows Terminal
- Write
chcp 65001 - Compile and run example code
- Paste into terminal
🙂🙂and press Enter - Paste into terminal
🙂and press Enter