-
-
Notifications
You must be signed in to change notification settings - Fork 714
fix(yaml_parser): tokens out of order #7168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
WalkthroughThe YAML lexer and parser underwent a significant refactor to simplify token and scope management. Wrapper structs for token scoping were removed, replaced by direct token handling with explicit caching for scope-closing tokens. Several new helper methods were introduced, and method signatures were updated accordingly. Numerous new YAML test cases were added and existing ones updated to reflect these lexer changes. Changes
Sequence Diagram(s)sequenceDiagram
participant Source as YAML Source
participant Lexer as YamlLexer
participant Tokens as LinkedList<LexToken>
participant Parser as Parser
Source->>Lexer: Provide YAML input
loop While not EOF
Lexer->>Lexer: Consume next token (direct, no wrappers)
alt Scope starts or ends
Lexer->>Lexer: Cache or emit scope-closing tokens
end
Lexer->>Tokens: Append LexToken(s)
end
Tokens->>Parser: Provide token stream
Parser->>Parser: Parse tokens (using expect for COLON, etc.)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Suggested reviewers
Poem
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
16c2eaf
to
c9fa99c
Compare
CodSpeed Performance ReportMerging #7168 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/sequence_follow_by_map.yaml (1)
1-3
: Good negative test; minor naming nit.Scenario is valid for an error test (sequence item followed by a mapping at same indent). Consider renaming the file to "sequence_followed_by_map.yaml" for grammatical clarity.
crates/biome_yaml_parser/src/parser/block.rs (1)
168-169
: Consider implementing the TODO for improved error handling.The switch from
bump
toexpect
improves error handling by generating diagnostics when the colon is missing. However, the generic error message fromexpect
may not be user-friendly for YAML-specific syntax errors.Would you like me to help implement a custom error message that provides more context about the expected mapping syntax? For example: "Expected ':' after mapping key in block map entry".
crates/biome_yaml_parser/src/lexer/mod.rs (2)
135-176
: Complex but necessary logic for mapping disambiguation.The refactored method correctly handles the various cases for potential mapping starts, including cached token management. The logic properly distinguishes between block maps and flow values.
Consider monitoring the performance impact of the multiple
LinkedList
operations in this hot path, especially for large YAML files. If performance becomes an issue, you might consider using a more efficient data structure or pre-allocating list capacity.
797-798
: Potential confusion with column tracking and tabs.The comment states "number of bytes since the last newline" but the field is named
column
. This could be misleading when tabs are involved, as tabs typically don't correspond to single columns in text display.Consider clarifying whether this tracks byte offsets or visual columns, especially given that YAML allows both spaces and tabs as whitespace.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (19)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_followed_plain.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_key_contains_multiple_values.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_missing_colon.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/multi_line_flow_collection_broken_out_of_parent_block.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/multi_line_plain_token_broken_out_parent_flow.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/plain_followed_mapping.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/plain_separated_by_comments.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/sequence_follow_by_map.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/err/flow/plain_start_with_indicator.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/mapping/explicit_block_mapping_with_compact_mapping.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/mapping/explicit_block_mapping_with_empty_value_and_mapping_as_key.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/mapping/nested_implicit_block_mapping.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/mapping/simple_block_mapping_separated_by_trivia.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/map_nested_in_sequence.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/nested_block_sequence.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/sequence_nested_in_map.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/sequence_with_trailing_trivia.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/simple_block_sequence.yaml.snap
is excluded by!**/*.snap
crates/biome_yaml_parser/tests/yaml_test_suite/ok/flow/flow_with_trailing_trivia.yaml.snap
is excluded by!**/*.snap
📒 Files selected for processing (13)
crates/biome_yaml_parser/src/lexer/mod.rs
(13 hunks)crates/biome_yaml_parser/src/lexer/tests/block.rs
(2 hunks)crates/biome_yaml_parser/src/parser/block.rs
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_followed_plain.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_key_contains_multiple_values.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_missing_colon.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/err/block/plain_followed_mapping.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/err/block/plain_separated_by_comments.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/err/block/sequence_follow_by_map.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/err/flow/plain_start_with_indicator.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/nested_block_sequence.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/sequence_with_trailing_trivia.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/ok/flow/flow_with_trailing_trivia.yaml
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
crates/biome_*/**/*
📄 CodeRabbit Inference Engine (CLAUDE.md)
Core crates must be located in
/crates/biome_*/
Files:
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/sequence_follow_by_map.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_missing_colon.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_followed_plain.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/err/flow/plain_start_with_indicator.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/nested_block_sequence.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/sequence_with_trailing_trivia.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/plain_followed_mapping.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/plain_separated_by_comments.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/ok/flow/flow_with_trailing_trivia.yaml
crates/biome_yaml_parser/src/parser/block.rs
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_key_contains_multiple_values.yaml
crates/biome_yaml_parser/src/lexer/mod.rs
**/*.{rs,toml}
📄 CodeRabbit Inference Engine (CLAUDE.md)
Format code (Rust + TOML) using
just format
Format Rust and TOML files using
just f
(alias forjust format
)
Files:
crates/biome_yaml_parser/src/parser/block.rs
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/src/lexer/mod.rs
**/*.rs
📄 CodeRabbit Inference Engine (CONTRIBUTING.md)
**/*.rs
: Run the linter for the whole project usingjust l
(alias forjust lint
)
Use Rust's doctest feature for testing code in documentation comments
For documentation of rules, assists, and their options, use inline rustdoc documentation
Files:
crates/biome_yaml_parser/src/parser/block.rs
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/src/lexer/mod.rs
**/*.{rs,snap}
📄 CodeRabbit Inference Engine (CONTRIBUTING.md)
Use the
insta
crate for snapshot testing in Rust and manage snapshots withcargo insta
commands
Files:
crates/biome_yaml_parser/src/parser/block.rs
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/src/lexer/mod.rs
🧠 Learnings (2)
📚 Learning: 2025-08-09T23:07:50.448Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-08-09T23:07:50.448Z
Learning: Applies to crates/biome_parser/**/token_source.rs : Wrap your lexer with a `BufferedLexer` and implement `TokenSourceWithBufferedLexer` for your token source.
Applied to files:
crates/biome_yaml_parser/src/lexer/mod.rs
📚 Learning: 2025-08-09T23:07:50.448Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-08-09T23:07:50.448Z
Learning: Applies to crates/biome_parser/lexer/mod.rs : Implement the `Lexer` trait from the `biome_parser` crate for your lexer struct.
Applied to files:
crates/biome_yaml_parser/src/lexer/mod.rs
🪛 YAMLlint (1.37.1)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/sequence_follow_by_map.yaml
[error] 3-3: syntax error: expected , but found '?'
(syntax)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_followed_plain.yaml
[error] 3-3: syntax error: expected , but found ''
(syntax)
crates/biome_yaml_parser/tests/yaml_test_suite/err/flow/plain_start_with_indicator.yaml
[error] 1-1: syntax error: found character '>' that cannot start any token
(syntax)
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/sequence_with_trailing_trivia.yaml
[error] 1-1: trailing spaces
(trailing-spaces)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/plain_separated_by_comments.yaml
[error] 3-3: syntax error: expected , but found ''
(syntax)
crates/biome_yaml_parser/tests/yaml_test_suite/ok/flow/flow_with_trailing_trivia.yaml
[error] 1-1: trailing spaces
(trailing-spaces)
[error] 3-3: trailing spaces
(trailing-spaces)
[error] 4-4: trailing spaces
(trailing-spaces)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_key_contains_multiple_values.yaml
[error] 2-2: syntax error: expected , but found ''
(syntax)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Bench (biome_package)
🔇 Additional comments (17)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_missing_colon.yaml (1)
1-2
: LGTM for error coverage.This clearly exercises a missing-colon mapping entry after a valid pair. Fits the err suite well.
crates/biome_yaml_parser/tests/yaml_test_suite/err/flow/plain_start_with_indicator.yaml (1)
1-1
: Ensure diagnostic matches lexer intent for '>' and '|' in flow context.This should produce an error since '>' and '|' are block scalar indicators and invalid as plain scalars in flow. Verify the expected diagnostic (message/code and span) in the snapshot to avoid brittleness.
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/plain_followed_mapping.yaml (1)
1-2
: Clarify expected error in assertions.A stray plain scalar after a mapping key/value at the same indent should error. Ensure the test asserts the error location at Line 2 and the intended diagnostic category (e.g., unexpected plain scalar in block mapping).
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_key_contains_multiple_values.yaml (1)
1-2
: Solid err case for composite key tokenization.Combining multiple scalars into a key should fail; this is a good regression test for key token boundaries after the lexer refactor.
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_followed_plain.yaml (1)
1-3
: Good negative test case for mapping followed by a plain scalar lineCaptures the invalid pattern where a plain scalar line is followed by a mapping entry at the same indent under
a:
. This aligns with the intended lexer/parser error handling.crates/biome_yaml_parser/tests/yaml_test_suite/err/block/plain_separated_by_comments.yaml (1)
1-3
: Solid coverage for comment-separated plain scalar edge caseIntentionally invalid structure; matches the lexer/parser error scenario and the YAMLlint diagnostic.
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/nested_block_sequence.yaml (1)
1-2
: Expanded nested sequence shapes look correctAdds both the “empty item with nested sequence” form and compact
- - 40
. Good coverage for block sequence nesting and indentation handling.Also applies to: 4-4
crates/biome_yaml_parser/tests/yaml_test_suite/ok/flow/flow_with_trailing_trivia.yaml (1)
1-4
: Intentional trailing whitespace to assert trivia handlingThese trailing spaces and inline comments are necessary to validate the new lexer behavior around trailing trivia and closing tokens. Keep as-is; don’t auto-trim.
crates/biome_yaml_parser/src/lexer/tests/block.rs (2)
88-89
: Updated expectations for indentation tokenization look correctReplacing earlier extra NEWLINEs with explicit
WHITESPACE:2
after eachNEWLINE:1
aligns with the new lexer semantics. This should stabilize assertions around block-indentation.Also applies to: 96-97
128-129
: Indentation after line break is now explicit (WHITESPACE:2
)This mirrors the prior change and ensures the nested
- 20
is preceded by indent trivia tokens. Consistent with the refactor.crates/biome_yaml_parser/src/parser/block.rs (1)
177-177
: Good consistency with error handling improvement.The change from
p.bump(T![:])
top.expect(T![:])
here maintains consistency with the similar change at line 169, ensuring uniform error handling across both flow YAML and flow JSON node parsing paths.crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/sequence/sequence_with_trailing_trivia.yaml (1)
1-1
: Clarify if trailing spaces are intentional for testing.The static analysis tool detected trailing spaces on this line. Given that this test file is specifically for "sequence_with_trailing_trivia", these trailing spaces might be intentional to test the lexer's handling of whitespace trivia.
If the trailing spaces are intentional for testing trivia handling, consider adding a comment to document this intent. Otherwise, they should be removed to comply with YAML best practices.
crates/biome_yaml_parser/src/lexer/mod.rs (5)
28-29
: Well-designed solution for token ordering issues.The addition of
cached_scope_closing_tokens
elegantly solves the problem of zero-width tokens with trailing trivia mentioned in the PR objectives. This caching mechanism ensures FLOW_END tokens are emitted after all trailing trivia, maintaining proper token order.
69-81
: Clean refactoring of token consumption logic.The removal of wrapper structs and direct handling of
LinkedList<LexToken>
simplifies the code significantly. The pattern matching is more readable and the token list appending is efficient.
86-104
: Improved sequence entry handling with explicit scope management.The refactored method correctly manages sequence scope creation and token ordering. The use of
is_none_or
for checking scope conditions is idiomatic, and the explicit push_front/push_back operations ensure proper token sequence.
427-441
: Well-structured flow scope evaluation.The new
evaluate_flow_scope
method cleanly separates the logic for determining flow collection boundaries and properly manages the caching of scope closing tokens.
650-675
: Proper newline handling with coordinate tracking.The new
consume_newline
method correctly handles both Unix (\n
) and Windows (\r\n
) line endings and properly updates the text coordinates.
a: | ||
plain # separated by | ||
aa: bb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Ensure linters don’t break this err-case (YAMLlint warning is expected here)
YAMLlint reports “expected , but found ''” on Line 3 — that’s intentional. Verify your CI excludes tests/yaml_test_suite/err/
from YAMLlint or configures a per-path ignore so these fixtures don’t fail the pipeline.
🏁 Script executed:
#!/bin/bash
# Check for yamllint configs and potential pre-commit hooks that could affect tests
fd -a ".yamllint" -u || true
fd -a ".pre-commit-config.yaml" -u || true
rg -n "yamllint|trailing-spaces|err/block" -S || true
# If you use yamllint, ensure its config excludes crates/biome_yaml_parser/tests/yaml_test_suite/err/**
Length of output: 497
Configure YAMLlint to ignore err/ fixtures
It looks like there isn’t a YAMLlint config in the repo to exclude your negative-test fixtures, so the “expected <block end>
” error in
crates/biome_yaml_parser/tests/yaml_test_suite/err/**
will break CI. Please add or update your YAMLlint config (e.g. /.yamllint
) to exclude that path.
• No .yamllint
or relevant pre-commit config was detected in the repo.
• Add something like this to the top-level .yamllint
:
extends: default
ignore:
- "crates/biome_yaml_parser/tests/yaml_test_suite/err/**"
• Ensure your CI YAMLlint step picks up this config so that err/ fixtures don’t fail the pipeline.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 3-3: syntax error: expected , but found ''
(syntax)
🤖 Prompt for AI Agents
In
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/mapping_followed_plain.yaml
lines 1 to 3, the YAMLlint configuration is missing and causes CI failures on
negative-test fixtures. To fix this, create or update a top-level .yamllint file
in the repo root to include an ignore rule for the err/ test fixtures by adding
an ignore section with the path
"crates/biome_yaml_parser/tests/yaml_test_suite/err/**". Also ensure the CI
pipeline YAMLlint step uses this config file so these error fixtures are
excluded from linting and do not break the build.
- 20 | ||
- "20" # abc | ||
- [50] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Protect trailing spaces from formatters/linters
If you run YAML formatters/linters, ensure this path is excluded or rules like “trailing-spaces” are disabled for this file. Otherwise the critical spaces may be stripped and tests will lose coverage.
🏁 Script executed:
#!/bin/bash
# Look for editor/formatter settings that might trim trailing whitespace
fd -a ".editorconfig" -u || true
rg -n "trim_trailing_whitespace" -S || true
rg -n "yamllint|trailing-spaces|flow_with_trailing_trivia.yaml" -S || true
# If needed, add a repo-level exception for:
# crates/biome_yaml_parser/tests/yaml_test_suite/ok/flow/flow_with_trailing_trivia.yaml
Length of output: 2589
Preserve trailing whitespace in flow_with_trailing_trivia.yaml
Add an override to your .editorconfig
so that editorconfig-aware formatters won’t strip the critical trailing spaces in this test file:
# Prevent trimming trailing spaces in the YAML trivia test
[crates/biome_yaml_parser/tests/yaml_test_suite/ok/flow/flow_with_trailing_trivia.yaml]
trim_trailing_whitespace = false
This ensures that any formatter or linter respecting .editorconfig
leaves the trailing spaces intact, preserving test coverage.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 1-1: trailing spaces
(trailing-spaces)
[error] 3-3: trailing spaces
(trailing-spaces)
[error] 4-4: trailing spaces
(trailing-spaces)
🤖 Prompt for AI Agents
In
crates/biome_yaml_parser/tests/yaml_test_suite/ok/flow/flow_with_trailing_trivia.yaml
lines 1 to 4, trailing whitespace is critical for the test but may be stripped
by formatters. To fix this, add an override section in the project's
.editorconfig file specifically for this YAML file, setting
trim_trailing_whitespace to false. This will prevent editorconfig-aware tools
from removing the trailing spaces and preserve the test's intended formatting.
Summary
This PR fixes multiple bugs with the Yaml lexer:
FLOW_END
tokens are emitted. Now they are only emitted after all trailing trivia of the child flow token have been emitted. This was causing problem with rowan since tree sink isn't able to properly handle zero-width token with trailing trivia. For example, this yaml document will now produce this stream of tokens:P/S: I really should stop pretending that YAML is easy. It's certainly not 🥲
Test Plan
Added new tests for various error cases
Docs
N/A
Summary by CodeRabbit
Refactor
Bug Fixes
Tests