-
-
Notifications
You must be signed in to change notification settings - Fork 720
feat(yaml_parser): parse block scalars #7417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
WalkthroughThe PR implements block scalar support across lexer, parser, syntax kinds and codegen. The lexer recognises Possibly related PRs
Suggested reviewers
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. ✨ Finishing Touches
🧪 Generate unit tests
Comment |
CodSpeed Performance ReportMerging #7417 will not alter performanceComparing Summary
Footnotes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (18)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml (1)
1-1
: Good negative case; consider folded counterpart too.Add a matching folded-scalar example (e.g.,
> bb a
) to ensure both styles report the same diagnostic.crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml (1)
1-5
: Solid coverage; add folded and multi-digit edge cases.To mirror literal tests and stress recovery, add:
- >+1-2
(folded + multiple indicators)- >123
(invalid multi-digit indent)- |0 + +- >+1-2 + +- >123xtask/codegen/src/yaml_kinds_src.rs (1)
13-15
: Confirm PLUS is scoped to block-header lexing.If emitted in general contexts,
a+b
risks fragmenting intoPLAIN_LITERAL("a") PLUS PLAIN_LITERAL("b")
. Please ensure the lexer keeps+
insidePLAIN_LITERAL
outside block headers, and add a test to lock this in.Example test to add (in lexer tests):
#[test] fn lex_plus_in_plain_scalar() { assert_lex!( "a+b: 1", MAPPING_START:0, PLAIN_LITERAL:3, COLON:1, WHITESPACE:1, FLOW_START:0, PLAIN_LITERAL:1, FLOW_END:0, MAPPING_END:0, ); }crates/biome_yaml_parser/src/lexer/tests/block.rs (5)
462-469
: Great sanity check; add folded twin for symmetry.#[test] fn lex_basic_block_scalar() { assert_lex!( r"| hello", PIPE:1, BLOCK_CONTENT_LITERAL:10, ); } + +#[test] +fn lex_basic_folded_block_scalar() { + assert_lex!( + r"> + hello", + R_ANGLE:1, + BLOCK_CONTENT_LITERAL:10, + ); +}
471-486
: Chomping indicators covered for|
; please add>
too.#[test] fn lex_block_scalar_with_chomping_indicator() { assert_lex!( "|+", PIPE:1, PLUS:1, BLOCK_CONTENT_LITERAL:0, ); assert_lex!( "|-", PIPE:1, DASH:1, BLOCK_CONTENT_LITERAL:0, ); + + assert_lex!( + ">-", + R_ANGLE:1, + DASH:1, + BLOCK_CONTENT_LITERAL:0, + ); + + assert_lex!( + ">+", + R_ANGLE:1, + PLUS:1, + BLOCK_CONTENT_LITERAL:0, + ); }
488-512
: Nice combos; add a folded variant and a two-digit indent to document permissiveness.assert_lex!( r"|2 content", PIPE:1, INDENTATION_INDICATOR:1, BLOCK_CONTENT_LITERAL:12, ); + assert_lex!( + r">2 + content", + R_ANGLE:1, + INDENTATION_INDICATOR:1, + BLOCK_CONTENT_LITERAL:12, + ); + // Allowed by lexer; validated later by syntax-lint + assert_lex!( + "|10", + PIPE:1, + INDENTATION_INDICATOR:2, + BLOCK_CONTENT_LITERAL:0, + );
536-566
: Good error coverage; add a folded “junk header” for parity.assert_lex!( "| bc d content", PIPE:1, WHITESPACE:1, ERROR_TOKEN:1, ERROR_TOKEN:1, ERROR_TOKEN:1, ERROR_TOKEN:1, BLOCK_CONTENT_LITERAL:12, ); + + assert_lex!( + "> bb a + content", + R_ANGLE:1, + WHITESPACE:1, + ERROR_TOKEN:1, + ERROR_TOKEN:1, + WHITESPACE:1, + ERROR_TOKEN:1, + BLOCK_CONTENT_LITERAL:12, + );
588-598
: Mixed indentation test is helpful; consider adding an explicit indent-indicator variant.#[test] fn lex_block_scalar_mixed_indentation() { assert_lex!( r"| line1 indented line2", PIPE:1, BLOCK_CONTENT_LITERAL:28, ); + + assert_lex!( + r"|2 + line1 + indented + line2", + PIPE:1, + INDENTATION_INDICATOR:1, + BLOCK_CONTENT_LITERAL:28, + ); }crates/biome_yaml_parser/src/parser/parse_error.rs (1)
38-40
: Clarify diagnostic wording: “block scalar header”.Slightly more precise and matches the feature’s terminology.
-pub(crate) fn expected_header(p: &YamlParser, range: TextRange) -> ParseDiagnostic { - expected_node("block header", range, p) +pub(crate) fn expected_header(p: &YamlParser, range: TextRange) -> ParseDiagnostic { + expected_node("block scalar header", range, p) }crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml (1)
1-7
: Solid coverage; consider adding a header that combines both indicators.You already cover chomping-only (|-), keep with empty content (|+), and indent-only (|1-). To exercise the “multiple indicators permitted” behaviour this PR adopts, add one case that combines both chomping and indent (e.g.
|+2
) with a line of content.crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml (1)
1-7
: Nice spread; add a mixed-order indicators case.Folded covers
>-
with content,>+
empty, and>-4
. Consider one where the indentation indicator precedes the chomping indicator (e.g.>4+
) to lock in permissive ordering.crates/biome_yaml_parser/src/lexer/mod.rs (2)
175-210
: Coalesce header-line garbage into a single diagnostic.Current loop emits a chain of
ERROR_TOKEN
s for every non-break after the header. A single diagnostic spanning the range is clearer and cheaper.Apply this diff:
- // So that the block content can cleanly start at a newline - while let Some(c) = self.current_byte() { - if is_break(c) { - break; - } - tokens.push_back(self.consume_unexpected_token()); - } + // Expect a line break after the header; coalesce any extras into one diagnostic + if self.current_byte().is_some_and(|c| !is_break(c)) { + let start = self.text_position(); + while let Some(c) = self.current_byte() { + if is_break(c) { + break; + } + self.advance(1); + } + self.push_diagnostic(ParseDiagnostic::new( + "Expected a line break after the block scalar header", + start..self.text_position(), + )); + }
234-261
: Tighten the indentation-indicator diagnostic.Small copy tweak and a hint make the error self-explanatory, especially for multi-digit values.
- let err = ParseDiagnostic::new( - "Block scalar indentation indicator must be between '1' and '9'", - start_pos..self.text_position(), - ); + let err = ParseDiagnostic::new( + "Block scalar indentation indicator must be a single digit between '1' and '9'", + start_pos..self.text_position(), + ) + .with_hint("YAML allows at most one indentation indicator; multi-digit values like '10' are invalid.");crates/biome_yaml_parser/src/parser/block.rs (5)
15-15
: Prefer consistent import path for flow helpersYou’re mixing
crate::parser
andsuper::
for flow imports. For consistency with nearby imports, pullparse_any_flow_node
viasuper::flow
.-use crate::parser::{flow::parse_any_flow_node, parse_error::expected_header}; +use super::flow::parse_any_flow_node; +use crate::parser::parse_error::expected_header;Alternatively, move
expected_header
into the existingsuper::parse_error::{ ... }
list. Happy either way.
257-271
: DRY up literal/folded scalar parsingBoth functions are identical except for the start token and node kind. Collapse into a single helper for maintainability.
-fn parse_literal_scalar(p: &mut YamlParser) -> CompletedMarker { - let m = p.start(); - p.bump(T![|]); - BlockHeaderList.parse_list(p); - parse_block_content(p); - m.complete(p, YAML_LITERAL_SCALAR) -} - -fn parse_folded_scalar(p: &mut YamlParser) -> CompletedMarker { - let m = p.start(); - p.bump(T![>]); - BlockHeaderList.parse_list(p); - parse_block_content(p); - m.complete(p, YAML_FOLDED_SCALAR) -} +fn parse_literal_scalar(p: &mut YamlParser) -> CompletedMarker { + parse_block_scalar(p, T![|], YAML_LITERAL_SCALAR) +} + +fn parse_folded_scalar(p: &mut YamlParser) -> CompletedMarker { + parse_block_scalar(p, T![>], YAML_FOLDED_SCALAR) +}Add this helper (placement anywhere nearby is fine):
fn parse_block_scalar( p: &mut YamlParser, start_token: YamlSyntaxKind, node_kind: YamlSyntaxKind, ) -> CompletedMarker { let m = p.start(); p.bump(start_token); BlockHeaderList.parse_list(p); parse_block_content(p); m.complete(p, node_kind) }
308-318
: Tiny duplication in chomping indicator parsers
parse_strip_indicator
andparse_keep_indicator
are identical bar token/kind. Optional: a small helper or macro would remove duplication.
335-335
: Nit: extract a helper for scalar startsFor readability and reuse, consider
is_at_block_scalar_start(p)
and call it here.
291-294
: No parser change needed for folded scalars
The lexer emits BLOCK_CONTENT_LITERAL for both|
and>
, so p.at/expect(BLOCK_CONTENT_LITERAL) handles folded content correctly. Optionally rename BLOCK_CONTENT_LITERAL to BLOCK_CONTENT in src/lexer/mod.rs for clarity.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
crates/biome_yaml_factory/src/generated/node_factory.rs
is excluded by!**/generated/**
,!**/generated/**
and included by**
crates/biome_yaml_factory/src/generated/syntax_factory.rs
is excluded by!**/generated/**
,!**/generated/**
and included by**
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml.snap
is excluded by!**/*.snap
and included by**
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml.snap
is excluded by!**/*.snap
and included by**
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml.snap
is excluded by!**/*.snap
and included by**
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml.snap
is excluded by!**/*.snap
and included by**
crates/biome_yaml_syntax/src/generated/kind.rs
is excluded by!**/generated/**
,!**/generated/**
and included by**
crates/biome_yaml_syntax/src/generated/macros.rs
is excluded by!**/generated/**
,!**/generated/**
and included by**
crates/biome_yaml_syntax/src/generated/nodes.rs
is excluded by!**/generated/**
,!**/generated/**
and included by**
crates/biome_yaml_syntax/src/generated/nodes_mut.rs
is excluded by!**/generated/**
,!**/generated/**
and included by**
📒 Files selected for processing (11)
crates/biome_yaml_parser/src/lexer/mod.rs
(11 hunks)crates/biome_yaml_parser/src/lexer/tests/block.rs
(1 hunks)crates/biome_yaml_parser/src/parser/block.rs
(3 hunks)crates/biome_yaml_parser/src/parser/parse_error.rs
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml
(1 hunks)crates/biome_yaml_syntax/src/lib.rs
(1 hunks)xtask/codegen/src/yaml_kinds_src.rs
(4 hunks)xtask/codegen/yaml.ungram
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
crates/biome_*_{syntax,parser,formatter,analyze,factory,semantic}/**
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain the per-language crate structure: biome_{lang}_{syntax,parser,formatter,analyze,factory,semantic}
Files:
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml
crates/biome_yaml_parser/src/parser/parse_error.rs
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml
crates/biome_yaml_parser/src/lexer/mod.rs
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml
crates/biome_yaml_parser/src/parser/block.rs
crates/biome_yaml_syntax/src/lib.rs
crates/biome_*/**
📄 CodeRabbit inference engine (CLAUDE.md)
Place core crates under /crates/biome_*/
Files:
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml
crates/biome_yaml_parser/src/parser/parse_error.rs
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml
crates/biome_yaml_parser/src/lexer/mod.rs
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml
crates/biome_yaml_parser/src/parser/block.rs
crates/biome_yaml_syntax/src/lib.rs
**/tests/**
📄 CodeRabbit inference engine (CLAUDE.md)
Place test files under a tests/ directory in each crate
Files:
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml
**/*.{rs,toml}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Format Rust and TOML files before committing (use
just f
/just format
).
Files:
xtask/codegen/src/yaml_kinds_src.rs
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/src/parser/parse_error.rs
crates/biome_yaml_parser/src/lexer/mod.rs
crates/biome_yaml_parser/src/parser/block.rs
crates/biome_yaml_syntax/src/lib.rs
xtask/codegen/*.ungram
📄 CodeRabbit inference engine (CLAUDE.md)
Define and modify language grammars in .ungram files; ASTs are generated from these
Files:
xtask/codegen/yaml.ungram
🧠 Learnings (4)
📚 Learning: 2025-08-17T08:57:34.751Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-08-17T08:57:34.751Z
Learning: Applies to crates/biome_parser/xtask/codegen/*.ungram : Nodes representing enclosing syntax errors must include the word Bogus (e.g., HtmlBogusAttribute)
Applied to files:
xtask/codegen/src/yaml_kinds_src.rs
crates/biome_yaml_syntax/src/lib.rs
📚 Learning: 2025-08-11T11:46:05.836Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_diagnostics/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:46:05.836Z
Learning: Applies to crates/biome_diagnostics/**/*.rs : Use helper advice types from biome_diagnostics::v2 (CodeFrameAdvice, CommandAdvice, DiffAdvice, LogAdvice) when suitable
Applied to files:
crates/biome_yaml_parser/src/parser/parse_error.rs
📚 Learning: 2025-08-11T11:46:05.836Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_diagnostics/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:46:05.836Z
Learning: Applies to crates/biome_diagnostics/**/*.rs : Types implementing Diagnostic must also implement Debug (e.g., use #[derive(Debug, Diagnostic)])
Applied to files:
crates/biome_yaml_parser/src/parser/parse_error.rs
📚 Learning: 2025-08-17T08:57:34.751Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-08-17T08:57:34.751Z
Learning: Applies to crates/biome_parser/xtask/codegen/*.ungram : Bogus nodes must be included as a variant in their corresponding union
Applied to files:
crates/biome_yaml_syntax/src/lib.rs
🧬 Code graph analysis (1)
crates/biome_yaml_parser/src/parser/block.rs (3)
crates/biome_yaml_parser/src/parser/flow.rs (3)
parse_any_flow_node
(18-26)recover
(155-161)recover
(213-219)crates/biome_yaml_parser/src/parser/parse_error.rs (1)
expected_header
(38-40)crates/biome_yaml_parser/src/parser/document.rs (2)
recover
(36-46)recover
(89-99)
🪛 YAMLlint (1.37.1)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml
[error] 1-1: syntax error: expected a comment or a line break, but found 'b'
(syntax)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml
[error] 1-1: syntax error: expected chomping or indentation indicators, but found '-'
(syntax)
🔇 Additional comments (25)
xtask/codegen/src/yaml_kinds_src.rs (3)
31-33
: Token split looks right.
INDENTATION_INDICATOR
as a literal token plusBLOCK_CONTENT_LITERAL
aligns with the lexer model described in the PR. No issues spotted.
79-84
: New AST nodes align with grammar changes.Header list, indicators, and content nodes give us nice hooks for recovery/diagnostics. Looks consistent with the “Bogus” strategy.
92-92
: Nice: Bogus node naming matches project conventions.
YAML_BOGUS_BLOCK_HEADER
adheres to the “Bogus*” guideline for enclosing syntax errors.crates/biome_yaml_parser/src/lexer/tests/block.rs (2)
514-534
: Header/comment interplay looks correct.Both literal and folded forms handled; whitespace before
#
retained. All good.
568-586
: Empty-content cases are spot on.Covers both zero-length and whitespace-only content for literal and folded styles. LGTM.
crates/biome_yaml_syntax/src/lib.rs (3)
33-38
: Bogus kind set looks right.Adding
YAML_BOGUS_BLOCK_HEADER
andYAML_BOGUS_FLOW_NODE
tois_bogus
aligns with the grammar changes.
45-46
: to_bogus() mapping for block headers is consistent with the union.
AnyYamlBlockHeader::can_cast
→YAML_BOGUS_BLOCK_HEADER
is the expected route for recovery.
69-71
: ConfirmNEWLINE
not being trivia is intentional.
is_trivia
excludesNEWLINE
, whileTriviaPieceKind
still maps it. If formatter/parser invariants elsewhere assumeNEWLINE
is trivia, consider adding it; otherwise, please confirm this divergence is deliberate.Would you like a repo scan to find call sites that rely on
SyntaxKind::is_trivia
including newlines?crates/biome_yaml_parser/src/lexer/mod.rs (7)
79-81
: Good: direct dispatch for block scalars.Routing
|
/>
to a dedicated path keeps the main loop readable.
159-162
: Trailing-trivia handling before FLOW_END — double-check behaviour.Switching to
consume_trivia(true)
ensures no newline slips between the value andFLOW_END
. Please verify parser expectations/tests for flows with trailing comments still pass under this stricter behaviour.
212-233
: Header lexing matches the PR’s permissive stance.Accepting multiple
+
/-
and digits is fine given a future lint will police it.
263-289
: Content lexing is sensible and lexer-only: +1.Treating content as a single
BLOCK_CONTENT_LITERAL
and deferring indent/chomp semantics to later phases keeps the lexer lean.
539-553
: Flow-scope breach caching makes sense.Caching closing tokens when breaching parent scope avoids interleaving trivia with flow delimiters. Good separation of concerns.
555-572
: Neat:consume_trivia(trailing)
clarifies intent.This simplifies callers and prevents accidental newline consumption where it matters.
574-595
: Confirm single invocation ofis_scalar_continuation()
per break-handling branch. All four call sites occur only whenis_break(current)
is true and are invoked exactly once each; invariants hold.xtask/codegen/yaml.ungram (6)
32-33
: Good: new bogus node adheres to conventions.
YamlBogusBlockHeader
is added and used in the appropriate union.
161-163
: Splitting block scalars into literal/folded nodes is the right move.This aligns AST with the spec and makes downstream processing clearer.
190-209
: Node shapes read well and mirror the spec.
headers: YamlBlockHeaderList
andcontent: YamlBlockContent
give us future wiggle room for linting without changing the tree.
251-258
: Header union covers all cases, including recovery.Ordering-agnostic, with explicit strip/keep/indent and bogus fallback — tidy.
272-273
: Token naming matches the lexer.
indentation_indicator
token aligns withINDENTATION_INDICATOR
.
280-281
: Single content token keeps the grammar simple.
block_content_literal
for both folded and literal is pragmatic.crates/biome_yaml_parser/src/parser/block.rs (4)
32-35
: Block scalar starts correctly recognisedRecognising
|
and>
here is spot on and keeps the dispatcher tidy. Nice one.
273-289
: Header list permits multiple indicators — ensure tests cover permutationsGiven the intentional permissiveness, please ensure tests cover duplicates and varied orders, e.g.
|+
,|2
,|+2
,|2+
,|+-2
, and the same for>
. This will guard AST shape and recovery behaviour.
295-305
: Header recovery strategy looks goodUsing
ParseRecoveryTokenSet
gated on content withexpected_header
is a sensible, low-noise choice for malformed headers.
326-332
: Good defensive checkUsing
expect
for block content is the right call; it yields a precise diagnostic when content is missing.
fn parse_indentation_indicator(p: &mut YamlParser) -> CompletedMarker { | ||
let m = p.start(); | ||
p.bump(INDENTATION_INDICATOR); | ||
m.complete(p, YAML_INDENTATION_INDICATOR) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Indentation indicator value range (1–9) — please verify
YAML 1.2 restricts the indentation indicator to 1–9. Ensure the lexer enforces this and add tests for |0
, >0
, |10
, >10
(expecting diagnostics).
🏁 Script executed:
#!/bin/bash
# Locate the lexer rule and existing tests
rg -nP -C3 'INDENTATION_INDICATOR|indentation\s+indicator' crates
rg -nP -n '\|0|>0|\|10|>10' crates/biome_yaml_parser/tests || true
Length of output: 19705
Add tests for >0
, |10
and >10
in invalid_headers.yaml
yaml_test_suite/err/block/invalid_headers.yaml covers |0
and multi-digit (123
) but lacks these out-of-range indicators—add tests expecting the “must be between '1' and '9'” diagnostic.
🤖 Prompt for AI Agents
In crates/biome_yaml_parser/src/parser/block.rs around lines 320 to 324, the
test suite is missing test cases for out-of-range indentation indicators; add
three entries to yaml_test_suite/err/block/invalid_headers.yaml covering the
indicators ">0", "|10", and ">10" and assert the parser emits the existing
diagnostic that the indentation indicator "must be between '1' and '9'"; ensure
each test includes the offending line, the expected diagnostic message and
location so the suite fails if the parser does not produce the "must be between
'1' and '9'" error for these inputs.
|
||
/// Lex block scalar header indicators, returning tokens and indent size | ||
fn consume_block_header_tokens(&mut self) -> LinkedList<LexToken> { | ||
let mut tokens = LinkedList::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first time I see the use of a linked list in the code base. What's the rationale behind it? Why not a VecDeque
? https://doc.rust-lang.org/std/collections/struct.VecDeque.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all stems from how we are lexing the block mapping keys. The lexer first has to lex all tokens in the key up until the closest :
indicator, or a new line. Then, it inserts a MAPPING_START
or FLOW_START
token depending on whether the lexed tokens represent a mapping key or just a normal flow value. It looks something like this
- Entering potential block mapping:
lexed token: ['!!foo', '&anchor'].
- !!foo &anchor [a, b, c]: d ↑ [lexer's current position]
- Trying to disambiguate the construct:
lexed token: ['!!foo', '&anchor'].
- &anchor [a, b, c]: d ↑ [lexer's current position]
potentially mapping key tokens: [[
,a
,,
,b
,,
,c
,]
] - Encountered
:
. This is a mapping key:lexed token: ['!!foo', '&anchor'].- &anchor [a, b, c]: d ↑ [lexer's current position]
mapping key tokens: [MAPPING_START,[
,a
,,
,b
,,
,c
,]
] - Returning the tokens to the parser:
lexed token: ['!!foo', '&anchor', MAPPING_START,[
,a
,,
,b
,,
,c
,]
].
Since we have to move list of tokens around frequently, I think LinkedList is the most suitable here, since it is the container with the most performant list concatenation
That said, if we need to stay consistent with the rest of the code base, I don't think we would lose much performance switching back to VecDeque.
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml.snap
Outdated
Show resolved
Hide resolved
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml.snap
Outdated
Show resolved
Hide resolved
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml
Show resolved
Hide resolved
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml.snap
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
crates/biome_yaml_parser/src/lexer/mod.rs (1)
447-447
: Function renaming improves clarity.Renaming
is_literal_continuation
tois_scalar_continuation
andno_longer_flow
tobreach_parent_scope
better reflects their broader usage across different scalar types.Also applies to: 479-479, 518-518, 579-579, 592-592, 650-656
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml.snap
is excluded by!**/*.snap
and included by**
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml.snap
is excluded by!**/*.snap
and included by**
crates/biome_yaml_parser/tests/yaml_test_suite/err/flow/plain_start_with_indicator.yaml.snap
is excluded by!**/*.snap
and included by**
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml.snap
is excluded by!**/*.snap
and included by**
📒 Files selected for processing (3)
crates/biome_yaml_parser/src/lexer/mod.rs
(12 hunks)crates/biome_yaml_parser/src/lexer/tests/block.rs
(1 hunks)crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/biome_yaml_parser/src/lexer/tests/block.rs
- crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml
🧰 Additional context used
📓 Path-based instructions (3)
crates/biome_*_{syntax,parser,formatter,analyze,factory,semantic}/**
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain the per-language crate structure: biome_{lang}_{syntax,parser,formatter,analyze,factory,semantic}
Files:
crates/biome_yaml_parser/src/lexer/mod.rs
crates/biome_*/**
📄 CodeRabbit inference engine (CLAUDE.md)
Place core crates under /crates/biome_*/
Files:
crates/biome_yaml_parser/src/lexer/mod.rs
**/*.rs
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Format all Rust source files before committing (just f)
Files:
crates/biome_yaml_parser/src/lexer/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
- GitHub Check: Bench (biome_module_graph)
- GitHub Check: Bench (biome_graphql_parser)
- GitHub Check: Bench (biome_package)
- GitHub Check: Bench (biome_configuration)
- GitHub Check: Bench (biome_graphql_formatter)
- GitHub Check: Bench (biome_json_analyze)
- GitHub Check: Bench (biome_html_formatter)
- GitHub Check: Bench (biome_css_formatter)
- GitHub Check: Bench (biome_js_parser)
- GitHub Check: Bench (biome_html_parser)
- GitHub Check: Bench (biome_css_parser)
- GitHub Check: Bench (biome_css_analyze)
- GitHub Check: Bench (biome_json_parser)
- GitHub Check: Bench (biome_js_analyze)
- GitHub Check: Bench (biome_js_formatter)
- GitHub Check: autofix
- GitHub Check: Bench (biome_json_formatter)
- GitHub Check: Documentation
- GitHub Check: Test Node.js API
- GitHub Check: Lint project (depot-ubuntu-24.04-arm-16)
- GitHub Check: Parser conformance
- GitHub Check: Test (depot-ubuntu-24.04-arm-16)
- GitHub Check: Lint project (depot-windows-2022)
- GitHub Check: Check Dependencies
- GitHub Check: Test (depot-windows-2022-16)
🔇 Additional comments (3)
crates/biome_yaml_parser/src/lexer/mod.rs (3)
268-293
: Smart approach to block content lexing.The decision to ignore declared indentation values during lexing (lines 271-275) is pragmatic - it simplifies the lexer whilst allowing syntax analysis rules to catch indentation mismatches later. This provides better diagnostic granularity.
159-161
: Trivia handling refactor improves consistency.The new
trailing
parameter inconsume_trivia
allows fine-grained control over newline handling. The usage pattern (trailing=true for flow ends, trailing=false for block evaluation) makes sense for the respective contexts.Also applies to: 298-298, 549-549, 560-568
641-641
: Minor: Capitalisation adjustment.Good consistency improvement - "Unexpected character" with capital U aligns with standard error message formatting.
Summary
Parse block scalars constructs as specified by Block Scalar Styles
In order to keep the grammar simple, this implementation allows multiple indicators to be specified instead of at most one indentation indicator and at most one chomping indicator. We can have a syntax lint rule to flag this behavior later.
This implementation also ignores the value of indentation indicator, since it's not required for lexing the block content. A syntax lint rule can then be used to flag discrepancies in block content's indentation.
Test Plan
Added new test cases
Docs
N/A