feat(yaml_parser): parse block scalars #7417

l0ngvh · 2025-09-06T05:56:14Z

Summary

Parse block scalars constructs as specified by Block Scalar Styles

- >+ # folded style
  Block content
- |2 # literal style

In order to keep the grammar simple, this implementation allows multiple indicators to be specified instead of at most one indentation indicator and at most one chomping indicator. We can have a syntax lint rule to flag this behavior later.

This implementation also ignores the value of indentation indicator, since it's not required for lexing the block content. A syntax lint rule can then be used to flag discrepancies in block content's indentation.

Test Plan

Added new test cases

Docs

N/A

changeset-bot · 2025-09-06T05:56:18Z

⚠️ No Changeset found

Latest commit: 27018bf

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2025-09-06T05:56:20Z

Walkthrough

The PR implements block scalar support across lexer, parser, syntax kinds and codegen. The lexer recognises | and >, parses block headers (chomping and indentation indicators), emits new tokens (INDENTATION_INDICATOR, BLOCK_CONTENT_LITERAL), centralises trivia handling and refactors scope/continuation logic. The parser adds literal/folded scalar paths, a BlockHeaderList with recovery and header parsing helpers, plus parse diagnostics (expected_header). Syntax kinds and codegen are updated for header/content nodes and bogus header variants. New tests and YAML fixtures cover valid and invalid block-scalar cases.

Possibly related PRs

fix(yaml_parser): tokens out of order #7168 — Modifies the YAML lexer’s scope evaluation, cached closing token logic and consume_trivia/continuation behaviour; overlaps strongly with this PR’s lexer refactor.

Suggested reviewers

ematipico

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly summarises the primary change by indicating that the YAML parser now supports block scalar parsing, and the conventional commit prefix keeps it concise and contextual.
Description Check	✅ Passed	The description outlines the implementation of block scalar parsing in line with the YAML spec, explains design decisions around indicators, and notes the addition of tests, all of which directly relate to the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codspeed-hq · 2025-09-06T06:09:36Z

CodSpeed Performance Report

Merging #7417 will not alter performance

_{Comparing l0ngvh:block-header (27018bf) with main (ea585a9)¹}

Summary

✅ 133 untouched benchmarks

No successful run was found on main (5d212c5) during the generation of this report, so ea585a9 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (18)

crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml (1)

1-1: Good negative case; consider folded counterpart too.

Add a matching folded-scalar example (e.g., > bb a) to ensure both styles report the same diagnostic.
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml (1)
1-5: Solid coverage; add folded and multi-digit edge cases.

To mirror literal tests and stress recovery, add:

- >+1-2 (folded + multiple indicators)

- >123 (invalid multi-digit indent)
 - |0
+
+- >+1-2
+
+- >123
xtask/codegen/src/yaml_kinds_src.rs (1)
13-15: Confirm PLUS is scoped to block-header lexing.

If emitted in general contexts, a+b risks fragmenting into PLAIN_LITERAL("a") PLUS PLAIN_LITERAL("b"). Please ensure the lexer keeps + inside PLAIN_LITERAL outside block headers, and add a test to lock this in.

Example test to add (in lexer tests):
#[test]
fn lex_plus_in_plain_scalar() {
    assert_lex!(
        "a+b: 1",
        MAPPING_START:0,
        PLAIN_LITERAL:3,
        COLON:1,
        WHITESPACE:1,
        FLOW_START:0,
        PLAIN_LITERAL:1,
        FLOW_END:0,
        MAPPING_END:0,
    );
}
crates/biome_yaml_parser/src/lexer/tests/block.rs (5)
462-469: Great sanity check; add folded twin for symmetry.
 #[test]
 fn lex_basic_block_scalar() {
     assert_lex!(
         r"|
     hello",
         PIPE:1,
         BLOCK_CONTENT_LITERAL:10,
     );
 }
+
+#[test]
+fn lex_basic_folded_block_scalar() {
+    assert_lex!(
+        r">
+    hello",
+        R_ANGLE:1,
+        BLOCK_CONTENT_LITERAL:10,
+    );
+}
471-486: Chomping indicators covered for |; please add > too.
 #[test]
 fn lex_block_scalar_with_chomping_indicator() {
     assert_lex!(
         "|+",
         PIPE:1,
         PLUS:1,
         BLOCK_CONTENT_LITERAL:0,
     );
 
     assert_lex!(
         "|-",
         PIPE:1,
         DASH:1,
         BLOCK_CONTENT_LITERAL:0,
     );
+
+    assert_lex!(
+        ">-",
+        R_ANGLE:1,
+        DASH:1,
+        BLOCK_CONTENT_LITERAL:0,
+    );
+
+    assert_lex!(
+        ">+",
+        R_ANGLE:1,
+        PLUS:1,
+        BLOCK_CONTENT_LITERAL:0,
+    );
 }
488-512: Nice combos; add a folded variant and a two-digit indent to document permissiveness.
     assert_lex!(
         r"|2
     content",
         PIPE:1,
         INDENTATION_INDICATOR:1,
         BLOCK_CONTENT_LITERAL:12,
     );
+    assert_lex!(
+        r">2
+    content",
+        R_ANGLE:1,
+        INDENTATION_INDICATOR:1,
+        BLOCK_CONTENT_LITERAL:12,
+    );
+    // Allowed by lexer; validated later by syntax-lint
+    assert_lex!(
+        "|10",
+        PIPE:1,
+        INDENTATION_INDICATOR:2,
+        BLOCK_CONTENT_LITERAL:0,
+    );
536-566: Good error coverage; add a folded “junk header” for parity.
     assert_lex!(
         "| bc d
     content",
         PIPE:1,
         WHITESPACE:1,
         ERROR_TOKEN:1,
         ERROR_TOKEN:1,
         ERROR_TOKEN:1,
         ERROR_TOKEN:1,
         BLOCK_CONTENT_LITERAL:12,
     );
+
+    assert_lex!(
+        "> bb a
+    content",
+        R_ANGLE:1,
+        WHITESPACE:1,
+        ERROR_TOKEN:1,
+        ERROR_TOKEN:1,
+        WHITESPACE:1,
+        ERROR_TOKEN:1,
+        BLOCK_CONTENT_LITERAL:12,
+    );
588-598: Mixed indentation test is helpful; consider adding an explicit indent-indicator variant.
 #[test]
 fn lex_block_scalar_mixed_indentation() {
     assert_lex!(
         r"|
   line1
     indented
  line2",
         PIPE:1,
         BLOCK_CONTENT_LITERAL:28,
     );
+
+    assert_lex!(
+        r"|2
+  line1
+    indented
+ line2",
+        PIPE:1,
+        INDENTATION_INDICATOR:1,
+        BLOCK_CONTENT_LITERAL:28,
+    );
 }
crates/biome_yaml_parser/src/parser/parse_error.rs (1)
38-40: Clarify diagnostic wording: “block scalar header”.

Slightly more precise and matches the feature’s terminology.
-pub(crate) fn expected_header(p: &YamlParser, range: TextRange) -> ParseDiagnostic {
-    expected_node("block header", range, p)
+pub(crate) fn expected_header(p: &YamlParser, range: TextRange) -> ParseDiagnostic {
+    expected_node("block scalar header", range, p)
 }
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml (1)

1-7: Solid coverage; consider adding a header that combines both indicators.

You already cover chomping-only (|-), keep with empty content (|+), and indent-only (|1-). To exercise the “multiple indicators permitted” behaviour this PR adopts, add one case that combines both chomping and indent (e.g. |+2) with a line of content.

crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml (1)

1-7: Nice spread; add a mixed-order indicators case.

Folded covers >- with content, >+ empty, and >-4. Consider one where the indentation indicator precedes the chomping indicator (e.g. >4+) to lock in permissive ordering.
crates/biome_yaml_parser/src/lexer/mod.rs (2)
175-210: Coalesce header-line garbage into a single diagnostic.

Current loop emits a chain of ERROR_TOKENs for every non-break after the header. A single diagnostic spanning the range is clearer and cheaper.

Apply this diff:
-        // So that the block content can cleanly start at a newline
-        while let Some(c) = self.current_byte() {
-            if is_break(c) {
-                break;
-            }
-            tokens.push_back(self.consume_unexpected_token());
-        }
+        // Expect a line break after the header; coalesce any extras into one diagnostic
+        if self.current_byte().is_some_and(|c| !is_break(c)) {
+            let start = self.text_position();
+            while let Some(c) = self.current_byte() {
+                if is_break(c) {
+                    break;
+                }
+                self.advance(1);
+            }
+            self.push_diagnostic(ParseDiagnostic::new(
+                "Expected a line break after the block scalar header",
+                start..self.text_position(),
+            ));
+        }
234-261: Tighten the indentation-indicator diagnostic.

Small copy tweak and a hint make the error self-explanatory, especially for multi-digit values.
-            let err = ParseDiagnostic::new(
-                "Block scalar indentation indicator must be between '1' and '9'",
-                start_pos..self.text_position(),
-            );
+            let err = ParseDiagnostic::new(
+                "Block scalar indentation indicator must be a single digit between '1' and '9'",
+                start_pos..self.text_position(),
+            )
+            .with_hint("YAML allows at most one indentation indicator; multi-digit values like '10' are invalid.");
crates/biome_yaml_parser/src/parser/block.rs (5)
15-15: Prefer consistent import path for flow helpers

You’re mixing crate::parser and super:: for flow imports. For consistency with nearby imports, pull parse_any_flow_node via super::flow.
-use crate::parser::{flow::parse_any_flow_node, parse_error::expected_header};
+use super::flow::parse_any_flow_node;
+use crate::parser::parse_error::expected_header;
Alternatively, move expected_header into the existing super::parse_error::{ ... } list. Happy either way.

257-271: DRY up literal/folded scalar parsing

Both functions are identical except for the start token and node kind. Collapse into a single helper for maintainability.
-fn parse_literal_scalar(p: &mut YamlParser) -> CompletedMarker {
-    let m = p.start();
-    p.bump(T![|]);
-    BlockHeaderList.parse_list(p);
-    parse_block_content(p);
-    m.complete(p, YAML_LITERAL_SCALAR)
-}
-
-fn parse_folded_scalar(p: &mut YamlParser) -> CompletedMarker {
-    let m = p.start();
-    p.bump(T![>]);
-    BlockHeaderList.parse_list(p);
-    parse_block_content(p);
-    m.complete(p, YAML_FOLDED_SCALAR)
-}
+fn parse_literal_scalar(p: &mut YamlParser) -> CompletedMarker {
+    parse_block_scalar(p, T![|], YAML_LITERAL_SCALAR)
+}
+
+fn parse_folded_scalar(p: &mut YamlParser) -> CompletedMarker {
+    parse_block_scalar(p, T![>], YAML_FOLDED_SCALAR)
+}
Add this helper (placement anywhere nearby is fine):
fn parse_block_scalar(
    p: &mut YamlParser,
    start_token: YamlSyntaxKind,
    node_kind: YamlSyntaxKind,
) -> CompletedMarker {
    let m = p.start();
    p.bump(start_token);
    BlockHeaderList.parse_list(p);
    parse_block_content(p);
    m.complete(p, node_kind)
}
308-318: Tiny duplication in chomping indicator parsers

parse_strip_indicator and parse_keep_indicator are identical bar token/kind. Optional: a small helper or macro would remove duplication.

335-335: Nit: extract a helper for scalar starts

For readability and reuse, consider is_at_block_scalar_start(p) and call it here.

291-294: No parser change needed for folded scalars
The lexer emits BLOCK_CONTENT_LITERAL for both | and >, so p.at/expect(BLOCK_CONTENT_LITERAL) handles folded content correctly. Optionally rename BLOCK_CONTENT_LITERAL to BLOCK_CONTENT in src/lexer/mod.rs for clarity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53ff5ae and 80aa600.

⛔ Files ignored due to path filters (10)

crates/biome_yaml_factory/src/generated/node_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_yaml_factory/src/generated/syntax_factory.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml.snap is excluded by !**/*.snap and included by **
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml.snap is excluded by !**/*.snap and included by **
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml.snap is excluded by !**/*.snap and included by **
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml.snap is excluded by !**/*.snap and included by **
crates/biome_yaml_syntax/src/generated/kind.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_yaml_syntax/src/generated/macros.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_yaml_syntax/src/generated/nodes.rs is excluded by !**/generated/**, !**/generated/** and included by **
crates/biome_yaml_syntax/src/generated/nodes_mut.rs is excluded by !**/generated/**, !**/generated/** and included by **

📒 Files selected for processing (11)

crates/biome_yaml_parser/src/lexer/mod.rs (11 hunks)
crates/biome_yaml_parser/src/lexer/tests/block.rs (1 hunks)
crates/biome_yaml_parser/src/parser/block.rs (3 hunks)
crates/biome_yaml_parser/src/parser/parse_error.rs (1 hunks)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml (1 hunks)
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml (1 hunks)
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml (1 hunks)
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml (1 hunks)
crates/biome_yaml_syntax/src/lib.rs (1 hunks)
xtask/codegen/src/yaml_kinds_src.rs (4 hunks)
xtask/codegen/yaml.ungram (4 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

crates/biome_*_{syntax,parser,formatter,analyze,factory,semantic}/**

📄 CodeRabbit inference engine (CLAUDE.md)

Maintain the per-language crate structure: biome_{lang}_{syntax,parser,formatter,analyze,factory,semantic}

Files:

crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml
crates/biome_yaml_parser/src/parser/parse_error.rs
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml
crates/biome_yaml_parser/src/lexer/mod.rs
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml
crates/biome_yaml_parser/src/parser/block.rs
crates/biome_yaml_syntax/src/lib.rs

crates/biome_*/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place core crates under /crates/biome_*/

Files:

crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml
crates/biome_yaml_parser/src/parser/parse_error.rs
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml
crates/biome_yaml_parser/src/lexer/mod.rs
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml
crates/biome_yaml_parser/src/parser/block.rs
crates/biome_yaml_syntax/src/lib.rs

**/tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place test files under a tests/ directory in each crate

Files:

crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/literal_scalar.yaml

**/*.{rs,toml}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Format Rust and TOML files before committing (use just f/just format).

Files:

xtask/codegen/src/yaml_kinds_src.rs
crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/src/parser/parse_error.rs
crates/biome_yaml_parser/src/lexer/mod.rs
crates/biome_yaml_parser/src/parser/block.rs
crates/biome_yaml_syntax/src/lib.rs

xtask/codegen/*.ungram

📄 CodeRabbit inference engine (CLAUDE.md)

Define and modify language grammars in .ungram files; ASTs are generated from these

Files:

xtask/codegen/yaml.ungram

🧠 Learnings (4)

📚 Learning: 2025-08-17T08:57:34.751Z

Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-08-17T08:57:34.751Z
Learning: Applies to crates/biome_parser/xtask/codegen/*.ungram : Nodes representing enclosing syntax errors must include the word Bogus (e.g., HtmlBogusAttribute)

Applied to files:

xtask/codegen/src/yaml_kinds_src.rs
crates/biome_yaml_syntax/src/lib.rs

📚 Learning: 2025-08-11T11:46:05.836Z

Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_diagnostics/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:46:05.836Z
Learning: Applies to crates/biome_diagnostics/**/*.rs : Use helper advice types from biome_diagnostics::v2 (CodeFrameAdvice, CommandAdvice, DiffAdvice, LogAdvice) when suitable

Applied to files:

crates/biome_yaml_parser/src/parser/parse_error.rs

📚 Learning: 2025-08-11T11:46:05.836Z

Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_diagnostics/CONTRIBUTING.md:0-0
Timestamp: 2025-08-11T11:46:05.836Z
Learning: Applies to crates/biome_diagnostics/**/*.rs : Types implementing Diagnostic must also implement Debug (e.g., use #[derive(Debug, Diagnostic)])

Applied to files:

crates/biome_yaml_parser/src/parser/parse_error.rs

📚 Learning: 2025-08-17T08:57:34.751Z

Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-08-17T08:57:34.751Z
Learning: Applies to crates/biome_parser/xtask/codegen/*.ungram : Bogus nodes must be included as a variant in their corresponding union

Applied to files:

crates/biome_yaml_syntax/src/lib.rs

🧬 Code graph analysis (1)

crates/biome_yaml_parser/src/parser/block.rs (3)

crates/biome_yaml_parser/src/parser/flow.rs (3)

parse_any_flow_node (18-26)

recover (155-161)

recover (213-219)

crates/biome_yaml_parser/src/parser/parse_error.rs (1)

expected_header (38-40)

crates/biome_yaml_parser/src/parser/document.rs (2)

recover (36-46)

recover (89-99)

🪛 YAMLlint (1.37.1)

crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml

[error] 1-1: syntax error: expected a comment or a line break, but found 'b'

(syntax)

crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml

[error] 1-1: syntax error: expected chomping or indentation indicators, but found '-'

(syntax)

🔇 Additional comments (25)

xtask/codegen/src/yaml_kinds_src.rs (3)

31-33: Token split looks right.

INDENTATION_INDICATOR as a literal token plus BLOCK_CONTENT_LITERAL aligns with the lexer model described in the PR. No issues spotted.

79-84: New AST nodes align with grammar changes.

Header list, indicators, and content nodes give us nice hooks for recovery/diagnostics. Looks consistent with the “Bogus” strategy.

92-92: Nice: Bogus node naming matches project conventions.

YAML_BOGUS_BLOCK_HEADER adheres to the “Bogus*” guideline for enclosing syntax errors.

crates/biome_yaml_parser/src/lexer/tests/block.rs (2)

514-534: Header/comment interplay looks correct.

Both literal and folded forms handled; whitespace before # retained. All good.

568-586: Empty-content cases are spot on.

Covers both zero-length and whitespace-only content for literal and folded styles. LGTM.

crates/biome_yaml_syntax/src/lib.rs (3)

33-38: Bogus kind set looks right.

Adding YAML_BOGUS_BLOCK_HEADER and YAML_BOGUS_FLOW_NODE to is_bogus aligns with the grammar changes.

45-46: to_bogus() mapping for block headers is consistent with the union.

AnyYamlBlockHeader::can_cast → YAML_BOGUS_BLOCK_HEADER is the expected route for recovery.

69-71: Confirm NEWLINE not being trivia is intentional.

is_trivia excludes NEWLINE, while TriviaPieceKind still maps it. If formatter/parser invariants elsewhere assume NEWLINE is trivia, consider adding it; otherwise, please confirm this divergence is deliberate.

Would you like a repo scan to find call sites that rely on SyntaxKind::is_trivia including newlines?

crates/biome_yaml_parser/src/lexer/mod.rs (7)

79-81: Good: direct dispatch for block scalars.

Routing |/> to a dedicated path keeps the main loop readable.

159-162: Trailing-trivia handling before FLOW_END — double-check behaviour.

Switching to consume_trivia(true) ensures no newline slips between the value and FLOW_END. Please verify parser expectations/tests for flows with trailing comments still pass under this stricter behaviour.

212-233: Header lexing matches the PR’s permissive stance.

Accepting multiple +/- and digits is fine given a future lint will police it.

263-289: Content lexing is sensible and lexer-only: +1.

Treating content as a single BLOCK_CONTENT_LITERAL and deferring indent/chomp semantics to later phases keeps the lexer lean.

539-553: Flow-scope breach caching makes sense.

Caching closing tokens when breaching parent scope avoids interleaving trivia with flow delimiters. Good separation of concerns.

555-572: Neat: consume_trivia(trailing) clarifies intent.

This simplifies callers and prevents accidental newline consumption where it matters.

574-595: Confirm single invocation of is_scalar_continuation() per break-handling branch. All four call sites occur only when is_break(current) is true and are invoked exactly once each; invariants hold.

xtask/codegen/yaml.ungram (6)

32-33: Good: new bogus node adheres to conventions.

YamlBogusBlockHeader is added and used in the appropriate union.

161-163: Splitting block scalars into literal/folded nodes is the right move.

This aligns AST with the spec and makes downstream processing clearer.

190-209: Node shapes read well and mirror the spec.

headers: YamlBlockHeaderList and content: YamlBlockContent give us future wiggle room for linting without changing the tree.

251-258: Header union covers all cases, including recovery.

Ordering-agnostic, with explicit strip/keep/indent and bogus fallback — tidy.

272-273: Token naming matches the lexer.

indentation_indicator token aligns with INDENTATION_INDICATOR.

280-281: Single content token keeps the grammar simple.

block_content_literal for both folded and literal is pragmatic.

crates/biome_yaml_parser/src/parser/block.rs (4)

32-35: Block scalar starts correctly recognised

Recognising | and > here is spot on and keeps the dispatcher tidy. Nice one.

273-289: Header list permits multiple indicators — ensure tests cover permutations

Given the intentional permissiveness, please ensure tests cover duplicates and varied orders, e.g. |+, |2, |+2, |2+, |+-2, and the same for >. This will guard AST shape and recovery behaviour.

295-305: Header recovery strategy looks good

Using ParseRecoveryTokenSet gated on content with expected_header is a sensible, low-noise choice for malformed headers.

326-332: Good defensive check

Using expect for block content is the right call; it yields a precise diagnostic when content is missing.

coderabbitai · 2025-09-06T13:22:36Z

crates/biome_yaml_parser/src/parser/block.rs

+fn parse_indentation_indicator(p: &mut YamlParser) -> CompletedMarker {
+    let m = p.start();
+    p.bump(INDENTATION_INDICATOR);
+    m.complete(p, YAML_INDENTATION_INDICATOR)
+}


💡 Verification agent

🧩 Analysis chain

Indentation indicator value range (1–9) — please verify

YAML 1.2 restricts the indentation indicator to 1–9. Ensure the lexer enforces this and add tests for |0, >0, |10, >10 (expecting diagnostics).

🏁 Script executed:

#!/bin/bash # Locate the lexer rule and existing tests rg -nP -C3 'INDENTATION_INDICATOR|indentation\s+indicator' crates rg -nP -n '\|0|>0|\|10|>10' crates/biome_yaml_parser/tests || true

Length of output: 19705

Add tests for >0, |10 and >10 in invalid_headers.yaml
yaml_test_suite/err/block/invalid_headers.yaml covers |0 and multi-digit (123) but lacks these out-of-range indicators—add tests expecting the “must be between '1' and '9'” diagnostic.

🤖 Prompt for AI Agents

In crates/biome_yaml_parser/src/parser/block.rs around lines 320 to 324, the test suite is missing test cases for out-of-range indentation indicators; add three entries to yaml_test_suite/err/block/invalid_headers.yaml covering the indicators ">0", "|10", and ">10" and assert the parser emits the existing diagnostic that the indentation indicator "must be between '1' and '9'"; ensure each test includes the offending line, the expected diagnostic message and location so the suite fails if the parser does not produce the "must be between '1' and '9'" error for these inputs.

ematipico · 2025-09-07T05:34:38Z

crates/biome_yaml_parser/src/lexer/mod.rs

+
+    /// Lex block scalar header indicators, returning tokens and indent size
+    fn consume_block_header_tokens(&mut self) -> LinkedList<LexToken> {
+        let mut tokens = LinkedList::new();


This is the first time I see the use of a linked list in the code base. What's the rationale behind it? Why not a VecDeque? https://doc.rust-lang.org/std/collections/struct.VecDeque.html

This all stems from how we are lexing the block mapping keys. The lexer first has to lex all tokens in the key up until the closest : indicator, or a new line. Then, it inserts a MAPPING_START or FLOW_START token depending on whether the lexed tokens represent a mapping key or just a normal flow value. It looks something like this

Entering potential block mapping:
- !!foo &anchor [a, b, c]: d ↑ [lexer's current position]
lexed token: ['!!foo', '&anchor'].

Trying to disambiguate the construct:
- &anchor [a, b, c]: d ↑ [lexer's current position]
lexed token: ['!!foo', '&anchor'].
potentially mapping key tokens: [[, a, , , , b, , , , c, ]]

Encountered :. This is a mapping key:
- &anchor [a, b, c]: d ↑ [lexer's current position]
lexed token: ['!!foo', '&anchor'].
mapping key tokens: [MAPPING_START, [, a, , , , b, , , , c, ]]

Returning the tokens to the parser:
lexed token: ['!!foo', '&anchor', MAPPING_START, [, a, , , , b, , , , c, ]].

Since we have to move list of tokens around frequently, I think LinkedList is the most suitable here, since it is the container with the most performant list concatenation

That said, if we need to stay consistent with the rest of the code base, I don't think we would lose much performance switching back to VecDeque.

crates/biome_yaml_parser/src/lexer/mod.rs

crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml.snap

crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml

crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml.snap

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

crates/biome_yaml_parser/src/lexer/mod.rs (1)

447-447: Function renaming improves clarity.

Renaming is_literal_continuation to is_scalar_continuation and no_longer_flow to breach_parent_scope better reflects their broader usage across different scalar types.

Also applies to: 479-479, 518-518, 579-579, 592-592, 650-656

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80aa600 and 27018bf.

⛔ Files ignored due to path filters (4)

crates/biome_yaml_parser/tests/yaml_test_suite/err/block/extra_characters_in_header.yaml.snap is excluded by !**/*.snap and included by **
crates/biome_yaml_parser/tests/yaml_test_suite/err/block/invalid_headers.yaml.snap is excluded by !**/*.snap and included by **
crates/biome_yaml_parser/tests/yaml_test_suite/err/flow/plain_start_with_indicator.yaml.snap is excluded by !**/*.snap and included by **
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml.snap is excluded by !**/*.snap and included by **

📒 Files selected for processing (3)

crates/biome_yaml_parser/src/lexer/mod.rs (12 hunks)
crates/biome_yaml_parser/src/lexer/tests/block.rs (1 hunks)
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

crates/biome_yaml_parser/src/lexer/tests/block.rs
crates/biome_yaml_parser/tests/yaml_test_suite/ok/block/scalar/folded_scalar.yaml

🧰 Additional context used

📓 Path-based instructions (3)

crates/biome_*_{syntax,parser,formatter,analyze,factory,semantic}/**

📄 CodeRabbit inference engine (CLAUDE.md)

Maintain the per-language crate structure: biome_{lang}_{syntax,parser,formatter,analyze,factory,semantic}

Files:

crates/biome_yaml_parser/src/lexer/mod.rs

crates/biome_*/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place core crates under /crates/biome_*/

Files:

crates/biome_yaml_parser/src/lexer/mod.rs

**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Format all Rust source files before committing (just f)

Files:

crates/biome_yaml_parser/src/lexer/mod.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)

GitHub Check: Bench (biome_module_graph)
GitHub Check: Bench (biome_graphql_parser)
GitHub Check: Bench (biome_package)
GitHub Check: Bench (biome_configuration)
GitHub Check: Bench (biome_graphql_formatter)
GitHub Check: Bench (biome_json_analyze)
GitHub Check: Bench (biome_html_formatter)
GitHub Check: Bench (biome_css_formatter)
GitHub Check: Bench (biome_js_parser)
GitHub Check: Bench (biome_html_parser)
GitHub Check: Bench (biome_css_parser)
GitHub Check: Bench (biome_css_analyze)
GitHub Check: Bench (biome_json_parser)
GitHub Check: Bench (biome_js_analyze)
GitHub Check: Bench (biome_js_formatter)
GitHub Check: autofix
GitHub Check: Bench (biome_json_formatter)
GitHub Check: Documentation
GitHub Check: Test Node.js API
GitHub Check: Lint project (depot-ubuntu-24.04-arm-16)
GitHub Check: Parser conformance
GitHub Check: Test (depot-ubuntu-24.04-arm-16)
GitHub Check: Lint project (depot-windows-2022)
GitHub Check: Check Dependencies
GitHub Check: Test (depot-windows-2022-16)

🔇 Additional comments (3)

crates/biome_yaml_parser/src/lexer/mod.rs (3)

268-293: Smart approach to block content lexing.

The decision to ignore declared indentation values during lexing (lines 271-275) is pragmatic - it simplifies the lexer whilst allowing syntax analysis rules to catch indentation mismatches later. This provides better diagnostic granularity.

159-161: Trivia handling refactor improves consistency.

The new trailing parameter in consume_trivia allows fine-grained control over newline handling. The usage pattern (trailing=true for flow ends, trailing=false for block evaluation) makes sense for the respective contexts.

Also applies to: 298-298, 549-549, 560-568

641-641: Minor: Capitalisation adjustment.

Good consistency improvement - "Unexpected character" with capital U aligns with standard error message formatting.

feat(yaml_parser): parse block headers

80aa600

github-actions bot added A-Parser Area: parser A-Tooling Area: internal tools labels Sep 6, 2025

l0ngvh requested review from a team September 6, 2025 13:14

l0ngvh marked this pull request as ready for review September 6, 2025 13:14

coderabbitai bot reviewed Sep 6, 2025

View reviewed changes

ematipico reviewed Sep 7, 2025

View reviewed changes

l0ngvh added 4 commits September 10, 2025 21:43

feat: simplify indentation indicator message

156ba4e

feat(yaml_parser): Add test for comment inside block scalar

6b264ef

fix(yaml_parser): emit error token on malformed indentatio indicator

c7bce4b

fix(yaml_parser): better unexpected character diagnostic message

27018bf

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

l0ngvh requested a review from ematipico September 10, 2025 13:30

ematipico approved these changes Sep 11, 2025

View reviewed changes

l0ngvh merged commit 34136fd into biomejs:main Sep 11, 2025
30 checks passed

kedevked pushed a commit to kedevked/biome that referenced this pull request Sep 22, 2025

feat(yaml_parser): parse block scalars (biomejs#7417)

783b6ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(yaml_parser): parse block scalars #7417

feat(yaml_parser): parse block scalars #7417

Uh oh!

l0ngvh commented Sep 6, 2025

Uh oh!

changeset-bot bot commented Sep 6, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 6, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Sep 6, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 6, 2025

Uh oh!

ematipico Sep 7, 2025

Uh oh!

l0ngvh Sep 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(yaml_parser): parse block scalars #7417

feat(yaml_parser): parse block scalars #7417

Uh oh!

Conversation

l0ngvh commented Sep 6, 2025

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested reviewers

Pre-merge checks (3 passed)

Uh oh!

codspeed-hq bot commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #7417 will not alter performance

Summary

Footnotes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

ematipico Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

l0ngvh Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Sep 6, 2025 •

edited

Loading

coderabbitai bot commented Sep 6, 2025 •

edited

Loading

codspeed-hq bot commented Sep 6, 2025 •

edited

Loading