+
Skip to content

Semi rewrite #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 56 commits into
base: master
Choose a base branch
from
Draft

Semi rewrite #43

wants to merge 56 commits into from

Conversation

morr0ne
Copy link
Member

@morr0ne morr0ne commented Jun 15, 2025

This PR is a major rewrite of the parser that comes with many improvements and changes. At its core, the main change is the underlying technology.

Instead of using a Pest grammar which is hard to maintain and read, the new parser separates the steps into a proper lexer using Logos and a simple but very efficient parser using LALRPOP. This gets turned into an AST which also allows for more efficient and streamlined implementation of the deserializer.

The parser is largely backwards compatible except where the old parser was wrong. The new parser should be more correct, and some tests have been updated to reflect that. The value enum has also been updated to be more ergonomic and easier to use.

There are a few breaking changes that improve the syntax and were agreed upon beforehand:

  • Stricter parsing of unquoted keys, most notably internal quotes are no longer allowed
  • Less compact syntax, stuff like truefalse won't work anymore
  • String interpolation now uses a syntax similar to the one used in Rust: "$var""${var}"
  • Unicode escapes also use a new syntax that allows for more Unicode planes to be used: "\u{FFFD}"

Enhanced Integer Parsing Support:

  • Added octal integer support: New 0o prefix syntax (e.g., 0o755 → 493, 0o123 → 83)
  • Added binary integer support: New 0b prefix syntax (e.g., 0b1010 → 10, 0b1111 → 15)
  • All integer formats support:
    • Negative values: -0xABC, -0o755, -0b1100
    • Underscore separators for readability: 1_000_000, 0xFA_FA_FA, 0o12_34_56, 0b1010_1100
Format Prefix Example Result
Decimal none 42, 1_000_000 42, 1000000
Hexadecimal 0x 0xFF, 0xFA_FA 255, 64250
Octal 0o 0o755, 0o12_34 493, 668
Binary 0b 0b1010, 0b10_11 10, 11

There might be other changes that are more of a result of the older parser not properly respecting the spec at times.

The changes should address all the concerns highlighted in #42.

There are other changes planned and more extensive error handling which will be implemented in other PRs. The new code should allow much easier extensibility.

Lua is fully working, the CLI has been updated too. WASM support is now fully working and fixed.

Overall, the new code is more readable, maintainable, correct, more efficient and possibly even faster, although that wasn't the goal of this rewrite.

Resolves #42

@JakeStanger
Copy link
Collaborator

Does this superseed #41 btw? If so we can close that

@morr0ne
Copy link
Member Author

morr0ne commented Jun 22, 2025

Does this superseed #41 btw? If so we can close that

Not necessarly, this is a different approach to the rewrite. I still think there could be value in a streaming parser, especially for big configs. I am gonna rebase that one and possibly work on it in the future once this is merged

@morr0ne morr0ne marked this pull request as ready for review June 22, 2025 23:13
@morr0ne morr0ne marked this pull request as draft June 23, 2025 08:38
Copy link
Collaborator

@JakeStanger JakeStanger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall impl looks great, very clean code. A few areas that do need attention, but they are all minor stuff really

@morr0ne morr0ne requested a review from JakeStanger June 24, 2025 14:46
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to update the spec to include the additional features before merge. Obvs gonna rework that at some point, but for now changes can just go on a v0.11 branch.

(Same is true of the interpolation/unicode syntax changes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a dedicated spec repository or include the spec in here? Because the current one lives in the website afaik.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon its own repo. While I want to keep this the reference implementation I think it makes sense to have the spec elsewhere, mainly so it can be managed separately (ie issues/PRs are directly against spec only)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@JakeStanger JakeStanger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code all LGTM as it stands. Will await the final changes and then fully re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Issues with the full specification and pest code
2 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载