+
Skip to content

Conversation

arendjr
Copy link
Contributor

@arendjr arendjr commented Jul 23, 2025

Summary

Fixed #6965: Implemented smarter scanner for project rules.

Previously, if project rules were enabled Biome's scanner would scan all dependencies regardless of whether they were used by/reachable from source files or not. While this worked for a first version, it was far from optimal.

The new scanner first scans everything listed under the files.includes setting, and then descends into the dependencies that were discovered there, including transitive dependencies. This has three main advantages:

  • Dependencies that are not reachable from your source files don't get indexed.
  • Dependencies that have multiple type definitions, such as those with separate definitions for CommonJS and ESM imports, only have the relevant definitions indexed.
  • If vcs.useIgnoreFile is enabled, .gitignore gets respected as well. Assuming you have folders such as build/ or dist/ configured there, those will be automatically ignored by the scanner.

In addition, the following refactorings have been made:

  • The scanner has been moved out of the workspace module. The watcher has become a proper submodule of the scanner.
  • Both scanner and watcher have "bridge" traits now. They no longer take a reference to the workspace directly, and they don't share methods with the workspace anymore either. Instead, they take a reference to a bridge implementation which allows their communication with the workspace. The scanner bridge trait is still implemented directly on the WorkspaceServer (and the watcher bridge trait on top of that), so there's no/hardly any actual overhead, but it provides a much cleaner separation of responsibilities. As an added benefit, the traits allow for much easier unit testing of the scanner and watcher in isolation, which I've done for the watcher already.
  • is_opened_by_scanner has been renamed to is_indexed. The scanner and watcher also have their terminology changed so they no longer open/close a file, but they index/unload it. This is to highlight the differences between the scanner behaviour when they load a file and the behaviour when a client opens a file.

Test Plan

Tests added and expanded. It will take me a bit longer to get everything green already, but I thought it was ready for review at least.

Docs

biomejs/website#2853

@arendjr arendjr requested review from a team July 23, 2025 18:34
Copy link

changeset-bot bot commented Jul 23, 2025

🦋 Changeset detected

Latest commit: a0bd572

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 14 packages
Name Type
@biomejs/biome Minor
@biomejs/wasm-bundler Minor
@biomejs/wasm-nodejs Minor
@biomejs/wasm-web Minor
@biomejs/js-api Major
@biomejs/cli-win32-x64 Minor
@biomejs/cli-win32-arm64 Minor
@biomejs/cli-darwin-x64 Minor
@biomejs/cli-darwin-arm64 Minor
@biomejs/cli-linux-x64 Minor
@biomejs/cli-linux-arm64 Minor
@biomejs/cli-linux-x64-musl Minor
@biomejs/cli-linux-arm64-musl Minor
@biomejs/backend-jsonrpc Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions bot added A-CLI Area: CLI A-Project Area: project A-LSP Area: language server protocol labels Jul 23, 2025
@dyc3

This comment was marked as resolved.

@arendjr arendjr changed the base branch from main to next July 23, 2025 18:43
@arendjr

This comment was marked as resolved.

Copy link

codspeed-hq bot commented Jul 23, 2025

CodSpeed Performance Report

Merging #6989 will not alter performance

Comparing arendjr:scanner-v2 (a0bd572) with next (d6e68b0)

Summary

✅ 128 untouched benchmarks

@ematipico
Copy link
Member

Can we run the ecosystem CI before merging and make sure we didn't introduce any regressions?

@Conaclos
Copy link
Member

Nice! Have you performed some benchmarks/tests on big project to see the improvements?

imports.insert(path.to_path_buf(), visitor.collect_info());

let module_info = visitor.collect_info();
for import_path in module_info.all_import_paths() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just amazing and mind-blowing!

where
T: WorkspaceScannerBridge,
{
#[inline]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test the implication of all these #[inline] whenever you can? i.e. check if there's a difference when removing them. Thank you

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested this PR against a codebase, and it doesn't work anymore. Even without project rules, the CLI doesn't finish.

Without project rules, it's stuck. With debug logs, it only prints

2025-07-24T11:56:01.887489Z  INFO main crates/biome_cli/src/commands/mod.rs: Configuration file loaded: Some("**/biome.jsonc"), diagnostics detected 1
2025-07-24T11:56:01.888581Z DEBUG main insert_project: crates/biome_service/src/projects.rs: Insert workspace folder **

With a project rule enabled, it starts logging few index_file:open_file_internal, but then it stops, it doesn't log anything anymore, the memory of the process starts increasing slowly.

Blocking for now

@arendjr
Copy link
Contributor Author

arendjr commented Jul 24, 2025

Yeah, sorry, the tests aren’t all green yet either, and it hang to cancel the runs yesterday because something was hanging indeed. Definitely needs some fixes still before it’s done.

@Conaclos Also still waiting for the tests to get green, then I’ll do some comparisons.

@ematipico
Copy link
Member

Fixed the failures in 8a919a6 (#6989)

I think we can merge this PR to main because it is just an internal refactor that doesn't introduce any new APIs for the user. Considering that we are still receiving bug reports around leaks, we need this refactor more than ever, ASAP.

@arendjr
Copy link
Contributor Author

arendjr commented Aug 3, 2025

Thanks, nice work!

I’ll have a closer look tomorrow, but just to temper expectations: I don’t think this PR will help much with the reports of leaks, because it does nothing except possibly reduce the amount of .d.ts files that are indexed. I haven’t even done benchmarks yet and before my holiday I wasn’t entirely confident with the current tests yet. I think it will improve performance for those using project rules, possibly significantly, but don’t expect any miracles 🙏

@arendjr
Copy link
Contributor Author

arendjr commented Aug 4, 2025

Oh, I'm afraid the CI status was misleading. Tests are actually still failing here. I'm on it!

@arendjr
Copy link
Contributor Author

arendjr commented Aug 4, 2025

How does this fix impact the LSP behaviour? I know we have a few tests in place, I wonder if a manual test would work.

I've done some manual testing indeed, and it appears the LSP behaviour is good. At least I've not gotten any CPU spikes in repositories where previously I could reproduce it, and I've tried running builds while the LSP is open too.

Regarding the entity of the change, from a user point of view prospective, it's still a fix. We essentially botched how we interpreted globs with folders and how we documented them. I don't have strong opinions, however this PR must have a docs PR, so we update our guides about how users can ignore folders and files (hopefully CLI and LSP behave the same)

Yes, I agree it's a fix. The reason I'm hesitant though is that we can anticipate that some users may require config changes, which makes it technically a breaking change. I don't think it's such a major change however, because it is still a fix. So if we're really strict in one direction, we would require releasing a v3, but if we don't care at all, it can go in a patch release. I think neither extreme is really helpful though, I think it deserves a small shoutout for users, so a minor it is :) We never strictly followed semver anyway...

@arendjr
Copy link
Contributor Author

arendjr commented Aug 4, 2025

Bloody Windows 😅

@Conaclos I tried two open-source projects (unleash and aigne-framework), both with project rules enabled. One became twice as fast, the other almost :)

@arendjr arendjr mentioned this pull request Aug 5, 2025
1 task
Copy link
Contributor

coderabbitai bot commented Aug 5, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

# Conflicts:
#	crates/biome_lsp/src/server.rs
#	crates/biome_service/src/workspace/scanner.tests.rs
#	crates/biome_service/src/workspace/server.rs
#	crates/biome_service/src/workspace/server.tests.rs
#	crates/biome_service/src/workspace/watcher.tests.rs
#	crates/biome_service/tests/spec_tests.rs
#	packages/@biomejs/backend-jsonrpc/src/workspace.ts
Copy link
Member

@siketyan siketyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@siketyan
Copy link
Member

siketyan commented Aug 5, 2025

Just tested on my private large repository. No issues so far, and got faster ~1s.

v2.1.3

❯ time ./node_modules/.pnpm/@biomejs+cli-darwin-arm64@2.1.3/node_modules/@biomejs/cli-darwin-arm64/biome check
Checked 5251 files in 6s. No fixes applied.

________________________________________________________
Executed in    7.12 secs    fish           external
   usr time   43.87 secs    0.25 millis   43.87 secs
   sys time   13.83 secs    1.92 millis   13.83 secs

78f3460

❯ time ~/Projects/github.com/biomejs/biome/target/release-with-debug/biome check
Checked 5251 files in 5s. No fixes applied.

________________________________________________________
Executed in    5.99 secs    fish           external
   usr time   38.55 secs    0.20 millis   38.55 secs
   sys time   15.87 secs    1.47 millis   15.86 secs

@arendjr
Copy link
Contributor Author

arendjr commented Aug 5, 2025

Thanks for confirming, @siketyan !

PR with website docs is ready too: biomejs/website#2853

@arendjr arendjr merged commit 85b1128 into biomejs:next Aug 5, 2025
32 checks passed
@siketyan siketyan mentioned this pull request Aug 6, 2025
ematipico added a commit that referenced this pull request Aug 12, 2025
Co-authored-by: Emanuele Stoppa <my.burning@gmail.com>
Co-authored-by: Naoki Ikeguchi <me@s6n.jp>
@github-actions github-actions bot mentioned this pull request Aug 13, 2025
Comment on lines +100 to +113
fs.insert(
file_path.into(),
r#"{
"files": { "includes": ["test.js", "!**/folder"] }
}
"#
.as_bytes(),
);

let test = Utf8Path::new("test.js");
fs.insert(test.into(), UNFORMATTED.as_bytes());

let test2 = Utf8Path::new("folder/test2.js");
fs.insert(test2.into(), UNFORMATTED.as_bytes());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test doesn’t actually test what you want it to, see #7268 (comment)

so this whole feature was shipped completely broken and is breaking user code as we speak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CLI Area: CLI A-Core Area: core A-LSP Area: language server protocol A-Parser Area: parser A-Project Area: project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

📎 Implement smarter scanner

6 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载