+
Skip to content

[scheduler] Another task-stealing optimization #9984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

RobinMorisset
Copy link
Contributor

The main win here is the third commit.
It is a followup to #9594, and was tested in prod on multiple kinds of machines and multiple services.
We observed some small but significant wins in scheduler utilisation, including at peak load which was the weak point of the previous batch of optimisations.
cc @tomas-abrahamsson since you saw large wins from the previous batch of scheduler enhancements.

Summary:
If we steal exactly one task, then we directly execute it, and it never
ends up in our runqueue. So it is wrong to update the flag telling
everyone that we have work they can steal.
If we steal more than one task, then we will call erts_add_runq_len,
which will take care of calling erts_non_empty_runq for us.

Test Plan: CI + canary

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D72957511
Summary: Just invert a condition, and fix the indentation

Test Plan: CI + canary

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D72957513
Summary:
There are two benefits:
- Being FIFO rather than LIFO, we are much more likely to find the
  runqueue uncontended on the second pass (since it is a runqueue that
  was found contended a lot earlier)
- We can now do several passes, rather than defaulting to a blocking
  call to lock on the second pass.

There was just one small issue: the implementation of equeues did not
support passing them across function boundaries, since it used a
preprocesor macro to find the default queue (when trying to grow the
queue). I fixed that by adding a field to the queue itself.

Test Plan: CI + canary

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D72957512
Copy link
Contributor

github-actions bot commented Jun 20, 2025

CT Test Results

    2 files    138 suites   49m 5s ⏱️
1 629 tests 1 571 ✅ 56 💤 2 ❌
2 348 runs  2 270 ✅ 76 💤 2 ❌

For more details on these failures, see this check.

Results for commit 7ebdb0a.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@jhogberg jhogberg added the team:VM Assigned to OTP team VM label Jun 23, 2025
@RobinMorisset
Copy link
Contributor Author

I appear to have missed some failed assert in debug builds, I'm looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载