这是indexloc提供的服务,不要输入任何密码
Skip to content

MDEV-34431: Avoid spin loops on page I/O waits #4172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: 11.4
Choose a base branch
from
Draft

Conversation

dr-m
Copy link
Contributor

@dr-m dr-m commented Jul 8, 2025

  • The Jira issue number for this PR is: MDEV-34431

Description

While waiting for for I/O completion, let us skip spin loops. Even on fast storage, reading a page into the buffer pool takes so long time that wasting CPU in a spin loop does not make any sense.

block_lock::s_lock_nospin(): A no-spin variant of acquiring a shared buffer page latch. Regular s_lock() always involveis a spin loop.

ssux_lock_impl::rd_lock_spin(), ssux_lock_impl::rd_lock_nospin(): Split from rd_wait().

ssux_lock_impl::rd_lock(): Invoke either rd_lock_nospin() or rd_lock_try() and rd_lock_spin().

The above changes are also part of #4168.

buf_read_page(): Return a pointer to a buffer-fixed, non-read-fixed page, or nullptr in case of an error.

buf_inc_get(): Remove the parameter.

IORequest::read_complete(): Assert that the page is both read-fixed and buffer-fixed. Sample recv_sys.recovery_on only once. Buffer-unfix the page when the asynchronous read completes.

buf_page_t::read_complete(): Assert that the page is both read-fixed and buffer-fixed.

buf_page_init_for_read(): Return a pointer to a buffer-fixed block descriptor pointer, bitwise-ORed with 1 in case the block already exists in the buffer pool.

buf_read_ahead_update(), buf_read_ahead_update_sql(): Common code for updating some statistics counters.

buf_read_page_low(): Replace the parameter sync with err, which will return an error code to a synchronous caller. Add a parameter for thread-local mariadb_stats. Return the pointer to the block, or the special values nullptr (read failure) or -1 or -2 for asynchronous reads. Increment the statistics when a synchronous read was requested. In a synchronous read, if the page has already been allocated in the buffer pool but it is read-fixed, wait for the read to complete.

buf_read_page_background(): Update the statistics if supplied.

Release Notes

When a buffer pool lookup needs to wait for page I/O, InnoDB will avoid excessive CPU usage.

How can this PR be tested?

Sysbench or HammerDB on a workload that is larger than the buffer pool. This should be most prominent when there are synchronous waits for pages in the buffer pool.

Measuring the CPU usage while running the test case in MDEV-32067 should show some impact.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

The second change (refactoring buf_read_page()) depends on earlier changes f27e9c8 and d6aed21, and 11.4 is the earliest maintained branch that contains them.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@dr-m dr-m self-assigned this Jul 8, 2025
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor Author

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may revise this further to make the I/O wait reporting more consistent. I wanted to publish this as is to enable some performance testing as early as possible.

One thing that is outside the scope of this is waits for page writes, in case a user thread is requesting write access to a buffer page that is currently being written back to the data file. This could be addressed separately later.

@dr-m dr-m force-pushed the 11.4-MDEV-34431 branch from 766c411 to 5ed84da Compare July 8, 2025 13:08
@dr-m dr-m force-pushed the 11.4-MDEV-34431 branch 2 times, most recently from c32d5d3 to 23f973d Compare July 10, 2025 14:05
@dr-m dr-m force-pushed the 11.4-MDEV-34431 branch 5 times, most recently from 2b3806e to f760918 Compare July 16, 2025 12:49
@dr-m dr-m marked this pull request as draft July 21, 2025 06:46
dr-m added 2 commits July 21, 2025 09:47
The Microsoft Windows implementations of SRWLOCK and WaitOnAddress()
include some spin loop logic before entering the operating system kernel.
Let us avoid duplicating some of that spin loop logic.

Thanks to Vladislav Vaintroub for this fix.
While waiting for for I/O completion, let us skip spin loops.
Even on fast storage, reading a page into the buffer pool takes
so long that a spin loop would only end up wasting CPU time.

block_lock::s_lock_nospin(): A no-spin variant of acquiring a shared
buffer page latch. Regular s_lock() always involveis a spin loop.

ssux_lock_impl::rd_lock_spin(), ssux_lock_impl::rd_lock_nospin():
Split from rd_wait().

ssux_lock_impl::rd_lock(): Invoke either rd_lock_nospin() or
rd_lock_try() and rd_lock_spin().

buf_page_get_low(): After acquiring a page latch on an io-fixed block,
try to optimize operations on the page latch.
@dr-m dr-m force-pushed the 11.4-MDEV-34431 branch from f760918 to 67da6e6 Compare July 21, 2025 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants