-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
MDEV-34431: Avoid spin loops on page I/O waits #4172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11.4
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may revise this further to make the I/O wait reporting more consistent. I wanted to publish this as is to enable some performance testing as early as possible.
One thing that is outside the scope of this is waits for page writes, in case a user thread is requesting write access to a buffer page that is currently being written back to the data file. This could be addressed separately later.
c32d5d3
to
23f973d
Compare
2b3806e
to
f760918
Compare
The Microsoft Windows implementations of SRWLOCK and WaitOnAddress() include some spin loop logic before entering the operating system kernel. Let us avoid duplicating some of that spin loop logic. Thanks to Vladislav Vaintroub for this fix.
While waiting for for I/O completion, let us skip spin loops. Even on fast storage, reading a page into the buffer pool takes so long that a spin loop would only end up wasting CPU time. block_lock::s_lock_nospin(): A no-spin variant of acquiring a shared buffer page latch. Regular s_lock() always involveis a spin loop. ssux_lock_impl::rd_lock_spin(), ssux_lock_impl::rd_lock_nospin(): Split from rd_wait(). ssux_lock_impl::rd_lock(): Invoke either rd_lock_nospin() or rd_lock_try() and rd_lock_spin(). buf_page_get_low(): After acquiring a page latch on an io-fixed block, try to optimize operations on the page latch.
Description
While waiting for for I/O completion, let us skip spin loops. Even on fast storage, reading a page into the buffer pool takes so long time that wasting CPU in a spin loop does not make any sense.
block_lock::s_lock_nospin()
: A no-spin variant of acquiring a shared buffer page latch. Regulars_lock()
always involveis a spin loop.ssux_lock_impl::rd_lock_spin()
,ssux_lock_impl::rd_lock_nospin()
: Split fromrd_wait()
.ssux_lock_impl::rd_lock()
: Invoke eitherrd_lock_nospin()
orrd_lock_try()
andrd_lock_spin()
.The above changes are also part of #4168.
buf_read_page()
: Return a pointer to a buffer-fixed, non-read-fixed page, ornullptr
in case of an error.buf_inc_get()
: Remove the parameter.IORequest::read_complete()
: Assert that the page is both read-fixed and buffer-fixed. Samplerecv_sys.recovery_on
only once. Buffer-unfix the page when the asynchronous read completes.buf_page_t::read_complete()
: Assert that the page is both read-fixed and buffer-fixed.buf_page_init_for_read()
: Return a pointer to a buffer-fixed block descriptor pointer, bitwise-ORed with 1 in case the block already exists in the buffer pool.buf_read_ahead_update()
,buf_read_ahead_update_sql()
: Common code for updating some statistics counters.buf_read_page_low()
: Replace the parametersync
witherr
, which will return an error code to a synchronous caller. Add a parameter for thread-localmariadb_stats
. Return the pointer to the block, or the special valuesnullptr
(read failure) or-1
or-2
for asynchronous reads. Increment the statistics when a synchronous read was requested. In a synchronous read, if the page has already been allocated in the buffer pool but it is read-fixed, wait for the read to complete.buf_read_page_background()
: Update the statistics if supplied.Release Notes
When a buffer pool lookup needs to wait for page I/O, InnoDB will avoid excessive CPU usage.
How can this PR be tested?
Sysbench or HammerDB on a workload that is larger than the buffer pool. This should be most prominent when there are synchronous waits for pages in the buffer pool.
Measuring the CPU usage while running the test case in MDEV-32067 should show some impact.
Basing the PR against the correct MariaDB version
main
branch.The second change (refactoring
buf_read_page()
) depends on earlier changes f27e9c8 and d6aed21, and 11.4 is the earliest maintained branch that contains them.PR quality check