+
Skip to content

chore: implement TlsSocket::AsyncWriteSome #416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 10, 2025
Merged

Conversation

kostasrim
Copy link
Collaborator

  • implement AsyncWriteSome in TlsSocket

return;
}

// NEED_WRITE state needs to "loop". We have a different action depending on the role
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug IMO. Even for AsyncReadSome because if the engine returned NEED_WRITE we would just return without reading anything. Since this function is common, I added aROLEindicating if this flow was called fromAsyncWriteSome or AsyncReadSome` and act accordingly.

What's more here, is that the flows added here are similar to the first step in AsyncRead/WriteSome without the actual allocation. I should refactor those and extract the common paths for simplicity. I will follow up on this on a separate PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand. Can you add the scenario that reproduces the bug to the unit test?
why if (should_read) { check does not cover the case where we need to read after we finished writing?

Copy link
Collaborator Author

@kostasrim kostasrim Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. So the flow goes like this:

  1. We call AsyncReadSome()
  2. We allocate the async request and call HandleOpAsync
  3. HandleOpAsync has 3 cases: a) when engine NEEDS_READ_AND_MAYBE_WRITE b) when engine NEED_WRITE and c) when EOF_STREAM is reached
  4. For a) we set the should_read variable so when we eventually call CompleteAsyncWrite above we will continue with an upstream read by calling StartUpstreamRead
  5. For b) however, we don't set should_read because the engine state dictates that we only need to write. So when we eventually call CompleteAsyncWrite, we will go out of scope and the operation won't do anything! We read 0 bytes and we did not complete the actual AsyncRead request.

So the bug here is that we don't poll the engine again to decide what to do next. Maybe we need to write more bytes until we actually reach a state when we can start an upstream read, and only then we can return/complete the async operation.

I need to think how I can add a test for this because it requires the AsyncRead to get NEED_WRITE instead of NEEDS_READ_AND_MAYBE_WRITE.

To summarize:

AsyncRead -> NEEDS_WRITE -> CompleteAsyncWrite -> does nothing because should_read is never set -> we read 0 bytes and never completed the async request issued by the caller.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this explanation can be copied as is to lines 656-658 instead of

 // We are done with the write, check if we also need to read because we are
  // in NEED_READ_AND_MAYBE_WRITE state

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree but the most important thing for me is to have a test for this flow as you suggested. I think one the follow up tests I added here https://github.com/romange/helio/pull/424/files#diff-c5bdae6ad50dd372157bccd1053dd768c6dde7468ff153a939c9410ef22c81dbR446 does that but I need to double check.

Either way +1 for the comment but I do want to assert that functionality is correct and we don't break anything in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will get back to you on this for the test. //TBA


const int op_val = push_res->engine_opcode;
auto req = AsyncReq{this, std::move(cb), v, len, op_val, AsyncReq::WRITER};
req.engine_written = push_res->written;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently we can be in a partial state with engine_written > 0 && op_val < 0. We cover both

@codecov-commenter
Copy link

codecov-commenter commented Jun 5, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 74.80916% with 33 lines in your changes missing coverage. Please review.

Project coverage is 78.16%. Comparing base (d30de86) to head (c7eeaa7).
Report is 187 commits behind head on master.

Files with missing lines Patch % Lines
util/tls/tls_socket.cc 62.92% 33 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #416      +/-   ##
==========================================
+ Coverage   77.60%   78.16%   +0.55%     
==========================================
  Files         103      112       +9     
  Lines        7824     9846    +2022     
==========================================
+ Hits         6072     7696    +1624     
- Misses       1752     2150     +398     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -660,7 +661,40 @@ void TlsSocket::AsyncReq::CompleteAsyncWrite(io::Result<size_t> write_result) {
if (should_read) {
should_read = false;
StartUpstreamRead();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another corner case I need to take care of

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to add a TODO then, so you will not forget?

@kostasrim
Copy link
Collaborator Author

@romange feel free to write some initial thoughts but I would like to go over this one more time -- need to jump back to df specific work for now but I pushed in case you want to add early comments

@kostasrim kostasrim self-assigned this Jun 5, 2025
@romange romange force-pushed the master branch 2 times, most recently from 9e235d3 to 20abeae Compare June 8, 2025 06:09
@kostasrim kostasrim requested a review from romange June 18, 2025 13:37
Copy link
Owner

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not everything is clear to me - gave some comments.
Also tests are missing

}
// We need another Async operation
op_val = engine_read;
op_val = owner->engine_->Read(reinterpret_cast<uint8_t*>(vec->iov_base), vec->iov_len);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a full blown class now. Consider adding _ to variables that should be private.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to properly refactor it into a proper class. Will follow up on a separate PR if you are OK with this -- I think I commented this somewhere.

return;
}

// NEED_WRITE state needs to "loop". We have a different action depending on the role
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand. Can you add the scenario that reproduces the bug to the unit test?
why if (should_read) { check does not cover the case where we need to read after we finished writing?

@kostasrim
Copy link
Collaborator Author

Not everything is clear to me - gave some comments. Also tests are missing

I already added tests with AsyncWrite but they used the synchronous interface so we already test the flow.

https://github.com/romange/helio/blob/master/util/tls/tls_socket_test.cc#L358

And I also add one for partial writes in a follow up PR:

https://github.com/romange/helio/pull/424/files#diff-c5bdae6ad50dd372157bccd1053dd768c6dde7468ff153a939c9410ef22c81dbR391

@@ -126,8 +126,12 @@ class TlsSocket final : public FiberSocketBase {
uint32_t len;
Engine::OpResult op_val;

iovec scratch_iovec;
enum Role : std::uint8_t { READER, WRITER };
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add _ to all the fields and make them private if possible.

// the internall BIO buffers are not yet flushed so the engine->Read() will return NEED_WRITE
// such that the protocol renegotiation can kick in. Even though this scenario seems easy to
// simulate it does not reproduce NEED_WRITE and for now I use this function to simulate it.
void __DebugForceNeedWriteOnAsyncRead(const iovec* v, uint32_t len, io::AsyncProgressCb cb);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this is not great but I was not able to reproduce the exact scenario by manually controlling all the steps above (even with the help of perplexity/chatgpt). There are other ways to do it according to the AI that require manual changes to the underline BIO's which we don't expose from the engine (and I did not want to take that route by exposing them).

I wanted to have a test such that we at least don't break the functionality of AsyncReadSome when Engine::NEED_WRITE which for now this function satisfies (it kinda acts as MOCK really).

I am planning to remove this in the future and provide an actual test case that reproduces the exact scenario described above.

For now I keep it as is as I don't want to spent more time on this (I am slightly lying here -- I do want to spent time and understand exactly how to reproduce this but there are other priorities atm).


size_t engine_written = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

member naming is still not consistent

romange
romange previously approved these changes Jul 8, 2025
Copy link
Owner

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

member naming is still not consistent

romange
romange previously approved these changes Jul 10, 2025
@kostasrim kostasrim enabled auto-merge (squash) July 10, 2025 07:07
@kostasrim kostasrim disabled auto-merge July 10, 2025 07:21
@kostasrim kostasrim force-pushed the kpr6 branch 2 times, most recently from 8c98de7 to 2daa803 Compare July 10, 2025 08:08
Signed-off-by: kostas <kostas@dragonflydb.io>
@kostasrim kostasrim requested a review from romange July 10, 2025 08:31
@kostasrim kostasrim merged commit 4b92c51 into romange:master Jul 10, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载