-
Notifications
You must be signed in to change notification settings - Fork 62
chore: implement TlsSocket::AsyncWriteSome #416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kostasrim
commented
Jun 5, 2025
- implement AsyncWriteSome in TlsSocket
util/tls/tls_socket.cc
Outdated
return; | ||
} | ||
|
||
// NEED_WRITE state needs to "loop". We have a different action depending on the role |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug IMO. Even for AsyncReadSome
because if the engine returned NEED_WRITE we would just return without reading anything. Since this function is common, I added a
ROLEindicating if this flow was called from
AsyncWriteSome or AsyncReadSome` and act accordingly.
What's more here, is that the flows added here are similar to the first step in AsyncRead/WriteSome
without the actual allocation. I should refactor those and extract the common paths for simplicity. I will follow up on this on a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand. Can you add the scenario that reproduces the bug to the unit test?
why if (should_read) {
check does not cover the case where we need to read after we finished writing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. So the flow goes like this:
- We call
AsyncReadSome()
- We allocate the async request and call HandleOpAsync
- HandleOpAsync has 3 cases: a) when engine NEEDS_READ_AND_MAYBE_WRITE b) when engine NEED_WRITE and c) when EOF_STREAM is reached
- For a) we set the
should_read
variable so when we eventually callCompleteAsyncWrite
above we will continue with anupstream read
by callingStartUpstreamRead
- For b) however, we don't set
should_read
because theengine state dictates that we only need to write
. So when we eventually callCompleteAsyncWrite
, we will go out of scope and the operation won't do anything! We read 0 bytes and wedid not complete the actual AsyncRead request
.
So the bug here is that we don't poll the engine again to decide what to do next. Maybe we need to write more bytes until we actually reach a state when we can start an upstream read, and only then we can return/complete the async operation.
I need to think how I can add a test for this because it requires the AsyncRead
to get NEED_WRITE
instead of NEEDS_READ_AND_MAYBE_WRITE
.
To summarize:
AsyncRead -> NEEDS_WRITE -> CompleteAsyncWrite -> does nothing because should_read is never set -> we read 0 bytes and never completed the async request issued by the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this explanation can be copied as is to lines 656-658 instead of
// We are done with the write, check if we also need to read because we are
// in NEED_READ_AND_MAYBE_WRITE state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree but the most important thing for me is to have a test for this flow as you suggested. I think one the follow up tests I added here https://github.com/romange/helio/pull/424/files#diff-c5bdae6ad50dd372157bccd1053dd768c6dde7468ff153a939c9410ef22c81dbR446 does that but I need to double check.
Either way +1 for the comment but I do want to assert that functionality is correct and we don't break anything in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will get back to you on this for the test. //TBA
util/tls/tls_socket.cc
Outdated
|
||
const int op_val = push_res->engine_opcode; | ||
auto req = AsyncReq{this, std::move(cb), v, len, op_val, AsyncReq::WRITER}; | ||
req.engine_written = push_res->written; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently we can be in a partial state with engine_written > 0
&& op_val < 0
. We cover both
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #416 +/- ##
==========================================
+ Coverage 77.60% 78.16% +0.55%
==========================================
Files 103 112 +9
Lines 7824 9846 +2022
==========================================
+ Hits 6072 7696 +1624
- Misses 1752 2150 +398 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@@ -660,7 +661,40 @@ void TlsSocket::AsyncReq::CompleteAsyncWrite(io::Result<size_t> write_result) { | |||
if (should_read) { | |||
should_read = false; | |||
StartUpstreamRead(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another corner case I need to take care of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to add a TODO then, so you will not forget?
@romange feel free to write some initial thoughts but I would like to go over this one more time -- need to jump back to df specific work for now but I pushed in case you want to add early comments |
9e235d3
to
20abeae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not everything is clear to me - gave some comments.
Also tests are missing
util/tls/tls_socket.cc
Outdated
} | ||
// We need another Async operation | ||
op_val = engine_read; | ||
op_val = owner->engine_->Read(reinterpret_cast<uint8_t*>(vec->iov_base), vec->iov_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a full blown class now. Consider adding _
to variables that should be private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am planning to properly refactor it into a proper class. Will follow up on a separate PR if you are OK with this -- I think I commented this somewhere.
util/tls/tls_socket.cc
Outdated
return; | ||
} | ||
|
||
// NEED_WRITE state needs to "loop". We have a different action depending on the role |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand. Can you add the scenario that reproduces the bug to the unit test?
why if (should_read) {
check does not cover the case where we need to read after we finished writing?
I already added tests with https://github.com/romange/helio/blob/master/util/tls/tls_socket_test.cc#L358 And I also add one for partial writes in a follow up PR: |
util/tls/tls_socket.h
Outdated
@@ -126,8 +126,12 @@ class TlsSocket final : public FiberSocketBase { | |||
uint32_t len; | |||
Engine::OpResult op_val; | |||
|
|||
iovec scratch_iovec; | |||
enum Role : std::uint8_t { READER, WRITER }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add _ to all the fields and make them private if possible.
// the internall BIO buffers are not yet flushed so the engine->Read() will return NEED_WRITE | ||
// such that the protocol renegotiation can kick in. Even though this scenario seems easy to | ||
// simulate it does not reproduce NEED_WRITE and for now I use this function to simulate it. | ||
void __DebugForceNeedWriteOnAsyncRead(const iovec* v, uint32_t len, io::AsyncProgressCb cb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this is not great but I was not able to reproduce the exact scenario by manually controlling all the steps above (even with the help of perplexity/chatgpt). There are other ways to do it according to the AI that require manual changes to the underline BIO's which we don't expose from the engine (and I did not want to take that route by exposing them).
I wanted to have a test such that we at least don't break the functionality of AsyncReadSome
when Engine::NEED_WRITE
which for now this function satisfies (it kinda acts as MOCK
really).
I am planning to remove this in the future and provide an actual test case that reproduces the exact scenario described above.
For now I keep it as is as I don't want to spent more time on this (I am slightly lying here -- I do want to spent time and understand exactly how to reproduce this but there are other priorities atm).
util/tls/tls_socket.h
Outdated
|
||
size_t engine_written = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
member naming is still not consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
member naming is still not consistent
8c98de7
to
2daa803
Compare
Signed-off-by: kostas <kostas@dragonflydb.io>