这是indexloc提供的服务,不要输入任何密码
Skip to content

feat(parallel random reads): Changes to enable parallel random read handling in random reader #3577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

abhishek10004
Copy link
Collaborator

@abhishek10004 abhishek10004 commented Jul 23, 2025

This pull request introduces changes to enable parallel read handling in the randomReader. The core idea is to use a mutex to serialize access for RangeReader while allowing MultiRangeReader operations to proceed in parallel.

The main approach is to determine read type without taking any locks and then allow parallel processing for MRD. For range reader, we would take a lock to synchronize access. Since reads wait on a lock for range reader, the reader type computation is done again before actually reading via range reader to ensure that the read should not be served by MRD.

@abhishek10004 abhishek10004 added execute-perf-test Execute performance test in PR execute-integration-tests Run only integration tests labels Jul 23, 2025
Copy link

codecov bot commented Jul 23, 2025

Codecov Report

Attention: Patch coverage is 89.01099% with 10 lines in your changes missing coverage. Please review.

Project coverage is 79.72%. Comparing base (2636fa0) to head (000af01).
Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
internal/gcsx/random_reader.go 89.01% 6 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3577      +/-   ##
==========================================
+ Coverage   79.69%   79.72%   +0.03%     
==========================================
  Files         144      144              
  Lines       19139    19241     +102     
==========================================
+ Hits        15252    15340      +88     
- Misses       3366     3371       +5     
- Partials      521      530       +9     
Flag Coverage Δ
unittests 79.72% <89.01%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@abhishek10004
Copy link
Collaborator Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces changes to enable parallel read handling in the randomReader. The core idea is to use a mutex to serialize access for RangeReader while allowing MultiRangeReader operations to proceed in parallel. The main area for improvement is the locking strategy within the ReadAt function, which uses a double-checked locking pattern that is complex and introduces a potential race condition. I've provided a detailed comment with a suggestion to simplify the locking logic, making it more robust and maintainable.

@abhishek10004
Copy link
Collaborator Author

/gemini summary

Copy link

Summary of Changes

This pull request introduces significant changes to the randomReader component, primarily to enable and optimize parallel read handling from Google Cloud Storage. It refines the logic for identifying read patterns (sequential vs. random) and dynamically selects the most efficient GCS API for data retrieval, while also adding necessary concurrency controls to ensure thread safety. The goal is to enhance the performance of concurrent read operations.

Highlights

  • Parallel Read Handling: The core logic for handling reads in randomReader has been refactored to better support parallel read operations, aiming to improve performance for concurrent access to GCS objects.
  • Dynamic Read Strategy: The system now dynamically determines the optimal read strategy (sequential vs. random) and the appropriate GCS API (RangeReader vs. MultiRangeReader) based on observed read patterns, including the number of seeks and average read size. This is managed by new isSeekNeeded, getReadInfo, and getEndOffset functions.
  • Concurrency Control: A sync.Mutex has been introduced within the randomReader to synchronize access to the internal reader state, ensuring thread safety during critical sections, particularly when determining the read type and managing the existing GCS reader.
  • Input Validation: Added a check in the ReadAt function to explicitly handle and return an error for negative offset values, improving robustness.
  • Test Suite Enhancements: The test suite has been updated with comprehensive new test cases for the refined getReadInfo and getEndOffset logic, ensuring the new read strategy determination works as expected across various scenarios.
Changelog
  • internal/gcsx/random_reader.go
    • Imported sync package for concurrency primitives.
    • Added mu sync.Mutex to randomReader struct for synchronizing access to shared state.
    • Introduced readInfo struct to encapsulate read type and expected offset.
    • Added validation for negative offsets in the ReadAt function.
    • Refactored ReadAt to use the new readInfo struct and mutex for conditional locking, allowing MultiRangeReader calls to proceed without holding the lock.
    • Implemented isSeekNeeded helper function to determine if a read should increment the seek count.
    • Rewrote getReadInfo to dynamically determine the read strategy (sequential or random) based on current offset, expected offset, number of seeks, and average read bytes.
    • Extracted logic for calculating the end offset for GCS queries into a new getEndOffset function.
    • Simplified the readerType function signature by removing start and end parameters, as the decision now relies on the determined readType and bucket type.
  • internal/gcsx/random_reader_stretchr_test.go
    • Removed unused fmt import.
    • Refactored and renamed Test_ReadInfo to Test_GetReadInfo, adding extensive test cases for the new getReadInfo logic, covering various read patterns and state transitions.
    • Removed Test_ReadInfo_Sequential and Test_ReadInfo_Random as their functionality is now covered by Test_GetReadInfo.
    • Updated Test_ReaderType to match the simplified function signature.
    • Added a new test function Test_GetEndOffset to verify the correct calculation of read range end offsets.
  • internal/gcsx/random_reader_test.go
    • Added ExpectCall(t.bucket, "BucketType")().WillOnce(Return(t.bucketType)) to several existing test functions to mock the BucketType method call, which is now part of the read type determination logic.
Activity
  • A github-actions[bot] commented that the PR title does not conform to the Conventional Commits specification.
  • A codecov[bot] reported 88.37% patch coverage, noting 10 lines missing coverage in internal/gcsx/random_reader.go.
  • The author (abhishek10004) requested a review and a summary from gemini.
  • A gemini-code-assist[bot] provided a review comment, highlighting a potential race condition and complexity in the double-checked locking pattern within ReadAt and suggesting a simplified locking approach.

@abhishek10004 abhishek10004 changed the title [[Draft PR]] Changes to enable parallel read handling in random reader code. feat (parallel random reads): Changes to enable parallel random read handling in random reader Jul 24, 2025
@abhishek10004 abhishek10004 changed the title feat (parallel random reads): Changes to enable parallel random read handling in random reader feat(parallel random reads): Changes to enable parallel random read handling in random reader Jul 24, 2025
@abhishek10004 abhishek10004 marked this pull request as ready for review July 24, 2025 11:10
@abhishek10004 abhishek10004 requested a review from a team as a code owner July 24, 2025 11:10
@github-actions github-actions bot added the remind-reviewers Auto remind reviewers in attention set for review post 24hrs of inactivity on PR. label Jul 24, 2025
Copy link

/gemini summary
/gemini review

@kislaykishore kislaykishore requested review from a team and meet2mky and removed request for a team July 24, 2025 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
execute-integration-tests Run only integration tests execute-perf-test Execute performance test in PR remind-reviewers Auto remind reviewers in attention set for review post 24hrs of inactivity on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant