-
-
Notifications
You must be signed in to change notification settings - Fork 266
feat(integration-test): add race condition tests and update API versi… #2575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis PR bumps version metadata to v1.4.7, adds a race-condition integration-test suite with Docker Compose and test scripts, and refactors PostgreSQL storage to propagate and encode snapshot information with RepeatableRead isolation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Reader as DataReader
participant DB as PostgreSQL
participant Token
rect rgba(220,240,255,0.35)
Note over Reader,DB: Snapshot-aware read (new)
end
Client->>Reader: Read request
activate Reader
Reader->>DB: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ
activate DB
DB-->>Reader: transaction started (xid, snapshot)
Reader->>Token: NewToken(xid, snapshot)
Token-->>Reader: Encoded snapshot token (base64 "xid:snapshot" or legacy)
Reader->>DB: SnapshotQuery(..., finalSnapshot)
DB-->>Reader: Visible rows per snapshot
Reader-->>Client: Response + snapshot token
deactivate Reader
deactivate DB
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas needing extra attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/storage/postgres/data_writer.go (1)
214-224: Fix: Exec error is overwritten by Close result; can return nil, nil on failureOn Exec failure you assign err = batchResult.Close() and then return err. If Close returns nil, the function returns nil, nil. Preserve the Exec error and still close the batch.
Apply these diffs.
Write path:
- batchResult := tx.SendBatch(ctx, batch) - for i := 0; i < batch.Len(); i++ { - _, err = batchResult.Exec() - if err != nil { - err = batchResult.Close() - if err != nil { - return nil, err - } - return nil, err - } - } + batchResult := tx.SendBatch(ctx, batch) + for i := 0; i < batch.Len(); i++ { + if _, execErr := batchResult.Exec(); execErr != nil { + _ = batchResult.Close() // best-effort cleanup + return nil, execErr + } + }RunBundle path:
- batchResult := tx.SendBatch(ctx, batch) - for i := 0; i < batch.Len(); i++ { - _, err = batchResult.Exec() - if err != nil { - err = batchResult.Close() - if err != nil { - return nil, err - } - return nil, err - } - } + batchResult := tx.SendBatch(ctx, batch) + for i := 0; i < batch.Len(); i++ { + if _, execErr := batchResult.Exec(); execErr != nil { + _ = batchResult.Close() // best-effort cleanup + return nil, execErr + } + }Current Close-after-success block (Lines 226-229 and 371-374) can remain to propagate Close errors.
Also applies to: 359-369
♻️ Duplicate comments (1)
integration-test/race_condition/test_race_condition_parallel.sh (1)
74-98: Same output interleaving issue in remaining parallel checks.The same output interleaving problem from lines 62-72 also affects the parallel checks after 1s wait and 5s wait. Apply similar fixes to all three sections to properly capture and display results.
🧹 Nitpick comments (13)
internal/info.go (1)
26-26: Core Version constant bumped to v1.4.7 — verified OK.Verification confirms no leftover v1.4.6 references and public API docs correctly reflect v1.4.7. For future releases, consider driving doc versions from a single source (or CI inject) to prevent drift across documentation.
integration-test/race_condition/docker-compose-test.yml (1)
2-8: Document test-only credentials.The hardcoded credentials (pg/pg) are appropriate for a local test environment. Consider adding a comment to clarify these are test-only and should not be used in production.
Apply this diff to add clarifying comments:
+ # Test environment only - uses simple credentials for local testing postgres: image: postgres:17-alpine environment:integration-test/race_condition/README.md (1)
14-22: Consider adding expected outcomes and prerequisites.The test instructions are clear, but could be enhanced with:
- Expected test results (what success looks like)
- Note about making scripts executable (
chmod +x *.sh)- Prerequisites (Docker, Docker Compose, curl, jq)
Consider adding a section like:
### Prerequisites - Docker and Docker Compose - `curl` and `jq` installed ### Expected Results Both tests should demonstrate that permissions are correctly isolated after concurrent deletions. The `read` permission should remain granted while `comment` and `edit` should be denied after deletion. **Note:** Make scripts executable before running: `chmod +x *.sh`integration-test/race_condition/test_race_condition.sh (3)
1-7: Consider adding error handling and validation.The script correctly uses
set -ebut doesn't validate API responses. Consider capturing and checking HTTP status codes or response content to ensure operations succeed.Example enhancement:
#!/bin/bash set -e # Write schema RESPONSE=$(curl -sS -w '\n%{http_code}' 'http://localhost:3477/v1/tenants/t1/schemas/write' \ -H "Content-Type: application/json" \ -d "$(cat "schema.perm" | jq -Rs '{schema: . }')") HTTP_CODE=$(echo "$RESPONSE" | tail -n1) if [ "$HTTP_CODE" -ne 200 ]; then echo "Schema write failed with HTTP $HTTP_CODE" exit 1 fi
22-33: Clarify the metadata depth parameter.The
depth: 20parameter is used but not documented. Consider adding a comment explaining why this depth is chosen for the race condition test, or confirm if this is necessary for the test scenario.function check_access() { ACTION=$1 - + # depth: 20 ensures deep permission resolution across potential relation chains curl -sS -w '\n' 'http://localhost:3477/v1/tenants/t1/permissions/check' \
50-76: Test lacks programmatic validation.The test outputs results but doesn't validate them programmatically. Consider adding assertions to verify:
- Read permission remains granted throughout
- Comment and edit permissions are denied after deletion
- Results are stable after the wait period
Example validation:
# After DELETE checks READ_RESULT=$(check_access 'read' | jq -r '.can') COMMENT_RESULT=$(check_access 'comment' | jq -r '.can') EDIT_RESULT=$(check_access 'edit' | jq -r '.can') if [ "$READ_RESULT" != "CHECK_RESULT_ALLOWED" ]; then echo "ERROR: Read permission should still be granted" exit 1 fi if [ "$COMMENT_RESULT" = "CHECK_RESULT_ALLOWED" ]; then echo "ERROR: Comment permission should be denied" exit 1 fiintegration-test/race_condition/test_race_condition_parallel.sh (1)
1-46: Consider extracting shared code to reduce duplication.Lines 1-46 are identical to
test_race_condition.sh. Consider extracting the shared functions and setup into a common file (e.g.,test_common.sh) that both scripts can source.Example structure:
test_common.sh:
#!/bin/bash BASE_URL="http://localhost:3477" TENANT="t1" function check_access() { ... } function drop_access() { ... } function setup_schema() { ... } function grant_permissions() { ... }Then in both test scripts:
#!/bin/bash set -e source "$(dirname "$0")/test_common.sh" setup_schema grant_permissions # ... test-specific logicinternal/storage/postgres/data_reader.go (2)
36-38: RepeatableRead + snapshot-aware filters LGTM; reduce repeated type assertions
- Isolation change to RepeatableRead is appropriate for stable reads.
- Passing both xid and snapshot into utils.SnapshotQuery is consistent with the new API.
- Minor: store st as a concrete snapshot.Token once to avoid repeated assertions.
Example:
- var st token.SnapToken - st, err = snapshot.EncodedToken{Value: snap}.Decode() + var st token.SnapToken + st, err = snapshot.EncodedToken{Value: snap}.Decode() if err != nil { /* ... */ } - builder = utils.SnapshotQuery(builder, st.(snapshot.Token).Value.Uint, st.(snapshot.Token).Snapshot) + tok := st.(snapshot.Token) + builder = utils.SnapshotQuery(builder, tok.Value.Uint, tok.Snapshot)Also applies to: 60-60, 135-135, 223-223, 281-281, 369-369, 482-482
559-575: Add supporting index for HeadSnapshot queryCreate the composite index to avoid seq scans and reduce latency under load:
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_transactions_tenant_id_id ON transactions(tenant_id, id DESC);Add a reversible migration.
Also applies to: 586-587
internal/storage/postgres/utils/common_test.go (1)
28-40: Good coverage; add case for xid > xmaxPlease add a test where xid > xmax to validate createFinalSnapshot adjusts xmax and xip_list appropriately.
Example:
It("Case 4: New mode with snapshot (xid > xmax)", func() { sl := squirrel.Select("column").From("table") revision := uint64(50) snapshot := "30:40:35" // xid(50) > xmax(40) -> expect adjusted final snapshot query := utils.SnapshotQuery(sl, revision, snapshot) sql, args, err := query.ToSql() Expect(err).ShouldNot(HaveOccurred()) Expect(sql).Should(Equal( "SELECT column FROM table WHERE (pg_visible_in_snapshot(created_tx_id, ?) = true OR created_tx_id = ?::xid8) AND ((pg_visible_in_snapshot(expired_tx_id, ?) = false OR expired_tx_id = ?::xid8) AND expired_tx_id <> ?::xid8)", )) Expect(args[0]).ShouldNot(Equal(snapshot)) // final snapshot should change Expect(args).Should(HaveLen(5)) })Also applies to: 42-55, 56-69
internal/storage/postgres/snapshot/token_test.go (1)
41-54: Extend tests: comparator with snapshots and invalid new-format
- Verify Eg/Gt/Lt ignore Snapshot by adding cases where Snapshot differs but Value equals.
- Add a failing decode case for base64-valid but invalid format (no colon), e.g., base64("123") -> expect error.
Happy to draft these tests if you want.
Also applies to: 116-127, 136-151, 153-165
internal/storage/postgres/snapshot/token.go (1)
37-55: Token format handling looks solid; tighten errors and doc comment
- Return a typed sentinel error for invalid token formats to simplify caller handling, e.g., var ErrInvalidTokenFormat = errors.New("invalid token format"), and use it instead of fmt.Errorf.
- Fix the comment on EncodedToken.String() (it returns the encoded value, doesn’t decode).
- Optional: consider prefixing new formats (e.g., "v2:xid:snapshot") to future‑proof beyond the len==8 heuristic.
Also applies to: 75-119, 121-124
internal/storage/postgres/utils/common.go (1)
121-121: Cast snapshot parameter to pg_snapshot for type safety.Passing a raw text parameter relies on implicit casts. Be explicit to avoid version/driver quirks.
Apply:
- squirrel.Expr("pg_visible_in_snapshot(created_tx_id, ?) = true", finalSnapshot), + squirrel.Expr("pg_visible_in_snapshot(created_tx_id, ?::pg_snapshot) = true", finalSnapshot), @@ - squirrel.Expr("pg_visible_in_snapshot(expired_tx_id, ?) = false", finalSnapshot), + squirrel.Expr("pg_visible_in_snapshot(expired_tx_id, ?::pg_snapshot) = false", finalSnapshot),Also applies to: 128-128
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pkg/pb/base/v1/openapi.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (15)
docs/api-reference/apidocs.swagger.json(1 hunks)docs/api-reference/openapiv2/apidocs.swagger.json(1 hunks)integration-test/race_condition/README.md(1 hunks)integration-test/race_condition/docker-compose-test.yml(1 hunks)integration-test/race_condition/schema.perm(1 hunks)integration-test/race_condition/test_race_condition.sh(1 hunks)integration-test/race_condition/test_race_condition_parallel.sh(1 hunks)internal/info.go(1 hunks)internal/storage/postgres/data_reader.go(9 hunks)internal/storage/postgres/data_writer.go(6 hunks)internal/storage/postgres/snapshot/token.go(4 hunks)internal/storage/postgres/snapshot/token_test.go(7 hunks)internal/storage/postgres/utils/common.go(2 hunks)internal/storage/postgres/utils/common_test.go(1 hunks)proto/base/v1/openapi.proto(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
internal/storage/postgres/utils/common_test.go (1)
internal/storage/postgres/utils/common.go (2)
SnapshotQuery(91-136)ActiveRecordTxnID(32-32)
internal/storage/postgres/data_reader.go (2)
internal/storage/postgres/utils/common.go (1)
SnapshotQuery(91-136)internal/storage/postgres/snapshot/token.go (1)
Token(18-21)
internal/storage/postgres/snapshot/token.go (1)
pkg/database/postgres/xid8.go (1)
XID8(10-10)
internal/storage/postgres/snapshot/token_test.go (2)
internal/storage/postgres/snapshot/token.go (3)
NewToken(29-34)EncodedToken(23-25)Token(18-21)pkg/database/postgres/xid8.go (1)
XID8(10-10)
internal/storage/postgres/data_writer.go (3)
pkg/database/postgres/xid8.go (1)
XID8(10-10)internal/storage/postgres/utils/common.go (1)
TransactionTemplate(25-25)internal/storage/postgres/snapshot/token.go (1)
NewToken(29-34)
🪛 Checkov (3.2.334)
integration-test/race_condition/docker-compose-test.yml
[medium] 14-15: Basic Auth Credentials
(CKV_SECRET_4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test with Coverage
- GitHub Check: Analyze (go)
- GitHub Check: Scan Container for Vulnerabilities
- GitHub Check: Test with Coverage
🔇 Additional comments (9)
docs/api-reference/openapiv2/apidocs.swagger.json (1)
6-6: Version bump looks good.OpenAPI info.version updated to v1.4.7; no other changes in this doc block.
docs/api-reference/apidocs.swagger.json (1)
6-6: Version bump confirmed.Doc version is now v1.4.7; consistent with the PR intent.
proto/base/v1/openapi.proto (1)
12-12: Proto OpenAPI info.version updated correctly.Keeps docs and generated artifacts aligned for v1.4.7.
integration-test/race_condition/docker-compose-test.yml (1)
10-17: LGTM - Test environment configuration is appropriate.The configuration correctly sets up the Permify service with PostgreSQL backend. The static analysis warning about credentials in the URI is a false positive in this context—embedding credentials in connection strings is standard practice for containerized test environments where the database is not exposed externally.
integration-test/race_condition/schema.perm (1)
1-11: LGTM - Clean test schema.The schema is appropriately simple for testing race conditions. The direct mapping between relations and permissions makes it easy to verify the behavior of concurrent deletions.
integration-test/race_condition/test_race_condition.sh (1)
9-19: LGTM - Permission grants are correct.The initial permission setup properly grants all three permissions (read, comment, edit) that will be tested in subsequent operations.
internal/storage/postgres/data_writer.go (1)
182-186: Snapshot propagation LGTMScanning xid+snapshot and returning snapshot-aware tokens looks correct and consistent with utils.TransactionTemplate and snapshot.NewToken signature.
Also applies to: 237-237, 257-261, 313-313, 335-339, 380-380
internal/storage/postgres/utils/common.go (2)
91-115: Review comment is incorrect—all SnapshotQuery call sites already pass the required third argument.All 6 production call sites in
data_reader.goand all 3 test call sites incommon_test.gohave been updated to passsnapshotValue. No compilation failures or missing arguments exist.Likely an incorrect or invalid review comment.
25-25: All TransactionTemplate call sites already correctly handle the two-column return.All three usages in
data_writer.go(lines 185, 260, 338) properly scan into both&xidand&snapshotValue, matching theRETURNING id, snapshotclause. No updates needed.
…on to v1.4.7
Summary by CodeRabbit
Documentation
Chores
Tests