feat(integration-test): add race condition tests and update API versi… #2575

tolgaozen · 2025-10-24T23:33:19Z

…on to v1.4.7

Summary by CodeRabbit

Documentation
- Updated API reference and OpenAPI docs to v1.4.7.
Chores
- Version bumped to v1.4.7.
Tests
- Added a race-condition integration test suite with Docker Compose setup and runnable scripts to exercise concurrent permission scenarios.

…on to v1.4.7

coderabbitai · 2025-10-24T23:33:31Z

Walkthrough

This PR bumps version metadata to v1.4.7, adds a race-condition integration-test suite with Docker Compose and test scripts, and refactors PostgreSQL storage to propagate and encode snapshot information with RepeatableRead isolation.

Changes

Cohort / File(s)	Summary
Version Updates (docs & proto) `docs/api-reference/apidocs.swagger.json`, `docs/api-reference/openapiv2/apidocs.swagger.json`, `proto/base/v1/openapi.proto`	Bumped OpenAPI/Swagger version from v1.4.6 → v1.4.7.
Version Update (code) `internal/info.go`	Bumped `Version` constant from "v1.4.6" → "v1.4.7".
Race Condition Integration Tests `integration-test/race_condition/README.md`, `integration-test/race_condition/docker-compose-test.yml`, `integration-test/race_condition/schema.perm`, `integration-test/race_condition/test_race_condition.sh`, `integration-test/race_condition/test_race_condition_parallel.sh`	Added Docker Compose test setup, schema for `user`/`doc` entities and relations, and bash scripts to exercise concurrent permission deletions and checks.
Snapshot Token & Tests `internal/storage/postgres/snapshot/token.go`, `internal/storage/postgres/snapshot/token_test.go`	Added `Snapshot` field to Token; `NewToken` now accepts snapshot string; Encode/Decode support legacy binary xid and new base64 `"xid:snapshot"` format; tests updated for both formats.
Postgres Data Reader `internal/storage/postgres/data_reader.go`	Switched transaction isolation to REPEATABLE READ; propagate snapshot token through read paths; HeadSnapshot now returns id and snapshot.
Postgres Data Writer `internal/storage/postgres/data_writer.go`	Transaction retrieval now scans `xid` and `snapshotValue`; return paths pass snapshot to `NewToken`.
Postgres Snapshot Utilities & Tests `internal/storage/postgres/utils/common.go`, `internal/storage/postgres/utils/common_test.go`	`TransactionTemplate` returns `id, snapshot`; added `createFinalSnapshot`; `SnapshotQuery` signature gains `snapshotValue` and uses snapshot-aware visibility logic; tests extended for legacy and snapshot modes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Reader as DataReader
    participant DB as PostgreSQL
    participant Token

    rect rgba(220,240,255,0.35)
    Note over Reader,DB: Snapshot-aware read (new)
    end

    Client->>Reader: Read request
    activate Reader
    Reader->>DB: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ
    activate DB
    DB-->>Reader: transaction started (xid, snapshot)
    Reader->>Token: NewToken(xid, snapshot)
    Token-->>Reader: Encoded snapshot token (base64 "xid:snapshot" or legacy)
    Reader->>DB: SnapshotQuery(..., finalSnapshot)
    DB-->>Reader: Visible rows per snapshot
    Reader-->>Client: Response + snapshot token
    deactivate Reader
    deactivate DB

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas needing extra attention:

snapshot token Encode/Decode compatibility and base64 edge cases (internal/storage/postgres/snapshot/token.go, tests)
visibility logic in SnapshotQuery and createFinalSnapshot (internal/storage/postgres/utils/common.go)
transaction isolation change impact in data_reader.go (deadlocks/consistency)
end-to-end propagation of snapshot values across reader/writer paths

Possibly related PRs

docs: update API versions to v1.4.0 #2267 — touches the same version metadata files (swagger/openapi and internal version).
docs: update API version to v1.3.8 in swagger files #2226 — similar version-string updates across docs and internal/info.go.
refactor: update JSON marshaling to protojson in Postgres storage functions #2014 — overlaps on version constant changes and touches Postgres storage areas.

Suggested reviewers

tolgaOzen

Poem

🐇 In burrows of code I softly tread,

Tokens stitched with snapshot thread,
Tests that race in Docker light,
Version hopped to v1.4.7 tonight,
May reads be stable, and sleeps be tight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat(integration-test): add race condition tests and update API version to v1.4.7" is partially related to the changeset. It accurately captures two real aspects of the PR: the addition of new race condition integration tests (multiple test files in integration-test/race_condition/) and the version update from v1.4.6 to v1.4.7 across multiple files. However, the title does not emphasize the primary functional changes, which are significant modifications to the PostgreSQL storage layer snapshot handling (in data_reader.go, data_writer.go, snapshot/token.go, and utils/common.go) that represent the core implementation to address race conditions. While the title references a real part of the change rather than being misleading, it focuses on the testing and administrative aspects rather than the main implementation work.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/race-condition

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9021b2d and 31f5528.

📒 Files selected for processing (1)

internal/storage/postgres/utils/common.go (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Scan Container for Vulnerabilities
GitHub Check: Test with Coverage
GitHub Check: Analyze (go)
GitHub Check: Test with Coverage

🔇 Additional comments (2)

internal/storage/postgres/utils/common.go (2)

25-25: LGTM: Snapshot capture added to transaction creation.

The addition of snapshot to the RETURNING clause enables capturing the transaction's snapshot at creation time, which is essential for the race condition handling improvements in this PR.

39-91: Past review issues have been addressed correctly.

The function now properly:

Uses strings.SplitN instead of the non-existent strings.SplitSeq

Uses correct range syntax for _, xip := range

Includes the safety guard for xid <= xmin (lines 70-73)

The snapshot manipulation logic is sound and handles the edge cases appropriately for PostgreSQL MVCC visibility semantics.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/storage/postgres/data_writer.go (1)

214-224: Fix: Exec error is overwritten by Close result; can return nil, nil on failure

On Exec failure you assign err = batchResult.Close() and then return err. If Close returns nil, the function returns nil, nil. Preserve the Exec error and still close the batch.

Apply these diffs.

Write path:

- batchResult := tx.SendBatch(ctx, batch)
- for i := 0; i < batch.Len(); i++ {
-   _, err = batchResult.Exec()
-   if err != nil {
-     err = batchResult.Close()
-     if err != nil {
-       return nil, err
-     }
-     return nil, err
-   }
- }
+ batchResult := tx.SendBatch(ctx, batch)
+ for i := 0; i < batch.Len(); i++ {
+   if _, execErr := batchResult.Exec(); execErr != nil {
+     _ = batchResult.Close() // best-effort cleanup
+     return nil, execErr
+   }
+ }

RunBundle path:

- batchResult := tx.SendBatch(ctx, batch)
- for i := 0; i < batch.Len(); i++ {
-   _, err = batchResult.Exec()
-   if err != nil {
-     err = batchResult.Close()
-     if err != nil {
-       return nil, err
-     }
-     return nil, err
-   }
- }
+ batchResult := tx.SendBatch(ctx, batch)
+ for i := 0; i < batch.Len(); i++ {
+   if _, execErr := batchResult.Exec(); execErr != nil {
+     _ = batchResult.Close() // best-effort cleanup
+     return nil, execErr
+   }
+ }

Current Close-after-success block (Lines 226-229 and 371-374) can remain to propagate Close errors.

Also applies to: 359-369

♻️ Duplicate comments (1)

integration-test/race_condition/test_race_condition_parallel.sh (1)

74-98: Same output interleaving issue in remaining parallel checks.

The same output interleaving problem from lines 62-72 also affects the parallel checks after 1s wait and 5s wait. Apply similar fixes to all three sections to properly capture and display results.

🧹 Nitpick comments (13)

internal/info.go (1)

26-26: Core Version constant bumped to v1.4.7 — verified OK.

Verification confirms no leftover v1.4.6 references and public API docs correctly reflect v1.4.7. For future releases, consider driving doc versions from a single source (or CI inject) to prevent drift across documentation.
integration-test/race_condition/docker-compose-test.yml (1)
2-8: Document test-only credentials.

The hardcoded credentials (pg/pg) are appropriate for a local test environment. Consider adding a comment to clarify these are test-only and should not be used in production.

Apply this diff to add clarifying comments:
+  # Test environment only - uses simple credentials for local testing
   postgres:
     image: postgres:17-alpine
     environment:
integration-test/race_condition/README.md (1)
14-22: Consider adding expected outcomes and prerequisites.

The test instructions are clear, but could be enhanced with:

Expected test results (what success looks like)

Note about making scripts executable (chmod +x *.sh)

Prerequisites (Docker, Docker Compose, curl, jq)

Consider adding a section like:
### Prerequisites
- Docker and Docker Compose
- `curl` and `jq` installed

### Expected Results
Both tests should demonstrate that permissions are correctly isolated after concurrent deletions. The `read` permission should remain granted while `comment` and `edit` should be denied after deletion.

**Note:** Make scripts executable before running: `chmod +x *.sh`
integration-test/race_condition/test_race_condition.sh (3)
1-7: Consider adding error handling and validation.

The script correctly uses set -e but doesn't validate API responses. Consider capturing and checking HTTP status codes or response content to ensure operations succeed.

Example enhancement:
#!/bin/bash
set -e

# Write schema
RESPONSE=$(curl -sS -w '\n%{http_code}' 'http://localhost:3477/v1/tenants/t1/schemas/write' \
     -H "Content-Type: application/json" \
     -d "$(cat "schema.perm" | jq -Rs '{schema: . }')")

HTTP_CODE=$(echo "$RESPONSE" | tail -n1)
if [ "$HTTP_CODE" -ne 200 ]; then
    echo "Schema write failed with HTTP $HTTP_CODE"
    exit 1
fi
22-33: Clarify the metadata depth parameter.

The depth: 20 parameter is used but not documented. Consider adding a comment explaining why this depth is chosen for the race condition test, or confirm if this is necessary for the test scenario.
 function check_access() {
     ACTION=$1
-
+    # depth: 20 ensures deep permission resolution across potential relation chains
     curl -sS -w '\n' 'http://localhost:3477/v1/tenants/t1/permissions/check' \
50-76: Test lacks programmatic validation.

The test outputs results but doesn't validate them programmatically. Consider adding assertions to verify:

Read permission remains granted throughout

Comment and edit permissions are denied after deletion

Results are stable after the wait period

Example validation:
# After DELETE checks
READ_RESULT=$(check_access 'read' | jq -r '.can')
COMMENT_RESULT=$(check_access 'comment' | jq -r '.can')
EDIT_RESULT=$(check_access 'edit' | jq -r '.can')

if [ "$READ_RESULT" != "CHECK_RESULT_ALLOWED" ]; then
    echo "ERROR: Read permission should still be granted"
    exit 1
fi

if [ "$COMMENT_RESULT" = "CHECK_RESULT_ALLOWED" ]; then
    echo "ERROR: Comment permission should be denied"
    exit 1
fi
integration-test/race_condition/test_race_condition_parallel.sh (1)
1-46: Consider extracting shared code to reduce duplication.

Lines 1-46 are identical to test_race_condition.sh. Consider extracting the shared functions and setup into a common file (e.g., test_common.sh) that both scripts can source.

Example structure:

test_common.sh:
#!/bin/bash

BASE_URL="http://localhost:3477"
TENANT="t1"

function check_access() { ... }
function drop_access() { ... }
function setup_schema() { ... }
function grant_permissions() { ... }
Then in both test scripts:
#!/bin/bash
set -e
source "$(dirname "$0")/test_common.sh"

setup_schema
grant_permissions
# ... test-specific logic
internal/storage/postgres/data_reader.go (2)
36-38: RepeatableRead + snapshot-aware filters LGTM; reduce repeated type assertions

Isolation change to RepeatableRead is appropriate for stable reads.

Passing both xid and snapshot into utils.SnapshotQuery is consistent with the new API.

Minor: store st as a concrete snapshot.Token once to avoid repeated assertions.

Example:
- var st token.SnapToken
- st, err = snapshot.EncodedToken{Value: snap}.Decode()
+ var st token.SnapToken
+ st, err = snapshot.EncodedToken{Value: snap}.Decode()
  if err != nil { /* ... */ }
- builder = utils.SnapshotQuery(builder, st.(snapshot.Token).Value.Uint, st.(snapshot.Token).Snapshot)
+ tok := st.(snapshot.Token)
+ builder = utils.SnapshotQuery(builder, tok.Value.Uint, tok.Snapshot)
Also applies to: 60-60, 135-135, 223-223, 281-281, 369-369, 482-482

559-575: Add supporting index for HeadSnapshot query

Create the composite index to avoid seq scans and reduce latency under load:
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_transactions_tenant_id_id
ON transactions(tenant_id, id DESC);
Add a reversible migration.

Also applies to: 586-587
internal/storage/postgres/utils/common_test.go (1)
28-40: Good coverage; add case for xid > xmax

Please add a test where xid > xmax to validate createFinalSnapshot adjusts xmax and xip_list appropriately.

Example:
It("Case 4: New mode with snapshot (xid > xmax)", func() {
  sl := squirrel.Select("column").From("table")
  revision := uint64(50)
  snapshot := "30:40:35" // xid(50) > xmax(40) -> expect adjusted final snapshot
  query := utils.SnapshotQuery(sl, revision, snapshot)
  sql, args, err := query.ToSql()
  Expect(err).ShouldNot(HaveOccurred())
  Expect(sql).Should(Equal(
    "SELECT column FROM table WHERE (pg_visible_in_snapshot(created_tx_id, ?) = true OR created_tx_id = ?::xid8) AND ((pg_visible_in_snapshot(expired_tx_id, ?) = false OR expired_tx_id = ?::xid8) AND expired_tx_id <> ?::xid8)",
  ))
  Expect(args[0]).ShouldNot(Equal(snapshot)) // final snapshot should change
  Expect(args).Should(HaveLen(5))
})
Also applies to: 42-55, 56-69
internal/storage/postgres/snapshot/token_test.go (1)

41-54: Extend tests: comparator with snapshots and invalid new-format

Verify Eg/Gt/Lt ignore Snapshot by adding cases where Snapshot differs but Value equals.

Add a failing decode case for base64-valid but invalid format (no colon), e.g., base64("123") -> expect error.

Happy to draft these tests if you want.

Also applies to: 116-127, 136-151, 153-165

internal/storage/postgres/snapshot/token.go (1)

37-55: Token format handling looks solid; tighten errors and doc comment

Return a typed sentinel error for invalid token formats to simplify caller handling, e.g., var ErrInvalidTokenFormat = errors.New("invalid token format"), and use it instead of fmt.Errorf.

Fix the comment on EncodedToken.String() (it returns the encoded value, doesn’t decode).

Optional: consider prefixing new formats (e.g., "v2:xid:snapshot") to future‑proof beyond the len==8 heuristic.

Also applies to: 75-119, 121-124
internal/storage/postgres/utils/common.go (1)
121-121: Cast snapshot parameter to pg_snapshot for type safety.

Passing a raw text parameter relies on implicit casts. Be explicit to avoid version/driver quirks.

Apply:
-    squirrel.Expr("pg_visible_in_snapshot(created_tx_id, ?) = true", finalSnapshot),
+    squirrel.Expr("pg_visible_in_snapshot(created_tx_id, ?::pg_snapshot) = true", finalSnapshot),
@@
-      squirrel.Expr("pg_visible_in_snapshot(expired_tx_id, ?) = false", finalSnapshot),
+      squirrel.Expr("pg_visible_in_snapshot(expired_tx_id, ?::pg_snapshot) = false", finalSnapshot),
Also applies to: 128-128

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00bea4c and 9021b2d.

⛔ Files ignored due to path filters (1)

pkg/pb/base/v1/openapi.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (15)

docs/api-reference/apidocs.swagger.json (1 hunks)
docs/api-reference/openapiv2/apidocs.swagger.json (1 hunks)
integration-test/race_condition/README.md (1 hunks)
integration-test/race_condition/docker-compose-test.yml (1 hunks)
integration-test/race_condition/schema.perm (1 hunks)
integration-test/race_condition/test_race_condition.sh (1 hunks)
integration-test/race_condition/test_race_condition_parallel.sh (1 hunks)
internal/info.go (1 hunks)
internal/storage/postgres/data_reader.go (9 hunks)
internal/storage/postgres/data_writer.go (6 hunks)
internal/storage/postgres/snapshot/token.go (4 hunks)
internal/storage/postgres/snapshot/token_test.go (7 hunks)
internal/storage/postgres/utils/common.go (2 hunks)
internal/storage/postgres/utils/common_test.go (1 hunks)
proto/base/v1/openapi.proto (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (5)

internal/storage/postgres/utils/common_test.go (1)

internal/storage/postgres/utils/common.go (2)

SnapshotQuery (91-136)

ActiveRecordTxnID (32-32)

internal/storage/postgres/data_reader.go (2)

internal/storage/postgres/utils/common.go (1)

SnapshotQuery (91-136)

internal/storage/postgres/snapshot/token.go (1)

Token (18-21)

internal/storage/postgres/snapshot/token.go (1)

pkg/database/postgres/xid8.go (1)

XID8 (10-10)

internal/storage/postgres/snapshot/token_test.go (2)

internal/storage/postgres/snapshot/token.go (3)

NewToken (29-34)

EncodedToken (23-25)

Token (18-21)

pkg/database/postgres/xid8.go (1)

XID8 (10-10)

internal/storage/postgres/data_writer.go (3)

pkg/database/postgres/xid8.go (1)

XID8 (10-10)

internal/storage/postgres/utils/common.go (1)

TransactionTemplate (25-25)

internal/storage/postgres/snapshot/token.go (1)

NewToken (29-34)

🪛 Checkov (3.2.334)

integration-test/race_condition/docker-compose-test.yml

[medium] 14-15: Basic Auth Credentials

(CKV_SECRET_4)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Test with Coverage
GitHub Check: Analyze (go)
GitHub Check: Scan Container for Vulnerabilities
GitHub Check: Test with Coverage

🔇 Additional comments (9)

docs/api-reference/openapiv2/apidocs.swagger.json (1)

6-6: Version bump looks good.

OpenAPI info.version updated to v1.4.7; no other changes in this doc block.

docs/api-reference/apidocs.swagger.json (1)

6-6: Version bump confirmed.

Doc version is now v1.4.7; consistent with the PR intent.

proto/base/v1/openapi.proto (1)

12-12: Proto OpenAPI info.version updated correctly.

Keeps docs and generated artifacts aligned for v1.4.7.

integration-test/race_condition/docker-compose-test.yml (1)

10-17: LGTM - Test environment configuration is appropriate.

The configuration correctly sets up the Permify service with PostgreSQL backend. The static analysis warning about credentials in the URI is a false positive in this context—embedding credentials in connection strings is standard practice for containerized test environments where the database is not exposed externally.

integration-test/race_condition/schema.perm (1)

1-11: LGTM - Clean test schema.

The schema is appropriately simple for testing race conditions. The direct mapping between relations and permissions makes it easy to verify the behavior of concurrent deletions.

integration-test/race_condition/test_race_condition.sh (1)

9-19: LGTM - Permission grants are correct.

The initial permission setup properly grants all three permissions (read, comment, edit) that will be tested in subsequent operations.

internal/storage/postgres/data_writer.go (1)

182-186: Snapshot propagation LGTM

Scanning xid+snapshot and returning snapshot-aware tokens looks correct and consistent with utils.TransactionTemplate and snapshot.NewToken signature.

Also applies to: 237-237, 257-261, 313-313, 335-339, 380-380

internal/storage/postgres/utils/common.go (2)

91-115: Review comment is incorrect—all SnapshotQuery call sites already pass the required third argument.

All 6 production call sites in data_reader.go and all 3 test call sites in common_test.go have been updated to pass snapshotValue. No compilation failures or missing arguments exist.

Likely an incorrect or invalid review comment.

25-25: All TransactionTemplate call sites already correctly handle the two-column return.

All three usages in data_writer.go (lines 185, 260, 338) properly scan into both &xid and &snapshotValue, matching the RETURNING id, snapshot clause. No updates needed.

integration-test/race_condition/test_race_condition_parallel.sh

internal/storage/postgres/utils/common.go

feat(integration-test): add race condition tests and update API versi…

9021b2d

…on to v1.4.7

vercel bot deployed to Preview October 24, 2025 23:36 View deployment

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

integration-test/race_condition/test_race_condition_parallel.sh Show resolved Hide resolved

internal/storage/postgres/utils/common.go Show resolved Hide resolved

refactor(storage): improve snapshot parsing and uniqueness check logic

31f5528

vercel bot deployed to Preview October 24, 2025 23:48 View deployment

tolgaozen merged commit 3b88322 into master Oct 25, 2025
13 of 15 checks passed

tolgaozen deleted the feature/race-condition branch October 25, 2025 00:01

coderabbitai bot mentioned this pull request Oct 26, 2025

feat(api): update version to v1.4.8 and enhance snapshot token creation #2577

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(integration-test): add race condition tests and update API versi… #2575

feat(integration-test): add race condition tests and update API versi… #2575

Uh oh!

tolgaozen commented Oct 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 24, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(integration-test): add race condition tests and update API versi… #2575

feat(integration-test): add race condition tests and update API versi… #2575

Uh oh!

Conversation

tolgaozen commented Oct 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tolgaozen commented Oct 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 24, 2025 •

edited

Loading