feat: blacklist redaction option #1054

homanp · 2025-10-16T07:15:24Z

Description

This pull request introduces custom entity redaction to the Superagent CLI and SDKs, allowing users to specify which types of PII to redact using natural language descriptions. The CLI now supports a --entities flag, and both the TypeScript and Python SDKs accept an entities parameter for flexible, context-aware redaction. Documentation and tests have been updated to reflect and validate these new features.

Custom Entity Redaction Feature

Added support for specifying custom entities to redact using natural language in both TypeScript and Python SDKs via an entities parameter in the redact() method. The CLI now accepts a --entities flag for the same purpose. [1] [2] [3] [4] [5] [6]
Updated the request body sent to the redaction API to include the entities list if provided, enabling AI-powered interpretation of custom entity descriptions. [1] [2]

Documentation Updates

Expanded CLI, TypeScript, and Python SDK documentation with sections and examples for custom entity redaction, including usage of the new flag/parameter and sample entity descriptions. [1] [2] [3] [4] [5] [6]

API and Type Safety Improvements

Introduced a RedactOptions interface to the TypeScript SDK for better type safety and extended the Client interface to support the new options.
Added support for a new reasoning field in API responses for both SDKs, with backward compatibility for the previous reasoning_content field. [1] [2] [3] [4]

Testing Enhancements

Added new tests in both Python and TypeScript SDKs to verify that custom entity redaction works as expected, including cases for single and multiple entities. [1] [2]

Version Bumps

Bumped CLI, TypeScript SDK, and Python SDK versions to reflect the new feature releases. [1] [2] [3]

Change references: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]

Related Issue

Fixes #1053

Checklist

I tested my changes
I reviewed my own code

vercel · 2025-10-16T07:15:31Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
docs	Ready	Preview	Comment	Oct 16, 2025 7:15am

claude · 2025-10-16T07:17:13Z

Pull Request Review: Custom Entity Redaction Feature

Overall Assessment

This is a well-structured PR that adds valuable functionality for custom entity redaction using natural language descriptions. The implementation is consistent across CLI, TypeScript SDK, and Python SDK, with good documentation and test coverage.

Recommendation: ✅ Approve with minor suggestions for improvement

Code Quality & Best Practices

✅ Strengths

Consistent API Design: The entities parameter is implemented consistently across all three surfaces (CLI, TypeScript, Python)
Backward Compatibility: Good handling of both reasoning and reasoning_content fields for API compatibility
Type Safety: Added RedactOptions interface in TypeScript for better type safety
Documentation: Comprehensive documentation updates with clear examples
Version Management: Proper version bumps across all packages

⚠️ Suggestions for Improvement

CLI Argument Parsing (cli/src/commands/redact.ts:21-37)
- The current implementation assumes --entities value is always at entitiesFlagIndex + 1, but doesn't validate it's not another flag
- Consider edge case: superagent redact --entities --url-whitelist ... would incorrectly treat --url-whitelist as the entities value
Suggested fix:
```
if (entitiesFlagIndex !== -1) {
  const entitiesValue = args[entitiesFlagIndex + 1];
  if (!entitiesValue || entitiesValue.startsWith('--')) {
    console.error('❌ ERROR: --entities requires a comma-separated list of entity types');
    process.exit(1);
  }
  // ... rest of logic
}
```
Empty Entities Array Handling
- TypeScript SDK checks entities.length > 0 before adding to request (line 299)
- Python SDK doesn't have this check - it will send an empty array if entities=[]
Suggested alignment in Python (client.py:213):
```
if entities and len(entities) > 0:
    request_body["entities"] = entities
```

Error Handling for Invalid Entities

No validation that entities are non-empty strings
CLI splits on comma but doesn't validate the resulting entities

Consider adding:

// In CLI after split
entities = entitiesValue.split(',')
  .map(entity => entity.trim())
  .filter(entity => entity.length > 0);

if (entities.length === 0) {
  console.error('❌ ERROR: --entities must contain at least one valid entity type');
  process.exit(1);
}

Potential Bugs & Issues

🔴 Medium Priority

Test Assertions May Be Fragile (sdk/python/tests/test_guard.py:212, sdk/typescript/tests/guard.test.ts:180)
- Tests assume SSN 123-45-6789 will NOT be redacted when only requesting email/phone redaction
- This relies on the API's interpretation and may be brittle if the model's behavior changes
- Recommendation: Add a comment explaining this assumption or use mock responses for more deterministic tests
No Validation of API Response Format
- Both SDKs assume the API response has the expected structure
- If the API changes or returns an error in a different format, could throw unclear errors
- Suggestion: Add basic validation after result = response.json() to ensure required fields exist

Security Concerns

✅ Good Security Practices

No Direct PII Storage: The feature enables better PII detection, which is security-positive
Defensive Coding: Good use of optional chaining and fallbacks for API response fields
API Key Handling: Proper use of environment variables and headers

⚠️ Minor Considerations

Entity Descriptions as Attack Vector
- Malicious users could potentially craft entity descriptions to manipulate redaction behavior
- Example: entities=["nothing", "everything except secrets"]
- Recommendation: Consider documenting best practices or adding warnings about entity description specificity
URL Whitelist Before Entities
- In Python SDK (client.py:248-254), URL whitelist is applied AFTER getting API response
- If entities include "URLs", the API might redact them before local whitelist is applied
- Recommendation: Document the interaction between entities and url_whitelist, or clarify order of operations

Performance Considerations

✅ Efficient Implementation

Minimal Overhead: Only adds a small array to the request body
No Additional API Calls: Single request handles custom entities
Optional Parameter: No performance impact when not used

💡 Optimization Opportunities

Entity Deduplication
- If user passes duplicate entities, they're sent to API as-is
- Minor optimization: deduplicate before sending
```
# In Python SDK
if entities:
    request_body["entities"] = list(set(entities))  # Remove duplicates
```
Caching Potential
- Identical text + entities combinations could benefit from caching
- Consider documenting this as a future enhancement for high-volume use cases

Test Coverage

✅ Good Coverage

Happy Path Tests: Both SDKs test single and multiple entities
Integration Tests: Tests hit actual API endpoints (based on API_BASE_URL usage)
Backward Compatibility: Existing tests still pass (implicitly validates backward compat)

📈 Opportunities for Additional Tests

Edge Cases:
- Empty string in entities array: entities=["", "email"]
- Very long entity descriptions
- Special characters in entity descriptions
- Interaction between entities and url_whitelist
Error Scenarios:
- API returns error when entities are invalid
- Network timeout with entities parameter
- Malformed API response
CLI Tests:
- No test file visible for CLI command
- Recommendation: Add CLI integration tests for --entities flag

Example test cases to add:

// TypeScript
it("redact with empty entities array sends no entities", async () => {
  const result = await client.redact(text, { entities: [] });
  // Verify request body doesn't include entities field
});

it("redact with entities and urlWhitelist works together", async () => {
  const result = await client.redact(
    "Email: john@example.com, URL: https://example.com",
    { entities: ["email"], urlWhitelist: ["https://example.com"] }
  );
  // Verify email redacted but URL preserved
});

Documentation Quality

✅ Excellent Documentation

Clear Examples: All three platforms have usage examples
Natural Language Descriptions: Good examples of entity descriptions
Combined Usage: Shows using entities with url_whitelist together

💡 Minor Enhancements

Add Performance Note: Document that more specific entity descriptions may yield better results
Add Limitations: Mention any known limitations or edge cases
Add Examples of What NOT to Do: Show antipatterns (e.g., too vague descriptions like "data")

Changelog Review

The CHANGELOG.md is well-structured and comprehensive. However:

Missing Breaking Changes Section: If the API response format changed (reasoning vs reasoning_content), should note if this is breaking
Consider Semantic Versioning: From 0.0.11 → 0.0.12 suggests patch, but this is a new feature (minor bump: 0.1.0?)

Summary & Recommendations

Category	Rating	Notes
Code Quality	⭐⭐⭐⭐	Clean, consistent, well-structured
Security	⭐⭐⭐⭐	No major concerns, minor considerations documented
Performance	⭐⭐⭐⭐⭐	Minimal overhead, efficient implementation
Test Coverage	⭐⭐⭐	Good coverage, room for edge case tests
Documentation	⭐⭐⭐⭐⭐	Comprehensive and clear

Action Items (Priority Order)

High: Fix CLI argument parsing to validate flag values
Medium: Align empty array handling between TypeScript and Python
Medium: Add entity description validation/sanitization
Low: Add edge case tests
Low: Document entity + URL whitelist interaction

Conclusion

This is a solid feature addition that follows the project's conventions and maintains consistency across platforms. The implementation is secure and performant. With the minor improvements suggested above, this PR will be excellent.

Great work! 🎉

Review generated on 2025-10-16

homanp added 3 commits October 16, 2025 08:31

add support for setting entities in redaction

dc08433

update changelog

155bb69

update tests

36d64a8

homanp self-assigned this Oct 16, 2025

homanp changed the title ~~Feat/blacklist redaction option~~ feat: blacklist redaction option Oct 16, 2025

homanp merged commit 4f9c0d3 into main Oct 16, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: blacklist redaction option #1054

feat: blacklist redaction option #1054

Uh oh!

homanp commented Oct 16, 2025

Uh oh!

vercel bot commented Oct 16, 2025

Uh oh!

claude bot commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: blacklist redaction option #1054

feat: blacklist redaction option #1054

Uh oh!

Conversation

homanp commented Oct 16, 2025

Description

Custom Entity Redaction Feature

Documentation Updates

API and Type Safety Improvements

Testing Enhancements

Version Bumps

Related Issue

Checklist

Uh oh!

vercel bot commented Oct 16, 2025

Uh oh!

claude bot commented Oct 16, 2025

Pull Request Review: Custom Entity Redaction Feature

Overall Assessment

Code Quality & Best Practices

✅ Strengths

⚠️ Suggestions for Improvement

Potential Bugs & Issues

🔴 Medium Priority

Security Concerns

✅ Good Security Practices

⚠️ Minor Considerations

Performance Considerations

✅ Efficient Implementation

💡 Optimization Opportunities

Test Coverage

✅ Good Coverage

📈 Opportunities for Additional Tests

Documentation Quality

✅ Excellent Documentation

💡 Minor Enhancements

Changelog Review

Summary & Recommendations

Action Items (Priority Order)

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants