Feat/telemetry webhook #1026

homanp · 2025-08-29T11:04:38Z

Description

This pull request introduces support for sending telemetry data to an external webhook, allowing for enhanced monitoring and analytics. It adds configuration options for telemetry webhooks in both YAML and multitenant setups, updates the proxy server implementations in Node.js and Rust to send request telemetry asynchronously, and refines the configuration caching logic to include telemetry webhook settings. Additionally, the vibekit.yaml file is updated to support a broader range of OpenAI models.

Telemetry Webhook Integration

Added support for specifying a telemetry_webhook configuration in YAML files, including webhook URL and custom headers, and updated documentation with usage examples. [1] [2]
Implemented methods in both Node.js (config.js, server.js) and Rust (config.rs, server.rs) to retrieve telemetry webhook configuration, supporting both single-tenant and multitenant modes. [1] [2] [3] [4]
Updated caching logic for multitenant configurations to store and retrieve both model and telemetry webhook settings from Redis, ensuring webhook settings persist across requests. [1] [2] [3] [4]
Added asynchronous telemetry webhook sending for each request (including SSE), with robust error handling and non-blocking execution in both Node.js and Rust proxy servers. [1] [2] [3] [4]

Model Configuration Updates

Expanded vibekit.yaml to include new OpenAI models, such as the GPT-5 series, o-Series reasoning models, and additional audio models (tts-1, tts-1-hd, whisper-1), improving support for a wider range of tasks and capabilities. [1] [2] [3] [4]
Updated example requests and documentation to reference the latest available models.

Codebase Maintenance

Removed the outdated rust/config.yaml file, as configuration is now managed elsewhere.
Ensured correct cloning and sharing of telemetry webhook configuration in Rust's ProxyServer for safe concurrent access.

Related Issue

Fixes #1021

Checklist

I tested my changes
I reviewed my own code

claude · 2025-08-29T11:06:50Z

PR Review: Telemetry Webhook Integration

🎯 Overview

This PR introduces telemetry webhook functionality to send request analytics to external endpoints. The implementation covers both Node.js and Rust proxy servers with support for single-tenant (YAML) and multitenant configurations.

✅ Strengths

Code Quality & Architecture

Consistent Implementation: Both Node.js and Rust implementations follow similar patterns, making maintenance easier
Non-blocking Design: Telemetry requests are properly fire-and-forget, ensuring they don't impact proxy performance
Configuration Flexibility: Supports both YAML-based and API-driven multitenant configurations
Error Handling: Appropriate try-catch blocks and error logging for webhook failures

Performance Considerations

Async Processing: Both implementations use async patterns (setImmediate in Node.js, tokio::spawn in Rust)
Redis Caching: Enhanced cache structure to store complete config including telemetry settings
No Performance Impact: Webhook failures won't affect main proxy functionality

🔧 Areas for Improvement

1. Code Duplication & Structure

// Node.js: server.js:848-879
sendToTelemetryWebhook(requestData) {
  setImmediate(() => { ... });
}

// Rust: server.rs:254-296  
fn send_to_telemetry_webhook(&self, request_data: serde_json::Value) {
  tokio::spawn(async move { ... });
}

Suggestion: Consider extracting telemetry logic into separate modules (telemetry.js/telemetry.rs) to improve maintainability.

2. Error Handling Inconsistency

Node.js logs webhook errors but swallows preparation errors silently
Rust uses eprintln! while Node.js uses console.error

Recommendation: Standardize error logging approach across both implementations.

3. Configuration Validation

The telemetry webhook URL isn't validated during configuration loading:

telemetry_webhook:
  url: "invalid-url"  # No validation

Suggestion: Add URL validation in both config managers.

4. Memory Management

Large request bodies are truncated to 10,000 characters, but this happens after JSON serialization:

"body": request_body.chars().take(10000).collect::<String>(),

Consider truncating before serialization to save memory.

🔐 Security Review

✅ Good Practices

Headers are properly forwarded from configuration
No sensitive data logged in error messages
Fire-and-forget pattern prevents webhook timing attacks

⚠️ Security Considerations

Data Exposure: Request bodies (potentially containing sensitive data) are sent to external webhooks
Configuration Security: Webhook URLs and headers stored in cache without encryption
No Rate Limiting: Webhook requests aren't rate limited

Recommendations:

Add configuration option to exclude request bodies from telemetry
Consider encrypting cached telemetry configurations
Implement webhook request rate limiting

🧪 Test Coverage

Current Coverage

The existing test.sh covers basic proxy functionality but doesn't test telemetry features.

Missing Tests

Telemetry webhook sending (both success/failure scenarios)
Configuration caching with telemetry settings
Multitenant telemetry configuration
Error handling when webhook is unreachable

Suggestion: Add telemetry-specific test cases to test.sh or create a separate telemetry test suite.

🐛 Potential Issues

1. Race Condition in Multitenant Cache

// server.js:105-107
const cachedConfig = JSON.parse(cachedData);
allConfigs = cachedConfig.models;
this.telemetryWebhookConfig = cachedConfig.telemetry_webhook;

The telemetry config assignment could be overwritten by concurrent requests.

2. Missing Null Checks

// Node.js server.js:143
this.telemetryWebhookConfig = configResponse.telemetry_webhook;

Should handle case where telemetry_webhook is null/undefined.

3. Configuration Cache Inconsistency

Rust implementation has more robust cache structure validation than Node.js version.

📈 Performance Impact

Positive

Async implementation prevents blocking
Redis caching reduces API calls
Request truncation limits memory usage

Considerations

Additional memory for telemetry payload construction
Extra network requests (mitigated by async)
Redis memory usage increase for cached telemetry configs

🎯 Recommendations

High Priority

Add telemetry webhook tests to prevent regressions
Implement URL validation in configuration loading
Fix race condition in multitenant cache handling
Standardize error logging across implementations

Medium Priority

Extract telemetry logic into separate modules
Add rate limiting for webhook requests
Consider data sensitivity configuration options

Documentation

The README updates clearly explain the telemetry webhook feature. Consider adding:

Example webhook payload structure
Error handling behavior documentation
Performance impact notes

🎉 Conclusion

This is a solid implementation that successfully adds telemetry capabilities without compromising proxy performance. The async design and configuration flexibility are well thought out. With the suggested improvements around testing, error handling, and security considerations, this will be a robust feature.

Overall Assessment: ✅ Approve with minor improvements

The core functionality is sound and the implementation follows good practices. The suggested improvements are mostly about robustness and maintainability rather than blocking issues.

homanp added 2 commits August 29, 2025 11:13

Add support for telemetry webhook to nodejs

9c3b112

minor tweaks

8c2dfaf

homanp self-assigned this Aug 29, 2025

homanp merged commit 45bfbac into main Aug 29, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/telemetry webhook #1026

Feat/telemetry webhook #1026

Uh oh!

homanp commented Aug 29, 2025

Uh oh!

Uh oh!

claude bot commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat/telemetry webhook #1026

Feat/telemetry webhook #1026

Uh oh!

Conversation

homanp commented Aug 29, 2025

Description

Related Issue

Checklist

Uh oh!

Uh oh!

claude bot commented Aug 29, 2025

PR Review: Telemetry Webhook Integration

🎯 Overview

✅ Strengths

Code Quality & Architecture

Performance Considerations

🔧 Areas for Improvement

1. Code Duplication & Structure

2. Error Handling Inconsistency

3. Configuration Validation

4. Memory Management

🔐 Security Review

✅ Good Practices

⚠️ Security Considerations

🧪 Test Coverage

Current Coverage

Missing Tests

🐛 Potential Issues

1. Race Condition in Multitenant Cache

2. Missing Null Checks

3. Configuration Cache Inconsistency

📈 Performance Impact

Positive

Considerations

🎯 Recommendations

High Priority

Medium Priority

Documentation

🎉 Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants