This document describes the current implementation of a simple mock LLM server for end-to-end tests. It provides basic request/response mocking for OpenAI and Anthropic APIs using their official SDK types.
- Support OpenAI Chat Completions API and Anthropic Messages API request/response schemas.
- Simple configuration using Go structs with official SDK types.
- Deterministic responses for testing without network calls.
- Minimal setup for basic testing scenarios.
- ✅ Basic OpenAI Chat Completions API support (non-streaming)
- ✅ Basic Anthropic Messages API support (non-streaming)
- ✅ Simple exact and contains matching
- ✅ In-memory configuration using Go structs
- ✅ Tool/function calls
- ✅ JSON configuration files
- ❌ Streaming responses (not implemented)
- ❌ Complex scenario engine (not implemented)
The current implementation uses a simplified architecture:
- Server: HTTP server with Gorilla mux router that handles provider-specific endpoints
- Provider Handlers: Separate handlers for OpenAI and Anthropic that process requests and return mocked responses
- Simple Matching: Basic matching logic that compares incoming requests against predefined mocks
- Direct SDK Integration: Uses official OpenAI and Anthropic SDK types directly
Current implementation uses these core types:
Config
: Root configuration containing arrays of OpenAI and Anthropic mocksOpenAIMock
: Maps OpenAI requests to responses using official SDK typesAnthropicMock
: Maps Anthropic requests to responses using official SDK types
MatchType
: Enum for matching strategies (exact
,contains
)OpenAIRequestMatch
: Defines how to match OpenAI requests (match type + message)AnthropicRequestMatch
: Defines how to match Anthropic requests (match type + message)
- Endpoint:
POST /v1/chat/completions
- Auth:
Authorization: Bearer <token>
(presence check only) - Request Type:
openai.ChatCompletionNewParams
- Response Type:
openai.ChatCompletion
- Matching: Exact or contains matching on the last message in the conversation
- Endpoint:
POST /v1/messages
- Auth:
x-api-key
(presence check only) - Headers:
anthropic-version
required - Request Type:
anthropic.MessageNewParams
- Response Type:
anthropic.Message
- Matching: Exact matching on the last message in the conversation (contains not implemented)
config := mockllm.Config{
OpenAI: []mockllm.OpenAIMock{
{
Name: "simple-response",
Match: mockllm.OpenAIRequestMatch{
MatchType: mockllm.MatchTypeExact,
Message: /* openai.ChatCompletionMessageParamUnion */,
},
Response: /* openai.ChatCompletion */,
},
},
Anthropic: []mockllm.AnthropicMock{
{
Name: "simple-response",
Match: mockllm.AnthropicRequestMatch{
MatchType: mockllm.MatchTypeExact,
Message: /* anthropic.MessageParam */,
},
Response: /* anthropic.Message */,
},
},
}
{
"openai": [
{
"name": "initial_request",
"match": {
"match_type": "exact",
"message" : {
"content": "List all nodes in the cluster",
"role": "user"
}
},
"response": {
"id": "chatcmpl-1",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-4.1-mini",
"choices": [
{
"index": 0,
"role": "assistant",
"message": {
"content": "",
"tool_calls": [
...
]
},
"finish_reason": "tool_calls"
}
]
}
},
{
"name": "k8s_get_resources_response",
"match": {
"match_type": "contains",
"message" : {
"content": "kagent-control-plane",
"role": "tool",
"tool_call_id": "call_1"
}
},
"response": {
"id": "call_1",
"object": "chat.completion.tool_message",
"created": 1677652288,
"model": "gpt-4.1-mini",
"choices": [
...
]
}
}
]
}
Simple linear search through mocks:
- Parse incoming request into appropriate SDK type
- Iterate through provider-specific mocks in order
- For each mock, check if the match criteria are met:
- Exact: JSON comparison of the last message
- Contains: String contains check on message content (OpenAI only)
- Return the response from the first matching mock
- Return 404 if no match found
- All responses are non-streaming JSON
- Uses official SDK response types directly
- No transformation or adaptation layer
- Standard HTTP headers (
Content-Type: application/json
)
Current implementation consists of:
server.go
— HTTP server setup, routing, and lifecycle managementtypes.go
— Core configuration types using official SDK typesopenai.go
— OpenAI provider handler and matching logicanthropic.go
— Anthropic provider handler and matching logicserver_test.go
— Basic integration tests
config := mockllm.Config{/* mocks */}
server := mockllm.NewServer(config)
baseURL, err := server.Start() // Starts on random port
defer server.Stop()
// Use baseURL for API calls in tests
- OpenAI Go SDK:
github.com/openai/openai-go
- Anthropic Go SDK:
github.com/anthropics/anthropic-sdk-go
- HTTP Router:
github.com/gorilla/mux
- No Streaming: Only supports non-streaming responses
- Simple Matching: Only last message matching, no complex predicates
- No Multi-turn: No stateful conversation tracking
- Limited Error Handling: Basic error responses only
- No Latency Simulation: No timing controls
The original design document outlined more sophisticated features that could be added:
- Streaming response support
- Complex matching predicates
- Error injection and latency simulation