A concurrent resume processing system built with Go + BAML.
You'll need the following installed:
- Go 1.25+ - Download here or use
brew install goor a package manager for your distro - pnpm - For BAML CLI installation (installation options)
npm install -g pnpm
Ensure that pnpm is in your PATH. pnpm setup
Version 9.3.0
Usage: pnpm setup
Sets up pnpm
Options:
-f, --force Override the PNPM_HOME env variable in case it already exists
Visit https://pnpm.io/9.x/cli/setup for documentation about this command.
- Docker - For running the BAML server
The .env file is already configured with a valid API key which is set up for this assignment.
# Generate Go client from BAML definitions
./generate.shThis creates baml/baml_client/ with type-safe Go code and tests if you have all the dependencies installed.
# Run the simple example
cd baml && docker compose up -d --build # important to rebuild the image every time you make changes to the BAML definitions as well as to rerun ./generate.sh
go run *.goYou should see:
- a resume extracted successfully
- the concurrent example failing due to rate limits.
While the root main.go demonstrates the problem of uncontrolled concurrency, the fully working batch processor is provided in the cmd/resumeproc command. This program enforces the API’s 5‑requests‑per‑minute limit and retries each resume until success, without prompting for configuration at runtime. Configuration comes from environment variables with sensible defaults.
To run the robust solution:
# Make sure the BAML server is running
cd baml && docker compose up -d --build
cd ..
# Optional: override default settings via environment variables
export RATE_PER_MIN=5 # requests per minute (default: 5)
export BURST_SIZE=5 # burst capacity (default: 5)
export MAX_WORKERS=3 # number of concurrent workers (default: 3)
# Execute the batch processor
go run ./cmd/resumeprocThe program reads all 25 resumes from data.go, processes them concurrently, and writes one JSON record per resume to stdout. A summary line is printed to stderr when processing completes. You can adjust RATE_PER_MIN, BURST_SIZE and MAX_WORKERS via environment variables to tune throughput while always respecting the server’s rate limits.
.
├── baml/
│ ├── baml_src/ # BAML definitions
│ │ ├── clients.baml # LLM clients
│ │ ├── generators.baml # Code generation config
│ │ └── resume.baml # Resume extraction functions
│ ├── baml_client/ # Generated code (don't edit)
│ ├── baml_client_util.go # Helper utilities
│ ├── baml_generate.go # Code generation logic
│ ├── client.go # BAML client wrapper
│ ├── docker-compose.yml # Docker Compose for BAML server
│ ├── Dockerfile.baml # BAML server Docker image
│ └── oapi-codegen-config.yaml # OpenAPI codegen config
├── tools/ # Go tools dependencies
├── main.go # Example code with single & concurrent extraction
├── data.go # Sample resume data (25 resumes with varied lengths)
├── generate.sh # BAML generation script
├── .env.template # Environment template
├── go.mod # Go module
└── README.md # This file
BAML (Boundary ML) is a DSL for building LLM applications with structured outputs. We use it extensively in all of our products.
class Resume {
name string
skills string[]
education Education[]
}
function ExtractResume(raw_text: string) -> Resume {
client GPT4o
prompt #"
Parse this resume: {{ raw_text }}
{{ ctx.output_format }}
"#
}
BAML is compiled to a Go client using baml-cli generate. This creates the baml/baml_client/ directory with an openapi spec.
Running go generate ./... in the baml/ directory will generate the Go client code.
Running ./generate.sh in the root directory will regenerate the Go client code.
Running docker compose up -d --build in the baml/ directory will rebuild the BAML server image and start the server.
bamlClient, _ := baml.NewSyncClient("http://localhost:2024", 5*time.Minute)
resume, _ := baml.Sync[baml.Resume](bamlClient.ExtractResume(context.Background(), baml.ExtractResumeJSONRequestBody{
RawText: resumeText,
}))
// resume.Name, resume.Skills, etc. are now available!Build a robust concurrent batch processing system that can handle multiple resume extractions efficiently while respecting API rate limits.
The provided API key is intentionally rate-limited to simulate real-world conditions:
- 5 requests per minute maximum
- 6,000 tokens per minute maximum
The key may also be used by other applicants at the same time.
These limits are designed to exacerbate rate limiting issues you'd encounter in production systems.
On average, a single resume extraction uses approximately 1,100-1,200 tokens:
- ~300 tokens for input (the resume text)
- ~800 tokens for output (the structured data)
You can observe actual token usage in the BAML server logs when running docker compose up. The logs show detailed information about each request including token counts.
Token Counting: Accurately counting tokens in Go is notoriously difficult and requires model-specific tokenizer libraries. For this assignment, approximation is perfectly acceptable. Character count and word count correlate linearly with token count - a reasonable approximation is ~4 characters per token or ~0.75 tokens per word. You can also use the average values above and observe patterns in the BAML logs.
Token counting is not a requirement for the solution of this assignment.
The main.go file includes two examples:
extractSingleResume()- Works fine, makes one requestextractMultipleResumesConcurrently()- Launches all goroutines simultaneously and will fail due to rate limits
The data.go file contains 25 sample resumes with varying lengths for testing:
- Some resumes are very short (~200-400 tokens total)
- Some are medium length (~800-1,000 tokens)
- Some are longer (~1,200-1,500 tokens)
This variation is intentional - it mirrors real-world data and makes the optimization problem more interesting. You'll need to consider token usage per resume, not just request counting.
Note: Due to occasional burst tolerance in rate limiting, you may not see errors on every run. If the concurrent example succeeds, simply run it again to observe the rate limit failures.
Your task is NOT to simply avoid rate limits by processing slowly. Instead, you must maximize execution speed while intelligently managing the constraints.
Think of this as a real production system processing thousands of resumes where speed matters.
Build a system that:
-
Processes all 25 resumes as fast as possible while respecting BOTH rate limits:
- 5 requests per minute maximum
- 6,000 tokens per minute maximum
-
Handles failures gracefully with appropriate retry logic
-
Uses proper Go concurrency patterns to maximize throughput
-
Collects all results in a thread-safe manner
-
Accounts for varying resume lengths when optimizing throughput
-
Recovers from external failures beyond rate limits:
- Stochastic LLM behavior may produce malformed JSON (even though BAML handles most cases)
- Network failures, timeouts, or API unavailability
- Your system should be resilient to these real-world issues
- DO NOT just add delays between requests to slow things down
- DO NOT use AI/LLM tools (ChatGPT, Claude, Copilot, etc.) to solve this assignment
- You may use standard Go libraries and search Go documentation
- Focus on demonstrating your understanding of concurrent systems design and general software engineering best practices
Functional Requirements:
- All 25 resumes successfully processed
- Execution time is minimized (not just "it works eventually")
- Proper error handling and resource management
Code Quality & Software Engineering:
- Separation of Concerns: Clear boundaries between rate limiting, retry logic, and job processing
- Clean Abstractions: Well-defined interfaces and types that make the system understandable
- Configurability: Easy to adjust parameters (rate limits, retry counts, worker pool sizes, etc.)
- Documentation: Clear comments explaining design decisions and non-obvious logic
- Idiomatic Go: Proper use of Go patterns, naming conventions, and standard library
We strongly encourage you to experiment with multiple implementations:
- Start with a naive solution - Get something working first, even if slow
- Build a more advanced version - Optimize for throughput and efficiency
- Push further (optional) - Demonstrate sophisticated patterns and edge case handling
Time your solutions! Use Go's time package to measure execution time for each approach. This helps you:
- Compare different strategies objectively
- Identify bottlenecks
- Demonstrate your optimization thinking
You can show multiple implementations to illustrate your thought process and understanding of trade-offs.
The BAML client handles LLM calls and parsing - you focus on building an optimized, production-ready concurrent processor.
"BAML CLI not found": Run pnpm add -g @boundaryml/baml
"Module not found": Run go mod download
"API errors": Check your .env file has a valid API key
"Generated code doesn't match": Run ./generate.sh again
We're excited to see your solution!