generate command #79

pelikhan · 2025-07-24T10:43:25Z

Implement PromptPex strategy to generate tests for prompts automatically.

🚀 Automated Prompt Test Generation: PromptPex Integration, Robust CLI, and Enhanced Utilities

This PR introduces advanced automated test generation for prompt files using the PromptPex methodology, empowering users to systematically validate and harden prompt engineering workflows.

Highlights:

🧪 PromptPex Test Generation Pipeline:
Implements a new generate CLI command that orchestrates intent analysis, input specification, rule extraction, scenario generation, and test case creation for prompts—enabling automated, stepwise test generation.
🛠️ Extensive CLI Enhancements:
- Adds robust session management with context loading, merging, and saving for persistent state and resumable test generation.
- Supports customizable effort levels (low/medium/high) to control test generation depth and complexity.
- Integrates custom system instructions and advanced options for granular control.
🧰 Utility Functions & Helpers:
- Introduces utilities for response normalization, code block handling, and prompt string rendering.
- Adds SHA256 hashing for prompt versioning and integrity checks.
- Enhances output formatting for clear, styled CLI feedback and debugging.
🏗️ Improved Reliability & Error Handling:
- Implements retry logic with exponential backoff for LLM API calls, including spinner-based progress indication and clear error reporting.
- Expands .gitignore and CI tooling for better artifact management and workflow reliability.
🧑‍🔬 Comprehensive Testing:
- Adds unit tests for new utilities, CLI flag parsing, error scenarios, and command behaviors.
- Provides scaffolding for temporary prompt files and LLM client mocks, ensuring repeatable and robust test coverage.
📄 Documentation & Examples:
- Expands documentation for the generate command, PromptPex integration, and advanced usage.
- Adds example documents for custom instruction scenarios.
🔍 Debugging & Transparency:
- Adds HTTP request logging for the Azure client, aiding in request/response debugging.
- Refines prompt file management with improved YAML serialization and dedicated save methods.

These changes deliver a powerful, research-backed framework for automated prompt validation—streamlining prompt engineering, improving reliability, and making the CLI experience more transparent and user-friendly.

AI-generated content by prd may be incorrect.

- Implement tests for Float32Ptr to validate pointer creation for float32 values. - Create tests for ExtractJSON to ensure correct extraction of JSON from various input formats. - Add tests for cleanJavaScriptStringConcat to verify string concatenation handling in JavaScript context. - Introduce tests for StringSliceContains to check for string presence in slices. - Implement tests for MergeStringMaps to validate merging behavior of multiple string maps, including overwrites and handling of nil/empty maps.

…ove unused ChatMessage type

…Pex context conversion

…ation

… tests in export_test.go - Changed modelParams from pointer to value in toGitHubModelsPrompt function for better clarity and safety. - Updated the assignment of ModelParameters to use the value directly instead of dereferencing a pointer. - Introduced a new test suite in export_test.go to cover various scenarios for GitHub models evaluation generation, including edge cases and expected outputs. - Ensured that the tests validate the correct creation of files and their contents based on the provided context and options.

- Added NewPromptPex function to create a new PromptPex instance. - Implemented Run method to execute the PromptPex pipeline with context management. - Created context from prompt files or loaded existing context from JSON. - Developed pipeline steps including intent generation, input specification, output rules, and tests. - Added functionality for generating groundtruth outputs and evaluating test results. - Implemented test expansion and rating features for improved test coverage. - Introduced error handling and logging throughout the pipeline execution.

- Implemented TestCreateContext to validate various prompt YAML configurations and their expected context outputs. - Added TestCreateContextRunIDUniqueness to ensure unique RunIDs are generated for multiple context creations. - Created TestCreateContextWithNonExistentFile to handle cases where the prompt file does not exist. - Developed TestCreateContextPromptValidation to check for valid and invalid prompt formats. - Introduced TestGithubModelsEvalsGenerate to test the generation of GitHub Models eval files with various scenarios. - Added TestToGitHubModelsPrompt to validate the conversion of prompts to GitHub Models format. - Implemented TestExtractTemplateVariables and TestExtractVariablesFromText to ensure correct extraction of template variables. - Created TestGetMapKeys and TestGetTestScenario to validate utility functions related to maps and test scenarios.

…tPex configuration

… summary generation

… improved summary reporting

…se and restore its implementation; remove obsolete promptpex.go and summary_test.go files

…covering various scenarios and error handling

…entiment analysis test prompt

…neFlags function and update flag parsing to use consistent naming

… in generate_test.go

…ck responses for sentiment analysis stages

…prompts

…odology for test generation

…derMessagesToString for message formatting

…ription for clarity; remove unused test functions

…uracy

…d; clean up pipeline comments for clarity

…erations field; update related tests for consistency

…s; update related tests and documentation for consistency

…pdate related parsing and test logic for consistency

…values; update related tests for consistency and remove unused test_types.go file

…s to values; update ApplyEffortConfiguration and tests for consistency

…rs for improved structure

…consistency

…e GetDefaultOptions and pipeline logic for usage

…s, parsing, and tests

…nced options and customization instructions

…nd update Test Generation section; add mermaid diagram for clarity

…larity and consistency

…related tests for consistency

…alid and invalid effort inputs

…tures - Introduced constants for evaluator rules compliance in constants.go. - Implemented GenerateRulesEvaluator function in evaluators.go for evaluating compliance with output rules. - Updated GetDefaultOptions to include evaluation model in options.go. - Modified pipeline to insert output rule evaluator into the prompt context. - Refactored render functions to use new color constants. - Added Eval field to PromptPexOptions in types.go for configuration.

…improved clarity and functionality; enhance test generation process with new rules and options

…ling in generateGroundtruth function and remove obsolete prompt_hash_test file

Copilot

Pull Request Overview

This pull request implements the generate command to automatically generate test cases for prompts using the PromptPex methodology. The implementation adds comprehensive test generation capabilities that analyze prompts and create diverse test scenarios to evaluate prompt behavior across different edge cases.

Adds new generate command with full PromptPex pipeline for automated test generation
Implements HTTP request logging functionality for debugging API interactions
Extends prompt file structure to support generated test data and evaluations

Reviewed Changes

Copilot reviewed 37 out of 38 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
pkg/prompt/prompt.go	Added SaveToFile method and TestDataItem type, updated YAML tags for omitempty
internal/azuremodels/client.go	Added HTTP logging context utilities for request debugging
internal/azuremodels/azure_client.go	Implemented HTTP request logging to specified log files
examples/test_generate.yml	Example generated prompt file with 40+ test cases and evaluator configuration
examples/custom_instructions_example.md	Documentation for custom instruction flags usage
cmd/run/run.go	Minor variable extraction refactor
cmd/root_test.go	Added test assertion for generate command in help output
cmd/root.go	Registered new generate command
cmd/generate/*	Complete generate command implementation with pipeline, parsing, utilities, and tests
README.md	Added comprehensive documentation for generate command and PromptPex methodology
Makefile	Added ci-lint, build, and clean targets

cmd/generate/parser.go

cmd/generate/generate.go

cmd/generate/README.md

Copilot · 2025-07-25T09:54:16Z

cmd/generate/llm.go

+		defer sp.Stop()
+
+		resp, err := h.client.GetChatCompletionStream(ctx, req, h.org)
+		if err != nil {


The defer statement for sp.Stop() is placed inside a loop and will be executed when the function returns, not when the loop iteration ends. This could lead to multiple spinners running simultaneously. Consider calling sp.Stop() explicitly before continuing to the next iteration or restructuring the code.

Suggested change

if err != nil {

resp, err := h.client.GetChatCompletionStream(ctx, req, h.org)

if err != nil {

sp.Stop() // Ensure spinner is stopped before handling errors

Copilot · 2025-07-25T09:54:16Z

cmd/generate/llm.go

+		}
+		reader := resp.Reader
+		//nolint:gocritic,revive // TODO
+		defer reader.Close()


Similar to the spinner issue, the defer statement for reader.Close() is inside a loop and may not behave as expected. Consider explicit resource management.

Suggested change

defer reader.Close()

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

pelikhan added 30 commits July 21, 2025 13:41

plumbing for commands

871788d

bringing promptpex

9e82844

Add comprehensive Copilot instructions for AI coding agents

d8fcb9d

Enhance ApplyEffortConfiguration to handle nil options gracefully

3ea7a6e

Refactor PromptPexContext to use ChatMessage from azuremodels and rem…

ef7d089

…ove unused ChatMessage type

Implement GitHub Models evaluation file generation and enhance Prompt…

96f9183

…Pex context conversion

Fix dereferencing of Frontmatter fields in GitHub Models prompt gener…

37b761c

…ation

clea content

ee90766

refactor: Remove obsolete export_test_new.go file

1c936c0

refactor: Remove obsolete output options and related tests from Promp…

292917a

…tPex configuration

feat: Add GenerateSummary function and corresponding tests for prompt…

e9c6668

… summary generation

feat: Implement runPipeline function and refactor GenerateSummary for…

5c5a167

… improved summary reporting

refactor: Rename parseTestsFromLLMResponse to ParseTestsFromLLMRespon…

b4b662f

…se and restore its implementation; remove obsolete promptpex.go and summary_test.go files

test: Add comprehensive tests for ParseTestsFromLLMResponse function …

393020f

…covering various scenarios and error handling

feat: Implement generate command with comprehensive options and add s…

6458590

…entiment analysis test prompt

refactor: Consolidate command-line flag definitions into AddCommandLi…

cdc38f1

…neFlags function and update flag parsing to use consistent naming

test: Add comprehensive tests for NewGenerateCommand and flag parsing…

bbdd748

… in generate_test.go

test: Enhance TestGenerateCommandWithValidPromptFile with detailed mo…

7dc3d7d

…ck responses for sentiment analysis stages

move test to common fodler

e812aec

feat: Update generate command description to include evaluations for …

341442f

…prompts

fix: Clarify command description to specify the use of PromptPex meth…

da294e2

…odology for test generation

fix: Update build instructions to include 'make build' command

50b853f

refactor: Rename runPipeline to RunTestGenerationPipeline and add Ren…

5018380

…derMessagesToString for message formatting

Merge remote-tracking branch 'origin/main' into pelikhan/promptpex

9391f0d

refactor: Update test prompt from sentiment analysis to joke analysis

f3f320b

fix: Disable usage help for pipeline failures in generate command

7ab63bc

pelikhan added 23 commits July 24, 2025 20:31

Refactor groundtruth model handling and update command-line flag desc…

f57bf34

…ription for clarity; remove unused test functions

wire up ci-lint

eba1adc

Update command examples in NewGenerateCommand for consistency and acc…

2d031ec

…uracy

Refactor PromptPex model aliases and remove unused TestExpansion fiel…

8b12281

…d; clean up pipeline comments for clarity

Refactor EffortConfiguration and PromptPexOptions by removing TestGen…

9eb7803

…erations field; update related tests for consistency

Refactor PromptPex model handling by changing pointer fields to value…

025e32e

…s; update related tests and documentation for consistency

Refactor PromptPexTest struct by changing pointer fields to values; u…

7571825

…pdate related parsing and test logic for consistency

Refactor PromptPexContext by changing RunID and PromptHash fields to …

0615664

…values; update related tests for consistency and remove unused test_types.go file

Refactor PromptPexOptions and related logic by changing pointer field…

178d921

…s to values; update ApplyEffortConfiguration and tests for consistency

Refactor test_generate.yml by nesting temperature under modelParamete…

59ca252

…rs for improved structure

Fix JSON field names in PromptPexTest and test generation output for …

2831dd9

…consistency

Add model key parsing in callModelWithRetry for improved error handling

a0cda99

Add IntentMaxTokens and InputSpecMaxTokens to PromptPexOptions; updat…

d4a8976

…e GetDefaultOptions and pipeline logic for usage

Add support for custom instructions in generation phases; update flag…

2018522

…s, parsing, and tests

Add test generation feature using PromptPex methodology; include adva…

e9adb0f

…nced options and customization instructions

Enhance README.md with detailed explanation of Inverse Output Rules a…

7defd59

…nd update Test Generation section; add mermaid diagram for clarity

Add Intent node to PromptPex mermaid diagram for clarity in output rules

29074f6

Refactor command-line flags and update test generation examples for c…

c058406

…larity and consistency

Refactor test input handling in ParseTestsFromLLMResponse and update …

5bc1b87

…related tests for consistency

Validate effort level in ParseFlags and add comprehensive tests for v…

36fd696

…alid and invalid effort inputs

Refactor effort configuration structure and update related logic for …

9c13267

…improved clarity and functionality; enhance test generation process with new rules and options

Update Makefile to use correct path for Go linter; enhance error hand…

4b18ed0

…ling in generateGroundtruth function and remove obsolete prompt_hash_test file

pelikhan requested a review from Copilot July 25, 2025 09:53

Copilot AI reviewed Jul 25, 2025

View reviewed changes

pelikhan and others added 5 commits July 25, 2025 10:12

add pull request description script

45d8915

Update cmd/generate/parser.go

8f7da6c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update cmd/generate/generate.go

376135e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update cmd/generate/README.md

1a6090e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Remove test data from test_generate.yml to streamline example usage

d21cd6c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

generate command #79

generate command #79

Uh oh!

pelikhan commented Jul 24, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jul 25, 2025

Uh oh!

Copilot AI Jul 25, 2025

Uh oh!

Uh oh!

generate command #79

Are you sure you want to change the base?

generate command #79

Uh oh!

Conversation

pelikhan commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Automated Prompt Test Generation: PromptPex Integration, Robust CLI, and Enhanced Utilities

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pelikhan commented Jul 24, 2025 •

edited

Loading