Agent Evolution Demo: GPT-4.1 -> GPT-4.1-nano (Agent Prompt & Tool Description) -> GPT-4.1-nano (RFT) for Product Description Manager Agent

This issue consolidates and refines goals from previous discussions (inspired by #106 & #107). It outlines a comprehensive Proof-of-Concept (POC) demonstrating the Evolving Agents Toolkit's (EAT) ability to manage a complete agent lifecycle. The demo will showcase agent evolution through two primary mechanisms:

1.  **Capability-Driven Model Adaptation & Prompt/Description Enhancement:** Orchestrated by `SystemAgent` to change the agent's underlying LLM (GPT-4.1 to GPT-4.1-nano) and to improve the agent's prompt and its tools' descriptions for better clarity, discovery, and LLM interaction, based on natural language requirements.
2.  **Reinforcement Fine-Tuning (RFT):** Leveraging OpenAI's RFT capabilities (conceptually applied to GPT-4.1-nano) to refine the agent's policy for specific criteria, such as output format adherence or content accuracy.

The demonstration will focus on an agent named "Product Description Manager."

**Motivation / Problem:**

EAT aims for autonomous and data-driven agent evolution. This requires capabilities to:
*   Adapt agents to new LLMs or evolving functional requirements based on high-level goals.
*   Refine agent prompts and tool descriptions to optimize LLM interactions and improve discoverability.
*   Continuously improve agent performance and policy based on experience and targeted feedback signals, leveraging modern fine-tuning techniques like RFT.

This unified demo will showcase a practical, end-to-end self-improvement loop within the EAT framework.

**Proposed Solution & Evolution Stages:**

The demo will evolve a "Product Description Manager" agent through the following stages:

**Stage 1: Initial Conversational Agent (GPT-4.1 based)**
*   **Goal:** Create a foundational agent for managing product descriptions.
*   **Capability:** Basic CRUD-like operations (e.g., Create, Retrieve, Update) for product descriptions.
*   **Interaction Style:** Conversational ReAct (requiring step-by-step guidance via prompts to `SystemAgent` or the agent directly).
*   **LLM:** GPT-4.1.
*   **EAT Orchestration:** `SystemAgent` receives a natural language requirement (e.g., "Create a 'Product Description Manager' agent using GPT-4.1, capable of conversational interaction, to handle product descriptions."). `SystemAgent` uses its internal tools (`CreateComponentTool`) to instantiate this agent and its necessary tools (e.g., `RetrieveProductTool`, `UpdateProductTool`, `SaveProductTool`).
*   **Entity Store:** A simple `examples/evolution_workflow/entity_store.py` will simulate product data persistence.

**Stage 2: Model Optimization & Prompt/Description Enhancement (GPT-4.1-nano)**
*   **Goal:** Adapt the agent to a more efficient model (GPT-4.1-nano) and improve its self-description (meta-prompt) and its tools' descriptions for better usability, LLM guidance, and discoverability.
*   **EAT Orchestration:** `SystemAgent` receives a new goal (e.g., "Evolve the 'Product Description Manager' to use GPT-4.1-nano. Also, refine its main description and the descriptions of its 'RetrieveProductTool' and 'UpdateProductTool' to be more precise, focusing on JSON output and key product attributes.").
    *   `SystemAgent` uses `EvolveComponentTool` to change the base model reference in the agent's configuration/code.
    *   `SystemAgent` uses `EvolveComponentTool` (leveraging its LLM) to rewrite the agent's `description` in its `AgentMeta` and the descriptions of its associated tools based on the refinement instructions.
*   **Interaction Style:** Remains Conversational ReAct.
*   **LLM:** GPT-4.1-nano.

**Stage 3: Reinforcement Fine-Tuning for Policy Improvement (GPT-4.1-nano RFT)**
*   **Goal:** Improve the "Product Description Manager" agent's policy for specific criteria, such as strict adherence to a JSON output schema for product descriptions, completeness of required fields, and conciseness.
*   **LLM:** The target for RFT is GPT-4.1-nano.
*   **RFT Process (Orchestrated by Demo Script using OpenAI API):**
    1.  **Define a Grader:**
        *   Implement a grader configuration (e.g., a multi-grader).
        *   **Example RFT Goal:** The RFT-enhanced agent must output product information as a JSON object strictly adhering to the schema: `{"product_id": "string", "name": "string", "price": "float", "features": ["string"], "category": "string", "short_summary": "string (max 50 words)"}`. It must always include `product_id`, `name`, and `price`.
        *   Example sub-graders:
            *   A Python grader (or `json_schema` type if directly supported by OpenAI RFT grader options) to check JSON schema validity and presence/type of required fields (`product_id`, `name`, `price`).
            *   A `string_check` grader to ensure `category` is one of predefined valid categories.
            *   A `score_model` grader (using another LLM like `gpt-4o-mini`) to evaluate the `short_summary` for conciseness (e.g., < 50 words) and relevance to the input features.
    2.  **Prepare Dataset (JSONL):**
        *   Create `training_set.jsonl` and `validation_set.jsonl` (small, illustrative for POC).
        *   Each line:
            *   `messages`: User prompts (e.g., "Create a description for a product with ID 'XYZ', name 'Quantum Widget', price 99.99, features ['self-calibrating', 'eco-friendly'], category 'Electronics'.")
            *   Reference fields for grading (e.g., `expected_json_schema: <schema_dict>`, `expected_category_values: ["Electronics", "Software"]`, `max_summary_length: 50`).
    3.  **Upload Files:** Upload datasets to OpenAI.
    4.  **Create Fine-Tune Job (RFT):**
        *   Use OpenAI API to start an RFT job with the GPT-4.1-nano base model, training/validation file IDs, the grader configuration, and the target JSON schema for structured outputs.
    5.  **Monitor & Evaluate:** (Briefly) Monitor the job via API. For POC, we might not wait for full completion but show job submission.
    6.  **Deploy/Use Fine-Tuned Model:** Obtain the fine-tuned model ID.
*   **EAT Integration:**
    *   The fine-tuned model ID from RFT will be stored in `SmartLibrary` as a new agent version (e.g., "Product Description Manager v1.2-RFT") or as metadata for the existing GPT-4.1-nano version.
    *   `SystemAgent` (or direct calls in the demo) will then use this RFT-enhanced agent, demonstrating improved output quality.
*   **Interaction Style:** The fine-tuned agent (GPT-4.1-nano based) will still operate conversationally (ReAct), but its *policy* (choices, generation quality, and adherence to the RFT goal like strict JSON output) will be demonstrably improved.

**Key EAT Components Involved:**

*   **`SystemAgent`:** Orchestrates agent creation (Stage 1) and LLM/description evolution (Stage 2). Uses `CreateComponentTool`, `EvolveComponentTool`, `SearchComponentTool`.
*   **`SmartLibrary`:** Stores different versions of the "Product Description Manager" agent, metadata about LLMs, evolution history, and the RFT-tuned model ID.
*   **`EvolveComponentTool`:** Used by `SystemAgent` to modify agent code/config for LLM changes and to regenerate descriptions via LLM.
*   **`LLMService`:** Provides access to GPT-4.1, GPT-4.1-nano for agent execution and description generation.
*   **`AgentBus`:** (Conceptually) Logs interactions that inform the RFT dataset creation (dataset is manually curated for POC).

**Acceptance Criteria:**

1.  A new example script `examples/evolution_workflow/unified_agent_evolution_demo.py` runs to completion.
2.  **Stage 1:** `SystemAgent` successfully creates a conversational "Product Description Manager" (v1.0) using GPT-4.1. Basic product management tasks succeed. `SmartLibrary` reflects this.
3.  **Stage 2:** `SystemAgent` successfully evolves the agent to "Product Description Manager" (v1.1) using GPT-4.1-nano. The agent's self-description and its tools' descriptions are demonstrably refined (e.g., logged old vs. new). Tasks still succeed conversationally. `SmartLibrary` reflects this new version.
4.  **Stage 3 (RFT):**
    *   The demo script defines and prints a suitable RFT grader configuration.
    *   The script defines and prints sample RFT training/validation data.
    *   The script successfully initiates an OpenAI RFT job (API call made) targeting a GPT-4.1-nano model. The job ID is logged.
    *   (If feasible within demo runtime/cost or using a pre-tuned ID) The fine-tuned model ID is retrieved and associated with an agent version (e.g., v1.1-RFT) in `SmartLibrary`.
    *   The RFT-enhanced agent, when prompted for product descriptions, demonstrably adheres more closely to the RFT goal (e.g., consistent JSON output matching the schema) compared to its pre-RFT version (v1.1).
5.  `SmartLibrary` (e.g., `unified_evolution_demo_library.json`) contains the distinct agent versions with metadata reflecting their LLMs, refined descriptions, and RFT status.
6.  Logs (e.g., `unified_evolution_demo.log`) clearly show `SystemAgent`'s orchestration for Stages 1 & 2, and the demo script's actions for Stage 3 RFT.
7.  A simple entity store (`examples/evolution_workflow/entity_store.py`) and tools (`examples/evolution_workflow/tools/`) are implemented and used by the agents.
8.  The demo runs with standard environment setup (`.env` for `OPENAI_API_KEY`). `INTENT_REVIEW_ENABLED` set to `false`.

**Implementation Details & Considerations:**

*   **RFT Focus:** The RFT stage will primarily focus on demonstrating the *process* of setting up and initiating an RFT job for policy improvement, especially output structuring.
*   **RFT Dataset & Grader:** For the POC, the RFT dataset will be small and illustrative. The grader will be designed to be implementable with OpenAI's grader types.
*   **RFT Job Management:** The demo will show job submission. Full job completion and extensive evaluation might be out of scope for a quick POC run, but the path to using the fine-tuned model will be shown.
*   **Description Evolution:** The "description enhancement" in Stage 2 will be explicitly prompted to `SystemAgent`, which will then use its LLM capabilities (likely via `EvolveComponentTool` or a dedicated tool) to rewrite the provided descriptions.
*   **Model Compatibility for RFT:** The demo will aim to use GPT-4.1-nano for RFT conceptually. If the OpenAI API for RFT strictly requires `o4-mini` at the time of implementation, the script will use `o4-mini` for the actual `fine_tuning/jobs` API call, clearly logging this substitution.

**Open Questions/Challenges:**

*   **RFT Job Duration/Cost:** Actual RFT job completion can be time-consuming and incur costs. The POC should manage expectations, possibly by simulating the retrieval of a fine-tuned model ID after job submission.
*   **Measuring "Improved Descriptions":** For Stage 2, "demonstrably improves/refines its description" can be shown by logging the "before" and "after" descriptions generated by the LLM.

---

**References:**
* [rft-use-cases](https://platform.openai.com/docs/guides/rft-use-cases)
* [reinforcement-fine-tuning](https://platform.openai.com/docs/guides/reinforcement-fine-tuning)
* [graders](https://platform.openai.com/docs/guides/graders)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agent Evolution Demo: GPT-4.1 -> GPT-4.1-nano (Agent Prompt & Tool Description) -> GPT-4.1-nano (RFT) for Product Description Manager Agent #109

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Agent Evolution Demo: GPT-4.1 -> GPT-4.1-nano (Agent Prompt & Tool Description) -> GPT-4.1-nano (RFT) for Product Description Manager Agent #109

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions