这是indexloc提供的服务,不要输入任何密码
Skip to content

Agent Evolution Demo: GPT-4.1 -> GPT-4.1-nano (Agent Prompt & Tool Description) -> GPT-4.1-nano (RFT) for Product Description Manager Agent #109

@matiasmolinas

Description

@matiasmolinas

This issue consolidates and refines goals from previous discussions (inspired by #106 & #107). It outlines a comprehensive Proof-of-Concept (POC) demonstrating the Evolving Agents Toolkit's (EAT) ability to manage a complete agent lifecycle. The demo will showcase agent evolution through two primary mechanisms:

  1. Capability-Driven Model Adaptation & Prompt/Description Enhancement: Orchestrated by SystemAgent to change the agent's underlying LLM (GPT-4.1 to GPT-4.1-nano) and to improve the agent's prompt and its tools' descriptions for better clarity, discovery, and LLM interaction, based on natural language requirements.
  2. Reinforcement Fine-Tuning (RFT): Leveraging OpenAI's RFT capabilities (conceptually applied to GPT-4.1-nano) to refine the agent's policy for specific criteria, such as output format adherence or content accuracy.

The demonstration will focus on an agent named "Product Description Manager."

Motivation / Problem:

EAT aims for autonomous and data-driven agent evolution. This requires capabilities to:

  • Adapt agents to new LLMs or evolving functional requirements based on high-level goals.
  • Refine agent prompts and tool descriptions to optimize LLM interactions and improve discoverability.
  • Continuously improve agent performance and policy based on experience and targeted feedback signals, leveraging modern fine-tuning techniques like RFT.

This unified demo will showcase a practical, end-to-end self-improvement loop within the EAT framework.

Proposed Solution & Evolution Stages:

The demo will evolve a "Product Description Manager" agent through the following stages:

Stage 1: Initial Conversational Agent (GPT-4.1 based)

  • Goal: Create a foundational agent for managing product descriptions.
  • Capability: Basic CRUD-like operations (e.g., Create, Retrieve, Update) for product descriptions.
  • Interaction Style: Conversational ReAct (requiring step-by-step guidance via prompts to SystemAgent or the agent directly).
  • LLM: GPT-4.1.
  • EAT Orchestration: SystemAgent receives a natural language requirement (e.g., "Create a 'Product Description Manager' agent using GPT-4.1, capable of conversational interaction, to handle product descriptions."). SystemAgent uses its internal tools (CreateComponentTool) to instantiate this agent and its necessary tools (e.g., RetrieveProductTool, UpdateProductTool, SaveProductTool).
  • Entity Store: A simple examples/evolution_workflow/entity_store.py will simulate product data persistence.

Stage 2: Model Optimization & Prompt/Description Enhancement (GPT-4.1-nano)

  • Goal: Adapt the agent to a more efficient model (GPT-4.1-nano) and improve its self-description (meta-prompt) and its tools' descriptions for better usability, LLM guidance, and discoverability.
  • EAT Orchestration: SystemAgent receives a new goal (e.g., "Evolve the 'Product Description Manager' to use GPT-4.1-nano. Also, refine its main description and the descriptions of its 'RetrieveProductTool' and 'UpdateProductTool' to be more precise, focusing on JSON output and key product attributes.").
    • SystemAgent uses EvolveComponentTool to change the base model reference in the agent's configuration/code.
    • SystemAgent uses EvolveComponentTool (leveraging its LLM) to rewrite the agent's description in its AgentMeta and the descriptions of its associated tools based on the refinement instructions.
  • Interaction Style: Remains Conversational ReAct.
  • LLM: GPT-4.1-nano.

Stage 3: Reinforcement Fine-Tuning for Policy Improvement (GPT-4.1-nano RFT)

  • Goal: Improve the "Product Description Manager" agent's policy for specific criteria, such as strict adherence to a JSON output schema for product descriptions, completeness of required fields, and conciseness.
  • LLM: The target for RFT is GPT-4.1-nano.
  • RFT Process (Orchestrated by Demo Script using OpenAI API):
    1. Define a Grader:
      • Implement a grader configuration (e.g., a multi-grader).
      • Example RFT Goal: The RFT-enhanced agent must output product information as a JSON object strictly adhering to the schema: {"product_id": "string", "name": "string", "price": "float", "features": ["string"], "category": "string", "short_summary": "string (max 50 words)"}. It must always include product_id, name, and price.
      • Example sub-graders:
        • A Python grader (or json_schema type if directly supported by OpenAI RFT grader options) to check JSON schema validity and presence/type of required fields (product_id, name, price).
        • A string_check grader to ensure category is one of predefined valid categories.
        • A score_model grader (using another LLM like gpt-4o-mini) to evaluate the short_summary for conciseness (e.g., < 50 words) and relevance to the input features.
    2. Prepare Dataset (JSONL):
      • Create training_set.jsonl and validation_set.jsonl (small, illustrative for POC).
      • Each line:
        • messages: User prompts (e.g., "Create a description for a product with ID 'XYZ', name 'Quantum Widget', price 99.99, features ['self-calibrating', 'eco-friendly'], category 'Electronics'.")
        • Reference fields for grading (e.g., expected_json_schema: <schema_dict>, expected_category_values: ["Electronics", "Software"], max_summary_length: 50).
    3. Upload Files: Upload datasets to OpenAI.
    4. Create Fine-Tune Job (RFT):
      • Use OpenAI API to start an RFT job with the GPT-4.1-nano base model, training/validation file IDs, the grader configuration, and the target JSON schema for structured outputs.
    5. Monitor & Evaluate: (Briefly) Monitor the job via API. For POC, we might not wait for full completion but show job submission.
    6. Deploy/Use Fine-Tuned Model: Obtain the fine-tuned model ID.
  • EAT Integration:
    • The fine-tuned model ID from RFT will be stored in SmartLibrary as a new agent version (e.g., "Product Description Manager v1.2-RFT") or as metadata for the existing GPT-4.1-nano version.
    • SystemAgent (or direct calls in the demo) will then use this RFT-enhanced agent, demonstrating improved output quality.
  • Interaction Style: The fine-tuned agent (GPT-4.1-nano based) will still operate conversationally (ReAct), but its policy (choices, generation quality, and adherence to the RFT goal like strict JSON output) will be demonstrably improved.

Key EAT Components Involved:

  • SystemAgent: Orchestrates agent creation (Stage 1) and LLM/description evolution (Stage 2). Uses CreateComponentTool, EvolveComponentTool, SearchComponentTool.
  • SmartLibrary: Stores different versions of the "Product Description Manager" agent, metadata about LLMs, evolution history, and the RFT-tuned model ID.
  • EvolveComponentTool: Used by SystemAgent to modify agent code/config for LLM changes and to regenerate descriptions via LLM.
  • LLMService: Provides access to GPT-4.1, GPT-4.1-nano for agent execution and description generation.
  • AgentBus: (Conceptually) Logs interactions that inform the RFT dataset creation (dataset is manually curated for POC).

Acceptance Criteria:

  1. A new example script examples/evolution_workflow/unified_agent_evolution_demo.py runs to completion.
  2. Stage 1: SystemAgent successfully creates a conversational "Product Description Manager" (v1.0) using GPT-4.1. Basic product management tasks succeed. SmartLibrary reflects this.
  3. Stage 2: SystemAgent successfully evolves the agent to "Product Description Manager" (v1.1) using GPT-4.1-nano. The agent's self-description and its tools' descriptions are demonstrably refined (e.g., logged old vs. new). Tasks still succeed conversationally. SmartLibrary reflects this new version.
  4. Stage 3 (RFT):
    • The demo script defines and prints a suitable RFT grader configuration.
    • The script defines and prints sample RFT training/validation data.
    • The script successfully initiates an OpenAI RFT job (API call made) targeting a GPT-4.1-nano model. The job ID is logged.
    • (If feasible within demo runtime/cost or using a pre-tuned ID) The fine-tuned model ID is retrieved and associated with an agent version (e.g., v1.1-RFT) in SmartLibrary.
    • The RFT-enhanced agent, when prompted for product descriptions, demonstrably adheres more closely to the RFT goal (e.g., consistent JSON output matching the schema) compared to its pre-RFT version (v1.1).
  5. SmartLibrary (e.g., unified_evolution_demo_library.json) contains the distinct agent versions with metadata reflecting their LLMs, refined descriptions, and RFT status.
  6. Logs (e.g., unified_evolution_demo.log) clearly show SystemAgent's orchestration for Stages 1 & 2, and the demo script's actions for Stage 3 RFT.
  7. A simple entity store (examples/evolution_workflow/entity_store.py) and tools (examples/evolution_workflow/tools/) are implemented and used by the agents.
  8. The demo runs with standard environment setup (.env for OPENAI_API_KEY). INTENT_REVIEW_ENABLED set to false.

Implementation Details & Considerations:

  • RFT Focus: The RFT stage will primarily focus on demonstrating the process of setting up and initiating an RFT job for policy improvement, especially output structuring.
  • RFT Dataset & Grader: For the POC, the RFT dataset will be small and illustrative. The grader will be designed to be implementable with OpenAI's grader types.
  • RFT Job Management: The demo will show job submission. Full job completion and extensive evaluation might be out of scope for a quick POC run, but the path to using the fine-tuned model will be shown.
  • Description Evolution: The "description enhancement" in Stage 2 will be explicitly prompted to SystemAgent, which will then use its LLM capabilities (likely via EvolveComponentTool or a dedicated tool) to rewrite the provided descriptions.
  • Model Compatibility for RFT: The demo will aim to use GPT-4.1-nano for RFT conceptually. If the OpenAI API for RFT strictly requires o4-mini at the time of implementation, the script will use o4-mini for the actual fine_tuning/jobs API call, clearly logging this substitution.

Open Questions/Challenges:

  • RFT Job Duration/Cost: Actual RFT job completion can be time-consuming and incur costs. The POC should manage expectations, possibly by simulating the retrieval of a fine-tuned model ID after job submission.
  • Measuring "Improved Descriptions": For Stage 2, "demonstrably improves/refines its description" can be shown by logging the "before" and "after" descriptions generated by the LLM.

References:

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions