Optimize text components—AI prompts, code, or instructions—of any system using reflective text evolution.
GEPA (Genetic-Pareto) is a framework for optimizing arbitrary systems composed of text components—like AI prompts, code snippets, or textual specs—against any evaluation metric. It employs LLMs to reflect on system behavior, using feedback from execution and evaluation traces to drive targeted improvements. Through iterative mutation, reflection, and Pareto-aware candidate selection, GEPA evolves robust, high-performing variants with minimal evaluations, co-evolving multiple components in modular systems for domain-specific gains.
This repository provides the official implementation of the GEPA algorithm as proposed in the paper titled "GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning" (https://arxiv.org/abs/2507.19457). In order to reproduce experiments from the paper, we provide a separate reproduction artifact.
pip install gepa
To install the very latest from main
:
pip install git+https://github.com/gepa-ai/gepa.git
The easiest and most powerful way to use GEPA for prompt optimization is within DSPy, where the GEPA algorithm is directly available through the
dspy.GEPA
API. Directly executable tutorial notebooks are at dspy.GEPA Tutorials.
GEPA can be run in just a few lines of code. In this example, we'll use GEPA to optimize a system prompt for math problems from the AIME benchmark (full tutorial). Run the following in an environment with OPENAI_API_KEY
:
import gepa
# Load AIME dataset
trainset, valset, _ = gepa.examples.aime.init_dataset()
seed_prompt = {
"system_prompt": "You are a helpful assistant. You are given a question and you need to answer it. The answer should be given at the end of your response in exactly the format '### <final answer>'"
}
# Let's run GEPA optimization process.
gepa_result = gepa.optimize(
seed_candidate=seed_prompt,
trainset=trainset, valset=valset,
task_lm="openai/gpt-4.1-mini", # <-- This is the model being optimized
max_metric_calls=150, # <-- Set a budget
reflection_lm="openai/gpt-5", # <-- Use a strong model to reflect on mistakes and propose better prompts
)
print("GEPA Optimized Prompt:", gepa_result.best_candidate['system_prompt'])
Here, we can see the optimized prompt that GEPA generates for AIME, which achieves improves GPT-4.1 Mini's performance from 46.6% to 56.6%, an improvement of 10% on AIME 2025. Note the details captured in the prompts in just 2 iterations of GEPA. GEPA can be thought of as precomputing some reasoning (during optimization) to come up with a good plan for future task instances.
GEPA is built around a flexible GEPAAdapter abstraction that lets it plug into any system and optimize different types of text snippets. The above example used a simple DefaultAdapter
that plugs into a single-turn LLM environment and evolves system prompts, where tasks are presented as user messages. GEPA can be easily extended to multi-turn and other agentic settings. For example, the dspy.GEPA
integration uses a DSPyAdapter.
Beyond prompt optimization, GEPA can evolve entire programs. The DSPy Full Program Adapter
demonstrates this by evolving complete DSPy programs—including custom signatures, modules, and control flow logic. Starting from a basic dspy.ChainOfThought("question -> answer")
that achieves 67% on the MATH benchmark, GEPA evolves a multi-step reasoning program that reach 93% accuracy. A fully executable example notebook shows how to use this adapter.
GEPA can be used to optimize any system consisting of textual components. Follow these steps:
- Implement
GEPAAdapter
: In order to allow the GEPA optimizer to pair with your system and its environment, users can implement theGEPAAdapter
interface defined in src/gepa/core/adapter.py.GEPAAdapter
requires 2 methods:- Evaluate: Given a candidate consisting of proposed text components, and a minibatch of inputs sampled from the train/val sets, evaluate and return execution scores, also capturing the system traces.
- Extract Traces for Reflection: Given the execution traces obtained from executing a proposed candidate, and a named component being optimized, return the textual content from the traces relevant to the named component.
- Prepare trainset and valset: Lists of example inputs and task metadata.
- Call
gepa.optimize
with your adapter, metric, and system configuration.
We are actively working on implementing adapters to integrate into many different frameworks. Please open an issue if there's a specific framework you would like to see supported!
Terminal-bench is a benchmark for evaluating the performance of terminal-use agents. Terminus is a leading terminal-use agent. In this script, we use GEPA to optimize the system prompt/terminal-use instruction for the Terminus agent through a custom GEPAAdapter
implementation.
Note that the terminus agent as well as terminal-bench run in an external environment and is integrated into GEPA via the TerminusAdapter
.
To run this example:
pip install terminal-bench
python src/gepa/examples/terminal-bench/train_terminus.py --model_name=gpt-5-mini
The Generic RAG Adapter enables GEPA to optimize Retrieval-Augmented Generation (RAG) systems using any vector store (ChromaDB, Weaviate, Qdrant, Pinecone) through a pluggable interface. It optimizes query reformulation, context synthesis, answer generation, and document reranking simultaneously.
See the complete RAG adapter examples and documentation for usage examples, supported vector stores, and step-by-step guides.
GEPA optimizes text components of systems using an evolutionary search algorithm that uses LLM-based reflection for mutating candidates. Most importantly, GEPA leverages task-specific textual feedback (for example, compiler error messages, profiler performance reports, documentation, etc.) to guide the search process. For further details, refer to the paper: GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning.
We encourage the community and users to help us develop adapters to allow GEPA to be used for optimizing all kinds of systems leveraging textual components. Refer to DSPy/GEPAAdapter and src/gepa/adapters/ for example GEPAAdapter
implementations. Please feel free to flag any problems faced as issues.
- Paper: 📄 GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (arXiv:2507.19457)
- Experiment reproduction artifact: GEPA Artifact Repository
- Talk Slides: GEPA Talk Slides
- Tutorials & Examples:
- dspy.GEPA Tutorials, with executable notebooks
Step-by-step notebooks showing how to use GEPA for practical optimization tasks via DSPy, including math, structured data extraction for enterprise tasks and privacy conscious delegation task. - Video tutorial by @weaviate on using dspy.GEPA to optimize a listwise reranker
- Matei Zaharia - Reflective Optimization of Agents with GEPA and DSPy
- Building and optimizing a multi-agent system for healthcare domain using DSPy+GEPA
- dspy.GEPA Tutorials, with executable notebooks
- Social and Discussion:
- X (formerly Twitter) Announcement Thread (Lakshya A Agrawal)
- GEPA covered by VentureBeat
- GEPA's use by Databricks covered by VentureBeat
- Stay up to date:
- Questions, Discussions?
- GEPA Integrations:
Want to use GEPA in other frameworks?- DSPy Adapter Code (integrates GEPA with DSPy),
- Contributed Adapters – see our adapter templates and issue tracker to request new integrations.
- DefaultAdapter - System Prompt Optimization for a single-turn task.
- DSPy Full Program Adapter - Evolves entire DSPy programs including signatures, modules, and control flow. Achieves 93% accuracy on MATH benchmark (vs 67% with basic DSPy ChainOfThought).
- Generic RAG Adapter - Vector store-agnostic RAG optimization supporting ChromaDB, Weaviate, Qdrant, Pinecone, and more. Optimizes query reformulation, context synthesis, answer generation, and document reranking prompts.
- TerminalBench Adapter - Easily integrating GEPA into a Terminus, a sophisticated external agentic pipeline, and optimizing the agents' system prompt.
- AnyMaths Adapter - Adapter for optimizing mathematical problem-solving and reasoning tasks. Contributed by @egmaminta.
- GEPA uses
- Context Compression using GEPA
- GEPA Integration into SuperOptiX-AI
- GEPA for Observable Javascript
- bandit_dspy
- GEPA in Go Programming Language
- 100% accuracy using GEPA on the clock-hands problem
- Prompt Optimization for Reliable Backdoor Detection in AI-Generated Code
- Teaching LLMs to Diagnose Production Incidents with ATLAS+GEPA
- DataBricks: Building State-of-the-Art Enterprise Agents 90x Cheaper with GEPA
- comet-ml/opik adds support for GEPA
If you use this repository, or the GEPA algorithm, kindly cite:
@misc{agrawal2025gepareflectivepromptevolution,
title={GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning},
author={Lakshya A Agrawal and Shangyin Tan and Dilara Soylu and Noah Ziems and Rishi Khare and Krista Opsahl-Ong and Arnav Singhvi and Herumb Shandilya and Michael J Ryan and Meng Jiang and Christopher Potts and Koushik Sen and Alexandros G. Dimakis and Ion Stoica and Dan Klein and Matei Zaharia and Omar Khattab},
year={2025},
eprint={2507.19457},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.19457},
}