这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@anj-s
Copy link
Collaborator

@anj-s anj-s commented Sep 18, 2025

TLDR

This PR introduces a new write_todos_list tool that allows the agent to create and manage a checklist of tasks for complex user requests. This helps the agent track its progress, organize its work, and provides the user with visibility into the agent's plan.

Dive Deeper

The write_todos_list tool is a declarative tool that enables the agent to manage a list of tasks with the following statuses: pending, in_progress, completed, and cancelled. The agent is guided by an updated system prompt on when and how to use this tool, with a focus on using it for complex, multi-step tasks and avoiding it for simple requests.

The tool is enabled by a useWriteTodos flag in the configuration. The implementation includes the tool itself, along with comprehensive unit tests to ensure its functionality and validation logic are working correctly.

Reviewer Test Plan

To test this feature, you can enable the useWriteTodos flag in your settings and give the agent a complex task. Here are a few examples:

  1. Create a new feature:

    • Prompt: add a new feature to the CLI that allows users to configure the output format of the response.
    • Expected behavior: The agent should create a todo list with steps like add a new configuration option, implement the logic to format the output, add tests for the new feature, etc.
  2. Build a simple application:

    • Prompt: create a simple web app that uses the Gemini API to answer questions.
    • Expected behavior: The agent should break down the task into smaller sub-tasks and create a todo list to track its progress.
  3. Debug an issue:

    • Prompt: The application is crashing when I try to upload a file. Can you help me debug and fix the issue?
    • Expected behavior: The agent should create a todo list to investigate the issue, such as reproduce the crash, examine the logs, identify the root cause, implement a fix, and verify the fix.

Fixes #4580

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Linked issues / bugs

@owenofbrien
Copy link
Collaborator

It's weird that a noop tool would improve performance but I assume you have run evals and shown that this improves things.

I think it's primarily a way to:
a) encourage the model to actually make a plan for complex tasks, and
b) encourage the model to explicitly update the plan as tasks are completed or become obsolete

I wonder if just adding instructions for a) and b) to the system prompt could yield a similar performance impact. @anj-s wdyt?

@anj-s
Copy link
Collaborator Author

anj-s commented Sep 19, 2025

It's weird that a noop tool would improve performance but I assume you have run evals and shown that this improves things.

Its not a noop tool as explained above. This helps the model create a list of items and track it. yes, this improves evals and is a known method for doing so.

@anj-s
Copy link
Collaborator Author

anj-s commented Sep 19, 2025

It's weird that a noop tool would improve performance but I assume you have run evals and shown that this improves things.

I think it's primarily a way to: a) encourage the model to actually make a plan for complex tasks, and b) encourage the model to explicitly update the plan as tasks are completed or become obsolete

I wonder if just adding instructions for a) and b) to the system prompt could yield a similar performance impact. @anj-s wdyt?

We have this in the system prompt but its not something the model does consistently and does not involve the model updating the plan list at every turn. We ideally want the todo list to be the only plan list that the model is tracking

@anj-s anj-s requested a review from a team as a code owner September 19, 2025 20:21
@anj-s anj-s added this pull request to the merge queue Sep 20, 2025
Merged via the queue into main with commit 44691a4 Sep 20, 2025
19 checks passed
@anj-s anj-s deleted the u/anj/write-todos branch September 20, 2025 13:05
nagendrareddy10 pushed a commit to nagendrareddy10/gemini-cli that referenced this pull request Sep 22, 2025
Co-authored-by: joshualitt <joshualitt@google.com>
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com>
Co-authored-by: matt korwel <matt.korwel@gmail.com>
Co-authored-by: gemini-cli-robot <gemini-cli-robot@google.com>
Co-authored-by: Jacob MacDonald <jakemac@google.com>
Co-authored-by: Shreya Keshive <skeshive@gmail.com>
yashv6655 added a commit to yashv6655/gemini-cli that referenced this pull request Sep 22, 2025
Co-authored-by: joshualitt <joshualitt@google.com>
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com>
Co-authored-by: matt korwel <matt.korwel@gmail.com>
Co-authored-by: gemini-cli-robot <gemini-cli-robot@google.com>
Co-authored-by: Jacob MacDonald <jakemac@google.com>
Co-authored-by: Shreya Keshive <skeshive@gmail.com>
thacio added a commit to thacio/auditaria that referenced this pull request Oct 3, 2025
giraffe-tree pushed a commit to giraffe-tree/gemini-cli that referenced this pull request Oct 10, 2025
Co-authored-by: joshualitt <joshualitt@google.com>
Co-authored-by: Tommaso Sciortino <sciortino@gmail.com>
Co-authored-by: matt korwel <matt.korwel@gmail.com>
Co-authored-by: gemini-cli-robot <gemini-cli-robot@google.com>
Co-authored-by: Jacob MacDonald <jakemac@google.com>
Co-authored-by: Shreya Keshive <skeshive@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explore plan + execute pattern