fix: Add fine grained interruptible task restarts in watch mode #11135

johnpyp · 2025-11-18T22:03:46Z

Description

This PR refactors watch mode task restarts to be "fine-grained", such that if a package changes, only active tasks which rely on that output get killed and restarted.

(Similar to #10846, but taking a slightly different approach which I think is cleaner).

Testing Instructions

Manual testing confirms the issue is fixed with this change. There aren't any existing automated tests for watch and the like, given that it's a long running process I assume.

vercel · 2025-11-18T22:03:55Z

@johnpyp is attempting to deploy a commit to the Vercel Team on Vercel.

A member of the Team first needs to authorize it.

anthonyshew · 2025-11-20T03:57:30Z

crates/turborepo-process/src/lib.rs

+                .filter(|c| predicate(c))
+                .cloned()
+                .collect();
+
+            lock.children.retain(|c| !predicate(c));


Between the evaluation in the .filter() and the .retain() a child's state might change. We should do something more like:

let (matching, remaining): (Vec<_>, Vec<_>) = lock.children .drain(..) .partition(|c| predicate(c)); lock.children = remaining;

anthonyshew · 2025-11-20T03:59:39Z

crates/turborepo-lib/src/task_graph/visitor/exec.rs

+                // process manager is closing. If it is closing, it means we are shutting down
+                // the entire run. If it is not closing, it means we are restarting specific
+                // tasks.
+                if self.manager.is_closing() {


This is potentially dangerous based on race conditions and dependent task behavior. What tests can we write to validate that we're not creating bugs here?

Example Race:

Thread 1 (exec.rs): Thread 2 (manager): child exits is_closing() → false close() called is_closing = true return Restarted ❌

That should've been Shutdown.

If we introduce some atomics, this concern would get addressed.

anthonyshew · 2025-11-20T04:07:46Z

crates/turborepo-process/src/child.rs

-pub struct ChildCommandChannel(mpsc::Sender<ChildCommand>);
+pub struct ChildCommandChannel {
+    sender: mpsc::Sender<ChildCommand>,
+    task_id: Option<TaskId<'static>>,


This is a little rough, because now the child process has to be aware of task_ids . These channels shouldn't be aware of "business logic" ideally, to preserve encapsulation.

I'm also worried about the lifetime implications. The TaskId is going to live the entire program, but line 433 has a .clone() that can end up referencing tasks that are non-static. That could result in panics.

Another small, small worry I have, but a worry nonethless, is the memory overhead associated with this strategy. TaskIds aren't dropped until the Child is dropped, so its possible to see some memory in high concurrency scenarios. (I haven't taken the time to validate this, but based on my other comments, I'd love to see an architectural change where this is no longer a concern).

Hmm fair enough. Is there a better identifier or way of getting the "set of processes mapped to tasks" that would allow us to not have the process code care about tasks at all?

That was the primary thing I wasn't confident in going into this change without being too familiar with the codebase.

After sleeping on this, I'm thinking the best way (still not great) we can go about this is if ProcessManager gets indices into Vec<Child>. Some pseudocoding:

#[derive(Debug)] struct ProcessManagerInner { is_closing: bool, // None = inactive/dead process slot children: Vec<Option<Child>>, // Maps task_id -> indices in the children vec // We keep dead indices here and filter them on access task_index: HashMap<TaskId<'static>, Vec<usize>>, size: Option<PtySize>, } impl ProcessManagerInner { fn is_index_active(&self, idx: usize) -> bool { self.children.get(idx) .and_then(|opt| opt.as_ref()) .is_some() } fn get_active_children_for_task(&self, task_id: &TaskId) -> Vec<Child> { self.task_index .get(task_id) .map(|indices| { indices.iter() .filter_map(|&idx| { self.children.get(idx) .and_then(|opt| opt.as_ref()) .cloned() }) .collect() }) .unwrap_or_default() } }

Now when a process exits, we don't need to update task_index immediately, it will get filtered.

// Mark index as dead - just set to None lock.children[3] = None;

Stopping a task now looks like:

pub async fn stop_tasks(&self, task_ids: &[TaskId<'static>]) { let children_to_stop = { let mut lock = self.state.lock().unwrap(); let mut children = Vec::new(); for task_id in task_ids { if let Some(indices) = lock.task_index.get(task_id) { for &idx in indices { // Check if still active if let Some(Some(child)) = lock.children.get(idx) { children.push(child.clone()); // Mark as inactive lock.children[idx] = None; } } } // Remove task from index lock.task_index.remove(task_id); } children }; // Stop outside the lock for mut child in children_to_stop { child.stop().await; } }

What do you think? Is that enough to have another go with or do you need more detail?

anthonyshew

Hi, thank you for this PR. I've put some comments into the review, but, as a top-level review comment, I'd love to see an approach to solve this such that we can be minimally invasive to the codebase outside of watch.rs. In truth, turbo watch deserves a large refactor, and so I'd love to keep the blast radius of PRs for fixes on the existing code to be small.

I completely understand that what I'm asking may border on impossible. This code is quite tricky, and has no test coverage, but I'd love to keep exploring what's possible here.

johnpyp changed the title ~~fix: interruptible, persistent tasks kill a targeted subset of the graph~~ fix: fine grained interruptible task restarts in watch mode Nov 18, 2025

johnpyp changed the title ~~fix: fine grained interruptible task restarts in watch mode~~ fix: add fine grained interruptible task restarts in watch mode Nov 18, 2025

johnpyp force-pushed the fix/interruptible-task-termination branch 2 times, most recently from 0475181 to c091360 Compare November 18, 2025 22:19

fix: Add fine grained interruptible task restarts in watch mode

de049c1

johnpyp force-pushed the fix/interruptible-task-termination branch from c091360 to de049c1 Compare November 18, 2025 23:40

johnpyp changed the title ~~fix: add fine grained interruptible task restarts in watch mode~~ fix: Add fine grained interruptible task restarts in watch mode Nov 18, 2025

johnpyp marked this pull request as ready for review November 18, 2025 23:43

johnpyp requested a review from a team as a code owner November 18, 2025 23:43

johnpyp requested a review from tknickman November 18, 2025 23:43

Merge branch 'main' into fix/interruptible-task-termination

eb31073

anthonyshew reviewed Nov 20, 2025

View reviewed changes

anthonyshew requested changes Nov 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Add fine grained interruptible task restarts in watch mode #11135

fix: Add fine grained interruptible task restarts in watch mode #11135

johnpyp commented Nov 18, 2025 •

edited

Loading

Uh oh!

vercel bot commented Nov 18, 2025

Uh oh!

anthonyshew Nov 20, 2025

Uh oh!

anthonyshew Nov 20, 2025

Uh oh!

anthonyshew Nov 20, 2025

Uh oh!

anthonyshew Nov 20, 2025

Uh oh!

anthonyshew Nov 20, 2025

Uh oh!

johnpyp Nov 20, 2025

Uh oh!

anthonyshew Nov 20, 2025

Uh oh!

anthonyshew left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Add fine grained interruptible task restarts in watch mode #11135

Are you sure you want to change the base?

fix: Add fine grained interruptible task restarts in watch mode #11135

Conversation

johnpyp commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing Instructions

Uh oh!

vercel bot commented Nov 18, 2025

Uh oh!

anthonyshew Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

anthonyshew Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

anthonyshew Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

anthonyshew Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

anthonyshew Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

johnpyp Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

anthonyshew Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

anthonyshew left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johnpyp commented Nov 18, 2025 •

edited

Loading