+
Skip to content

Conversation

bsowell
Copy link
Contributor

@bsowell bsowell commented May 26, 2025

This PR combines two commits in an attempt to the get the fast notebook tests to pass.

Previously the _infer_prompt method would call asyncio.run to start up an
event loop in the case that an LLM was in async mode. This causes issues in
environments like Jupyter notebooks that already have a running event loop. To
avoid conflicts, we modify this to start a new thread and run the new event
loop there.

I also switched from GPT_3_5_TURBO_INSTRUCT to GPT_4O_MINI to avoid context errors.

bsowell added 3 commits May 26, 2025 14:36
Previously the _infer_prompt method would call asyncio.run to start up an
event loop in the case that an LLM was in async mode. This causes issues in
environments like Jupyter notebooks that already have a running event loop. To
avoid conflicts, we modify this to start a new thread and run the new event
loop there.
I was getting context length errors, and 4O_MINI is cheaper anyway.
@bsowell bsowell requested review from HenryL27 and Copilot May 26, 2025 21:41
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issues with fast notebook tests by changing the LLM model and updating the approach for running asynchronous event loops in environments with an already running loop.

  • Change from GPT_3_5_TURBO_INSTRUCT to GPT_4O_MINI in the notebook script
  • Update Python version in the notebook metadata
  • Replace asyncio.run with a new-thread based approach for running an async loop in base_llm.py

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
notebooks/default-prep-script.ipynb Updated LLM model and Python version to support faster notebook tests
lib/sycamore/sycamore/transforms/base_llm.py Modified asynchronous execution to run a new event loop in a separate thread


fut = asyncio.run_coroutine_threadsafe(_infer_prompts_async([p for _, p in nonempty], llm), new_loop)

responses = fut.result()
Copy link

Copilot AI May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a timeout to fut.result() (e.g., fut.result(timeout=some_seconds)) to prevent potential indefinite blocking if the asynchronous tasks encounter issues.

Suggested change
responses = fut.result()
try:
responses = fut.result(timeout=30) # Replace 30 with an appropriate timeout value
except concurrent.futures.TimeoutError:
new_loop.call_soon_threadsafe(new_loop.stop)
t.join()
new_loop.close()
raise TimeoutError("The asynchronous task timed out.")

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@HenryL27 HenryL27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man, thanks for fixing this. It's been bothering me that I couldn't get something like it to work.

@bsowell
Copy link
Contributor Author

bsowell commented May 27, 2025

Oh man, thanks for fixing this. It's been bothering me that I couldn't get something like it to work.

I found the asyncio docs a little thin, so I asked gemini "Write a short python program to demonstrate the behavior of the asyncio.run_coroutine_threadsafe function." and it worked surprisingly well!

@bsowell bsowell merged commit 0a41cb7 into main May 27, 2025
12 of 15 checks passed
@bsowell bsowell deleted the ben/async_notebook branch May 27, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载