+
Skip to content

Conversation

austin-aryn-ai
Copy link
Contributor

…etries to Gemini, remove retries from VLMTableStructureExtractor

…etries to Gemini, remove retries from VLMTableStructureExtractor
@austin-aryn-ai austin-aryn-ai requested a review from bsowell June 17, 2025 02:50
@austin-aryn-ai
Copy link
Contributor Author

If you're OK with the approach taken here, I can add some tests.

Copy link
Contributor

@bsowell bsowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the async thing. Otherwise looks fine.

return d

@staticmethod
def extract_table_block(html_str: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a concrete problem with this, but another option might be to use BeautifulSoup, which we should already have as a dependency. It theoretically handles "broken" HTML, and might theoretically be able to recover from other kinds of issues. Not a big priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using BeautifulSoup.

def __init__(
self,
chain: list[LLM],
response_checker: Optional[Callable[[str], bool]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. let's start with this. I'd love to get to the point where we can have a library of "validators" that can be used with LLM calls.

timeout=120.0,
)
async def generate_content_async(self, model, contents, config):
return self._client.models.generate_content(model=model, contents=contents, config=config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need to be self._client.aio.models.generate_content or need an await?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! I'll add 'await'.

Comment on lines +565 to +567
if res.startswith("```html"):
res = res[7:].rstrip("`")
res = res.strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought is that you could have the html checker handle this as well, since it's pretty common for llms to use the backtick thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we want the response check to modify the response. Also, it's only used when the LLM is a ChainedLLM so we still need this code for non-chained LLMs.

@austin-aryn-ai austin-aryn-ai merged commit fff2c54 into main Jun 18, 2025
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载