+
Skip to content

Conversation

bsowell
Copy link
Contributor

@bsowell bsowell commented May 16, 2025

This calls an LLM to determine the cells of a table.

This calls an LLM to determine the cells of a table.
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the VLMTableStructureExtractor to enhance table structure extraction via an LLM, along with related utility functions and tests.

  • Added a new _crop_bbox helper function and the VLMTableStructureExtractor class for processing table images.
  • Extended unit and integration tests to validate new LLM-based table extraction, and updated deserialization methods in Gemini and Anthropic implementations.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
lib/sycamore/sycamore/transforms/table_structure/extract.py Added _crop_bbox and VLMTableStructureExtractor to extract table structure using an LLM.
lib/sycamore/sycamore/tests/unit/llms/test_llms.py Added a new test case for ensuring proper Gemini pickling.
lib/sycamore/sycamore/tests/integration/transforms/test_table_extraction.py Added integration tests for table extraction using various LLMs.
lib/sycamore/sycamore/llms/gemini.py Updated reduce to use a separate deserializer function.
lib/sycamore/sycamore/llms/anthropic.py Updated reduce to use a separate deserializer function.

"""Table structure extractor that uses a VLM model to extract the table structure."""

EXTRACT_TABLE_STRUCTURE_PROMPT = """You are given an image of a table from a document. Please convert this table into HTML. Be sure to include the table header and all rows. Use 'colspan' and 'rowspan' in the output to indicate merged cells. Return the HTML as a string. Do not include any other text in the response.
+"""
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be an extra '+' in the closing triple quotes for the prompt string in VLMTableStructureExtractor. Removing the extraneous '+' will prevent potential syntax errors.

Suggested change
+"""
"""

Copilot uses AI. Check for mistakes.

new_elem = extractor.extract(element=basic_table_element, doc_image=basic_table_image)
assert new_elem.table is not None

print(new_elem.table.to_html())
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider removing or replacing the print statement used for debugging in the test to maintain clean test outputs.

Suggested change
print(new_elem.table.to_html())
logging.debug(new_elem.table.to_html())

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@HenryL27 HenryL27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 q but lgtm

Comment on lines 494 to 498
# Convert cell bounding boxes to be relative to the original image.
for cell in table.cells:
if cell.bbox is None:
continue
cell.bbox.translate_self(crop_box[0], crop_box[1]).to_relative_self(width, height)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't all the cell bboxes be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, yes. I guess I was just on auto-pilot. I'll go ahead and remove this. Part of me just wanted to leave it as a defense mechanism, but I can't think of a way that an llm could hallucinate bounding boxes in a way we would interpret.

@bsowell bsowell merged commit 52128e6 into main May 16, 2025
11 of 15 checks passed
@bsowell bsowell deleted the ben/vlm_table_extract branch May 16, 2025 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载