Add VLMTableStructureExtractor for table structure extraction. #1304

bsowell · 2025-05-16T00:16:16Z

This calls an LLM to determine the cells of a table.

Copilot

Pull Request Overview

This PR introduces the VLMTableStructureExtractor to enhance table structure extraction via an LLM, along with related utility functions and tests.

Added a new _crop_bbox helper function and the VLMTableStructureExtractor class for processing table images.
Extended unit and integration tests to validate new LLM-based table extraction, and updated deserialization methods in Gemini and Anthropic implementations.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
lib/sycamore/sycamore/transforms/table_structure/extract.py	Added _crop_bbox and VLMTableStructureExtractor to extract table structure using an LLM.
lib/sycamore/sycamore/tests/unit/llms/test_llms.py	Added a new test case for ensuring proper Gemini pickling.
lib/sycamore/sycamore/tests/integration/transforms/test_table_extraction.py	Added integration tests for table extraction using various LLMs.
lib/sycamore/sycamore/llms/gemini.py	Updated reduce to use a separate deserializer function.
lib/sycamore/sycamore/llms/anthropic.py	Updated reduce to use a separate deserializer function.

Copilot · 2025-05-16T00:16:50Z

lib/sycamore/sycamore/transforms/table_structure/extract.py

+    """Table structure extractor that uses a VLM model to extract the table structure."""
+
+    EXTRACT_TABLE_STRUCTURE_PROMPT = """You are given an image of a table from a document. Please convert this table into HTML. Be sure to include the table header and all rows. Use 'colspan' and 'rowspan' in the output to indicate merged cells. Return the HTML as a string. Do not include any other text in the response.
+"""


There appears to be an extra '+' in the closing triple quotes for the prompt string in VLMTableStructureExtractor. Removing the extraneous '+' will prevent potential syntax errors.

Suggested change

+"""

"""

Copilot · 2025-05-16T00:16:51Z

lib/sycamore/sycamore/tests/integration/transforms/test_table_extraction.py

+    new_elem = extractor.extract(element=basic_table_element, doc_image=basic_table_image)
+    assert new_elem.table is not None
+
+    print(new_elem.table.to_html())


[nitpick] Consider removing or replacing the print statement used for debugging in the test to maintain clean test outputs.

Suggested change

print(new_elem.table.to_html())

logging.debug(new_elem.table.to_html())

HenryL27

1 q but lgtm

HenryL27 · 2025-05-16T00:20:27Z

lib/sycamore/sycamore/transforms/table_structure/extract.py

+        # Convert cell bounding boxes to be relative to the original image.
+        for cell in table.cells:
+            if cell.bbox is None:
+                continue
+            cell.bbox.translate_self(crop_box[0], crop_box[1]).to_relative_self(width, height)


shouldn't all the cell bboxes be null?

Lol, yes. I guess I was just on auto-pilot. I'll go ahead and remove this. Part of me just wanted to leave it as a defense mechanism, but I can't think of a way that an llm could hallucinate bounding boxes in a way we would interpret.

Add VLMTableStructureExtractor for table structure extraction.

a1e72c4

This calls an LLM to determine the cells of a table.

bsowell requested review from HenryL27, Copilot and dhruvkaliraman7 May 16, 2025 00:16

Copilot AI reviewed May 16, 2025

View reviewed changes

HenryL27 approved these changes May 16, 2025

View reviewed changes

Remove bounding box processing since we don't get bounding boxes.

e86ced8

bsowell merged commit 52128e6 into main May 16, 2025
11 of 15 checks passed

bsowell deleted the ben/vlm_table_extract branch May 16, 2025 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add VLMTableStructureExtractor for table structure extraction. #1304

Add VLMTableStructureExtractor for table structure extraction. #1304

Uh oh!

bsowell commented May 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 16, 2025

Uh oh!

Copilot AI May 16, 2025

Uh oh!

HenryL27 left a comment

Uh oh!

HenryL27 May 16, 2025

Uh oh!

bsowell May 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	print(new_elem.table.to_html())
	logging.debug(new_elem.table.to_html())

Add VLMTableStructureExtractor for table structure extraction. #1304

Add VLMTableStructureExtractor for table structure extraction. #1304

Uh oh!

Conversation

bsowell commented May 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 16, 2025

Choose a reason for hiding this comment

Uh oh!

HenryL27 left a comment

Choose a reason for hiding this comment

Uh oh!

HenryL27 May 16, 2025

Choose a reason for hiding this comment

Uh oh!

bsowell May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants