+
Skip to content

Conversation

alexaryn
Copy link
Collaborator

@alexaryn alexaryn commented Jun 9, 2025

No description provided.

@alexaryn alexaryn requested a review from bsowell June 9, 2025 21:42
@alexaryn alexaryn marked this pull request as ready for review June 10, 2025 00:08
@alexaryn alexaryn changed the title Deal with rotated tables. Sycamore: deal with rotated tables. Jun 10, 2025
Copy link
Contributor

@bsowell bsowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a few tests? I would think it would be feasible to create a pdf with an otherwise straightforward table rotated a few different ways and then check that the extraction does what we expect.

@alexaryn alexaryn requested a review from karanataryn June 10, 2025 18:28
@karanataryn karanataryn requested a review from Copilot June 10, 2025 22:07
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces utilities and modifications to better handle rotated tables in image documents. Key changes include:

  • New rotation helper functions in rotation.py to support image and coordinate rotations.
  • Enhancements in table structure extraction and text extraction, including handling of font size and additional vector data.
  • Improved TableElement handling with a new shallow copy method and updated rotation logic in table extraction.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
lib/sycamore/sycamore/utils/rotation.py Introduces rotation utility functions for images, coordinates, and bounding boxes.
lib/sycamore/sycamore/transforms/text_extraction/text_extractor.py Updates text extraction to incorporate font size and vector properties.
lib/sycamore/sycamore/transforms/table_structure/extract.py Adds rotated_table and modifications to apply table rotation adjustments during extraction.
lib/sycamore/sycamore/transforms/detr_partitioner.py Modifies token processing to include vector data and iterates page elements by index for in-place updates.
lib/sycamore/sycamore/data/element.py Introduces a new shallow copy method for TableElement.

@alexaryn alexaryn merged commit 06d0271 into main Jun 11, 2025
12 of 15 checks passed
@alexaryn alexaryn deleted the alex_syc_rotated branch June 11, 2025 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载