+
Skip to content

Conversation

MarkLindblad
Copy link
Contributor

No description provided.

Comment on lines +169 to +170
elems.sort(key=generate_elem_prefer_left(left_col))
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just return here, making the special case self-contained and not needing the else and indentation.

left_col = []
right_col = []
for elem in elems:
if elem.type == "Page-footer" or elem.type == "Page-header":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably also exclude titles. Not sure about section headers.

Comment on lines +144 to +147
if x1 < 0.6 and x2 < 0.6:
left_col.append(elem)
elif x1 > 0.4 and x2 > 0.4:
right_col.append(elem)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, this gets fooled by a 5-column newspaper-ish layout. It puts the first three columns in left. It also calls something two-column if it's one column on the side, but that may end up OK after processing.

The .4 and .6 constants are kind-of magic. I'm sure there will be false positives and false negatives.

Comment on lines +138 to +139
left_col = []
right_col = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing uses right_col. Might as well nuke it, unless you plan to use it. Perhaps if either array is empty, return false and sort by y.

right_col.append(elem)
else:
return False, [], []
return True, left_col, right_col
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning a list of "lefts", we could use the existing tagging approach used in this file. Then we might not need a fancy closure-based sort key.

is_entirely_two_columns, left_col, _ = check_entirely_two_columns(elems)
if is_entirely_two_columns:
# if the entire page is two columns, sort the left column first
elems.sort(key=generate_elem_prefer_left(left_col))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if we just used elem_left_top here?

@MarkLindblad
Copy link
Contributor Author

While this solution improves robustness, we have chosen an even more robust solution here: #1273.

@MarkLindblad MarkLindblad deleted the mark/one-fix-multicol branch May 27, 2025 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载