Tolerate bad entity extraction. #1078

eric-anderson · 2024-12-17T21:07:23Z

Add tests for tolerating bad extraction.
Substantially refactor llm_filter so we don't have a giant function in docset.py
Also fix a minor bug where the tokenized filter was re-looking up the text in the element
rather than using the txt variable.

* Add tests for tolerating bad extraction. * Substantially refactor llm_filter so we don't have a giant function in docset.py * Also fix a minor bug where the tokenized filter was re-looking up the text in the element rather than using the txt variable.

eric-anderson added 2 commits December 16, 2024 19:00

Tolerate errors during threshold_filter

6e8f2b1

Tolerate bad entity extraction.

0096275

* Add tests for tolerating bad extraction. * Substantially refactor llm_filter so we don't have a giant function in docset.py * Also fix a minor bug where the tokenized filter was re-looking up the text in the element rather than using the txt variable.

eric-anderson requested review from baitsguy and dhruvkaliraman7 December 17, 2024 21:07

baitsguy approved these changes Dec 17, 2024

View reviewed changes

eric-anderson merged commit e4c213e into main Dec 17, 2024
12 of 14 checks passed

eric-anderson deleted the eric-upstream-docset branch December 17, 2024 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tolerate bad entity extraction. #1078

Tolerate bad entity extraction. #1078

Uh oh!

eric-anderson commented Dec 17, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tolerate bad entity extraction. #1078

Tolerate bad entity extraction. #1078

Uh oh!

Conversation

eric-anderson commented Dec 17, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants