+
Skip to content

Conversation

HenryL27
Copy link
Collaborator

@HenryL27 HenryL27 commented Feb 7, 2025

Also refactors entity extractor to break up the megafunction

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
@HenryL27 HenryL27 marked this pull request as ready for review February 7, 2025 19:57
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Copy link
Contributor

@baitsguy baitsguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cleanup requests but functionally looks good


def validate(d: Document) -> bool:
if self._tokenizer is not None:
return d.properties.get(self._entity_name, "None") != "None"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._entity_name in d.properties?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, we need to check for "None" because that's what the llm returns if no val

if self._tokenizer is not None:
return d.properties.get(self._entity_name, "None") != "None"
else:
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can just combine the if else

return d
batches.append(curr_club)
else:
batches = [[i for e, i in elements[: self._num_of_elements]]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind adding a comment saying if no tokenizer we process each element separately because we don't know how many we can combine?

from sycamore.data import Element, Document
from sycamore.llms import LLM
from sycamore.llms.prompts.default_prompts import (
EntityExtractorZeroShotGuidancePrompt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we start deleting these if we're not using? the set of prompts is growing

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
@HenryL27 HenryL27 merged commit afd15c7 into main Feb 7, 2025
12 of 15 checks passed
@HenryL27 HenryL27 deleted the hml-jinjaprompting branch February 7, 2025 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载