-
Notifications
You must be signed in to change notification settings - Fork 65
[llm unify 5a/n] Add JinjaPompt and re-convert extract entities #1161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some cleanup requests but functionally looks good
|
||
def validate(d: Document) -> bool: | ||
if self._tokenizer is not None: | ||
return d.properties.get(self._entity_name, "None") != "None" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._entity_name in d.properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, we need to check for "None" because that's what the llm returns if no val
if self._tokenizer is not None: | ||
return d.properties.get(self._entity_name, "None") != "None" | ||
else: | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can just combine the if else
return d | ||
batches.append(curr_club) | ||
else: | ||
batches = [[i for e, i in elements[: self._num_of_elements]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind adding a comment saying if no tokenizer we process each element separately because we don't know how many we can combine?
from sycamore.data import Element, Document | ||
from sycamore.llms import LLM | ||
from sycamore.llms.prompts.default_prompts import ( | ||
EntityExtractorZeroShotGuidancePrompt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we start deleting these if we're not using? the set of prompts is growing
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Also refactors entity extractor to break up the megafunction