+
Skip to content

Conversation

HenryL27
Copy link
Collaborator

Creates a SycamorePrompt interface for rendering documents/elements as llm calls and adds a few implementations.

  • StaticPrompt: static with no context
  • ElementPrompt: renders an element
  • ElementListPrompt: renders a document with a (optional) formatted list of elements in it

These are not the only ones to be implemented but these are necessary to move a number of the existing transforms to use llm_map (this prompt interface)

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
…kwargs

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
@HenryL27 HenryL27 requested a review from baitsguy January 21, 2025 17:49
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Copy link
Contributor

@baitsguy baitsguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks fine, I'm curious about how the existing transforms/uses are going to map to this. I worry a bit about usability, do you think you can transition one of the existing docset transforms to this stuff in this PR to demonstrate?

"""
format_args = self.kwargs
format_args["doc_text"] = doc.text_representation
format_args.update({"doc_property_" + k: v for k, v in doc.properties.items()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything special we want to do for nested properties? lots of stuff is in properties.entity.* so single level flattening might not do much

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why are we flattening instead of accepting the dot notation? I think it's more intuitive to send a prompt like:

prompt = ElementListPrompt(
                system = "Hello {name}. This is a prompt about {document.properties.path}"
                ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python string.format is rather limited, can't do any of

"hello {document.properties.name}".format(document=doc, document.properties=props, document.properties.name='henry')
  • it doesn't do function calls or dict lookups
  • can't have dotted param names.

which maybe means we shouldn't be using string.format here.

I did at one point play around with doing something like

s = "Hello {name}. this is a prompt about {document.properties.path}"
fstring = f"f\"{s}\""
formatted = eval(fstring, abunchofglobals)

but that seems dangerous bc you could maybe do something like

s = "{os.system('rm -rf /')}"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I don't want to invent a formatting/templating system

def __init__(
self,
*,
system: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're supporting system and user prompts (i.e. the messages api), shouldn't we be supporting a list then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I think we can limit to only one system prompt (I don't remember what the providers do but that seems sensible) though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
@HenryL27
Copy link
Collaborator Author

HenryL27 commented Jan 21, 2025

idk if it makes sense to translate an op until llm_map exists but to sketch it out extract_entity would look like something like

# default_prompts.py

EntityExtractorFewShotGuidancePrompt = ElementListPrompt(
    system="You are a helpful entity extractor",
    user="""You are given a few text elements of a document. The {entity} of the document is in these few text elements. Here are
    some example groups of text elements where the {entity} has been identified.
    {examples}
    Using the context from the document and the provided examples, FIND, COPY, and RETURN the {entity}. Only return the {entity} as part
    of your answer. DO NOT REPHRASE OR MAKE UP AN ANSWER.
    {elements}
    """,
)

(instead of the class construction as in default_prompts)
then

def extract_entity(self, entity_name, examples):
    prompt = default_prompts.EntityExtractorFewShotGuidancePrompt.instead(
        entity=entity_name, examples=examples
    )
    return self.llm_map(prompt, output_field=entity, llm_and_stuff_that_would_also_be_args_to_extract_entity)

Copy link
Contributor

@baitsguy baitsguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename instead to something else like populate or something

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
@HenryL27 HenryL27 merged commit ef263e3 into main Jan 22, 2025
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载