[llm unify 1/n] Add consolidated prompt classes #1120

HenryL27 · 2025-01-21T17:49:25Z

Creates a SycamorePrompt interface for rendering documents/elements as llm calls and adds a few implementations.

StaticPrompt: static with no context
ElementPrompt: renders an element
ElementListPrompt: renders a document with a (optional) formatted list of elements in it

These are not the only ones to be implemented but these are necessary to move a number of the existing transforms to use llm_map (this prompt interface)

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

…kwargs Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

baitsguy

Implementation looks fine, I'm curious about how the existing transforms/uses are going to map to this. I worry a bit about usability, do you think you can transition one of the existing docset transforms to this stuff in this PR to demonstrate?

lib/sycamore/sycamore/llms/prompts/prompts.py

baitsguy · 2025-01-21T19:08:06Z

lib/sycamore/sycamore/llms/prompts/prompts.py

+        """
+        format_args = self.kwargs
+        format_args["doc_text"] = doc.text_representation
+        format_args.update({"doc_property_" + k: v for k, v in doc.properties.items()})


anything special we want to do for nested properties? lots of stuff is in properties.entity.* so single level flattening might not do much

Also why are we flattening instead of accepting the dot notation? I think it's more intuitive to send a prompt like:

prompt = ElementListPrompt( system = "Hello {name}. This is a prompt about {document.properties.path}" ...

python string.format is rather limited, can't do any of

"hello {document.properties.name}".format(document=doc, document.properties=props, document.properties.name='henry')

it doesn't do function calls or dict lookups

can't have dotted param names.

which maybe means we shouldn't be using string.format here.

I did at one point play around with doing something like

s = "Hello {name}. this is a prompt about {document.properties.path}" fstring = f"f\"{s}\"" formatted = eval(fstring, abunchofglobals)

but that seems dangerous bc you could maybe do something like

s = "{os.system('rm -rf /')}"

and I don't want to invent a formatting/templating system

baitsguy · 2025-01-21T19:43:06Z

lib/sycamore/sycamore/llms/prompts/prompts.py

+    def __init__(
+        self,
+        *,
+        system: Optional[str] = None,


if we're supporting system and user prompts (i.e. the messages api), shouldn't we be supporting a list then?

Sure. I think we can limit to only one system prompt (I don't remember what the providers do but that seems sensible) though

Yeah that makes sense

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

HenryL27 · 2025-01-21T22:16:17Z

idk if it makes sense to translate an op until llm_map exists but to sketch it out extract_entity would look like something like

# default_prompts.py

EntityExtractorFewShotGuidancePrompt = ElementListPrompt(
    system="You are a helpful entity extractor",
    user="""You are given a few text elements of a document. The {entity} of the document is in these few text elements. Here are
    some example groups of text elements where the {entity} has been identified.
    {examples}
    Using the context from the document and the provided examples, FIND, COPY, and RETURN the {entity}. Only return the {entity} as part
    of your answer. DO NOT REPHRASE OR MAKE UP AN ANSWER.
    {elements}
    """,
)

(instead of the class construction as in default_prompts)
then

def extract_entity(self, entity_name, examples):
    prompt = default_prompts.EntityExtractorFewShotGuidancePrompt.instead(
        entity=entity_name, examples=examples
    )
    return self.llm_map(prompt, output_field=entity, llm_and_stuff_that_would_also_be_args_to_extract_entity)

baitsguy

maybe rename instead to something else like populate or something

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

HenryL27 added 4 commits January 16, 2025 16:40

add prompt base classes and ElementListPrompt

c2a8cfa

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

override .instead in ElementListPrompt to store net-new keys in self.…

21a115a

…kwargs Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

add ElementPrompt and StaticPrompt

f94da80

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

add unit tests for prompts

b73c162

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

HenryL27 requested a review from baitsguy January 21, 2025 17:49

forgot to commit this

17b2163

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

baitsguy reviewed Jan 21, 2025

View reviewed changes

HenryL27 added 2 commits January 21, 2025 12:37

address pr comments; flatten properties with flatten_data

5d145d5

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

support multiple user prompts

7fa2ff1

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

baitsguy approved these changes Jan 21, 2025

View reviewed changes

rename instead to set

abf9b0b

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

HenryL27 merged commit ef263e3 into main Jan 22, 2025
12 of 14 checks passed

[llm unify 1/n] Add consolidated prompt classes #1120

[llm unify 1/n] Add consolidated prompt classes #1120

Uh oh!

Conversation

HenryL27 commented Jan 21, 2025

Uh oh!

baitsguy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

baitsguy Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

baitsguy Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

HenryL27 Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

HenryL27 Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

baitsguy Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

HenryL27 Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

baitsguy Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

HenryL27 commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baitsguy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HenryL27 commented Jan 21, 2025 •

edited

Loading