[llm unify 5a/n] Add JinjaPompt and re-convert extract entities #1161

HenryL27 · 2025-02-07T01:20:40Z

Also refactors entity extractor to break up the megafunction

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

…pting

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

baitsguy

Some cleanup requests but functionally looks good

baitsguy · 2025-02-07T20:14:46Z

lib/sycamore/sycamore/transforms/extract_entity.py

+
+        def validate(d: Document) -> bool:
+            if self._tokenizer is not None:
+                return d.properties.get(self._entity_name, "None") != "None"


self._entity_name in d.properties?

discussed offline, we need to check for "None" because that's what the llm returns if no val

baitsguy · 2025-02-07T20:14:58Z

lib/sycamore/sycamore/transforms/extract_entity.py

+            if self._tokenizer is not None:
+                return d.properties.get(self._entity_name, "None") != "None"
+            else:
+                return True


can just combine the if else

baitsguy · 2025-02-07T20:18:57Z

lib/sycamore/sycamore/transforms/extract_entity.py

-                return d
+                batches.append(curr_club)
+            else:
+                batches = [[i for e, i in elements[: self._num_of_elements]]]


mind adding a comment saying if no tokenizer we process each element separately because we don't know how many we can combine?

baitsguy · 2025-02-07T20:20:44Z

lib/sycamore/sycamore/transforms/extract_entity.py

 from sycamore.data import Element, Document
 from sycamore.llms import LLM
 from sycamore.llms.prompts.default_prompts import (
-    EntityExtractorZeroShotGuidancePrompt,


can we start deleting these if we're not using? the set of prompts is growing

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

HenryL27 added 6 commits February 6, 2025 16:36

add jinja prompts and convert extract entity to use it

d5cae60

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

Merge branch 'main' of github.com:aryn-ai/sycamore into hml-jinjaprom…

c56626f

…pting

delete commented out / dead code

274ddfe

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

JinjaPrompt docstring

2bb14dd

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

add comments bc otherwise this is very dense

c6ed22f

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

add norender() directive to jinja prompts when they shouldn't render

d9fa8ab

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

HenryL27 marked this pull request as ready for review February 7, 2025 19:57

change FANCY_BATCHED_LIST to BATCHED_LIST_WITH_METADATA

d9d5079

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

baitsguy reviewed Feb 7, 2025

View reviewed changes

pr comments

9b23fbd

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

baitsguy approved these changes Feb 7, 2025

View reviewed changes

HenryL27 merged commit afd15c7 into main Feb 7, 2025
12 of 15 checks passed

HenryL27 deleted the hml-jinjaprompting branch February 7, 2025 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llm unify 5a/n] Add JinjaPompt and re-convert extract entities #1161

[llm unify 5a/n] Add JinjaPompt and re-convert extract entities #1161

Uh oh!

HenryL27 commented Feb 7, 2025

Uh oh!

baitsguy left a comment

Uh oh!

baitsguy Feb 7, 2025

Uh oh!

baitsguy Feb 7, 2025

Uh oh!

baitsguy Feb 7, 2025

Uh oh!

baitsguy Feb 7, 2025

Uh oh!

baitsguy Feb 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[llm unify 5a/n] Add JinjaPompt and re-convert extract entities #1161

[llm unify 5a/n] Add JinjaPompt and re-convert extract entities #1161

Uh oh!

Conversation

HenryL27 commented Feb 7, 2025

Uh oh!

baitsguy left a comment

Choose a reason for hiding this comment

Uh oh!

baitsguy Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

baitsguy Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

baitsguy Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

baitsguy Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

baitsguy Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants