Releases · aryn-ai/sycamore

@alexaryn

This Sycamore release contains a variety of bug fixes and improvements.

What's Changed

Low-hanging mypy fruit (Sycamore edition). by @alexaryn in #1265
Miscellaneous code-level improvements. by @alexaryn in #1266
Improve OpenSearch reader by batching certain operations by @austin-aryn-ai in #1267
Don't include original_elements in queries by @baitsguy in #1268
Add a new OpenAI model, revert LlmFilterPrompt change by @austin-aryn-ai in #1271
Add metadata loading to materialize by @HenryL27 in #1270
Update httpcore and h11 for dependabot issue. by @bsowell in #1274
Put os_client_args in default namespace to avoid surprise. by @alexaryn in #1275
Add debugging for unexpected gemini model stop reasons. by @eric-anderson in #1276
Summarize the group name for groupby by @bohou-aryn in #1261
Update torch to 2.7.0 and transformers to 4.50.0. by @bsowell in #1279
Adding gpt-4.1 and gpt-4.1-nano by @Soeb-aryn in #1278
Add better view_pdf function by @HenryL27 in #1277
Standardize materialize filenames a little by @HenryL27 in #1283
Add support for non-clustering groupby by @bohou-aryn in #1281
Add instructions to split entities in LLM extract entity. by @akarshgupta7 in #1286
Add function to unroll entities by @bohou-aryn in #1288
fix hybrid table model fallback to not edit tokens in-place by @HenryL27 in #1290
Add heuristics to compute K for kmeans based on docset size. by @akarshgupta7 in #1287
fix the non cluster groupby path error by @bohou-aryn in #1291
Speed up import sycamore by ~10x; add module/tool for timing imports. by @eric-anderson in #1292
Improve docset documentation to explain that it's lazy. by @eric-anderson in #1294
handle non clustering aggregate count by @bohou-aryn in #1293
Minor fixups from testing with managed-service. by @eric-anderson in #1296
load tiktoken tokenizer lazily by @HenryL27 in #1298
refactor planner to add query preprocessor functionality by @HenryL27 in #1297
Add Gemini 2.5 Flash by @karanataryn in #1295
Upgrade gemini by @bsowell in #1299
Adding notebooks that walk through the earnings calls documents by @AbhijitP-009 in #1284
Bump AIOHttp To Fix Package Install by @karanataryn in #1300
OpenSearchReader result filtering by @austin-aryn-ai in #1282
Add support for default llm kwargs for all models. by @bsowell in #1303
Use llm for clustering by @bohou-aryn in #1302
Upgrade tornado dependency. by @bsowell in #1305
Add VLMTableStructureExtractor for table structure extraction. by @bsowell in #1304
Fix collect when groupby has no entity names by @bohou-aryn in #1307
Upgrade setuptools. by @bsowell in #1309
Add type casting logic to extract entity. by @akarshgupta7 in #1310
Remove aryn-sdk from the sycamore repo. by @bsowell in #1306
Initial draft of X-Y Cut reading order for Sycamore (Hackathon) by @alexaryn in #1301
improve model selection error message by @HenryL27 in #1311
add id field param to queryresult.retrieved_docs by @HenryL27 in #1312
Update LLM models by @bsowell in #1313
Fix the example llm_cluster_instruction for the topk operator. by @vikram-ak in #1314
Add a split_pdf function. by @bsowell in #1315
fix case where tokens are None inside of model selection parsing by @HenryL27 in #1317
Fix fast notebook tests by @bsowell in #1316
Handle case where query is a compound query by @austin-aryn-ai in #1319
Add DeepWiki Badge by @karanataryn in #1321
Fix issue with nested_lookup utility. by @bsowell in #1320
Remove index from Jinjaprompt to avoid token blow up. by @akarshgupta7 in #1322
Add flag to skip table extraction on empty tables by @MarkLindblad in #1323
Enable pdfminer vertical text grouping. by @bsowell in #1324
Update Slack invite link in README.md by @sohamkasar19 in #1325
Fix when unroll handles None entity field by @bohou-aryn in #1326
Add some useful planner processors by @HenryL27 in #1327
Update Slack invite link in documentation and README.md by @sohamkasar19 in #1329
Add code to turn materialize off for all nodes after and incl Sort. by @akarshgupta7 in #1328
Add support for a custom supplement_text function in the partitioner. by @bsowell in #1332
allow planner customization via prompt specification by @HenryL27 in #1330
Upgrade to torch 2.7.1. by @bsowell in #1334
Add RECOMPUTE source mode for nodes with materialized disabled. by @akarshgupta7 in #1333
Rename Table Extractor Options to Table Extraction Options by @karanataryn in #1338
Add Ability to Resolve Boundary Overlaps in Tables by @karanataryn in #1337
Fix materialize commit by @bohou-aryn in #1331
Add OpenAI reasoning models. by @bsowell in #1340
Add retries for VLMTableStructureExtractor. by @bsowell in #1341
Fix Resolve Overlaps Plumbing by @karanataryn in #1339
First part of reliable opensearch writing -- handles new items and missing metadata on source or destination. by @eric-anderson in #1335
Add support for document reconstruct for RAG. by @akarshgupta7 in #1343
Sycamore: deal with rotated tables. by @alexaryn in #1336
Add RAG support in Luna. by @akarshgupta7 in #1344
Fix test clustering flaky by @bohou-aryn in #1345
Add empty list in RAG doc reconstructor if doc not found in unique_docs. by @akarshgupta7 in #1346
Fix notebook tests. by @eric-anderson in #1347
Upgrade dependency on requests by @bsowell in #1348
Remove default parameter for Gemini max_output_tokens. by @bsowell in #1349
Introduce a new chained LLM by @austin-aryn-ai in #1342
Fix MRR refresh by @dhruvkaliraman7 in #1350
Add logging on retries. by @eric-anderson in #1318
[tmp!!!] switch off partitioner its by @HenryL27 in #1351
Load models in eval mode, no gradient computation by @dhruvkaliraman7 in #1352
Update protobuf version for Dependabot. by @bsowell in #1353
Add support for new Gemini models. by @bsowell in #1355
Improve ChainedLLM to handle when to move to next llm in chain, add r… by @austin-aryn-ai in #1354
Fix more Dependencies by @karanataryn in #1356
add kwargs to planning processors and plumb through sq client by @HenryL27 in #1360
Refactor Process Batch by @karanataryn in #1357
Add Extract Images Function by @karanataryn in #1361
switch partitioner its back on by @HenryL27 in #1366
Add source fields to RAG ...

@HenryL27

This Sycamore release contains a variety of bug fixes and improvements.

What's Changed

add autoschema param for aryn writer by @HenryL27 in #1232
remove aryn sdk publisher workflow by @HenryL27 in #1234
Add extract_image_format option. by @bsowell in #1230
deserialize a summaryDocument as a SummaryDocument, not a Document by @HenryL27 in #1238
Fix when no sub docs in summary document by @dhruvkaliraman7 in #1239
Fix OS document reconstruct read by @dhruvkaliraman7 in #1240
Favor doc_reconstruct when reconstruct_document is also mentioned by @dhruvkaliraman7 in #1210
Add support for air-gapping the easyocr model. by @eric-anderson in #1231
spread property data, not references by @HenryL27 in #1243
fallback to tatr if deformable fails by @HenryL27 in #1244
Fix var name for MRR which broke on bad merge by @dhruvkaliraman7 in #1245
Upstreaming Customer prompts by @dhruvkaliraman7 in #1246
Add access method for materialize docset by @bohou-aryn in #1241
Bump Lint Dependencies and Relint by @karanataryn in #1247
Add Claude 3.7 Sonnet by @karanataryn in #1248
Rename QueryBookmark to DataLoader by @bohou-aryn in #1249
Embedder is now a context manager and can free resources. by @alexaryn in #1250
add default llm_mode to llms by @HenryL27 in #1253
Add close() to our OpenAI and OpenAIClientWrapper classes. by @alexaryn in #1254
Fix limit transform by @dhruvkaliraman7 in #1255
Fix clustring flaky test by @bohou-aryn in #1251
tweak llm filter prompt to be more better by @HenryL27 in #1258
Allow override of OpenSearch user/password in run_plan(). by @alexaryn in #1256
Delete PITs after we're done reading by @austin-aryn-ai in #1257
Inspect serialization issues only if TypeError is raised by @austin-aryn-ai in #1259
better defaults for aryn writer by @HenryL27 in #1262
bugfix: missing token has 4 slashes by @eric-anderson in #1252
Upgrade torch to 2.6.0. by @bsowell in #1263
Bump version to 0.1.32. by @bsowell in #1264

New Contributors

@austin-aryn-ai made their first contribution in #1257

Full Changelog: v0.1.31...v0.1.32

@eric-anderson

This Sycamore release contains a variety of bug fixes and improvements.

What's Changed

Refactor the caching API in llms so that the get and set APIs are symmetric by @eric-anderson in #1108
Fix OpenSearch tests that require pre-loaded index by @austintlee in #1111
Fix source_directory path in conf.py by @sravan1946 in #953
Make lib/poetry-lock/poetry-lock-all.sh failures more obvious by @MarkLindblad in #1107
aryn-opensearch-bedrock-rag-example.ipynb by @jonfritz in #1117
Bump Dependencies to Fix Security Issues by @karanataryn in #1119
[llm unify 1/n] Add consolidated prompt classes by @HenryL27 in #1120
Add Dependency Review Action by @karanataryn in #1121
Removing Guidance by @Soeb-aryn in #1114
Bump PyPDF by @karanataryn in #1124
Add anthropic api key to testing workflow by @HenryL27 in #1125
Add support for async DocParse calls in aryn-sdk by @MarkLindblad in #1116
Bump aryn-sdk version to 0.1.11 by @MarkLindblad in #1127
Add CodeQL Vulnerability Scan by @karanataryn in #1118
Update fileformattools by @baitsguy in #1133
capturing metadata from LLMs by @Soeb-aryn in #1122
Add jupyter utils (Finra upstream) by @dhruvkaliraman7 in #1135
Change @context_params behavior to only pass explicit arguments. by @bsowell in #1136
Upgrade OpenAI to ^1.60.2. by @bsowell in #1137
Explicitly add tiktoken and relock. by @bsowell in #1138
add HeaderAugmenterMerger to docs by @HenryL27 in #1139
add another --no-root for rtd by @HenryL27 in #1140
Improve use of async DocParse via aryn-sdk by @MarkLindblad in #1134
Bump aryn-sdk version to 0.1.12 by @MarkLindblad in #1141
Fix self-reported aryn-sdk version by @MarkLindblad in #1142
Bump aryn-sdk version to 0.1.12.post0 by @MarkLindblad in #1143
Update Testing Workflows by @karanataryn in #1144
[llm unify 2/n] Implement llm_map(_elements) and move extract_entity to it. by @HenryL27 in #1126
[llm unify 3/n] Reimplement SummarizeImages as an LLMMapElements by @HenryL27 in #1146
Add OpenSearch shard related logging by @baitsguy in #1145
Clean Up Dead Code by @karanataryn in #1132
Fix fetching of parent doc properties during OpenSearch read by @austintlee in #1148
[llm unify 4/n] extract properties by @HenryL27 in #1149
Add a groupby operator by @bohou-aryn in #1123
Change job to task in aryn-sdk async support by @MarkLindblad in #1151
Bump aryn-sdk version to 0.1.13 by @MarkLindblad in #1152
Finish cleaning up PR 1148 by @austintlee in #1150
Aryn connectors for reading and writing docsets by @austintlee in #1147
handle specified prompt and use_elements=True in extract entity by @HenryL27 in #1153
Fix async DocParse task id in aryn-sdk example by @MarkLindblad in #1155
Add Opensearch Writer Reliability by @dhruvkaliraman7 in #1130
Add materialize read reliability by @dhruvkaliraman7 in #1094
add docs -> docs wrapper function for LLMPropertyExtractor by @HenryL27 in #1158
Reliability mocking bug by @dhruvkaliraman7 in #1160
Ensure parent docs are collected during doc reconstruct by @austintlee in #1159
fix extract properties again by @HenryL27 in #1162
Make async DocParse methods in aryn-sdk not operate on non-DocParse async tasks by @MarkLindblad in #1156
[llm unify 5a/n] Add JinjaPompt and re-convert extract entities by @HenryL27 in #1161
Add list to cast types by @dhruvkaliraman7 in #1163
[llm unify 5b/n] Jinja summarize images by @HenryL27 in #1166
Rename async list endpoints to "action" from "path" by @MarkLindblad in #1170
[llm unify 5c/n] jinjify extract properties by @HenryL27 in #1169
Bump Beautiful Soup by @karanataryn in #1167
Serialize query strings to avoid Ray Dataset column imputation by @austintlee in #1171
Update aryn-sdk's async DocParse interface to raise Exceptions rather than returning error strings by @MarkLindblad in #1164
Add OCR Languages to Aryn SDK by @karanataryn in #1168
Fix None in llm response by @dhruvkaliraman7 in #1173
[llm unify 5/n] llm_filter by @HenryL27 in #1154
Bump aryn-sdk to v0.1.14 by @MarkLindblad in #1165
ASDK: Prevent excessive memory consumption reading file. by @alexaryn in #1174
Close the darn file! by @alexaryn in #1175
Add planner interface by @baitsguy in #1177
Plumbing X-Aryn-Trace-ID through ASDK partition_file. by @alexaryn in #1179
Bump Ray to Fix Security Issue by @karanataryn in #1181
Writer Reliability bug by @dhruvkaliraman7 in #1184
fix anthropic required module by @HenryL27 in #1185
Improve Aryn reader by @austintlee in #1172
Retain element doc_id from source by @austintlee in #1178
Update aryn-opensearch-bedrock-rag-example.ipynb by @jonfritz in #1187
Add Gemini LLM and Summarizer by @karanataryn in #1176
[llm unify 6/n] extract schema and batch schema to llm map by @HenryL27 in #1188
Adding clustering and groupby in luna planner part by @bohou-aryn in #1183
Fix Gemini Bugs by @karanataryn in #1191
Helper script for getting git credentials from the environment by @eric-anderson in #1190
fix image tests - we don't explode on bad bboxes these days by @HenryL27 in #1193
Get random hits when filtering properties in sycamore query by @dhruvkaliraman7 in #1195
Remove OCR Images by @karanataryn in #1194
Add Summarize Images to Aryn SDK by @karanataryn in #1189
Change Gemini Model Names by @karanataryn in #1197
Add Gemini 2 Pro by @karanataryn in #1198
Bump Aryn SDK to 0.1.15 by @karanataryn in #1199
LLM Async mode by @HenryL27 in #1200
LLM Batch inference by @HenryL27 in #1202
updating run-jupyter.sh file by @Soeb-aryn in #1201
Change HeaderAugmenterMerger str concatenation behavior to minimize adding newlines by @MarkLindblad in #1205
Add param to control how model is selected in hybrid table extractor by @HenryL27 in #1203
Add entity name in grouped result, also add materialize in groupbycount operator by @bohou-aryn in #1204
Add VLM OCR to Aryn SDK by @karanataryn in #1207
Fix OpenSearch integ test by @dhruvkaliraman7 in #1209
[SDK] Ability to cancel running partition call. by @alexaryn in #1211
bump aryn sdk version by @HenryL27 in #1212
Bump Dependencies to Fix Security Issues by @karanataryn in #1208
...

@eric-anderson

This Sycamore release contains several bug fixes and improvements.

What's Changed

Add logging of the full exception in base_writer. by @eric-anderson in #1069
Fix create_element to not crash on bad element types by @eric-anderson in #1070
Add docset.take_stream() by @baitsguy in #1071
Make temporary fix to split_elements to avoid exceeding recursion depth due to certain table elements by @MarkLindblad in #1073
add TableMerger to merge elements docs by @HenryL27 in #1074
Increase max recursion depth for split_element's split_one by @MarkLindblad in #1075
Merge-elements-LLM-filter by @dhruvkaliraman7 in #1076
Add support for GPU to similarity. by @austintlee in #999
Tolerate bad entity extraction. by @eric-anderson in #1078
move deformable detr safe loading code by @HenryL27 in #1055
Allow Doc reconstruct via function by @austintlee in #1072
Add-tokenizer-and-reranking-to-LLM-ExtractEntity by @dhruvkaliraman7 in #1081
Schema object + entity extraction support by @baitsguy in #1083
Make ttviz.cpp compile again. by @alexaryn in #1082
Keep newline in OpenAI Embedder by @dhruvkaliraman7 in #1086
Changed the default embedding model to openai. by @akarshgupta7 in #1087
Add Embed at Element Level by @dhruvkaliraman7 in #1084
Get sycamore.query to work with Schema instead of only OpenSearchSchema by @baitsguy in #1088
Add hybrid table extractor by @HenryL27 in #1089
Add map reduce style summarize to handle large texts for summarization. by @austintlee in #1079
fix max(nothing) bug by @HenryL27 in #1091
Delay initializing openai client in embedder by @HenryL27 in #1092
fix materialize on windows by @HenryL27 in #1093
Add Retries for OpenSearch Writer by @karanataryn in #1085
Property extraction type cast by @baitsguy in #1095
Revert overzealous no-rootification by @HenryL27 in #1098
Add support for Anthropic LLMs. by @bsowell in #1096
Fix similarity assert condition for LLM Filter by @dhruvkaliraman7 in #1099
Raise PartitionError with explicit status code. by @alexaryn in #1101
Add PartitionError to aryn_sdk.partition's __init__.py by @MarkLindblad in #1102
Prompt update for property extraction by @baitsguy in #1103
Add support for parallel read in OpenSearchReader by @austintlee in #1100
Fix No Root Repetition in Test File by @karanataryn in #1097
Bump version to 0.1.30. by @bsowell in #1109

Full Changelog: v0.1.29...v0.1.30

@HenryL27

This Sycamore release contains small bug fixes and enhancements.

What's Changed

when there's no table structure, take the token bbox for the cell bbox by @HenryL27 in #1061
Disable use of scroll in OpenSearch reader when running KNN queries. by @austintlee in #1062
Binarize OCR Image to Improve Performance by @karanataryn in #1063
Fix split_elements for table elements with no elem.table attribute by @MarkLindblad in #1064
Fix Extract Schema Empty Return by @karanataryn in #1067
Bump version to v0.1.29. by @bsowell in #1068

Full Changelog: v0.1.28...v0.1.29

@Soeb-aryn

This release updates doc_ids from UUIDs to NanoIds, adds some document title functionality, and improves stability and performance.

What's Changed

adding one shot prompting along with multimodal request by @Soeb-aryn in #1023
Fix query-ui dependency on boto3 and re-lock. by @mdwelsh in #1028
Updated NTSB queries and ground truth for CIDR-25 paper. by @mdwelsh in #1026
Add streaming support and tests for query-server. by @mdwelsh in #1027
Supply element types in output from MarkedMerger. by @alexaryn in #1031
Fix SummarizeData so that downstream .materialize operations will work. by @mdwelsh in #1030
add nanoid by @HenryL27 in #1034
Removed duplicate code in query execution. by @akarshgupta7 in #1035
Convert docids from UUID to NanoID. by @alexaryn in #1032
Use NanoIDs in file_scan. by @alexaryn in #1036
extract table properties prompt & bug fix by @Soeb-aryn in #1037
Convert DocIDs to UUIDs for Qdrant & Weaviate; unit tests. by @alexaryn in #1038
heuristics to get title from section headers by @Soeb-aryn in #1033
updating function in pdf_miner class by @Soeb-aryn in #1041
Added ragas to compute string metrics for evaluation. by @akarshgupta7 in #1039
Fix sort so that it works with an unspecified or None default_value. by @eric-anderson in #1040
Added correctness score to the metrics. by @akarshgupta7 in #1043
Query planner improvements by @baitsguy in #1046
Fix materialize to tolerate an empty input directory in ray mode by @eric-anderson in #1045
PR fix by @baitsguy in #1047
disable vectorsearch rerank by default in query by @baitsguy in #1048
vectorsearch planner prompt changes by @baitsguy in #1049
Make OpenAIEmbedder serializable after client has been initialized. by @bsowell in #1050
Rename Embedding in ElasticSearch Notebook by @karanataryn in #1051
Add deformable table extractor by @HenryL27 in #1053
Add helper for thread local variables that can be used to add metadata to the output stream by @eric-anderson in #1052
Propagate element level llm_filter output to doc.properties by @baitsguy in #1054
Handle military clock time (0800) in time standardizer. by @alexaryn in #1056
Fix incorrect docstring for promote-certain-elements-to-title feature by @MarkLindblad in #1057
adding parameter for API in sdk and remote_partitioner by @Soeb-aryn in #1042
bump sycamore version to 0.1.28 by @HenryL27 in #1058
bump aryn sdk version to 0.1.10 by @HenryL27 in #1059
don't die if box is None in try_draw_boxes by @HenryL27 in #1060

New Contributors

@akarshgupta7 made their first contribution in #1035

Full Changelog: v0.1.27...v0.1.28

@MarkLindblad

This Sycamore release includes a variety of small bug fixes and improvements.

What's Changed

Bump aryn-sdk version to 0.1.9 from 0.1.8 by @MarkLindblad in #1011
Add plan validation by @baitsguy in #1001
Sort retrieval docs by score properties if they exist by @baitsguy in #1012
Add 120k max chars (default) for summarize_data by @baitsguy in #1013
Queryeval docset write fix by @baitsguy in #1014
Add notebook file for OpenSearch example by @jonfritz in #1015
Fix up NTSB queries for query-eval tool. by @mdwelsh in #1016
Rename from APS to DocParse by @karanataryn in #1017
enable JSONifying tables by @HenryL27 in #1018
Fix aryn-sdk's convert_image_element example by @MarkLindblad in #1019
Fix DocParse chunking example in aryn-sdk by @MarkLindblad in #1021
blacksmith.sh: Migrate workflows to Blacksmith by @blacksmith-sh in #1020
Revert Unit Tests to GitHub Actions by @karanataryn in #1025
Bump version to 0.1.27. by @bsowell in #1024

Full Changelog: v0.1.26...v0.1.27

@HenryL27

This release includes several stabliity and reliability improvements.

What's Changed

skip flaky test by @HenryL27 in #956
Fix mypy warnings. by @mdwelsh in #947
Work around hang observed during vcrpy recording. by @alexaryn in #950
Postprocessing to modify plans returned by llm planner; minor issues with query-ui by @amolvdeshpande in #882
bump sdk to 0.1.7 by @HenryL27 in #961
Add HeaderAugmenterMerger by @dhruvkaliraman7 in #946
Update docs to reflect OpenAIPropertyExtractor->LLMPropertyextractor by @bsowell in #964
Couple of minor fixes and tweaks to the table merger. by @bsowell in #963
Enable use_elements in query.summarize_data by @baitsguy in #966
Fix typo in syntax in docstring for Summarize Images by @jonfritz in #967
Add missing tokenizer argument in MarkBreakByTokens docstring by @MarkLindblad in #969
Add Lots of Connector Unit Tests by @karanataryn in #957
Add OCR Evaluation Code by @karanataryn in #685
Fixed query tag check by @baitsguy in #968
Fix SDK Threshold Bug by @karanataryn in #970
Add score to each document in OpenSearch query result. by @bsowell in #971
Fix HeaderAugmenterMerger by @MarkLindblad in #973
Refactor mark_bbox_preset to expose function outside DocSet by @MarkLindblad in #972
Fix mark_bbox_preset's MarkDropHeaderFooter parameter by @MarkLindblad in #975
OpenSearch improvements by @baitsguy in #974
Adding a separate installation instructions page by @AbhijitP-009 in #977
Union OCR / PDFMiner Tokens with Table Outputs by @karanataryn in #976
Make Table Code More Robust by @karanataryn in #979
fix divide by zero in align_headers by @HenryL27 in #978
Allow for returning query traces on cached query executions. by @mdwelsh in #959
Add Enhance Table Option to SDK by @karanataryn in #980
Bump SDK Version by @karanataryn in #981
Update Lockfiles by @karanataryn in #920
Add query planning strategy objects by @baitsguy in #982
Move tokenized data to device by @baitsguy in #983
Update vectorsearch query test by @baitsguy in #984
Integration test for Sycamore Query demo. by @mdwelsh in #985
Add Closure of Client Connections for Connectors by @karanataryn in #989
Work around lack of resource module on Windows. by @alexaryn in #962
Update README.md by @karanataryn in #990
Merge in Fixes from Luna Demo Deployment by @karanataryn in #992
Add table-chunker by @dhruvkaliraman7 in #993
chore: Added back to top , contributors section and star history graph by @samarth29jc in #987
Return the list of documents referenced in a Luna query. by @mdwelsh in #995
Sync Locks across all Directories by @karanataryn in #988
Remove unused code (_batchify) by @MarkLindblad in #887
Don't try to put footers in columns by @HenryL27 in #998
Docprep notebook testing by @sohamkasar19 in #996
Add expected documents in query-eval tool by @baitsguy in #997
Move Aryn DocParse Docs Out of Sycamore by @karanataryn in #994
Remove seed from rewrite prompt by @baitsguy in #1000
Fix OpenAI reduce methods to handle Azure deployment names. by @bsowell in #1002
Add support for custom source parameter for remote Aryn Partitioner by @MarkLindblad in #1003
Fix mixed samples for schema extraction. by @mdwelsh in #1004
updating extract table prop by @Soeb-aryn in #1005
Update Opensearch domain in docprep notebook testing (GHA) by @sohamkasar19 in #1006
Improve suggested install command by @HenryL27 in #1007
Fix augment_text docstring by @HenryL27 in #1008
Add support for using Aryn DocParse chunking from aryn-sdk by @MarkLindblad in #1010
Update sycamore to 0.1.26 by @HenryL27 in #1009

New Contributors

@amolvdeshpande made their first contribution in #882
@samarth29jc made their first contribution in #987

Full Changelog: v0.1.25...v0.1.26

@mdwelsh

This Sycamore release includes numerous bug fixes for connectors and other transforms. It also includes support for Anthropic LLMs via Amazon Bedrock.

What's Changed

Sycamore Query evaluation tool. by @mdwelsh in #912
Luna client local schema (take 2) by @dtecuci in #919
Fix small bug in client. by @mdwelsh in #923
Fix DuckDB Spelling Error by @karanataryn in #924
Make OpenSearchSchema a proper Pydantic model. by @mdwelsh in #922
Fix typo by @Yashbhatt786 in #927
Bugfixes: DocumentSource enum serialization and missing element_id in old data by @baitsguy in #928
Bug fixes: remove kwargs in docset.rerank, sycamore query codegen by @baitsguy in #932
Add Table Merger by @dhruvkaliraman7 in #880
Basic Bedrock LLM client. by @mdwelsh in #931
Accept query plan examples in config by @baitsguy in #934
Evaluate query plans in query-eval by @baitsguy in #936
Add local mode support for json scan and json document scan by @bohou-aryn in #925
Handle Drawing Missing Tables and Cells by @karanataryn in #938
Support LLM selection in Sycamore Query Client. by @mdwelsh in #935
Crop To Bbox Error by @karanataryn in #939
Add plan correctness metrics summary + K in TopK optional by @baitsguy in #940
don't embed the empty string with openai by @HenryL27 in #943
Support SummarizeImages with non-OpenAI LLMs. by @bsowell in #941
Add support for tags and notes. by @mdwelsh in #942
Create LLMSchemaExtractor and LLMPropertyExtractor classes. by @bsowell in #945
Don't run embedded weaviate in the unit tests by @HenryL27 in #951
fix empty strings in section headers by @HenryL27 in #948
Select pages by @bsowell in #937
Fixup notebook tests by @eric-anderson in #933
Use pytest-xdist for unit tests. by @mdwelsh in #952
Update standardizer.py by @jonfritz in #944
Fix bugs in Unflattening Data by @karanataryn in #930
fix materialize bug with s3 filesystem by @eric-anderson in #954
Bump version to 0.1.25. by @bsowell in #955

New Contributors

@Yashbhatt786 made their first contribution in #927

Full Changelog: v0.1.24...v0.1.25

@Dnaynu

This Sycamore release includes several bug fixes in the Weaviate and DuckDB connectors and in several of the example notebooks. Thanks to @Dnaynu for contributing to the Sycamore documentation!

What's Changed

fix asdict in the reader too. duh by @HenryL27 in #907
Add text reprentation for empty tables by @dhruvkaliraman7 in #909
Refactor logical plan serialization. by @mdwelsh in #905
microperformance improvement by @HenryL27 in #906
Bugfix: Handle opensearch reader doc resconstruction when no parent doc in results by @baitsguy in #908
Fix bug in entity extraction. by @eric-anderson in #911
added ability to read schema from file by @dtecuci in #904
Enable copying of the hash context. by @alexaryn in #910
Add option to extract line-based bounding boxes from pdfminer. by @bsowell in #874
Support random sample in local mode. by @bsowell in #913
Opensearch kwargs fix by @baitsguy in #914
Fix Typo in NTSB Demo by @karanataryn in #917
Update using_jupyter.md by @jonfritz in #902
Docs: Typo Fix by @Dnaynu in #918
Update DuckDB Reader to Package Change by @karanataryn in #916
Make metadata-extraction.ipynb work by @eric-anderson in #915
Bump Sycamore version to 0.1.24. by @bsowell in #921

New Contributors

@Dnaynu made their first contribution in #918

Full Changelog: v0.1.23...v0.1.24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

Releases: aryn-ai/sycamore

v0.1.33

What's Changed

Contributors

Uh oh!

v0.1.32

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.31

What's Changed

Contributors

Uh oh!

v0.1.30

What's Changed

Contributors

Uh oh!

v0.1.29

What's Changed

Contributors

Uh oh!

v0.1.28

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.27

What's Changed

Contributors

Uh oh!

v0.1.26

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.25

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.24

What's Changed

New Contributors

Contributors

Uh oh!