+
Skip to content

Conversation

karanataryn
Copy link
Contributor

No description provided.

@karanataryn karanataryn requested a review from mdwelsh October 12, 2024 07:03
@karanataryn karanataryn enabled auto-merge (squash) October 12, 2024 07:03
Copy link
Contributor

@mdwelsh mdwelsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

@karanataryn karanataryn merged commit 47e8b42 into main Oct 12, 2024
10 of 11 checks passed
@karanataryn karanataryn deleted the fix-demo-typo branch October 12, 2024 17:36
dtecuci pushed a commit that referenced this pull request Oct 14, 2024
dtecuci added a commit that referenced this pull request Oct 14, 2024
* added ability to read schema from file

* small typo

Co-authored-by: Matt Welsh <matt@aryn.ai>

* fixed two funtion refs that were modified

* reformatted file with black

* fixed schema file format (was json), added more exception handling

* Fix anonymous reading in materialize and add rate limited logging. (#898)

* Fix anonymous reading in materialize and add rate limited logging.

* In materialize, try reading using the credentials, but if it doesn't work, fall back to
  reading anonymously if that seems to be working.
* Add rate limited logging to reading via materialize in local mode.
* Check for no root before checking if a source since that makes more sense.
* switch ntsb_loader_materialized.py over to read in local mode, it was working (with the anonymous
  fix), but was very slow hence the logging.

* Bump version to v0.1.23. (#903)

* fix asdict in the reader too. duh (#907)

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

* Add text reprentation for empty tables (#909)

* Refactor logical plan serialization. (#905)

* Working on this.

* Working on refactoring.

* Tests pass - is such a thing even possible?

* Fix tests.

* Fix mypy.

* Cleanup.

* Fix NTSB examples.

* A few tweaks to the query planner prompt, and a workaround in
queryui/util.py.

* Fix mypy.

* seriously small performance improvement that matters when youre processing tens of thousands of tables (from training code) (#906)

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>

* Handle opensearch reader doc resconstruction when no parent doc in results (#908)

* Fix bug in entity extraction. (#911)

* Notebooks like default-prep-script.ipynb would fail because the wrong way of generating the prompt would be used.
* Rename test to match with name of file being tested.
* Fix existing tests to verify parameters on all branches -- the reason the tests were passing
  was that it was taking the default branch in the test cases
* Update all of the tests to directly call run rather than route everything through ray.

* Enable copying of the hash context. (#910)

* Enable copying of the hash context.

* Address comments.

* Add option to extract line-based bounding boxes from pdfminer. (#874)

We have been using pdfminer's layout detection to group text into
boxes. This can cause issues, especially with table extraction, when
the boxes don't line up with cells or what we detect with the DETR
model. This change adds support for an object_type parameter to the
PdfMinerExtractor that can be set to "boxes" (the current behavior),
or "lines", which groups characters into lines, but does not group
them further.

To avoid an explosion of options, we introduce a
"text_extractor_options" dict as a paramter, and refactor the
TextExtractor class hierarchy a bit to support it.

* Support random sample in local mode. (#913)

This transform isn't widely used, but still worth supporting in local
model to bring it to parity.

* Opensearch kwargs fix (#914)

* Fix kwargs in opensearch reader

* simplify test assertion

* lint

* pr comments

* fix typo (#917)

* Update using_jupyter.md (#902)

* Update using_jupyter.md

Update link

* Fixed path

---------

Co-authored-by: dtecuci <168428824+dtecuci@users.noreply.github.com>

* Rebased. Added ability to read schema from file

* rebased. small typo

Co-authored-by: Matt Welsh <matt@aryn.ai>

* rebased. reformatted file with black

* resolved conflicts

* changed schema file format to yaml

* removed unused import

* small typos fixed

* fixed spacing

---------

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Co-authored-by: Matt Welsh <matt@aryn.ai>
Co-authored-by: Eric Anderson <eric@aryn.ai>
Co-authored-by: Ben Sowell <ben@aryn.ai>
Co-authored-by: Henry Lindeman <hmlindeman@yahoo.com>
Co-authored-by: Dhruv Kaliraman <112497058+dhruvkaliraman7@users.noreply.github.com>
Co-authored-by: Vinayak Thapliyal <vinayak@aryn.ai>
Co-authored-by: Alex Meyer <144723289+alexaryn@users.noreply.github.com>
Co-authored-by: Karan Sampath <176953591+karanataryn@users.noreply.github.com>
Co-authored-by: jonfritz <134336691+jonfritz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载