+
Skip to content

Conversation

dhruvkaliraman7
Copy link
Contributor

@dhruvkaliraman7 dhruvkaliraman7 commented Jan 9, 2025

This PR introduces File Read Reliability (FRR), a new component that complements the Materialize Read Reliability (MRR) system. While MRR handles reliability between materialized directories, FRR focuses on the initial data ingestion phase.

Features:
Implements reliability checks for raw data reading operations.
Currently supports BinaryScan operation (file scan through the first materialization step).

Future Work:
Extend FRR support to additional scan methods, implementation will be handled in subsequent PRs.

This PR is dependent on Add Materialize Read Reliability

with tempfile.TemporaryDirectory() as tmpdir, tempfile.TemporaryDirectory() as tmpdir1:

docs = make_docs(10)
docs.pop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not docs = make_docs(9)?
or remove the docs.pop()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_docs includes a MetadataDocument, removing that

@dhruvkaliraman7 dhruvkaliraman7 merged commit f59a739 into Add-Materialize-Read-Reliability Feb 6, 2025
8 of 11 checks passed
dhruvkaliraman7 added a commit that referenced this pull request Feb 6, 2025
* Add materialize read reliability

* Remove extra var

* Add unit test, fix signatures, add assertion checks

* Lint + typo

* Remove print

* Modify check for _path_to_id

* lint smh

* Add tests, retry logic, exception handling

* Address comments

* Address comments, refactor to make MRR node traverse and pass through context

* lint, better logging

* bug fix

* Allow assertions to be raised in retries

* Address comments

* Add File Read Reliability (#1105)

* Initial dev

* Remove debugging code

* Remove old code which passed reliability object to context

* Add exception handling after all files processed on ray, lint fix

* Add unit tests

* Switch to using Path Partition Filter

* Add logging, fix assertions in tests

* lint

* refactor tests

* mypy fix, make tests efficient, uniform naming convention

* Remove print

* Change func call from merge

* Better docs and logging

* Yet better docs

* nits

* lint smh

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载