+
Skip to content

Conversation

t-kalinowski
Copy link
Member

This PR adds an as_iterator() method for a base python object. This enables usage like this with TensorFlow Datasets:

ds <- ...

loop(for(batch in ds) {
  ...
})

Without this, users need to instead explicitly call reticulate::as_iterator(ds), before passing to coro::loop(), which can be confusing because of the name clash between coro::as_iterator(), and also not necessary for most other iterables that are automatically converted to an iterator.

R/iterator.R Outdated
#' @exportS3Method
as_iterator.python.builtin.iterator <- function(x) {
force(x)
function() reticulate::iter_next(x, completed = quote(.__exhausted__.))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be created using the exhaustion constructor. At the minimum it should be sym(".__exhausted__."). The .__exhausted__. symbol should never appear in code, in case the objects of a namespace are iterated over.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good reason I hadn't considered. I generally try to avoid micro-optimizations like this, but introducing to as.symbol() overhead into tight for loops seemed like an easy win:

plot(print(bench::mark(
  quote(.__exhausted__.), 
  as.symbol(".__exhausted__."))))
#> # A tibble: 2 × 13
#>   expression                       min median `itr/sec` mem_alloc `gc/sec` n_itr
#>   <bch:expr>                   <bch:t> <bch:>     <dbl> <bch:byt>    <dbl> <int>
#> 1 quote(.__exhausted__.)             0    1ns    1.29e9        0B        0 10000
#> 2 as.symbol(".__exhausted__.")    82ns  164ns    5.68e6        0B        0 10000
#> # … with 6 more variables: n_gc <dbl>, total_time <bch:tm>, result <list>,
#> #   memory <list>, time <list>, gc <list>
#> Loading required namespace: tidyr
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 2678 rows containing missing values (geom_point).

Created on 2022-06-29 by the reprex package (v2.0.1)

Would you like me to change back to as.symbol(".__exhausted__.")? Or maybe create the symbol once in the closure then use that?

as_iterator.python.builtin.iterator <- function(x) {
  force(x)
  exhausted <- as.symbol(".__exhausted__.")
  function() reticulate::iter_next(x, completed = exhausted)
}

Unfortunately, reticulate::iter_next() forces the completed argument on each call.

Copy link
Member

@lionel- lionel- Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe create the symbol once in the closure then use that?

This should also be avoided for the same reason. I'd probably just use coro::exhausted(). In R, generators and iterators are not expected to be used for tight loops.

R/iterator.R Outdated
}))


#' @exportS3Method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not #' @export?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason, happy to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup I'd prefer @export for coding style consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

}
}

on_load(on_package_load("reticulate", {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it reticulate doesn't plan to add these methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think {coro} is a better home for these methods. reticulate already provides an as_iterator() function which is not a generic and does not adhere to the coro iterator protocol. Exporting a method for a same-named generic in reticulate invites confusion, and also the complexity of an awkward delayed registration dance that can be completely avoided if these methods live here.

@t-kalinowski
Copy link
Member Author

I think we don't even need a separate method for python.builtin.iterator, the method for python.builtin.object covers all cases. I'm going to remove it.

@lionel-
Copy link
Member

lionel- commented Jun 29, 2022

Now this is just missing a NEWS bullet :)

@t-kalinowski
Copy link
Member Author

Done!

@lionel- lionel- merged commit 8f4eec2 into r-lib:main Jun 29, 2022
@lionel-
Copy link
Member

lionel- commented Jun 29, 2022

Thanks! Do you need that on CRAN? If yes, I'll probably release right away, otherwise this might sit on github for months. Do you have any other planned changes for coro?

@t-kalinowski
Copy link
Member Author

t-kalinowski commented Jun 29, 2022

This came out of working on updates to the Tensorflow website, which is still ongoing work. I don’t have any other coro enhancements planned presently, but other things might still shake out in the next two weeks while we’re updating the TF examples. @dfalbel what do you think?

@t-kalinowski
Copy link
Member Author

@lionel- If it's ok with you, let's wait ~ 2 weeks. I'll ping you when we need this on CRAN.

@lionel-
Copy link
Member

lionel- commented Jun 29, 2022

That sounds good.

@t-kalinowski
Copy link
Member Author

Hi @lionel-, the dev version is working great for us and I don't think we'll have any additional PRs to coro in the near term. When you get a chance, can you please publish to CRAN?

@lionel-
Copy link
Member

lionel- commented Jul 19, 2022

Great, I'll do that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载