-
-
Notifications
You must be signed in to change notification settings - Fork 115
Open
Labels
Description
GitHub imposes rate limits on their REST API; these limits can be onerous for higher-volume users, since some of our audits (particularly impostor-commit
and ref-confusion
) require a lot of individual paginated requests to fetch the needed repository state.
To reduce this, we could switch to (or offer a model) for a model where we git clone
and collect history locally instead, since GitHub has much higher rate limits on clones.
Pros:
- Fewer rate limit issues.
- Potentially much, much faster overall (since local Git object/history scanning will be a lot faster than our current network roundtrips)
Cons:
- A bit more complicated (but not much more)
- Probably slightly slower on each audit cycle (i.e. per
uses:
clause), since we'd need togit clone
and pull down more data initially. This will be cached and amortized by the faster filtering (above), but it'll probably make these audits a little less responsive.
Related thoughts:
- Maybe a blobless clone will be faster? I'm not sure if this induces a tradeoff on GitHub's side or not, versus the hot path for a normal clone.
- This is also somewhat related to Design a static HTTP API for serving pre-computed information #278, which seeks to solve the same problem by deploying our own static API.
(h/t @andrew)
iainlane