-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[annotator] Avoid constructing and scanning values for tables #18088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the final PR Bugbot will review for you during this billing cycle
Your free Bugbot reviews will reset on December 7
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
2b87add to
7c638c4
Compare
7c638c4 to
7a7b670
Compare
| // Everything else cannot harbor tables. | ||
| (_, _) => Ok(()), | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Closures silently ignore captured data.
The collect_table_info_from_value function doesn't handle closures with captured values. When a MoveValue::Closure is encountered with FatType::Function, it falls through to the catch-all case and returns Ok(()) without recursively checking the captured values for tables. The old code explicitly iterated through closure_value.captured to find tables, but this logic is missing in the new implementation, causing tables in closure-captured values to be silently ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor has a great point about closures (also discussed in 1:1). Say we capture a table handle in a closure, how do we process it? we need type info from that function, in theory, viewer has access to modules, can we just based on function's name get the module + type signature, based on mask get types of the function, then recurse to check if contains_tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, cursor bug bot pretty good.
I thought about this, and the result is we did not do it before and we don't need to do now. The entire table scanning business is kind of optional.
AnnotedMoveValue also did not contain any info about tables for captured arguments. That is since info about captured args is build from the layout which does not contain struct names. The question we discussed in 1:1 is whether this is only on the top-level or transitively, but it is the later.
We could start looking up function info to also scan captured values, but that seems beyond this perf related PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair
73c3223 to
9dde4fb
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
storage/indexer/src/db_v2.rs
Outdated
| AnnotatedMoveValue::Bytes(_) => {}, | ||
| fn process_table_infos(&mut self, infos: Vec<(StructTag, MoveStruct)>) -> Result<()> { | ||
| for (tag, val) in infos { | ||
| if tag.type_args.len() == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we assert that this is a table tag? The infos carry no table info so it might be better to enforce the invariant on how they are populated here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems overkill to me. The job of the collect table info stuff is to collect table types only. We cannot check all such invariants for perf reasons.
| for (tag, val) in infos { | ||
| if tag.type_args.len() == 2 | ||
| && let MoveStruct::Runtime(vals) = &val | ||
| && let Some(MoveValue::Address(handle)) = vals.first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need value in the first place? seems like we use types only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handle is used in save_table_info, so we need it.
| let layout = (&fat_ty).try_into().map_err(into_vm_status)?; | ||
| let move_value = MoveValue::simple_deserialize(blob, &layout)?; | ||
| let mut limit = Limiter::default(); | ||
| self.collect_table_info_from_value(&fat_ty, move_value, &mut limit, infos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asked in indexer file as well, but why pass structs out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed to me cleaner than just a pair of types, though I agree it has some overhead. But in the current flamegraph, youu don't see the annotator at all anymore, so I guess its OK.
| Ok(()) | ||
| } | ||
| }, | ||
| (FatType::Runtime(tys), MoveValue::Struct(MoveStruct::Runtime(vals))) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this case? we fall into struct/struct above, and then if table, add, if contains, recurse, if not -> return early, this seems unreachable even?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FatType::Runtime can be theoretically constructed to represent a struct in a captured argument. It is not captured above, though I'm not sure it can appear in reality here. But for completeness of covering variants here better to include.
This adds a new function to the annotator whether a given type is or transitively contains tables. Based on that, the code in `db_v2.rs` avoids creating an annotated value if its known to contain no tables. Consider `vector<T>`, before this change, indexer would visit any value in the vector scanning for table handles even if its known that `T` doesn't has tables. The information whether tables are in a type is computed during FatType construction and shared between cached structs, as well as cached in the annotator itself. One step further then this PR goes, would be move table handle scanning into the annotator and avoid to construct annotated values in the first place, but this here might already have significant effects.
67d5aa0 to
416e680
Compare
| let move_value = MoveValue::simple_deserialize(blob, &layout)?; | ||
| let mut limit = Limiter::default(); | ||
| self.collect_table_info_from_value(&fat_ty, move_value, &mut limit, infos) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Split Limiter Bypasses Resource Control
The collect_table_info function creates two separate Limiter::default() instances at lines 695 and 700, resetting resource tracking between type resolution and value collection. This allows the function to potentially exceed intended resource limits since each limiter starts fresh instead of accumulating charges across both operations.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
Description
This adds a new function to the annotator which directly extracts table information from a value.
Also fixes an issue in type substitution, ensuring that sharing of struct info is preserved over substitution.
This remembers whether a FatType contains table, and only if so recurses into
MoveValueto find the table infos. It avoids the construction of annotated value altogether.Consider
vector<T>: before this change, indexer would visit any value in the vector scanning for table handles even if its known thatTdoesn't has tables.The information whether tables are in a type is computed during FatType construction and shared between cached structs, as well as cached in the annotator itself.
Note
Introduces a fast path to extract table info without full annotation, tracks/caches whether types contain tables, fixes generic struct substitution, and updates the indexer to use the new API.
collect_table_info(ty_tag, blob, infos)to extract table info without constructingAnnotatedMoveValue.TypeTag/FatTypecontains tables and skips traversal when not needed.FatStructTypewithcontains_tablesandis_table(); propagates throughsubstand layouts.StructNameand updates substitution APIs to accept a substitutor closure.TABLE_MODULE_IDandTABLE_STRUCT_NAMEconstants.storage/indexer/src/db_v2.rs):collect_table_info.collect_table_info_from_*andprocess_table_infosto persistTableInfo.parse_move_value-based table discovery and related checks.Written by Cursor Bugbot for commit 416e680. This will update automatically on new commits. Configure here.