[move-vm] Refactor Loader #9320

runtian-zhou · 2023-07-26T20:58:52Z

Description

This is a preview PR for MoveVM Loader refactor that sketches out the individual steps of making Module to be its own loadable unit instead of the global indexing schema that we are using now.

Test Plan

We will need to better look into the security implication there.

ziaptos

Just some low/mid level comments. I'm doing more passes and will add more.

In any case - great job!

ziaptos · 2023-07-28T18:22:35Z

aptos-move/framework/src/natives/string_utils.rs


                // We know that the type is a list, so we can safely unwrap
-                let ty_args = if let Type::StructInstantiation(_, ty_args) = list_ty {
+                let ty_args = if let Type::StructInstantiation { ty_args, .. } = list_ty {


nit:
Is there a macro that will do this for you? I actually don't know the answer :)

I don't think so? The only macro on top of my head is matches! but didn't seem quite helpful here.

@ziaptos (really name please in your github?): macros is a bad idea for code readability unless it is one of the predefined common ones (like matches!, write!, etc.)

@wrwg I actually didn't realize my name was not there. But yes - I was thinking of the pre-defined macros. Also seemed like the crustaceans like more their macros (unlike C/C++ macros that are, indeed - evil)

ziaptos · 2023-07-28T18:25:39Z

third_party/move/move-binary-format/src/file_format.rs

    pub const MAIN_INDEX: FunctionDefinitionIndex = FunctionDefinitionIndex(0);
+
+    /// Returns the code key of `module_handle`
+    pub fn module_id_for_handle(&self, module_handle: &ModuleHandle) -> ModuleId {


key or id? should be consistent.

dumb question: "What is the purpose of the id/key?" seems like module handle already has all the info.

I think it's there for a historical reason. The module id is used as the key to the storage. That's why the terminolgy get confused a bit.

ziaptos · 2023-07-28T18:26:53Z

third_party/move/move-vm/runtime/src/data_cache.rs


 pub struct AccountDataCache {
-    data_map: BTreeMap<Type, (MoveTypeLayout, GlobalValue)>,
+    data_map: HashMap<Type, (MoveTypeLayout, GlobalValue)>,


+1 for using HashMap (did we make sure it is faster tho, or just assume there are many more reads than writes?)

There's a funny reason behind this: BTreeMap requires key to have Ord whereas HashMap requires Hash. After we add ability information to Type, those ability information need to be erased when querying from the map here.

Based on some of the experiments we did a long time ago, I'd actually not assume HashMap to be faster than BTreeMap. The reason is that for regular inputs, BTreeMap almost never has to perform full comparisons while HashMap requires the key to be fully hashed.

For tree-structured data, we may want to do something to avoid recursive Hash computation. I was just investigating using something like https://docs.rs/by_address/latest/by_address/# for SignatureToken recursive links to avoid having to recompute hashes for subtrees.

But my experiments showed that for sizes < 1000 BTreeMap was better than HashMap for random integer data. Above that HashMap starts to win but still <2x.

HashMap is known to be only good if hashing is cheap, like if it is an integer or such.

@runtian-zhou I do not quite understand what you mean with erasing abillity. Are you saying you are defining a custom hash on Type which ignores ability? This would be kind of an anti-pattern.

@wrwg That's useful to know. In my experience (C++ tho). There was never a good way to predict whether to use Tree or Hash based map - it just depends on so many things.

ziaptos · 2023-07-28T18:33:57Z

third_party/move/move-vm/runtime/src/loader/mod.rs

+        let mut field_tys = vec![];
+        for field in fields {
+            let ty = self.make_type_while_loading(module, &field.signature.0)?;
+            debug_assert!(field_tys.len() < usize::max_value());


This is unlikely to ever happen - there's no way that a vec can contain 2^64 elements in any foreseeable future.

also, nit: use usize::MAX;
https://doc.rust-lang.org/nightly/std/primitive.usize.html#method.max_value

ziaptos · 2023-07-28T18:36:31Z

third_party/move/move-vm/runtime/src/loader/mod.rs

-        Self::make_type_internal(module, tok, &|struct_name, module_id| {
-            Ok(self.resolve_struct_by_name(struct_name, module_id)?.0)
-        })
+        make_type_internal(module, tok)


Seems like there's no need anymore to have an internal version. Just move the implementation here.

edit: I see that make_type_internal is a free function. I guess it can stay in that case - maybe have a better naming (make_type_impl/make_type_implementation). But that's from C++ like conventions I've seen (I'm ok with a different one as long as it is consistent)

ziaptos · 2023-07-28T18:39:54Z

third_party/move/move-vm/runtime/src/loader/mod.rs

-            },
-        };
-        Ok(res)
+        make_type_internal(BinaryIndexedView::Module(module), tok)


this function make_type_while_loading has the same functionality as make_type. If we need to preserve the APIs for some time - I'd just call make_type here - and add a comment that this is deprecated, or something similar.

ziaptos · 2023-07-28T18:43:06Z

third_party/move/move-vm/runtime/src/loader/mod.rs

-    ) -> PartialVMResult<(CachedStructIndex, Arc<StructType>)> {
-        match self
-            .modules
+    ) -> PartialVMResult<Arc<StructType>> {


Checkout the comment above :)
Please fix the comment, or even better, just remove it, as the function name/arguments/return type convey the exact same information.

ziaptos · 2023-07-28T18:50:35Z

third_party/move/move-vm/runtime/src/loader/mod.rs

-            })
-            .collect::<PartialVMResult<Vec<_>>>()
-            .map_err(|err| err.finish(Location::Undefined))?;
+        let return_ = func.return_types.clone();


nit: just call it ret, avoid inconsistent trailing underscores.

ziaptos

Just some low/mid level comments. I'm doing more passes and will add more.

In any case - great job!

vgao1996

1/n Changes to move-vm/types look good

vgao1996 · 2023-08-04T22:05:43Z

third_party/move/move-vm/types/src/loaded_data/runtime_types.rs

+        #[derivative(PartialEq = "ignore", Hash = "ignore")]
+        base_ability_set: AbilitySet,
+        #[derivative(PartialEq = "ignore", Hash = "ignore")]
+        phantom_ty_args_mask: Vec<bool>,


Nice trick! Do you also need to tell it to ignore these fields for Ord?

That's why I changed the BTreeMap to HashMap! So it seems that this derivative crate can't handle Ord very well.. https://mcarton.github.io/rust-derivative/latest/cmp.html

Then shall we also remove #[derive(Ord, PartialOrd)]?

True. Lemme take a look.

BTW the way I implement the struct type here might still make us vulnerable to long identifier attack. I'll use Arc instead of cloning to avoid memory consumption.

vgao1996

Methodology-wise this looks good to me. Here is some early feedback.

Still need to take a closer look at loader/mod.rs.

vgao1996 · 2023-08-08T17:03:19Z

third_party/move/move-vm/runtime/src/data_cache.rs


 pub struct AccountDataCache {
-    data_map: BTreeMap<Type, (MoveTypeLayout, GlobalValue)>,
+    data_map: HashMap<Type, (MoveTypeLayout, GlobalValue)>,


Based on some of the experiments we did a long time ago, I'd actually not assume HashMap to be faster than BTreeMap. The reason is that for regular inputs, BTreeMap almost never has to perform full comparisons while HashMap requires the key to be fully hashed.

vgao1996 · 2023-08-08T17:07:21Z

third_party/move/move-vm/runtime/src/loader/type_loader.rs

+            }
+        },
+        SignatureToken::StructInstantiation(sh_idx, tys) => {
+            let type_parameters: Vec<_> = tys


Nit: these should be called args, not parameters.

or "actual parameters". Perhaps "actual_type_parameters". But "type_args" is better.

vgao1996 · 2023-08-08T17:11:30Z

third_party/move/move-vm/runtime/src/loader/type_loader.rs

+                name: Arc::new(StructName {
+                    name: struct_name.to_owned(),
+                    module: module_id,
+                }),


Is there a particular reason why we're moving away from the index based type representation here? The creation of the struct name seems pretty expensive, as you need to make multiple allocations.

NVM I understand it now. It's because we no longer have a global struct table.

vgao1996

Overall this looks good to me. Just need to be careful with cloning identifiers.

runtian-zhou · 2023-08-25T21:08:08Z

Update: Added a randomized testing strategy to make sure the loader is handling function invokations properly.

The idea is to use proptest to generate a DAG that will represent the dependncy relationship between modules. Each module will invoke the designated function in its succesor module and make sure that the execution result is as expected. With this we can make sure the loader is resolving dependencies correctly.

zekun000 · 2023-08-31T23:30:40Z

third_party/move/move-vm/runtime/src/interpreter.rs

-                .as_ref()
-                .ok_or_else(|| { PartialVMError::new(StatusCode::VM_MAX_VALUE_DEPTH_REACHED) })?
-                .solve(&ty_arg_depths))
+            let formula = resolver.loader().calculate_depth_of_struct(*si)?;


what's the motivation of moving the cache from inline to type cache?

The motivation is to get ready for module invalidation. Depth formula is a property that could only be calculated by fetching all transitive deps. With this PR I'd like to make module resolution to be not dependent on the deps so that's why I moved it.

third_party/move/move-vm/runtime/src/loader/mod.rs

zekun000 · 2023-09-01T20:30:44Z

aptos-move/framework/src/module_metadata.rs

    }
 }

+thread_local!(static METADATA_CACHE: RefCell<HashMap<Vec<Metadata>, Option<RuntimeModuleMetadataV1>>> = RefCell::new(HashMap::new()));


this is awful

Is it possible we could have a concurrent map of some sort and share across threads? What are the constraints that causes you to use thread_local? This does seem awkward to use, though I guess Rust isn't Java with a simple ConcurrentHashMap.

agreed, hope we can avoid it in this PR

The problem is not so much the thread local (those are a beautiful concept) but that we hash over the entire content of the metadata. So the key to this map here can be kilobytes in size!

This cache really belongs into the loader, where metadata should be treated as a first class citizen. Or did we fully removed the possibility to cache derived information from the loaded module data? This would be bad. We will sooner or later need to be able to derive info not only for metadata but also other things. I thought the design which was discussed last fall has the option to deal with derived data.

I think the proper way to do this is to attach the derived data to the Module/Executable lifetime but probably require extra work

we all agreed to get this out :)

This is a bit of disappointing, because the major ask we had already last fall to @gelash and to Dario at this point was to come up with an approach to cache derived information for loaded modules. Now we are saying this is postponed to some later work. In fact this PR REMOVES mostly previous caching (the type and function indices), so I'm suspicious you don't actually have any design for this in the first place.

What do you mean by this? The metadata cache here is more of just for performance optimization that might be affected by this PR so it's an orthogonal issue. We definitely want to be able to cache the metadata alongside the resolved module but that's some future work. I tried a bit previously but it became really ugly.

third_party/move/move-vm/runtime/src/loader/modules.rs

github-actions · 2023-09-26T19:07:29Z

✅ Forge suite `compat` success on `aptos-node-v1.6.2` ==> `d600382f14e4a29ddb76731e67c7dd54d373b08d`

Compatibility test results for aptos-node-v1.6.2 ==> d600382f14e4a29ddb76731e67c7dd54d373b08d (PR)
1. Check liveness of validators at old version: aptos-node-v1.6.2
compatibility::simple-validator-upgrade::liveness-check : committed: 3957 txn/s, latency: 7597 ms, (p50: 6000 ms, p90: 10600 ms, p99: 30800 ms), latency samples: 178100
2. Upgrading first Validator to new version: d600382f14e4a29ddb76731e67c7dd54d373b08d
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1735 txn/s, latency: 15958 ms, (p50: 19200 ms, p90: 22200 ms, p99: 23200 ms), latency samples: 92000
3. Upgrading rest of first batch to new version: d600382f14e4a29ddb76731e67c7dd54d373b08d
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1038 txn/s, latency: 15779 ms, (p50: 18800 ms, p90: 22200 ms, p99: 22900 ms), latency samples: 91376
4. upgrading second batch to new version: d600382f14e4a29ddb76731e67c7dd54d373b08d
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3678 txn/s, latency: 8737 ms, (p50: 9700 ms, p90: 12400 ms, p99: 12800 ms), latency samples: 143460
5. check swarm health
Compatibility test for aptos-node-v1.6.2 ==> d600382f14e4a29ddb76731e67c7dd54d373b08d passed
Test Ok

Grafana dashboard
Humio Logs
Test runner output
Test run is land-blocking

github-actions · 2023-09-26T19:10:46Z

✅ Forge suite `realistic_env_max_load` success on `d600382f14e4a29ddb76731e67c7dd54d373b08d`

two traffics test: inner traffic : committed: 6208 txn/s, latency: 6307 ms, (p50: 6000 ms, p90: 8000 ms, p99: 15300 ms), latency samples: 2694460
two traffics test : committed: 100 txn/s, latency: 2894 ms, (p50: 2500 ms, p90: 3300 ms, p99: 12700 ms), latency samples: 1820
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.227, avg: 0.212", "QsPosToProposal: max: 0.332, avg: 0.173", "ConsensusProposalToOrdered: max: 0.664, avg: 0.618", "ConsensusOrderedToCommit: max: 0.466, avg: 0.449", "ConsensusProposalToCommit: max: 1.111, avg: 1.067"]
Max round gap was 1 [limit 4] at version 1172354. Max no progress secs was 3.95191 [limit 10] at version 1172354.
Test Ok

Grafana dashboard
Humio Logs
Test runner output
Test run is land-blocking

github-actions · 2023-09-26T19:20:11Z

✅ Forge suite `framework_upgrade` success on `aptos-node-v1.5.1` ==> `d600382f14e4a29ddb76731e67c7dd54d373b08d`

Compatibility test results for aptos-node-v1.5.1 ==> d600382f14e4a29ddb76731e67c7dd54d373b08d (PR)
Upgrade the nodes to version: d600382f14e4a29ddb76731e67c7dd54d373b08d
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 4267 txn/s, latency: 4850 ms, (p50: 4800 ms, p90: 7800 ms, p99: 8700 ms), latency samples: 238960
5. check swarm health
Compatibility test for aptos-node-v1.5.1 ==> d600382f14e4a29ddb76731e67c7dd54d373b08d passed
Test Ok

Grafana dashboard
Humio Logs
Test runner output
Test run is land-blocking

* Rename variant * [move-vm] Store ability in runtime type * fixup! [move-vm] Store ability in runtime type * fixup! fixup! [move-vm] Store ability in runtime type * fixup! fixup! fixup! [move-vm] Store ability in runtime type * [move_vm] Split loader into multiple files * [move_vm] Cache struct type for modules * [move_vm] Move depth cache to type cache * [move-vm] Use name to replace type index * [move-vm] Remove function index * [move-vm] Inline functions * [move-vm] Remove cached index from definition * [move-vm] Check cross module linking before loading * [move-vm] Remove global struct cache * [move-vm] Inline struct name to avoid excessive memory allocation * [move-vm] Split function in loader * [move-vm] Split out module cache * [e2e-test] Add randomized test for loader * Fix Lint * [move-vm] Cache signature resolution * [move-vm] Removed unneeded signature token * [move-vm] Arc-ed type argument * Fix Zekun's comments * fixup! [e2e-test] Add randomized test for loader * add comments for the test strategy * Rename struct name * More renaming * Addressing more comments

@main

* Update system-integrators-guide.md (aptos-labs#10087) The old url file doesn't exist anymore. must be changed to the new one. * [dashboards] sync grafana dashboards * [typo*] (aptos-labs#9912) * [typo*]Update delegation-pool-operations.md * [typo*]Update run-a-fullnode-on-gcp.md * [typo*]Update run-a-fullnode-on-gcp.md * [typo*]Update glossary.md * [Spec] Ensures for stake.move (aptos-labs#9700) * 1 * cannot finish hp * remove some wrong statements * hp1-3 * rewrite hp2 * rewrite hp2 again * hp1-3 * init * fix * fix ensures * fix ensure in the update_sat * fix comment * fix md * fix new comment * update comment * fix indent * fix timeout * fix timeout * fix timeout --------- Co-authored-by: chan-bing <zzywullr@gmail.com> * Update fullnode-source-code-or-docker.md (aptos-labs#9914) * [typo*]Update index.md (aptos-labs#9913) * [typo*]Update index.md * Update index.md --------- Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com> * [link*Update fullnode-source-code-or-docker.md (aptos-labs#9978) * Update aptos-bitvec/src/lib.rs (aptos-labs#10045) There is a mistake in the comment * Update delegation-pool-operations.md (aptos-labs#10171) remove deprecated function * trivial: fix rocksdb property reporter * forge: ability to override resource ask in Rust * [CLI] Add txn stream to local testnet (aptos-labs#10101) * [dashboards] sync grafana dashboards * [TS SDK V2] Add `General` api class (aptos-labs#10185) * add general api queries * add general api queries * [move] make vm clonable * Pass down resolver to MoveVmExt::new() In preparation for caching a warm VM * Move VM warm up into adapter * WarmVmCache * Fix circular dependency * [object] refactor burn/unburn (aptos-labs#9785) * Fix Issue 9717 by breaking critical edges in CFG in move-compiler (v1) (aptos-labs#10064) Fix issue aptos-labs#9717 by breaking critical edges in the CFG in move-compiler (v1). While issue 9717 mentions "inline", this is a red herring as inlining a vector loop just makes it more likely to have a reference parameter on the stack, which leads to a failure to correctly drop dead references in move-compiler/src/cfgir/liveness/mod.rs. I've left the initial test case bug_9717.move and added bug_9717_looponly.move that just has a loop illustrating the problem, along with several variants on the loop (one of them, break2, also exhibiting the problem). Anyway, the failure to properly place the reference drop happens when the constructed CFG has critical edges (edges from a node with multiple outgoing edges to a node with multiple ingoing edges). Such a situation is well-known in compiler literature to make it difficult to place instructions precisely based on certain analyses. Fortunately, this seems to be easily fixed by adding a small pass to add a node on each such critical edge, so that the drop can be placed properly even in the case of a break. Unfortunately, later passes can't deal with the resulting deep expression trees and chains of direct jumps that result in some cases, so I had to fix hlir/translate.rs to reduce stack depth for a given expression, and cfgir/optimize/inline_blocks.rs to properly remove the unneeded jumps. New tests also reveal misplaced warnings/errors about unused variables in the presence of inlining, so that was also fixed. * [Doc] Update build e2e dapp docs (aptos-labs#10165) * Update build e2e dapp docs * update js -> jsx * update * Update developer-docs-site/docs/tutorials/build-e2e-dapp/4-fetch-data-from-chain.md * f * f --------- Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com> * implement memoize high order function (aptos-labs#10110) * [Executor] Metadata and exists support (in Block-STM/executor) (aptos-labs#10170) * [move-vm] Refactor Loader (aptos-labs#9320) * Rename variant * [move-vm] Store ability in runtime type * fixup! [move-vm] Store ability in runtime type * fixup! fixup! [move-vm] Store ability in runtime type * fixup! fixup! fixup! [move-vm] Store ability in runtime type * [move_vm] Split loader into multiple files * [move_vm] Cache struct type for modules * [move_vm] Move depth cache to type cache * [move-vm] Use name to replace type index * [move-vm] Remove function index * [move-vm] Inline functions * [move-vm] Remove cached index from definition * [move-vm] Check cross module linking before loading * [move-vm] Remove global struct cache * [move-vm] Inline struct name to avoid excessive memory allocation * [move-vm] Split function in loader * [move-vm] Split out module cache * [e2e-test] Add randomized test for loader * Fix Lint * [move-vm] Cache signature resolution * [move-vm] Removed unneeded signature token * [move-vm] Arc-ed type argument * Fix Zekun's comments * fixup! [e2e-test] Add randomized test for loader * add comments for the test strategy * Rename struct name * More renaming * Addressing more comments * [consensus] Dedicated channel for proposal buffering and batch process * revert 9ed1da8 (aptos-labs#10256) * update move run to include more params (aptos-labs#9932) * [CLI] Update CLI binary doc site to include openssl3 instruction (aptos-labs#9964) * Update CLI binary doc site to include openssl3 instruction * Update developer-docs-site/docs/tools/aptos-cli/install-cli/download-cli-binaries.md Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com> * Update developer-docs-site/docs/tools/aptos-cli/install-cli/download-cli-binaries.md Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com> --------- Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com> Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com> * [dag] split notifier into Order and Proof Notifier [dag] additional ledger info verification checks [dag] separate out highest committed round provider [dag] introduce a ledger info provider trait * [CLI] Fix faucet component of local testnet without --force-restart (aptos-labs#10214) * Update Docker images (aptos-labs#10208) Co-authored-by: gedigi <gedigi@users.noreply.github.com> * [ts-sdk-v2] Add .npmrc to ensure pnpm lockfile is consistent (aptos-labs#10251) * Update dependabot.yml to disable version update PRs (aptos-labs#10249) * CLI version bump to 2.1.1 (aptos-labs#10272) * Get Drand Move example to compile (aptos-labs#10196) * change names of drand and veiledcoin * drand compiles * drand tests compile * remove diff.txt * add retrieve lottery winner function * Update CLI dry run text (aptos-labs#10273) * add partial transaction queries (aptos-labs#10275) * [TSSDKv2][1/n] add Ed25519 classes (aptos-labs#10157) * add PrivateKey, PublicKey, and Signature classes * Add helper.ts and fixes based on comments * add asymmetric_crypto file which includes all crypto base classes as abstract. And have all concrete crypto classes to extend from based abstract classes * Update crypto classes name to include Ed25519 prefix * [TS SDK V2] Add Indexer account queries (aptos-labs#10216) * implement account indexer queries * add indexer account api queries * address feedback * [TSSDKv2][3/n] Account and AuthenticationKey Classes (aptos-labs#10210) * Add helper.ts and fixes based on comments * add asymmetric_crypto file which includes all crypto base classes as abstract. And have all concrete crypto classes to extend from based abstract classes * Update crypto classes name to include Ed25519 prefix * fixes based on comments * Add account and auth key classes * update Account class to accept abstract PublicKey and PrivateKey * update methods' comment * [ts-sdk-v2] Add getTransactions to the sdk-v2 transaction API (aptos-labs#10288) * Add getTransactions to the sdkv2 transaction API * Update ecosystem/typescript/sdk_v2/src/internal/transaction.ts Co-authored-by: Maayan <maayan@aptoslabs.com> * update after merge --------- Co-authored-by: Greg Nazario <greg@gnazar.io> Co-authored-by: Maayan <maayan@aptoslabs.com> * [aptos-stdlib] Cleanup error codes for divide by 0 * [marketplace-example] Fix royalties edge conditions There were two bugs with royalties and listings. One was that in v1, royalties could have a denominator 0, or be greater than 100% in legacy NFTs. This means that it's possible that payouts don't work correctly, or take more than expected. Now, royalties are bounded to 0-100%. The second issue was that commission was taken after royalties, but didn't consider that royalties could be 100%. Now, royalties are taken first, and commission is taken out of the remainder. This does mean the marketplace may not have any commission if the royalties are set to 100%. * [Spec] Fix spec (aptos-labs#10215) * fix spec * staking_contract spec * [Rust] Upgrade to Rust version 1.72.1 * [Forge][Chaos] remove jitter, make inter-region BW 300 Mbps (aptos-labs#10277) ### Description The changes are based on observations while measuring network performance and reading more into the dataset used. * Jitter: My understanding is that re-ordering of packets should be pretty rare in the real world, while the previous jitter configs would introduce re-ordering quite frequently. Unless we have a strong belief that jitter is present in our networks, we shouldn't mess with this. * Inter-region bitrate: The numbers this was based on are iperf with a single TCP stream. The results correlate strongly with RTT, which suggests that RTT is the limiting factor, so there's no real reason to constrain BW itself. For now, 300 Mbps is as fast as our network stack will go for 100+ ms RTT. * [Forge][PFN] remove epoch changes from some tests (aptos-labs#10278) ### Description This is in an effort to reduce noise in the PFN tests. Epoch changes are an obvious source of noise, although it's unclear how much it contributes to noise difference between runs. To still have coverage of epoch changes, the tests without chaos will still do 5 minute epoch changes. ### Test Plan Run ad-hoc forge run, observe no epoch changes. * [Forge][netbench] split into large and small messages for two region test (aptos-labs#10283) ### Description We found that the throughput for large and small messages is very different with large latencies. We update the test to run with large messages and then split out a small messages test. The large messages are useful for sanity checking the network setup, the small messages is something that we can hopefully improve upon. * [crypto] add secp256k1 support to aptos-crypto Pretty straight forward, but I think our crypto apis are really not great. The amount of traits that should really be compressed down into a single trait for PublicKey, PrivateKey, and Signature. This would make maintaining and adding new libraries so much easier. * [types] Add support for secp256k1 authenticator with this we can now send secp256k1 signed transactions to the blockchain... I'm going to do some code refactoring in authenticators and transactions before resuming the end-to-end testing and the feature gating of this feature. * [openapi] update openapi with secp256k1 * [types] remove AuthenticationKeyPreimage This was adding extra code and adding complexity to what is already a complex space. If we aren't going to use the preimage, we have no need to write the code. We need to be more dilligent about removing this type of unnecessary code from the codebase, because it really impairs our ability to move fast. * [types] remove prefix and rename derived_address on AuthenticationKey * AuthenticationKey and Address are 1:1, so all this code is legacy based upon some weird goal of trying to compress the account address into an insecure size back in Libra. Even the authors of this code have since moved to 32-bytes in their own blockchain. * prefix is never used and removed. * derived_address -> account_address because there's no derivation it is literally 1:1 * [api] end to end test for secp256k1 ecdsa * [features] add secp256k1 ecdsa * [Aptos Data Poller] Improve peer polling logic. * [Aptos Data Client] Update tests for new poller. * [exp] make it work for any upstream repo (aptos-labs#10016) * [indexer-grpc] k6 loadtest (aptos-labs#9493) * [Aptos Data Client] Improve selection for optimistic fetch and subscriptions. * [Aptos Data Client] Add new tests for peer selection logic. * Make InMemoryStateCalculatorV2 work with state sync chunks. (aptos-labs#10263) * [release-builder] Increase lockup before executing proposals Increase the lockup before executing proposals Currently the release flow started consistently failing because we need to increase the lockup This ensures that the lockup is sufficient before executing transactions Test Plan: not sure how to test other than by running against testnet??? Please advise * [Network] Replace RwLock with ArcSwap for trusted peers. * [Network] Reduce lock contention for peer metadata using cache. * replay-verify.yaml not reference workflows by @main * replay-verify: not cancel other sub-jobs on first failure * recalibrate single node benchmark for perf regression aptos-labs#10298 * [move-model] fix internal assertion violation in definition analysis (aptos-labs#10292) Co-authored-by: Aalok Thakkar <aalok@Aaloks-MacBook-Pro.local> * [ts-sdk-v2] Rename endpoint to path in client * [ts-sdk-v2] Rename originMethod to name for API requests * [ts-sdk-v2] Simplify fullnode get requests * [ts-sdk-v2] Make Post requests simpler * [ts-sdk-v2] Update format and lint for SDK This fixes most of the SDK lints except for the pieces around the static functions for deserialize. * [ts-sdk-v2] Fix broken tests * [ts-sdk-v2] Replace "Generic error" * [ts-sdk-v2] Add derivation path invalid test * [ts-sdk-v2] Cleanup Account and crypto for code reuse 1. Allows deriving public key from private key 2. Detaches accounts from Single Ed25519 3. Adds some consistency to input naming * [ts-sdk-v2] Unify authentication key creation * [ts-sdk-v2] Add Authentication Key scheme enum * [ts-sdk-v2] Add ability to derive authkeys from non-key schemes * [ts-sdk-v2] Add docs to AuthenticationKey * [ts-sdk-v2] Cleanup documentation on Ed25519 * [ts-sdk-v2] Add documentation to asymmetric crypto * [ts-sdk-v2] Add docs, cleanup multi-ed25519 * [ts-sdk-v2] Move paginate with cursor to client * [ts-sdk-v2] Cleanup unused lint ignores * [ts-sdk-v2] Authentication key testing and message improvements * [ts-sdk-v2] Rename some client inputs and add documentation * [dag] epoch manager integration; dag is here * [dashboards] sync grafana dashboards * fix WarmVmCache Natives are stateful so must be covered by the WarmVmId. 1. add TimedFeaturesBuilder and convert TimedFeatures to an array of booleans. 2. add SafeNativeBulder::id_bytes() to fix the bug 3. rebuild vm (and hence all the natives) only when the builder id_bytes() change. * use s5cmd for downloading files in replay-verify official aswcli experiencing random crashes * [e2e-tests] Fix compilation error (aptos-labs#10318) * Drop frozen root after make checkpoint. (aptos-labs#10327) * redistribute mainnet replay sub-job ranges * [TS-SDK v2] Updating the `Deserializable` interface and making `Serializable` an abstract class (aptos-labs#10307) * Removing export from Deserializable to facilitate using a static `deserialize` method, fixing error messages in unit tests for multi_ed25519, and changing Serializable to an abstract class that implements `bcsToBytes()`. Removed abstract deserialize from public/private key classes. * Re-adding doc comments * Adding `serialize` and `deserialize` and corresponding unit tests to the AccountAddress class * Fixing multi_ed25519 error messages * Updating doc comments for Deserializable to clarify what its purpose is. * [move unit tests] Run extended checker as part of unit tests (aptos-labs#10309) * [move unit tests] Run extended checker as part of unit tests Closes aptos-labs#9251 This runs the extended checker as part of Aptos unit tests (either our own Rust integrated tests or from the CLI). It uses the same technique as we already used for native extensions specific to Aptos: a hook is defined where additional, move-model based validations can be run. This is hook is then connected to the extended checker when running Aptos tests. The implementation also optimizes the construction of the move model: if that one is already needed by abi generation (which is the default), it is not constructed a 2nd time for the extended checker -- both for the existing build step and the new test step. This should avoid one full additional compilation (source -> bytecode -> model run). * Extended checks until now excluded test code, leading to wrong usage of entry functions and attributes marked as test-only. Because fixing this is a breaking change, this commit adds the behavior to check test code via a new CLI option `--check-test-code`. This flag should eventually become default behavior. Also fixes some reviewer comments. * implement forge links to axiom (aptos-labs#10330) * [gas-calibration][simple] ignore terms with 0-coefficients (aptos-labs#9742) * [indexer][api] update the metrcis for api gateway consumption. (aptos-labs#10322) * [dag] add various counters * [TS-SDK v2] Adding `serializeVector` and `deserializeVector` to SDK v2 (aptos-labs#10347) * Adding `serializeVector` and `deserializeVector` to serializer.ts and deserializer.ts as well as a unit test for each * Adding documentation for each function * Removing redundant line in doc comment * Removing redundant line in deserializer doc comment too * Allow using skip_index_and_usage on state sync code path. (aptos-labs#10303) * [dag] add a few structured logging * jin_fix_ed25519_derive_publickey (aptos-labs#10357) * [ts-sdk-v2] Add MIME types, and convert input types from string to Enum (aptos-labs#10308) * Remove expensive counters (aptos-labs#10188) * [TS SDK V2] Port over transaction types (aptos-labs#10364) * transaction types * address comments * [CLI] Restructure local testnet code (aptos-labs#10252) * [release-builder] Fix increase lockup The previous PR was converting the private key arg to string incorrectly... ED25519 key type has a very silly to_string that returns debug output I suppose this is to prevent dumping the key, but really i think it should just not have a to_string and instead only have a debug method that does the same thing, and a very explicit to hex method so that you dont accidentally dump these. Test Plan: framework upgrade test succeeds, or at least doesnt fail on this error (there is still some underlying flakiness in the forge test itself) * [Sharded-Execution] Fix a race condition while fetching the state values on a shard from a remote stateview (aptos-labs#10320) [Sharded-Execution] Fix a race condition while fetching the state values on a shard from a remote stateview * [SDKv2] derive publickey unit test (aptos-labs#10358) * jin_fix_ed25519_derive_publickey * Add unit test for publicKey() derivate method * [framework] Turn on test-only checking and fix errors (aptos-labs#10368) This turns on running extended checks also on test-only code in the framework unit tests. A few errors discovered this way are fixed. Changes to function declarations in this PR do not effect compatibility because only test-only functions are effected which are stripped before deployment. Related to aptos-labs#10335, but more needs to be done to make this behavior the default. This cannot happen before the next framework release. * Add CI to check for banned CLI deps (aptos-labs#10338) * refactor: replace multiply_then_divide using math64::mul_div (aptos-labs#10047) Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com> * [dag] hardening message verification * [TS SDK V2] Port over account and transaction authenticator, signed transaction type (aptos-labs#10367) * transaction types * address comments * authenticators * previewnet flow in the benchmark (aptos-labs#10305) * [ts sdk-v2] Mini get/post request refactor (aptos-labs#10380) * [Sharded-Execution-GRPC] Add GRPC communication for sharded execution. (aptos-labs#10274) [Sharded-Execution-GRPC] Add GRPC communication for sharded execution In this commit we replace the existing socket based communication (that is message send and message recv) with GRPC. Here we get the basic GRPC reliability. More reliability and better performance to come in subsequent commits * [forge] Fix fullnode override in forge (aptos-labs#10382) ### Description Fixing a mistake made in a previous PR, that made fullnode configs not apply (and overwrite validator configs) ### Test Plan Ran a fullnode test and manually check the config that is logged. * Update delegation-pool-operations.md (aptos-labs#10299) reorganize page * [indexer grpc] update the metrics for usage analysis. (aptos-labs#10383) * Update staking-pool-operations.md (aptos-labs#10300) * Update staking-pool-operations.md reorganize page * Update staking-pool-operations.md add step for owner account * Update staking-pool-operations.md * Update staking-pool-operations.md remove heading --------- Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com> * [Storage][Pruner] Set min_readable_version and metrics. (aptos-labs#10381) * rename enum variants and transaction arguments (aptos-labs#10384) * [indexer grpc] More fields to short connection metric (aptos-labs#10385) * [Tutorials] Replacing the old "Your First NFT" tutorial entry function calls to use aptos_token.move and new indexer queries (aptos-labs#9424) * Overwriting the old your-first-nft tutorial in the docs to replace the 0x3 contract calls with the aptos_token contract calls. Also added corresponding documentation tags to referenced indexer queries and the new token client calls. * Changing simple-aptos-token.py to simple_aptos_token.py and updating the Makefile to include it as an example * Changing `create_token` to `mint_token` in python tutorial and `createToken` in typescript to `mint` * Updating typescript example to work correctly if indexer/fullnode chainId aren't in sync * [Python SDK] Use AccountAddress.from_str_relaxed when parsing addresses from the node API --------- Co-authored-by: rtmtree <36580295+rtmtree@users.noreply.github.com> Co-authored-by: rustielin <rustielin@users.noreply.github.com> Co-authored-by: Vladislav ~ cryptomolot <88001005+cryptomolot@users.noreply.github.com> Co-authored-by: Zorrot Chen <UI_Zorrot@163.com> Co-authored-by: chan-bing <zzywullr@gmail.com> Co-authored-by: Freesson ~ cryptomolot <semenikhinas@gmail.com> Co-authored-by: Christian Sahar <125399153+saharct@users.noreply.github.com> Co-authored-by: Jiege <stevekeol.x@gmail.com> Co-authored-by: michelle-aptos <120680608+michelle-aptos@users.noreply.github.com> Co-authored-by: aldenhu <msmouse@gmail.com> Co-authored-by: Daniel Porteous (dport) <daniel@dport.me> Co-authored-by: Maayan <maayan@aptoslabs.com> Co-authored-by: Zekun Li <li.zekun@gmail.com> Co-authored-by: Aaron <lightmark@gmail.com> Co-authored-by: Brian R. Murphy <132495859+brmataptos@users.noreply.github.com> Co-authored-by: Oliver He <heliuchuan@gmail.com> Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com> Co-authored-by: Rati Gelashvili <gelash@users.noreply.github.com> Co-authored-by: runtianz <runtian@aptoslabs.com> Co-authored-by: Jin <128556004+0xjinn@users.noreply.github.com> Co-authored-by: Balaji Arun <balajia@vt.edu> Co-authored-by: Gerardo Di Giacomo <gerardo@aptoslabs.com> Co-authored-by: gedigi <gedigi@users.noreply.github.com> Co-authored-by: Greg Nazario <greg@gnazar.io> Co-authored-by: Michael Straka <mstraka100@gmail.com> Co-authored-by: Teng Zhang <rahxephon89@163.com> Co-authored-by: Josh Lind <josh.lind@hotmail.com> Co-authored-by: Brian (Sunghoon) Cho <brian@aptoslabs.com> Co-authored-by: Rustie Lin <rustie117@gmail.com> Co-authored-by: Guoteng Rao <3603304+grao1991@users.noreply.github.com> Co-authored-by: Perry Randall <perryjrandall@gmail.com> Co-authored-by: igor-aptos <110557261+igor-aptos@users.noreply.github.com> Co-authored-by: aalok-t <140445856+aalok-t@users.noreply.github.com> Co-authored-by: Aalok Thakkar <aalok@Aaloks-MacBook-Pro.local> Co-authored-by: Matt <90358481+xbtmatt@users.noreply.github.com> Co-authored-by: Wolfgang Grieskamp <wg@aptoslabs.com> Co-authored-by: Christian Theilemann <christian@aptoslabs.com> Co-authored-by: Victor Gao <10379359+vgao1996@users.noreply.github.com> Co-authored-by: larry-aptos <112209412+larry-aptos@users.noreply.github.com> Co-authored-by: Sital Kedia <sitalkedia@users.noreply.github.com> Co-authored-by: Manu Dhundi <manudhundi@gmail.com> Co-authored-by: 0xbe1 <0xbetrue@gmail.com> Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com>

* Rename variant * [move-vm] Store ability in runtime type * fixup! [move-vm] Store ability in runtime type * fixup! fixup! [move-vm] Store ability in runtime type * fixup! fixup! fixup! [move-vm] Store ability in runtime type * [move_vm] Split loader into multiple files * [move_vm] Cache struct type for modules * [move_vm] Move depth cache to type cache * [move-vm] Use name to replace type index * [move-vm] Remove function index * [move-vm] Inline functions * [move-vm] Remove cached index from definition * [move-vm] Check cross module linking before loading * [move-vm] Remove global struct cache * [move-vm] Inline struct name to avoid excessive memory allocation * [move-vm] Split function in loader * [move-vm] Split out module cache * [e2e-test] Add randomized test for loader * Fix Lint * [move-vm] Cache signature resolution * [move-vm] Removed unneeded signature token * [move-vm] Arc-ed type argument * Fix Zekun's comments * fixup! [e2e-test] Add randomized test for loader * add comments for the test strategy * Rename struct name * More renaming * Addressing more comments

runtian-zhou requested review from gelash, gerben-stavenga and vgao1996 July 26, 2023 20:59

runtian-zhou force-pushed the loader_refactor branch 3 times, most recently from 2a6182f to f2c9ee7 Compare July 27, 2023 21:32

ziaptos reviewed Jul 28, 2023

View reviewed changes

runtian-zhou force-pushed the loader_refactor branch from f2c9ee7 to 343a32a Compare August 4, 2023 21:43

vgao1996 reviewed Aug 4, 2023

View reviewed changes

runtian-zhou mentioned this pull request Aug 8, 2023

[move-vm] Cache ability in runtime types #9184

Closed

vgao1996 reviewed Aug 8, 2023

View reviewed changes

vgao1996 reviewed Aug 18, 2023

View reviewed changes

vgao1996 approved these changes Aug 22, 2023

View reviewed changes

runtian-zhou force-pushed the loader_refactor branch 2 times, most recently from f16e865 to b1dc12a Compare August 28, 2023 20:34

runtian-zhou marked this pull request as ready for review August 28, 2023 20:42

runtian-zhou requested review from davidiw, movekevin, wrwg and zekun000 as code owners August 28, 2023 20:42

runtian-zhou changed the title ~~[Preview][move-vm] Refactor Loader~~ [move-vm] Refactor Loader Aug 28, 2023

runtian-zhou force-pushed the loader_refactor branch 2 times, most recently from 508f5cc to 7d15c1e Compare August 31, 2023 02:21

zekun000 reviewed Sep 1, 2023

View reviewed changes

third_party/move/move-vm/runtime/src/loader/modules.rs Outdated Show resolved Hide resolved

runtian-zhou force-pushed the loader_refactor branch 3 times, most recently from 660f0ac to e5fbafb Compare September 6, 2023 22:21

runtian-zhou added 17 commits September 26, 2023 11:18

[move-vm] Remove cached index from definition

29cc8a2

[move-vm] Check cross module linking before loading

5c70319

[move-vm] Remove global struct cache

5e24b81

[move-vm] Inline struct name to avoid excessive memory allocation

60ef061

[move-vm] Split function in loader

c3711bb

[move-vm] Split out module cache

96ae1ee

[e2e-test] Add randomized test for loader

3b3dcf4

Fix Lint

58cb639

[move-vm] Cache signature resolution

4a26b73

[move-vm] Removed unneeded signature token

4fbb260

[move-vm] Arc-ed type argument

807e3ab

Fix Zekun's comments

846cf3d

fixup! [e2e-test] Add randomized test for loader

b567724

add comments for the test strategy

aa7a383

Rename struct name

8e51a0a

More renaming

84bfb66

Addressing more comments

d600382

runtian-zhou force-pushed the loader_refactor branch from aebbd9e to d600382 Compare September 26, 2023 18:27

runtian-zhou enabled auto-merge (squash) September 26, 2023 18:37

This comment has been minimized.

Sign in to view

runtian-zhou merged commit 13eaf34 into main Sep 26, 2023

runtian-zhou deleted the loader_refactor branch September 26, 2023 19:10

[move-vm] Refactor Loader #9320

[move-vm] Refactor Loader #9320

Uh oh!

Conversation

runtian-zhou commented Jul 26, 2023

Description

Test Plan

Uh oh!

ziaptos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vgao1996 Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ziaptos left a comment

Choose a reason for hiding this comment

Uh oh!

vgao1996 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vgao1996 left a comment

Choose a reason for hiding this comment

Uh oh!

vgao1996 Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vgao1996 Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vgao1996 Aug 8, 2023 •

edited

Loading

vgao1996 Aug 8, 2023 •

edited

Loading

vgao1996 Aug 8, 2023 •

edited

Loading