这是indexloc提供的服务,不要输入任何密码
Skip to content

scripts(termux_download_deb_pac): optimization and refactoring #24621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

twaik
Copy link
Member

@twaik twaik commented May 4, 2025

Moved duplicated code to a separate function to make code more readable and editable.
Removed invocations of external commands like sed and seq to avoid binary execution overhead, using bash builtins instead.
Replaced all test and [ with [[ to improve speed.
Grouped some variable declarations under single expressions to avoid parsing overhead.
Replaced ${var} with $var in some places to reduce parsing overhead.

@twaik twaik requested a review from Grimler91 as a code owner May 4, 2025 11:46
@twaik twaik requested a review from Maxython May 4, 2025 11:50
@twaik twaik force-pushed the optimize-termux_download_deb_pac branch from 5068f94 to eb6f7dc Compare May 4, 2025 11:53
@TomJo2000
Copy link
Member

Moved duplicated code to a separate function to make code more readable and editable. Removed invocations of external commands like sed and seq to avoid binary execution overhead, using bash builtins instead. Replaced all test and [ with [[ to improve speed.

Makes sense.

Grouped some variable declarations under single expressions to avoid parsing overhead.

That's probably not worth worrying about but go ahead.

Replaced ${var} with $var in some places to reduce parsing overhead.

Since when does that cause additional overhead?
I'm gonna need to see some benchmarks for that one in particular.

@twaik
Copy link
Member Author

twaik commented May 4, 2025

Since when does that cause additional overhead?
I'm gonna need to see some benchmarks for that one in particular.

$ time for ((i=0; i<10000000; i++)); do [[ "$PKG_HASH" != "null" ]]; done
real    0m13.239s
user    0m13.238s
sys     0m0.001s

$ time for ((i=0; i<10000000; i++)); do [[ "${PKG_HASH}" != "null" ]]; done
real    0m14.320s
user    0m14.318s
sys     0m0.001s

@TomJo2000
Copy link
Member

Since when does that cause additional overhead?
I'm gonna need to see some benchmarks for that one in particular.

$ time for ((i=0; i<10000000; i++)); do [[ "$PKG_HASH" != "null" ]]; done
real    0m13.239s
user    0m13.238s
sys     0m0.001s

$ time for ((i=0; i<10000000; i++)); do [[ "${PKG_HASH}" != "null" ]]; done
real    0m14.320s
user    0m14.318s
sys     0m0.001s

10 million iterations.
So we are literally talking about microseconds here.

@twaik
Copy link
Member Author

twaik commented May 4, 2025

Yes. Currently it does not matter but I am planning to parralelize packages downloading and after that this optimization will be more effective.

@TomJo2000
Copy link
Member

TomJo2000 commented May 4, 2025

Yes. Currently it does not matter but I am planning to parralelize packages downloading and after that this optimization will be more effective.

My local testing puts the per-instance difference of $var vs ${var} at about 6 microseconds.
Or 0.000006 seconds, or 0.006 milliseconds.

This codepath is nowhere near hot enough to apply that level of pedantry.
Even after you parallelize it.

@twaik
Copy link
Member Author

twaik commented May 4, 2025

So we are literally talking about microseconds here.

I am planning to write a cron script for package building validation.
We have ~>2500 packages, which are being built for 4 archs.
So if I optimize our buildsystem scripts to save at least 5 seconds I will save at least one calendar day during each build validation attempt. From this point of view it becomes more valuable, right?

@TomJo2000
Copy link
Member

TomJo2000 commented May 4, 2025

I am planning to write a cron script for package building validation. We have ~>2500 packages, which are being built for 4 archs. So if I optimize our buildsystem scripts to save at least 5 seconds I will save at least one calendar day during each build validation attempt. From this point of view it becomes more valuable, right?

4 arches. Those run in parallel.
2500-ish iterations each.
0.006ms saved on each.

We're saving 15ms (Milliseconds, not seconds), for each ${var} you turn into a $var.
There's not enough ${var}'s in the repo to make that worth while.

@twaik
Copy link
Member Author

twaik commented May 4, 2025

Ok, I'll revert this one after getting home.

@Maxython
Copy link
Member

Maxython commented May 4, 2025

I did a quick test of these changes on pacman package loading, and I can say that the current changes break pacman package loading completely.

@twaik
Copy link
Member Author

twaik commented May 4, 2025

I tried to build pacman packages in termux-packages repo and everything was fine, but it seems like it downloaded packages from debian repos. Does pacman-packages-related building require pacman-packages's version of repo.json?

@Maxython
Copy link
Member

Maxython commented May 4, 2025

I tried to build pacman packages in termux-packages repo and everything was fine, but it seems like it downloaded packages from debian repos. Does pacman-packages-related building require pacman-packages's version of repo.json?

For testing you need to reconfigure repo.json to the termux-pacman repositories (service) - https://github.com/termux-pacman/termux-packages/blob/master/repo.json

@twaik twaik force-pushed the optimize-termux_download_deb_pac branch 4 times, most recently from 02da454 to 1d62bb0 Compare May 4, 2025 16:33
@agnostic-apollo
Copy link
Member

Note that this has been rewritten in my repo.json changes pull including separate functions and any changes will be replaced in coming weeks. This also applies to most of the other dependencies/repo code. So you might want to hold off till after.

@twaik
Copy link
Member Author

twaik commented May 4, 2025

When exactly do you want to merge your PRs? And maybe you have links to your PRs?

@agnostic-apollo
Copy link
Member

Those changes are not final yet, so haven't pushed. I will working on them after current Termux app and keychain releases, when I work on the new bootstraps. There would be one for docker, another for repo.json (pull is at #21181 but changes there are outdated and different from my local changes) and another for other refactoring of the build system (some changes are at agnostic-apollo@8ab564a).

Instead of doing major refactoring for structuring, if there are any high latency parts of the code taking significant seconds (not micro or mill seconds), feel free to make changes for that, I will handle any conflicts against those.

Moved duplicated code to a separate function to make code more readable and editable.
Removed invocations of external commands like `sed` and `seq` to avoid binary execution overhead, using bash builtins instead.
Replaced all `test` and `[` with `[[` to improve speed.
Grouped some variable declarations under single expressions to avoid parsing overhead.
@twaik twaik force-pushed the optimize-termux_download_deb_pac branch from 1d62bb0 to 752d705 Compare May 4, 2025 17:14
@twaik
Copy link
Member Author

twaik commented May 4, 2025

Instead of doing major refactoring for structuring, if there are any high latency parts of the code taking significant seconds (not micro or mill seconds), feel free to make changes for that, I will handle any conflicts against those.

These changes are preparation for making parallel downloading of dependencies, but refactoring old code at least to get rid of duplicates is good too.

taking significant seconds (not micro or mill seconds)

Parallel downloads can save minutes of CI time during building big packages or packages with big amount of dependencies, i.e. firefox, thunderbird or kdenlive.

BTW do our servers have any kind of ratelimiting, preventing parallel downloads of packages or anything similar?

@twaik
Copy link
Member Author

twaik commented May 4, 2025

@Maxython now pacman-packages should be fine too.

return 1
fi

termux_download "${TERMUX_REPO_URL[${idx}-1]}/${PKG_PATH}" \
termux_download "${TERMUX_REPO_URL[${idx}]}/${PKG_PATH}" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always wondered about this, but, there is a set of very similar code in the file generate-bootstraps.sh. Do you think it would be appropriate or possible to use some of these optimizations to also optimize that, by making that script import and use the same code that this script is using?

curl --fail --location --output "$package_tmpdir/package.deb" "$package_url"

That script already sources properties.sh, so as I understand it, that means that it can already only work inside the Docker container, so there would be no meaningful downside to making it also source other files, especially if it would result in optimization and DRY.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new bootstrap will use new repo.json format and will share code with build-package.sh. I also also removed the duplicate code changed above. There are lot of other issues to handle in current code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, I didn't know that you already changed generate-bootstraps.sh in the future version you mentioned.

I have been using code that I heavily copied from generate-bootstraps.sh somewhere else as well, with a comment about where it came from, so I'll want to pause that and wait for the new generate-bootstraps.sh to come so I can keep my copied implementation mostly up-to-date with the reference implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${TERMUX_REPO_URL[${idx}-1]}

It is related to using seq instead of iterating over array keys.

Comment on lines +64 to +66
read -d "\n" PKG_PATH PKG_HASH <<< "$(jq -r '.["'"$PACKAGE"'"] |
select((.VERSION == "'"$VERSION_PACMAN"'" or "'"$TERMUX_WITHOUT_DEPVERSION_BINDING"'" == "true") and .SHA256SUM and .FILENAME) |
"'"$arch"'/\(.FILENAME) \(.SHA256SUM)"' "$pkg_file_path")" || :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the following changes to improve the readability of jq algorithms:

Suggested change
read -d "\n" PKG_PATH PKG_HASH <<< "$(jq -r '.["'"$PACKAGE"'"] |
select((.VERSION == "'"$VERSION_PACMAN"'" or "'"$TERMUX_WITHOUT_DEPVERSION_BINDING"'" == "true") and .SHA256SUM and .FILENAME) |
"'"$arch"'/\(.FILENAME) \(.SHA256SUM)"' "$pkg_file_path")" || :
read -d "\n" PKG_PATH PKG_HASH <<< "$(jq -r ".[\"$PACKAGE\"] |
select((.VERSION == \"$VERSION_PACMAN\" or \"$TERMUX_WITHOUT_DEPVERSION_BINDING\" == \"true\") and .SHA256SUM and .FILENAME) |
\"${arch}/\"+.FILENAME, .SHA256SUM" "$pkg_file_path")" || :

}

termux_download_deb_pac_get_pkg_hash_and_version() {
local arch="$1" pkg_file_path="$TERMUX_COMMON_CACHEDIR-$1/$PACKAGE_FILE_PATH" PKG_PATH="" PKG_HASH=""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arch variable in the termux_download_deb_pac_get_pkg_hash_and_version() function is used only once, although it could be used in the pkg_file_path variable, but because they are in the same line, it is not possible to do so. Perhaps you should move the arch variable before the pkg_file_path variable or remove the arch variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants