-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
scripts(termux_step_get_dependencies): parallelizing deps downloading #24698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
6bb1b18 to
a95ff5b
Compare
robertkirkman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this a bit more and I have not found any other problems with this so far. I have also tested on-device building using this, and it seems to work successfully there too.
There is a slight concern that spawning too many processes could make devices affected by Phantom Process Killers (both Android 12+ AOSP-based ones, and proprietary ones varying by vendor) more likely to kill Termux, but: they already would do that a lot during build-package.sh for many packages on affected devices long before now, so it can be assumed that people using build-package.sh on-device have probably already set up any mitigations available for the problem on their device ("disable monitor phantom processes", "disable battery optimizations", "disable child process restrictions", "lawnchair nightly app", etc.) so I think it is best not too worry about that too much, it affects too many other situations already to worry about it.
thunder-coding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, seems to work perfectly. Tested it with a few packages and the reduction in build times are impressive. I have pointed out some nitpicks
| if ! wait -n && (( $? == 121 )); then | ||
| echo some job returned 121, exiting | ||
| # One of jobs exited with fatal error, we should return error too and kill all background jobs | ||
| kill $(jobs -p) 2>/dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to let them finish, why waste bandwidth on files that are already partially downloaded, we could simply wait for them to finish. Once they finish we can display the fatal error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is pretty much pointless to let them finish in CI since they will not be used. Should I add [[ "${CI-false}" == "true" ]] condition instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense. Go ahead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, can you check if it is fine? Testing is fine, but I want to know if you are fine with my implementation.
| done < <(./scripts/buildorder.py $([[ "${TERMUX_INSTALL_DEPS}" == "true" ]] && echo "-i") "$TERMUX_PKG_BUILDER_DIR" $TERMUX_PACKAGES_DIRECTORIES || echo "ERROR") | ||
|
|
||
| while [[ -n "$(jobs -p)" ]]; do | ||
| if ! wait -n && (( $? == 121 )); then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't hardcode the 121 error code, store it as TERMUX_PARALLEL_DEPENDENCIES_FATAL_EXIT_CODE or some similar variable name (if you can come up with something better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a workaround suggested and used by @agnostic-apollo.
#24168 (comment)
| return 69 # EX__UNAVAILABLE |
Not the same code though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just asking you to store it as some local variable, I saw the code and got it why it's hardcoded, just store it as maybe $fatal_error_code or similar so that there is no confusion for why a mysterious 69 or 121 is happenning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference.
- https://cs.android.com/android/platform/superproject/+/android-11.0.0_r40:prebuilts/gcc/linux-x86/host/x86_64-linux-glibc2.17-4.8/sysroot/usr/include/sysexits.h
- https://tldp.org/LDP/abs/html/exitcodes.html
- https://unix.stackexchange.com/a/242372
##
# Set `sysexits` library default variables.
#
# **See Also:**
# - https://cs.android.com/android/platform/superproject/+/android-11.0.0_r40:prebuilts/gcc/linux-x86/host/x86_64-linux-glibc2.17-4.8/sysroot/usr/include/sysexits.h
# - https://tldp.org/LDP/abs/html/exitcodes.html
# - https://unix.stackexchange.com/a/242372
#
#
# `sysexits__set_default_variables`
##
sysexits__set_default_variables() {
### Set Default Variables Start
# The following variables must not be modified unless you know what you are doing.
EX__SUCCESS=0 # Successful termination.
#EX__BASE=64 # Base value for error messages.
EX__USAGE=64 # Command line usage error.
EX__DATAERR=65 # Data format error.
#EX__NOINPUT=66 # Cannot open input.
#EX__NOUSER=67 # Addressee unknown.
#EX__NOHOST=68 # Host name unknown.
#EX__UNAVAILABLE=69 # Service unavailable.
#EX__SOFTWARE=70 # Internal software error.
#EX__OSERR=71 # System error (e.g., can't fork).
#EX__OSFILE=72 # Critical OS file missing.
#EX__CANTCREAT=73 # Can't create (user) output file.
#EX__IOERR=74 # Input/output error.
#EX__TEMPFAIL=75 # Temp failure; user is invited to retry.
#EX__PROTOCOL=76 # Remote error in protocol.
#EX__NOPERM=77 # Permission denied.
#EX__CONFIG=78 # Configuration error.
#EX__MAX=78 # Maximum listed value.
# System errors.
S_EX__FAILED=1 # General failure.
#S_EX__SH=2 # Misuse of shell builtins (according to Bash documentation).
#S_EX__EXEC=126 # Command invoked cannot execute. Permission problem or command is not an executable.
S_EX__NOENT=127 # Command not found.
#S_EX__INVAL=128 # Invalid argument to exit not in `0-255` range.
#128+n # Fatal error signal "n".
#255* # Exit status out of range, exit `-1` exit takes only integer args in the range `0-255`.
#S_EX__HUP=129
#S_EX__INT=130
# Custom errors.
C_EX__UNSUPPORTED=80 # Unsupported action.
C_EX__NOT_FOUND=81 # Data not found.
C_EX__IS_INVALID=82 # Data is invalid.
C_EX__IS_EMPTY=83 # Data is empty.
C_EX__IS_NULL=84 # Data is null.
C_EX__NOOP=85 # No operation.
### Set Default Variables End
SYSEXITS__VARIABLES_SET="true"
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, can you check if it is fine? Testing is fine, but I want to know if you are fine with my implementation.
|
Also, I'd suggest you to allow to limit the download parallelism as some mirrors may rate-limit you. Might be helpful for people using different mirrors or hosting custom repository: Here's the patch I would suggest you to apply (click to expand the message)You can apply the patch by first saving the below diff as If you want, I can also push this commit myself if you want me to (will also rebase against master) From 0bbac3711ed9c2c487d1ce7716091a2ead9b0737 Mon Sep 17 00:00:00 2001
From: Yaksh Bariya <yakshbari4@gmail.com>
Date: Fri, 16 May 2025 14:42:19 +0530
Subject: [PATCH] scripts(termux_step_get_dependencies): allow limiting
parallelism for downloading dependencies
---
scripts/build/termux_step_get_dependencies.sh | 13 +++++++++++++
scripts/build/termux_step_setup_variables.sh | 1 +
2 files changed, 14 insertions(+)
diff --git a/scripts/build/termux_step_get_dependencies.sh b/scripts/build/termux_step_get_dependencies.sh
index d4d26c6ab3..e5c9d6f6a4 100644
--- a/scripts/build/termux_step_get_dependencies.sh
+++ b/scripts/build/termux_step_get_dependencies.sh
@@ -46,9 +46,22 @@ termux_step_get_dependencies() {
fi
fi
+ # Store the current sub-shell PID to check for background jobs later on
+ local ppid="$BASHPID"
if [[ "${dep[build]}" != "true" ]]; then
dep[download]="true"
(
+ # $(jobs -p | wc -l) does not work as we are in a subshell
+ # $TERMUX_DEPENDENCY_DOWNLOAD_PARALLELISM is multiplied by 2 as termux_buffered_output will create a new process along with this current code inside the subshell. Thus effectively creating 2 processes
+ while [[ "$(ps --ppid $ppid -o pid --no-headers | wc -l)" -gt $((2 * "$TERMUX_DEPENDENCY_DOWNLOAD_PARALLELISM")) ]]; do
+ echo "Waiting for background jobs to finish before downloading ${dep[versioned]}..."
+ # if ! wait -n; then
+ # # Exit with the same code as the last job if anything failed
+ # exit $?
+ # fi
+ # We don't have any other alternative than just sleeping for a second or two as the below won't work as 'wait -n' won't work in a subshell
+ sleep 1
+ done
if ! TERMUX_WITHOUT_DEPVERSION_BINDING="$([[ "${dep[cyclic]}" == "true" ]] && echo "true" || echo "${TERMUX_WITHOUT_DEPVERSION_BINDING}")" termux_download_deb_pac "${dep[name]}" "${dep[arch]}" "${dep[version]}" "${dep[version_pac]}"; then
if [[ "${dep[cyclic]}" == "true" || ( "$TERMUX_FORCE_BUILD_DEPENDENCIES" == "true" && "$TERMUX_ON_DEVICE_BUILD" == "true" ) ]]; then
echo "Download of ${dep[name]}$([[ "${TERMUX_WITHOUT_DEPVERSION_BINDING}" == "false" && "${dep[cyclic]}" == "false" ]] && echo "@${dep[version]}") from $TERMUX_REPO_URL failed" >&2
diff --git a/scripts/build/termux_step_setup_variables.sh b/scripts/build/termux_step_setup_variables.sh
index 0ac9dbffa6..6c3811e606 100644
--- a/scripts/build/termux_step_setup_variables.sh
+++ b/scripts/build/termux_step_setup_variables.sh
@@ -6,6 +6,7 @@ termux_step_setup_variables() {
: "${TERMUX_FORCE_BUILD_DEPENDENCIES:="false"}"
: "${TERMUX_INSTALL_DEPS:="false"}"
: "${TERMUX_PKG_MAKE_PROCESSES:="$(nproc)"}"
+ : "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELISM:="20"}"
: "${TERMUX_PKGS__BUILD__RM_ALL_PKGS_BUILT_MARKER_AND_INSTALL_FILES:="true"}"
: "${TERMUX_PKGS__BUILD__RM_ALL_PKG_BUILD_DEPENDENT_DIRS:="false"}"
: "${TERMUX_PKG_API_LEVEL:="24"}"
--
2.49.0
|
Typically |
Yeah makes sense to have a flag in build-package.sh |
Main loop of the function will be splitted in more pieces so we should make sure all data will be fetched only once and accessible outside the first loop.
6eea804 to
63caf1a
Compare
Done. |
thunder-coding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also look into the patch I sent for limiting download parallelism. I'd prefer if you use that instead of disabling it entirely. If the user wants to limit download parallelism, they can simply set the download parallelism to 1
build-package.sh
Outdated
| echo " -o Specify directory where to put built packages. Default: output/." | ||
| echo " --format Specify package output format (debian, pacman)." | ||
| echo " --library Specify library of package (bionic, glibc)." | ||
| echo " --disable-dependency-download-parallelizing disable dependency downloading parallelizing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your version of code it will run AFTER all packages are downloaded since you put it in the loop that is being invoked after all wait -n invocations.
Also the point of --disable-dependency-download-parallelizing is to fallback to old behaviour without explicitly add exit handler twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if parallelizing causes problems we should not use it at all.
63caf1a to
2e6d5fe
Compare
2e6d5fe to
08319ef
Compare
thunder-coding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, not going to be too picky I guess, we all have different preferences about smaller things!
| echo " -o Specify directory where to put built packages. Default: output/." | ||
| echo " --format Specify package output format (debian, pacman)." | ||
| echo " --library Specify library of package (bionic, glibc)." | ||
| echo " --disable-dependency-download-parallelizing disable dependency downloading parallelizing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That option name is wayyyyy too long.
How about:
| echo " --disable-dependency-download-parallelizing disable dependency downloading parallelizing" | |
| echo " --no-parallel disable parallel dependency downloads" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case does not specify where exactly parallelizing is to be disabled.
| termux_error_exit "./build-package.sh: option '--library' requires an argument" | ||
| fi | ||
| ;; | ||
| --disable-dependency-download-parallelizing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be nice to be able to specify a maximum amount of parallel downloads.
e.g. --parallel 8.
But since that's out of scope for this PR I'll just leave it as a potential improvement for later.
That option name is still way too long though.
| --disable-dependency-download-parallelizing) | |
| --no-parallel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#24698 (comment)
Also, same as above, it does not specify where exactly parallelizing is to be disabled.
Our servers support parallel downloading so it should not be a problem, so it will be used more with 3rd party servers.
| $(test "${TERMUX_WITHOUT_DEPVERSION_BINDING:-}" = "true" && echo "-w") \ | ||
| $(test -n "${TERMUX_PACKAGE_FORMAT:-}" && echo "--format $TERMUX_PACKAGE_FORMAT") \ | ||
| $(test -n "${TERMUX_PACKAGE_LIBRARY:-}" && echo "--library $TERMUX_PACKAGE_LIBRARY") \ | ||
| $(test "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" = "false" ]] && echo "--disable-dependency-download-parallelizing") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| $(test "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" = "false" ]] && echo "--disable-dependency-download-parallelizing") \ | |
| $(test "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" = "false" ]] && echo "--no-parallel") \ |
Not for this PR but for later.
A potential improvement could be making $TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING an int value instead of a bool to set the maximum concurrency.
To that end a better name might be $TERMUX_MAX_PARALLEL_DOWNLOADS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
| @@ -0,0 +1,24 @@ | |||
| termux_buffered_output() { | |||
| ( # Do everything in a subshell to avoid any kind of mess. | |||
| TAG="$1" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LINE_TAG or LINE_PREFIX might be clearer.
| # read with 0 timeout does not read any data so giving minimal timeout | ||
| IFS='' read -t 0.001 -r line; rc=$? | ||
| # append job name to the start for tracking multiple jobs | ||
| [[ $rc == 0 ]] && buffer+=( "[$TAG]: $line" ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're gonna either need to quote $rc, or use arithmetic evaluation.
As since we set set +e above I'd prefer arithmetic.
| [[ $rc == 0 ]] && buffer+=( "[$TAG]: $line" ) | |
| (( rc )) || buffer+=( "[$TAG]: $line" ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable can not be anything other than int so it should be safe to use it as is.
Also [[ does not trigger set -e to fail inside function, while (( triggers it.
| termux_buffered_output() { | ||
| ( # Do everything in a subshell to avoid any kind of mess. | ||
| TAG="$1" | ||
| set +e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way the set +e is gonna persist after this function.
Which is probably not what you intended.
So add a local -1 above it.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is being done in subshell (effectively child process) so nothing persists there. It should be fine to use it as is.
But I will probably change func() { ( combo to func() (.
| done | ||
|
|
||
| # prevent output garbling by using scripts directory for locking | ||
| [[ "${#buffer[@]}" -ge 1 ]] && flock --no-fork . printf "%s\n" "${buffer[@]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [[ "${#buffer[@]}" -ge 1 ]] && flock --no-fork . printf "%s\n" "${buffer[@]}" | |
| (( ${#buffer[@]} )) && flock --no-fork . printf "%s\n" "${buffer[@]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, (( will trigger failing`
|
|
||
| for name in "${deps[@]}"; do | ||
| declare -n dep="$name" | ||
| if [[ "$TERMUX_INSTALL_DEPS" == "true" || "${dep[cyclic]}" = "true" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if [[ "$TERMUX_INSTALL_DEPS" == "true" || "${dep[cyclic]}" = "true" ]]; then | |
| if [[ "$TERMUX_INSTALL_DEPS" == "true" || "${dep[cyclic]}" == "true" ]]; then |
| $([[ "${TERMUX_PKGS__BUILD__RM_ALL_PKG_BUILD_DEPENDENT_DIRS}" == "true" ]] && echo "-r") \ | ||
| $([[ "${TERMUX_WITHOUT_DEPVERSION_BINDING}" = "true" ]] && echo "-w") \ | ||
| --format $TERMUX_PACKAGE_FORMAT --library $set_library "${PKG_DIR}" | ||
| $([[ "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" = "false" ]] && echo "--disable-dependency-download-parallelizing") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| $([[ "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" = "false" ]] && echo "--disable-dependency-download-parallelizing") \ | |
| $([[ "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" == "false" ]] && echo "--no-parallel") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, this option name does not specify which parallelizing is to be disabled.
|
Hey, I'd prefer if we can fastrack this. I'd appreciate the improvements this PR brings in |
Shortly: it is saving dependencies data into some double-dimensional array and splitting current dependency downloading/build process into two parts: 1: downloading debs and 2: extracting them/building everything that was not downloaded.
I splitted changes into 3 parts for easier changes tracking.
I checked it only with kf6-breeze-icons and downloading dependencies was mind-blowingly fast (compared to regular downloading).
I am not intending to merge it until it will be fully tested.