这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@twaik
Copy link
Member

@twaik twaik commented May 12, 2025

Shortly: it is saving dependencies data into some double-dimensional array and splitting current dependency downloading/build process into two parts: 1: downloading debs and 2: extracting them/building everything that was not downloaded.

I splitted changes into 3 parts for easier changes tracking.

I checked it only with kf6-breeze-icons and downloading dependencies was mind-blowingly fast (compared to regular downloading).

I am not intending to merge it until it will be fully tested.

@twaik twaik force-pushed the parallelize-dependency-downloads branch 2 times, most recently from 6bb1b18 to a95ff5b Compare May 13, 2025 05:13
Copy link
Member

@robertkirkman robertkirkman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this a bit more and I have not found any other problems with this so far. I have also tested on-device building using this, and it seems to work successfully there too.

There is a slight concern that spawning too many processes could make devices affected by Phantom Process Killers (both Android 12+ AOSP-based ones, and proprietary ones varying by vendor) more likely to kill Termux, but: they already would do that a lot during build-package.sh for many packages on affected devices long before now, so it can be assumed that people using build-package.sh on-device have probably already set up any mitigations available for the problem on their device ("disable monitor phantom processes", "disable battery optimizations", "disable child process restrictions", "lawnchair nightly app", etc.) so I think it is best not too worry about that too much, it affects too many other situations already to worry about it.

Copy link
Member

@thunder-coding thunder-coding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, seems to work perfectly. Tested it with a few packages and the reduction in build times are impressive. I have pointed out some nitpicks

if ! wait -n && (( $? == 121 )); then
echo some job returned 121, exiting
# One of jobs exited with fatal error, we should return error too and kill all background jobs
kill $(jobs -p) 2>/dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to let them finish, why waste bandwidth on files that are already partially downloaded, we could simply wait for them to finish. Once they finish we can display the fatal error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is pretty much pointless to let them finish in CI since they will not be used. Should I add [[ "${CI-false}" == "true" ]] condition instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. Go ahead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, can you check if it is fine? Testing is fine, but I want to know if you are fine with my implementation.

done < <(./scripts/buildorder.py $([[ "${TERMUX_INSTALL_DEPS}" == "true" ]] && echo "-i") "$TERMUX_PKG_BUILDER_DIR" $TERMUX_PACKAGES_DIRECTORIES || echo "ERROR")

while [[ -n "$(jobs -p)" ]]; do
if ! wait -n && (( $? == 121 )); then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hardcode the 121 error code, store it as TERMUX_PARALLEL_DEPENDENCIES_FATAL_EXIT_CODE or some similar variable name (if you can come up with something better)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a workaround suggested and used by @​agnostic-apollo.
#24168 (comment)

return 69 # EX__UNAVAILABLE

Not the same code though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just asking you to store it as some local variable, I saw the code and got it why it's hardcoded, just store it as maybe $fatal_error_code or similar so that there is no confusion for why a mysterious 69 or 121 is happenning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem.

Copy link
Member

@agnostic-apollo agnostic-apollo May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference.

##
# Set `sysexits` library default variables.
#
# **See Also:**
# - https://cs.android.com/android/platform/superproject/+/android-11.0.0_r40:prebuilts/gcc/linux-x86/host/x86_64-linux-glibc2.17-4.8/sysroot/usr/include/sysexits.h
# - https://tldp.org/LDP/abs/html/exitcodes.html
# - https://unix.stackexchange.com/a/242372
#
#
# `sysexits__set_default_variables`
##
sysexits__set_default_variables() {

### Set Default Variables Start
# The following variables must not be modified unless you know what you are doing.

EX__SUCCESS=0          # Successful termination.
#EX__BASE=64           # Base value for error messages.
EX__USAGE=64           # Command line usage error.
EX__DATAERR=65         # Data format error.
#EX__NOINPUT=66        # Cannot open input.
#EX__NOUSER=67         # Addressee unknown.
#EX__NOHOST=68         # Host name unknown.
#EX__UNAVAILABLE=69    # Service unavailable.
#EX__SOFTWARE=70       # Internal software error.
#EX__OSERR=71          # System error (e.g., can't fork).
#EX__OSFILE=72         # Critical OS file missing.
#EX__CANTCREAT=73      # Can't create (user) output file.
#EX__IOERR=74          # Input/output error.
#EX__TEMPFAIL=75       # Temp failure; user is invited to retry.
#EX__PROTOCOL=76       # Remote error in protocol.
#EX__NOPERM=77         # Permission denied.
#EX__CONFIG=78         # Configuration error.
#EX__MAX=78            # Maximum listed value.


# System errors.
S_EX__FAILED=1         # General failure.
#S_EX__SH=2            # Misuse of shell builtins (according to Bash documentation).
#S_EX__EXEC=126        # Command invoked cannot execute. Permission problem or command is not an executable.
S_EX__NOENT=127        # Command not found.
#S_EX__INVAL=128       # Invalid argument to exit not in `0-255` range.
#128+n                 # Fatal error signal "n".
#255*                  # Exit status out of range, exit `-1` exit takes only integer args in the range `0-255`.
#S_EX__HUP=129
#S_EX__INT=130

# Custom errors.
C_EX__UNSUPPORTED=80     # Unsupported action.
C_EX__NOT_FOUND=81       # Data not found.
C_EX__IS_INVALID=82      # Data is invalid.
C_EX__IS_EMPTY=83        # Data is empty.
C_EX__IS_NULL=84         # Data is null.
C_EX__NOOP=85            # No operation.

### Set Default Variables End

SYSEXITS__VARIABLES_SET="true"

}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, can you check if it is fine? Testing is fine, but I want to know if you are fine with my implementation.

@thunder-coding
Copy link
Member

Also, I'd suggest you to allow to limit the download parallelism as some mirrors may rate-limit you. Might be helpful for people using different mirrors or hosting custom repository:

Here's the patch I would suggest you to apply (click to expand the message)

You can apply the patch by first saving the below diff as xyz-whatever.patch and git am xyz-whatever.patch

If you want, I can also push this commit myself if you want me to (will also rebase against master)

From 0bbac3711ed9c2c487d1ce7716091a2ead9b0737 Mon Sep 17 00:00:00 2001
From: Yaksh Bariya <yakshbari4@gmail.com>
Date: Fri, 16 May 2025 14:42:19 +0530
Subject: [PATCH] scripts(termux_step_get_dependencies): allow limiting
 parallelism for downloading dependencies

---
 scripts/build/termux_step_get_dependencies.sh | 13 +++++++++++++
 scripts/build/termux_step_setup_variables.sh  |  1 +
 2 files changed, 14 insertions(+)

diff --git a/scripts/build/termux_step_get_dependencies.sh b/scripts/build/termux_step_get_dependencies.sh
index d4d26c6ab3..e5c9d6f6a4 100644
--- a/scripts/build/termux_step_get_dependencies.sh
+++ b/scripts/build/termux_step_get_dependencies.sh
@@ -46,9 +46,22 @@ termux_step_get_dependencies() {
 				fi
 			fi
 
+			# Store the current sub-shell PID to check for background jobs later on
+			local ppid="$BASHPID"
 			if [[ "${dep[build]}" != "true" ]]; then
 				dep[download]="true"
 				(
+					# $(jobs -p | wc -l) does not work as we are in a subshell
+					# $TERMUX_DEPENDENCY_DOWNLOAD_PARALLELISM is multiplied by 2 as termux_buffered_output will create a new process along with this current code inside the subshell. Thus effectively creating 2 processes
+					while [[ "$(ps --ppid $ppid -o pid --no-headers | wc -l)" -gt $((2 * "$TERMUX_DEPENDENCY_DOWNLOAD_PARALLELISM")) ]]; do
+						echo "Waiting for background jobs to finish before downloading ${dep[versioned]}..."
+						# if ! wait -n; then
+						# 	# Exit with the same code as the last job if anything failed
+						# 	exit $?
+						# fi
+						# We don't have any other alternative than just sleeping for a second or two as the below won't work as 'wait -n' won't work in a subshell
+						sleep 1
+					done
 					if ! TERMUX_WITHOUT_DEPVERSION_BINDING="$([[ "${dep[cyclic]}" == "true" ]] && echo "true" || echo "${TERMUX_WITHOUT_DEPVERSION_BINDING}")" termux_download_deb_pac "${dep[name]}" "${dep[arch]}" "${dep[version]}" "${dep[version_pac]}"; then
 						if [[ "${dep[cyclic]}" == "true" || ( "$TERMUX_FORCE_BUILD_DEPENDENCIES" == "true" && "$TERMUX_ON_DEVICE_BUILD" == "true" ) ]]; then
 							echo "Download of ${dep[name]}$([[ "${TERMUX_WITHOUT_DEPVERSION_BINDING}" == "false" && "${dep[cyclic]}" == "false" ]] && echo "@${dep[version]}") from $TERMUX_REPO_URL failed" >&2
diff --git a/scripts/build/termux_step_setup_variables.sh b/scripts/build/termux_step_setup_variables.sh
index 0ac9dbffa6..6c3811e606 100644
--- a/scripts/build/termux_step_setup_variables.sh
+++ b/scripts/build/termux_step_setup_variables.sh
@@ -6,6 +6,7 @@ termux_step_setup_variables() {
 	: "${TERMUX_FORCE_BUILD_DEPENDENCIES:="false"}"
 	: "${TERMUX_INSTALL_DEPS:="false"}"
 	: "${TERMUX_PKG_MAKE_PROCESSES:="$(nproc)"}"
+	: "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELISM:="20"}"
 	: "${TERMUX_PKGS__BUILD__RM_ALL_PKGS_BUILT_MARKER_AND_INSTALL_FILES:="true"}"
 	: "${TERMUX_PKGS__BUILD__RM_ALL_PKG_BUILD_DEPENDENT_DIRS:="false"}"
 	: "${TERMUX_PKG_API_LEVEL:="24"}"
-- 
2.49.0

@twaik
Copy link
Member Author

twaik commented May 16, 2025

Also, I'd suggest you to allow to limit the download parallelism as some mirrors may rate-limit you. Might be helpful for people using different mirrors or hosting custom repository:

Typically apt does not perform parallel downloads so if some mirror/hosting disallow/ratelimit parallel downloads completely it makes sense to allow user/CI to disable parallelizing completely as well. What do you think?

@thunder-coding
Copy link
Member

Also, I'd suggest you to allow to limit the download parallelism as some mirrors may rate-limit you. Might be helpful for people using different mirrors or hosting custom repository:

Typically apt does not perform parallel downloads so if some mirror/hosting disallow/ratelimit parallel downloads completely it makes sense to allow user/CI to disable parallelizing completely as well. What do you think?

Yeah makes sense to have a flag in build-package.sh

twaik added 2 commits May 18, 2025 08:29
Main loop of the function will be splitted in more pieces so we should make sure all data will be fetched only once and accessible outside the first loop.
@twaik twaik force-pushed the parallelize-dependency-downloads branch 3 times, most recently from 6eea804 to 63caf1a Compare May 18, 2025 05:38
@twaik
Copy link
Member Author

twaik commented May 18, 2025

Yeah makes sense to have a flag in build-package.sh

Done.

Copy link
Member

@thunder-coding thunder-coding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also look into the patch I sent for limiting download parallelism. I'd prefer if you use that instead of disabling it entirely. If the user wants to limit download parallelism, they can simply set the download parallelism to 1

build-package.sh Outdated
echo " -o Specify directory where to put built packages. Default: output/."
echo " --format Specify package output format (debian, pacman)."
echo " --library Specify library of package (bionic, glibc)."
echo " --disable-dependency-download-parallelizing disable dependency downloading parallelizing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your version of code it will run AFTER all packages are downloaded since you put it in the loop that is being invoked after all wait -n invocations.
Also the point of --disable-dependency-download-parallelizing is to fallback to old behaviour without explicitly add exit handler twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if parallelizing causes problems we should not use it at all.

@twaik twaik force-pushed the parallelize-dependency-downloads branch from 63caf1a to 2e6d5fe Compare May 18, 2025 05:45
@twaik twaik force-pushed the parallelize-dependency-downloads branch from 2e6d5fe to 08319ef Compare May 18, 2025 05:52
@twaik twaik requested a review from thunder-coding May 18, 2025 12:26
@termux termux deleted a comment from kovakai5 Jun 1, 2025
Copy link
Member

@thunder-coding thunder-coding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, not going to be too picky I guess, we all have different preferences about smaller things!

echo " -o Specify directory where to put built packages. Default: output/."
echo " --format Specify package output format (debian, pacman)."
echo " --library Specify library of package (bionic, glibc)."
echo " --disable-dependency-download-parallelizing disable dependency downloading parallelizing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That option name is wayyyyy too long.
How about:

Suggested change
echo " --disable-dependency-download-parallelizing disable dependency downloading parallelizing"
echo " --no-parallel disable parallel dependency downloads"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case does not specify where exactly parallelizing is to be disabled.

termux_error_exit "./build-package.sh: option '--library' requires an argument"
fi
;;
--disable-dependency-download-parallelizing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be nice to be able to specify a maximum amount of parallel downloads.
e.g. --parallel 8.

But since that's out of scope for this PR I'll just leave it as a potential improvement for later.
That option name is still way too long though.

Suggested change
--disable-dependency-download-parallelizing)
--no-parallel)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#24698 (comment)
Also, same as above, it does not specify where exactly parallelizing is to be disabled.
Our servers support parallel downloading so it should not be a problem, so it will be used more with 3rd party servers.

$(test "${TERMUX_WITHOUT_DEPVERSION_BINDING:-}" = "true" && echo "-w") \
$(test -n "${TERMUX_PACKAGE_FORMAT:-}" && echo "--format $TERMUX_PACKAGE_FORMAT") \
$(test -n "${TERMUX_PACKAGE_LIBRARY:-}" && echo "--library $TERMUX_PACKAGE_LIBRARY") \
$(test "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" = "false" ]] && echo "--disable-dependency-download-parallelizing") \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$(test "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" = "false" ]] && echo "--disable-dependency-download-parallelizing") \
$(test "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" = "false" ]] && echo "--no-parallel") \

Not for this PR but for later.
A potential improvement could be making $TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING an int value instead of a bool to set the maximum concurrency.

To that end a better name might be $TERMUX_MAX_PARALLEL_DOWNLOADS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@@ -0,0 +1,24 @@
termux_buffered_output() {
( # Do everything in a subshell to avoid any kind of mess.
TAG="$1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LINE_TAG or LINE_PREFIX might be clearer.

# read with 0 timeout does not read any data so giving minimal timeout
IFS='' read -t 0.001 -r line; rc=$?
# append job name to the start for tracking multiple jobs
[[ $rc == 0 ]] && buffer+=( "[$TAG]: $line" )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're gonna either need to quote $rc, or use arithmetic evaluation.
As since we set set +e above I'd prefer arithmetic.

Suggested change
[[ $rc == 0 ]] && buffer+=( "[$TAG]: $line" )
(( rc )) || buffer+=( "[$TAG]: $line" )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable can not be anything other than int so it should be safe to use it as is.
Also [[ does not trigger set -e to fail inside function, while (( triggers it.

termux_buffered_output() {
( # Do everything in a subshell to avoid any kind of mess.
TAG="$1"
set +e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way the set +e is gonna persist after this function.
Which is probably not what you intended.
So add a local -1 above it.

Footnotes

  1. https://man.archlinux.org/man/bash.1#local

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is being done in subshell (effectively child process) so nothing persists there. It should be fine to use it as is.
But I will probably change func() { ( combo to func() (.

done

# prevent output garbling by using scripts directory for locking
[[ "${#buffer[@]}" -ge 1 ]] && flock --no-fork . printf "%s\n" "${buffer[@]}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[ "${#buffer[@]}" -ge 1 ]] && flock --no-fork . printf "%s\n" "${buffer[@]}"
(( ${#buffer[@]} )) && flock --no-fork . printf "%s\n" "${buffer[@]}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, (( will trigger failing`


for name in "${deps[@]}"; do
declare -n dep="$name"
if [[ "$TERMUX_INSTALL_DEPS" == "true" || "${dep[cyclic]}" = "true" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [[ "$TERMUX_INSTALL_DEPS" == "true" || "${dep[cyclic]}" = "true" ]]; then
if [[ "$TERMUX_INSTALL_DEPS" == "true" || "${dep[cyclic]}" == "true" ]]; then

$([[ "${TERMUX_PKGS__BUILD__RM_ALL_PKG_BUILD_DEPENDENT_DIRS}" == "true" ]] && echo "-r") \
$([[ "${TERMUX_WITHOUT_DEPVERSION_BINDING}" = "true" ]] && echo "-w") \
--format $TERMUX_PACKAGE_FORMAT --library $set_library "${PKG_DIR}"
$([[ "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" = "false" ]] && echo "--disable-dependency-download-parallelizing") \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$([[ "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" = "false" ]] && echo "--disable-dependency-download-parallelizing") \
$([[ "${TERMUX_DEPENDENCY_DOWNLOAD_PARALLELIZING-true}" == "false" ]] && echo "--no-parallel") \

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this option name does not specify which parallelizing is to be disabled.

@thunder-coding
Copy link
Member

Hey, I'd prefer if we can fastrack this. I'd appreciate the improvements this PR brings in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Tracking

Development

Successfully merging this pull request may close these issues.

5 participants