-
Notifications
You must be signed in to change notification settings - Fork 145
Introduce git-last-modified(1) command #1942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There are issues in commit aa39f20: |
Let's try CI here, because I'm seeing unreproducable errors on GitLab CI. |
Similar to git-blame(1), introduce a new subcommand git-last-modified(1). This command shows the most recent modification to paths in a tree. It does so by expanding the tree at a given commit, taking note of the current state of each path, and then walking backwards through history looking for commits where each path changed into its final commit ID. Based-on-patch-by: Jeff King <peff@peff.net> Improved-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Toon Claes <toon@iotcl.com>
This just runs some simple last-modified commands. We already test correctness in the regular suite, so this is just about finding performance regressions from one version to another. Based-on-patch-by: Jeff King <peff@peff.net> Signed-off-by: Toon Claes <toon@iotcl.com>
Our 'git last-modified' performs a revision walk, and computes a diff at each point in the walk to figure out whether a given revision changed any of the paths it considers interesting. When changed-path Bloom filters are available, we can avoid computing many such diffs. Before computing a diff, we first check if any of the remaining paths of interest were possibly changed at a given commit by consulting its Bloom filter. If any of them are, we are resigned to compute the diff. If none of those queries returned "maybe", we know that the given commit doesn't contain any changed paths which are interesting to us. So, we can avoid computing it in this case. Comparing the perf test results on git.git: Test HEAD~ HEAD ------------------------------------------------------------------------------------ 8020.1: top-level last-modified 4.49(4.34+0.11) 2.22(2.05+0.09) -50.6% 8020.2: top-level recursive last-modified 5.64(5.45+0.11) 5.62(5.30+0.11) -0.4% 8020.3: subdir last-modified 0.11(0.06+0.04) 0.07(0.03+0.04) -36.4% Based-on-patch-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Toon Claes <toon@iotcl.com>
aa39f20
to
c739a7d
Compare
@To1ne seems that I can reproduce the test failure here: $ (export GIT_TEST_COMMIT_GRAPH=1; export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1; make -j15 && cd t && ./t8020-*.sh -ivx)
[...]
ok 10 - last-modified from subdir uses relative pathspecs
expecting success of 8020.11 'limit last-modified traversal by count': check_last_modified -1 <<-\EOF
3 a
^2 file
EOF
+ check_last_modified -1
+ local indir=
+ test 1 != 0 + break
+ cat
+ test_when_finished rm -f tmp.*
+ test 0 = 0
+ test_cleanup={ rm -f tmp.*
} && (exit "$eval_ret"); eval_ret=$?; :
+ git last-modified -1
+ git name-rev --annotate-stdin --name-only --tags
+ tr \t
+ sort tmp.3
+ test_cmp expect actual
+ test 2 -ne 2
+ eval diff -u "$@"
+ diff -u expect actual
--- expect 2025-07-10 07:32:55.146852469 +0000
+++ actual 2025-07-10 07:32:55.158851905 +0000
@@ -1,2 +1 @@
3 a
-^2 file
error: last command exited with $?=1
not ok 11 - limit last-modified traversal by count
#
# check_last_modified -1 <<-\EOF
# 3 a
# ^2 file
# EOF
#
1..11 Maybe you found a bug in the |
@dscho Thank you for your input, but I'm not sure there is. If I regenerate commit-graphs in the test repo with $ git last-modified -1
664d1218d7e3d3f48a5b61104b0532c7635dc5e5 a TBH, I've got no idea where to look now. |
This series adds the git-last-modified(1) subcommand. In the past the
subcommand was proposed1 to be named git-blame-tree(1). This version
is based on the patches shared by the kind people at GitHub2.
What is different from the series shared by GitHub:
Renamed the subcommand from
blame-tree
tolast-modified
. There wassome consensus5 this name works better, so let's give it a try and
see how this name feels.
Patches for --max-depth are excluded. I think it's a separate topic to
discuss and I'm not sure it needs to be part of series anyway. The
main patch was submitted in the previous attempt3 and if people
consider it valuable, I'm happy to discuss that in a separate patch
series.
The patches in 'tb/blame-tree' at Taylor's fork4 implements a
caching layer. This feature reads/writes cached results in
.git/blame-tree/<hash>.btc
. To keep this series to a reviewablesize, that feature is excluded from this series. I think it's better
to submit this as a separate series.
Squashed various commits together. Like they introduced a flag
--go-faster
, which later became the default and only implementation.That story was wrapped up in a single commit.
Dropped the patches that attempt to increase performance for tree
entries that have not been updated in a long time. In my testing I've
seen both performance improvements and degradation with these
changes:
Test HEAD~ HEAD
8020.1: top-level last-modified 4.52(4.38+0.11) 2.03(1.93+0.08) -55.1%
8020.2: top-level recursive last-modified 5.79(5.64+0.11) 8.34(8.17+0.11) +44.0%
8020.3: subdir last-modified 0.15(0.09+0.06) 0.19(0.14+0.06) +26.7%
Before we include these patches, I want to make sure these changes
have positive impact in all/most scenarios. This can happen in a
separate series.
The last-modified command isn't recursive by default. If you want
recurse into subtrees, you need to pass
-r
.Fixed all memory leaks, and removed the use of
USE_THE_REPOSITORY_VARIABLE.
I've set myself as the author and added Based-on-patch-by trailers to
credit the original authors. Let me know if you disagree.
Again thanks to Taylor and the people at GitHub for sharing these
patches. I hope we can work together to get this upstreamed.
--
Cheers,
Toon
Signed-off-by: Toon Claes toon@iotcl.com
Changes in v4:
last-modified.[ch]
and moved code tobuiltin/last-modified.c
. Historically we've had libary code (also because itwas used in testtool), but we no longer need that separation. I'm sorry this
makes the range-diff hard to read.
https://lore.kernel.org/git/xmqqh5zvk5h0.fsf@gitster.g/
Changes in v3:
entries that have not been updated in a long time. (see above)
last_modified_init()
to the caller.commit
instruct last_modified_entry
.easier to understand.
Changes in v2:
blame-tree
tolast-modified