-
Notifications
You must be signed in to change notification settings - Fork 145
VMAWIP: rebase merge commits #1491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Conversation
Make fsmonitor more robust to avoid the flakiness seen in t7527. * jh/t7527-unflake-by-forcing-cookie: fsmonitor: fix race seen in t7527
Test fix. * ab/t5314-avoid-losing-exit-status: t5314: check exit code of "git"
Test fix. * ab/t7600-avoid-losing-exit-status-of-git: t7600: don't ignore "rev-parse" exit code in helper
Test fix. * ab/t4023-avoid-losing-exit-status-of-diff: t4023: fix ignored exit codes of git
Stop using deprecated macOS API in fsmonitor. * jh/fsmonitor-darwin-modernize: fsmonitor: eliminate call to deprecated FSEventStream function
Test fix. * js/t3920-shell-and-or-fix: t3920: don't ignore errors of more than one command with `|| true`
Test fix. * rs/t3920-crlf-eating-grep-fix: t3920: support CR-eating grep
Redefining system functions for a few functions did not follow our usual "implement git_foo() and #define foo(args) git_foo(args)" pattern, which has broken build for some folks. * jk/avoid-redef-system-functions-2.30: git-compat-util: undefine system names before redeclaring them git-compat-util: avoid redefining system function names
The jk/avoid-redef-system-functions-2.30 topic pre-merged for more recent codebase. * jk/avoid-redef-system-functions:
The way the diff machinery prepares the options array for the parse_options API has been refactored to avoid resource leaks. * rs/diff-parseopts: diff: remove parseopts member from struct diff_options diff: use add_diff_options() in diff_opt_parse() diff: factor out add_diff_options()
Correct pthread API usage. * sx/pthread-error-check-fix: maintenance: compare output of pthread functions for inequality with 0
Introduce a case insensitive mode to the Bash completion helpers. * aw/complete-case-insensitive: completion: add case-insensitive match of pseudorefs completion: add optional ignore-case when matching refs
The advice message given by "git status" when it takes long time to enumerate untracked paths has been updated. * rr/status-untracked-advice: status: modernize git-status "slow untracked files" advice
Fix a pair of bugs in 'git branch'. * rj/branch-copy-and-rename: branch: force-copy a branch to itself via @{-1} is a no-op
The code to auto-correct a misspelt subcommand unnecessarily called into git_default_config() from the early config codepath, which was a no-no. This has bee corrected. * sg/help-autocorrect-config-fix: help.c: fix autocorrect in work tree for bare repository
"git http-fetch" (which is rarely used) forgot to identify itself in the trace2 output. * jt/http-fetch-trace2-report-name: http-fetch: invoke trace2_cmd_name()
Code clean-up around unused function parameters. * jk/unused-post-2.39: userdiff: mark unused parameter in internal callback list-objects-filter: mark unused parameters in virtual functions diff: mark unused parameters in callbacks xdiff: mark unused parameter in xdl_call_hunk_func() xdiff: drop unused parameter in def_ff() ws: drop unused parameter from ws_blank_line() list-objects: drop process_gitlink() function blob: drop unused parts of parse_blob_buffer() ls-refs: use repository parameter to iterate refs
Code clean-up. * jk/server-supports-v2-cleanup: server_supports_v2(): use a separate function for die_on_error
Code clean-up. * rs/am-parse-options-cleanup: am: don't pass strvec to apply_parse_options()
Code clean-up. * rs/clear-commit-marks-cleanup: commit: skip already cleared parents in clear_commit_marks_1()
Code clean-up. * rs/reflog-expiry-cleanup: reflog: clear leftovers in reflog_expiry_cleanup()
Code clean-up. * rs/clarify-error-in-write-loose-object: object-file: inline write_buffer()
Clean-ups in error messages produced by "git for-each-ref" and friends. * jk/ref-filter-error-reporting-fix: ref-filter: convert email atom parser to use err_bad_arg() ref-filter: truncate atom names in error messages ref-filter: factor out "unrecognized %(foo) arg" errors ref-filter: factor out "%(foo) does not take arguments" errors ref-filter: reject arguments to %(HEAD)
The output from "git diff --stat" on an unmerged path lost the terminating LF in Git 2.39, which has been corrected. * pg/diff-stat-unmerged-regression-fix: diff: fix regression with --stat and unmerged file
Code clean-up. * sk/remove-duplicate-includes: git: remove duplicate includes
Use the SHA1DC implementation on macOS, just like other platforms, by default. * ab/darwin-default-to-sha1dc: Makefile: use sha1collisiondetection by default on OSX and Darwin
"git pull -v --recurse-submodules" attempted to pass "-v" down to underlying "git submodule update", which did not understand the request and barfed, which has been corrected. * ss/pull-v-recurse-fix: submodule: accept -v for the update command
Just like "git var GIT_EDITOR" abstracts the complex logic to choose which editor gets used behind it, "git var" now give support to GIT_SEQUENCE_EDITOR. * sa/git-var-sequence-editor: var: add GIT_SEQUENCE_EDITOR variable
When given a pattern that matches an empty string at the end of a line, the code to parse the "git diff" line-ranges fell into an infinite loop, which has been corrected. * lk/line-range-parsing-fix: line-range: fix infinite loop bug with '$' regex
After "git pull" that is configured with pull.rebase=false merge.ff=only fails due to our end having our own development, give advice messages to get out of the "Not possible to fast-forward" state. * fc/advice-diverged-history: advice: add diverging advice for novices
…o next Once we start running, we assumed that the list of alternate object databases would never change. Hook into the machinery used to update the list of packfiles during runtime to update this list as well. * ds/reprepare-alternates-when-repreparing-packfiles: object-file: reprepare alternates when necessary
* ew/fetch-no-write-fetch-head-fix: fetch: pass --no-write-fetch-head to subprocesses
"git add -p" while the index is unmerged sometimes failed to parse the diff output it internally produces and died, which has been corrected. * jk/add-p-unmerged-fix: add-patch: handle "* Unmerged path" lines
"git format-patch" honors the src/dst prefixes set to nonstandard values with configuration variables like "diff.noprefix", causing receiving end of the patch that expects the standard -p1 format to break. Teach "format-patch" to ignore end-user configuration and always use the standard prefixes. This is a backward compatibility breaking change. * jk/format-patch-ignore-noprefix: format-patch: add format.noprefix option format-patch: do not respect diff.noprefix diff: add --default-prefix option t4013: add tests for diff prefix options diff: factor out src/dst prefix setup
"git format-patch" honors the src/dst prefixes set to nonstandard values with configuration variables like "diff.noprefix", causing receiving end of the patch that expects the standard -p1 format to break. Teach "format-patch" to ignore end-user configuration and always use the standard prefixes. This is a backward compatibility breaking change. * jk/format-patch-ignore-noprefix: rebase: prefer --default-prefix to --{src,dst}-prefix for format-patch
Fix for a "ls-files --format="%(path)" that produced nonsense output, which was a bug in 2.38. * aj/ls-files-format-fix: ls-files: fix "--format" output of relative paths
"git receive-pack" that responds to "git push" requests failed to clean a stale lockfile when killed in the middle, which has been corrected. * ps/receive-pack-unlock-before-die: receive-pack: fix stale packfile locks when dying
Code clean-up for test framework. * fc/test-aggregation-clean-up: test: don't print aggregate-results command test: simplify counts aggregation
Previously, we punted on the question how to carry over amendments to merge commits. Instead, we always performed new merges from scratch. Such amendments to merge commits may very well be necessary, though, e.g. if one side of the history changed a function signature and the other side added a caller. In a discussion inspired by Johannes Sixt's "cherry-pick -m 1" idea, Phillip Wood came up with the fundamental idea behind this commit. The premise is that both "rebase" and "merge" try to reconcile diverging changes. Both "rebase" and "merge" should result in identical trees (after merge conflicts are resolved, if there were any). A "merge" typically boils down to a "3-way" merge, with a "merge base" and two "merge heads" that diverged from said merge base. Think e.g. last week's master as the merge base, from where a developer branched off a topic branch (the first merge head), and the current master as second merge head. A "rebase" is more complicated than a single "3-way merge": When rebasing commits, they are "cherry-picked" one by one. That way, the changes introduced by that commit are reconciled with the changes introduced by the "rebase" so far: a "3-way merge" between HEAD and the cherry-picked commit, with the latter's parent commit as merge base (HEAD would be the rebased parent commit in the common case). In the context of a cherry-pick, it is important to keep in mind that the "diverging changes" are not reflected by commit history. For the purpose of a 3-way merge, they don't have to be. Now let's look again at the problem of "rebasing merge commits", i.e. how to "cherry-pick a merge commit"? It is not as simple as a "3-way merge" (a single one, that is) because we do not have a single merge base from where sprung exactly two diverging changesets. Just like when we cherry-pick a regular commit, in the case of a merge commit we have diverging changes for *every* one of its parents. And just like before, the original parent commit is the merge base, and the rebased parent commit is a merge head. The other merge head is the original merge commit itself. By starting with the original merge commit and then performing these 3-way merges sequentially, one for every parent, we reconcile all diverging changes. So we "rebased" the merge commit, including all of its amendments. This is the most logical generalization of the cherry-pick concept: if we only have one parent, the strategy outlined above is identical to Git's cherry-pick operation. Side-note: While this description only talked about merge commits with exactly two parent commits above, the principle still holds for merge commits with more than two parent commits ("octopus merges"). That strategy requires an original merge commit, with a matching number of parents, though, and therefore it would not make sense in general to do this for every `merge` command in the todo list: what if there was no original merge commit, or if the specified merge commit had a different number of parents? It therefore cannot be the default mode for the `merge` command, thus we introduce a new flag `-R` for that. However, for existing merge commits, this strategy *is* valid and *does* lead to the most intuitive results. Therefore, let's use it by default when generating todo lists in `git rebase --rebase-merges`. Original-idea-by: Phillip Wood <phillip.wood@dunelm.org.uk> Helped-by: Igor Djordjevic <igor.d.djordjevic@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
TODO: implement a test Just like with `pick` commands, `merge` commands should also get a chance to resolve already-recorded conflicts. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This commit adds a lengthy test case to t3430 that reflects some challenging use cases for the --rebase-merges option. In particular, it sets up a scenario which demonstrates that "evil merges" happen in practice, and they are necessarily introducing those extra changes that constitute an "evil merge". It then sets up three "upstream" branches with competing changes that are designed to conflict with the changes to rebase. The purpose of this added test case is two-fold: 1. to document what we expect --rebase-merges to accomplish, and even more to document what we do *not* expect it to be able to do. 2. to explore what kinds of merge conflicts --rebase-merges can produce (spoiler alert: we can end up with some bad ones, with unintuitively-nested merge conflicts). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When rebasing a regular merge between two parent commits, we have that problem that we have to perform *two* 3-way merges, because we want to merge in the changes (like amendments, merge conflict resolutions, etc) from the original merge commit, too. When the first of these 3-way merges already had conflicts, then we run the chance of ending up with nested conflicts in the second 3-way merge. So let's see in that case whether we gain something by merging the original merge commit with the other parent first, and if that resulted in a clean merge, proceed to merge the first parent (in this case, we cannot end up with nested merge conflicts). This simplifies the realistic example of a nested merge conflict to a non-nested merge conflict. Before: int hi(void) { printf("Hello, world!\n"); } <<<<<<< intermediate merge <<<<<<<< HEAD /* main event loop */ void event_loop(void) { /* TODO: place holder for now */ ======== ======= } >>>>>>> <HASH>... merge head #1 /* caller */ void caller(void) { hi(); >>>>>>>> <HASH>... original merge } With this patch, this becomes much simpler: int hi(void) { printf("Hello, world!\n"); } /* main event loop */ void event_loop(void) { /* TODO: place holder for now */ } <<<<<<<< HEAD ======== /* caller */ void caller(void) { hi(); } >>>>>>>> <HASH>... intermediate merge Note: this needs to be refactored and stuff and things. It may even be necessary to dive deeper into the code and implement a "W merge" that avoids the problem where (one part of) one file would benefit from merging the second parent before the first, while another (part of the same) file would benefit from the reverse order. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
FWIW as the test failures prove, it is probably a bad idea to introduce the A much better idea is probably to introduce a "fake" merge strategy option, say, |
So, the W-shaped merge idea may work. It's certainly a lot more promising than the fundamentally broken strategy of involving a 3-way merge between A3, M, and A3' (and a follow-up with B3' and B3). I think it'd be a huge amount of effort to implement and get right, though. (How long did it take us to get 3-way merges right in the face of renames, D/F conflicts, mode changes, etc. 15 years? Tackling 5-way merges looks interesting to say the least.) But since this patch is doing the 3-way merge of A3, M, and A3', let me take a minute to explain why it's a really bad idea and suggest another simpler alternative (I sent the below to you privately already, but since you're linking this PR to the mailing list..). There are two ways to see why this is a bad idea: === Reason #1 this is a bad idea === Summary: This solution does NOT match the problem statement. Details: The problem we need to solve is:
Now, porting the manual user changes is "risky"; it might not be portable to the newer merge. We might even get nested conflicts. So it needs to be handled with care. But the {A3,M,A3'} 3-way merge doesn't even attempt to compute the manual user changes. It instead uses an approximation that involves the manual user changes as a very small subset, and in so doing, expands the risk zone to that huge set of changes. === Reason #2 this is a bad idea === Summary: Conflicts galore (as seen by diff|apply analogy to merging) Details: Fundamentally, a 3-way merge between A3, M, and A3' is roughly equivalent (yeah, I know I'm glossing over details but it's good enough as an explanation) to either:
OR
Let's look at the latter: git diff A3 M basically computes a squashed set of changes from B1 + B2 + ... + BN + {manual user changes}. You then attempt to apply all these changes, all at once, and "hope" that the B1 + B2 + ... + BN changes disappear due to already being detected as having applied upstream in B3'. But odds are that B1, B2,..., BN are as big or bigger than {manual user changes}, and are just as likely to conflict with upstream changes as anything in {manual user changes}. So, most of your conflicts are going to be repeated ones you already resolved when you rebased B1, B2, ..., BN. But it gets worse, because the "repeated" conflicts are tweaked a bit so they do not look the same and even rerere won't help you. And it gets even worse, because it's the union of all those conflicts all at once that you get to re-resolve. And it can get even worse; things get really hilarious if one of the B's were intentionally dropped during the rebase, as you still get to resolve any conflicts between it and upstream in this funny merge (and good luck correctly resolving it to remove the changes introduced by that particular B). A more detailed of these explanations for this being a bad idea can be found at https://github.com/newren/git/blob/2a621020863c0b867293e020fec0954b43818789/rebasing-merges-differences.txt === Good idea === Summary: Solve the actual problem statement Details: High level overview:
=== Better idea === Summary: See URL below Details: While the above is the right basic idea, and is probably leaps and bounds better than what you're currently doing, it's a bit too simplistic:
But those things can be handled by extending this idea, as per: |
@newren valid points! And your better idea even has a better name: a "RON merge" ;-) |
Two other points that I thought I'd bring up:
|
15d354c
to
377b9f9
Compare
f48423a
to
7e679a4
Compare
4ca4552
to
0c40f5c
Compare
One of the repair items I privately added to my ever-growing TODO list when doing the v2.39.2
maint-*
merge cascade (which was quite painful due to the -- expected! -- merge conflicts inRelNotes
and inGIT-VERSION-GEN
) was to take a look again at my past work on replaying merge commits.So I rebased the commits that I could still find, which was challenging because Git changes quite a lot in 4 years.
It also made some things substantially easier because we now have merge-ort which does not require a work tree with the first merge parent checked out.
In my preliminary tests, it manages to (force-)rebase the v2.39.2 merge cascade without requiring the user to re-resolve already-resolved conflicts, which is exactly what I wanted.
However, the "realistic" test case added in this PR demonstrates that the strategy still is not good enough (and that test case is neither, admittedly, as it is extremely hard to read, even for the person who wrote it, at least four years later).
The thing is, I am more convinced than before that we essentially need to extend the idea of a 3-way merge, but not iteratively (by breaking up the overall merge into a sequence of 3-way merges) because that will always run the risk of really, really difficult-to-resolve nested merge conflicts.
Instead, we should look at extending the concept of a 3-way merge to a "W-merge":
For octopus merges, it could be extended to an "inverted spider merge" by adding more pre-rebase tip/tip "legs" to the center.
The big trick will not so much be identifying the unambiguous parts of the merge vs the conflicting parts (because we have the original merge as a reference, and we can interpret all of the line ranges of all of the hunks in all of the spider legs relative to that reference).
The big trick will be the representation of these conflicts, as the spider legs do not have the same knees (i.e. they do not have one common merge base).
I understand that this is all a bit hand-waving at this stage, and nothing of that is implemented yet, that's why this PR is still in draft mode.