+
Skip to content

VMAWIP: rebase merge commits #1491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 252 commits into
base: next
Choose a base branch
from

Conversation

dscho
Copy link
Member

@dscho dscho commented Mar 14, 2023

One of the repair items I privately added to my ever-growing TODO list when doing the v2.39.2 maint-* merge cascade (which was quite painful due to the -- expected! -- merge conflicts in RelNotes and in GIT-VERSION-GEN) was to take a look again at my past work on replaying merge commits.

So I rebased the commits that I could still find, which was challenging because Git changes quite a lot in 4 years.

It also made some things substantially easier because we now have merge-ort which does not require a work tree with the first merge parent checked out.

In my preliminary tests, it manages to (force-)rebase the v2.39.2 merge cascade without requiring the user to re-resolve already-resolved conflicts, which is exactly what I wanted.

However, the "realistic" test case added in this PR demonstrates that the strategy still is not good enough (and that test case is neither, admittedly, as it is extremely hard to read, even for the person who wrote it, at least four years later).

The thing is, I am more convinced than before that we essentially need to extend the idea of a 3-way merge, but not iteratively (by breaking up the overall merge into a sequence of 3-way merges) because that will always run the risk of really, really difficult-to-resolve nested merge conflicts.

Instead, we should look at extending the concept of a 3-way merge to a "W-merge":

w-merge

For octopus merges, it could be extended to an "inverted spider merge" by adding more pre-rebase tip/tip "legs" to the center.

The big trick will not so much be identifying the unambiguous parts of the merge vs the conflicting parts (because we have the original merge as a reference, and we can interpret all of the line ranges of all of the hunks in all of the spider legs relative to that reference).

The big trick will be the representation of these conflicts, as the spider legs do not have the same knees (i.e. they do not have one common merge base).

I understand that this is all a bit hand-waving at this stage, and nothing of that is implemented yet, that's why this PR is still in draft mode.

Make fsmonitor more robust to avoid the flakiness seen in t7527.

* jh/t7527-unflake-by-forcing-cookie:
  fsmonitor: fix race seen in t7527
Test fix.

* ab/t5314-avoid-losing-exit-status:
  t5314: check exit code of "git"
Test fix.

* ab/t7600-avoid-losing-exit-status-of-git:
  t7600: don't ignore "rev-parse" exit code in helper
Test fix.

* ab/t4023-avoid-losing-exit-status-of-diff:
  t4023: fix ignored exit codes of git
Stop using deprecated macOS API in fsmonitor.

* jh/fsmonitor-darwin-modernize:
  fsmonitor: eliminate call to deprecated FSEventStream function
Test fix.

* js/t3920-shell-and-or-fix:
  t3920: don't ignore errors of more than one command with `|| true`
Test fix.

* rs/t3920-crlf-eating-grep-fix:
  t3920: support CR-eating grep
Redefining system functions for a few functions did not follow our
usual "implement git_foo() and #define foo(args) git_foo(args)"
pattern, which has broken build for some folks.

* jk/avoid-redef-system-functions-2.30:
  git-compat-util: undefine system names before redeclaring them
  git-compat-util: avoid redefining system function names
The jk/avoid-redef-system-functions-2.30 topic pre-merged for more
recent codebase.

* jk/avoid-redef-system-functions:
The way the diff machinery prepares the options array for the
parse_options API has been refactored to avoid resource leaks.

* rs/diff-parseopts:
  diff: remove parseopts member from struct diff_options
  diff: use add_diff_options() in diff_opt_parse()
  diff: factor out add_diff_options()
Correct pthread API usage.

* sx/pthread-error-check-fix:
  maintenance: compare output of pthread functions for inequality with 0
Introduce a case insensitive mode to the Bash completion helpers.

* aw/complete-case-insensitive:
  completion: add case-insensitive match of pseudorefs
  completion: add optional ignore-case when matching refs
The advice message given by "git status" when it takes long time to
enumerate untracked paths has been updated.

* rr/status-untracked-advice:
  status: modernize git-status "slow untracked files" advice
Fix a pair of bugs in 'git branch'.

* rj/branch-copy-and-rename:
  branch: force-copy a branch to itself via @{-1} is a no-op
The code to auto-correct a misspelt subcommand unnecessarily called
into git_default_config() from the early config codepath, which was
a no-no.  This has bee corrected.

* sg/help-autocorrect-config-fix:
  help.c: fix autocorrect in work tree for bare repository
"git http-fetch" (which is rarely used) forgot to identify itself
in the trace2 output.

* jt/http-fetch-trace2-report-name:
  http-fetch: invoke trace2_cmd_name()
Code clean-up around unused function parameters.

* jk/unused-post-2.39:
  userdiff: mark unused parameter in internal callback
  list-objects-filter: mark unused parameters in virtual functions
  diff: mark unused parameters in callbacks
  xdiff: mark unused parameter in xdl_call_hunk_func()
  xdiff: drop unused parameter in def_ff()
  ws: drop unused parameter from ws_blank_line()
  list-objects: drop process_gitlink() function
  blob: drop unused parts of parse_blob_buffer()
  ls-refs: use repository parameter to iterate refs
Code clean-up.

* jk/server-supports-v2-cleanup:
  server_supports_v2(): use a separate function for die_on_error
Code clean-up.

* rs/am-parse-options-cleanup:
  am: don't pass strvec to apply_parse_options()
Code clean-up.

* rs/clear-commit-marks-cleanup:
  commit: skip already cleared parents in clear_commit_marks_1()
Code clean-up.

* rs/reflog-expiry-cleanup:
  reflog: clear leftovers in reflog_expiry_cleanup()
Code clean-up.

* rs/clarify-error-in-write-loose-object:
  object-file: inline write_buffer()
Clean-ups in error messages produced by "git for-each-ref" and friends.

* jk/ref-filter-error-reporting-fix:
  ref-filter: convert email atom parser to use err_bad_arg()
  ref-filter: truncate atom names in error messages
  ref-filter: factor out "unrecognized %(foo) arg" errors
  ref-filter: factor out "%(foo) does not take arguments" errors
  ref-filter: reject arguments to %(HEAD)
The output from "git diff --stat" on an unmerged path lost the
terminating LF in Git 2.39, which has been corrected.

* pg/diff-stat-unmerged-regression-fix:
  diff: fix regression with --stat and unmerged file
Code clean-up.

* sk/remove-duplicate-includes:
  git: remove duplicate includes
Use the SHA1DC implementation on macOS, just like other platforms,
by default.

* ab/darwin-default-to-sha1dc:
  Makefile: use sha1collisiondetection by default on OSX and Darwin
"git pull -v --recurse-submodules" attempted to pass "-v" down to
underlying "git submodule update", which did not understand the
request and barfed, which has been corrected.

* ss/pull-v-recurse-fix:
  submodule: accept -v for the update command
Just like "git var GIT_EDITOR" abstracts the complex logic to
choose which editor gets used behind it, "git var" now give support
to GIT_SEQUENCE_EDITOR.

* sa/git-var-sequence-editor:
  var: add GIT_SEQUENCE_EDITOR variable
When given a pattern that matches an empty string at the end of a
line, the code to parse the "git diff" line-ranges fell into an
infinite loop, which has been corrected.

* lk/line-range-parsing-fix:
  line-range: fix infinite loop bug with '$' regex
gitster and others added 15 commits March 8, 2023 15:18
After "git pull" that is configured with pull.rebase=false
merge.ff=only fails due to our end having our own development, give
advice messages to get out of the "Not possible to fast-forward"
state.

* fc/advice-diverged-history:
  advice: add diverging advice for novices
…les' into next"

This reverts commit 390cd4f, reversing
changes made to 1c421f4.
…o next

Once we start running, we assumed that the list of alternate object
databases would never change.  Hook into the machinery used to
update the list of packfiles during runtime to update this list as
well.

* ds/reprepare-alternates-when-repreparing-packfiles:
  object-file: reprepare alternates when necessary
* ew/fetch-no-write-fetch-head-fix:
  fetch: pass --no-write-fetch-head to subprocesses
"git add -p" while the index is unmerged sometimes failed to parse
the diff output it internally produces and died, which has been
corrected.

* jk/add-p-unmerged-fix:
  add-patch: handle "* Unmerged path" lines
"git format-patch" honors the src/dst prefixes set to nonstandard
values with configuration variables like "diff.noprefix", causing
receiving end of the patch that expects the standard -p1 format to
break.  Teach "format-patch" to ignore end-user configuration and
always use the standard prefixes.

This is a backward compatibility breaking change.

* jk/format-patch-ignore-noprefix:
  format-patch: add format.noprefix option
  format-patch: do not respect diff.noprefix
  diff: add --default-prefix option
  t4013: add tests for diff prefix options
  diff: factor out src/dst prefix setup
"git format-patch" honors the src/dst prefixes set to nonstandard
values with configuration variables like "diff.noprefix", causing
receiving end of the patch that expects the standard -p1 format to
break.  Teach "format-patch" to ignore end-user configuration and
always use the standard prefixes.

This is a backward compatibility breaking change.

* jk/format-patch-ignore-noprefix:
  rebase: prefer --default-prefix to --{src,dst}-prefix for format-patch
Fix for a "ls-files --format="%(path)" that produced nonsense
output, which was a bug in 2.38.

* aj/ls-files-format-fix:
  ls-files: fix "--format" output of relative paths
"git receive-pack" that responds to "git push" requests failed to
clean a stale lockfile when killed in the middle, which has been
corrected.

* ps/receive-pack-unlock-before-die:
  receive-pack: fix stale packfile locks when dying
Code clean-up for test framework.

* fc/test-aggregation-clean-up:
  test: don't print aggregate-results command
  test: simplify counts aggregation
Previously, we punted on the question how to carry over amendments to
merge commits. Instead, we always performed new merges from scratch.

Such amendments to merge commits may very well be necessary, though,
e.g.  if one side of the history changed a function signature and the
other side added a caller.

In a discussion inspired by Johannes Sixt's "cherry-pick -m 1" idea,
Phillip Wood came up with the fundamental idea behind this commit.

The premise is that both "rebase" and "merge" try to reconcile diverging
changes. Both "rebase" and "merge" should result in identical trees
(after merge conflicts are resolved, if there were any).

A "merge" typically boils down to a "3-way" merge, with a "merge base"
and two "merge heads" that diverged from said merge base. Think e.g.
last week's master as the merge base, from where a developer branched
off a topic branch (the first merge head), and the current master as
second merge head.

A "rebase" is more complicated than a single "3-way merge": When
rebasing commits, they are "cherry-picked" one by one. That way, the
changes introduced by that commit are reconciled with the changes
introduced by the "rebase" so far: a "3-way merge" between HEAD and the
cherry-picked commit, with the latter's parent commit as merge base
(HEAD would be the rebased parent commit in the common case).

In the context of a cherry-pick, it is important to keep in mind that
the "diverging changes" are not reflected by commit history. For the
purpose of a 3-way merge, they don't have to be.

Now let's look again at the problem of "rebasing merge commits", i.e.
how to "cherry-pick a merge commit"? It is not as simple as a "3-way
merge" (a single one, that is) because we do not have a single merge
base from where sprung exactly two diverging changesets.

Just like when we cherry-pick a regular commit, in the case of a merge
commit we have diverging changes for *every* one of its parents. And
just like before, the original parent commit is the merge base, and the
rebased parent commit is a merge head. The other merge head is the
original merge commit itself.

By starting with the original merge commit and then performing these
3-way merges sequentially, one for every parent, we reconcile all
diverging changes. So we "rebased" the merge commit, including all of
its amendments.

This is the most logical generalization of the cherry-pick concept: if
we only have one parent, the strategy outlined above is identical to
Git's cherry-pick operation.

Side-note: While this description only talked about merge commits with
exactly two parent commits above, the principle still holds for merge
commits with more than two parent commits ("octopus merges").

That strategy requires an original merge commit, with a matching number
of parents, though, and therefore it would not make sense in general to
do this for every `merge` command in the todo list: what if there was no
original merge commit, or if the specified merge commit had a different
number of parents? It therefore cannot be the default mode for the
`merge` command, thus we introduce a new flag `-R` for that.

However, for existing merge commits, this strategy *is* valid and *does*
lead to the most intuitive results. Therefore, let's use it by default
when generating todo lists in `git rebase --rebase-merges`.

Original-idea-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Igor Djordjevic <igor.d.djordjevic@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
TODO: implement a test

Just like with `pick` commands, `merge` commands should also get a chance
to resolve already-recorded conflicts.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This commit adds a lengthy test case to t3430 that reflects some
challenging use cases for the --rebase-merges option.

In particular, it sets up a scenario which demonstrates that "evil merges"
happen in practice, and they are necessarily introducing those extra
changes that constitute an "evil merge".

It then sets up three "upstream" branches with competing changes that
are designed to conflict with the changes to rebase.

The purpose of this added test case is two-fold:

1. to document what we expect --rebase-merges to accomplish, and even more
   to document what we do *not* expect it to be able to do.

2. to explore what kinds of merge conflicts --rebase-merges can produce
   (spoiler alert: we can end up with some bad ones, with
   unintuitively-nested merge conflicts).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When rebasing a regular merge between two parent commits, we have that
problem that we have to perform *two* 3-way merges, because we want to
merge in the changes (like amendments, merge conflict resolutions, etc)
from the original merge commit, too.

When the first of these 3-way merges already had conflicts, then we run
the chance of ending up with nested conflicts in the second 3-way merge.

So let's see in that case whether we gain something by merging the
original merge commit with the other parent first, and if that resulted in
a clean merge, proceed to merge the first parent (in this case, we cannot
end up with nested merge conflicts).

This simplifies the realistic example of a nested merge conflict to a
non-nested merge conflict. Before:

	int hi(void) {
		printf("Hello, world!\n");
	}
	<<<<<<< intermediate merge
	<<<<<<<< HEAD
	/* main event loop */
	void event_loop(void) {
		/* TODO: place holder for now */
	========
	=======
	}
	>>>>>>> <HASH>... merge head #1
	/* caller */
	void caller(void) {
		hi();
	>>>>>>>> <HASH>... original merge
	}

With this patch, this becomes much simpler:

	int hi(void) {
		printf("Hello, world!\n");
	}
	/* main event loop */
	void event_loop(void) {
		/* TODO: place holder for now */
	}
	<<<<<<<< HEAD
	========
	/* caller */
	void caller(void) {
		hi();
	}
	>>>>>>>> <HASH>... intermediate merge

Note: this needs to be refactored and stuff and things. It may even be
necessary to dive deeper into the code and implement a "W merge" that
avoids the problem where (one part of) one file would benefit from merging
the second parent before the first, while another (part of the same) file
would benefit from the reverse order.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho requested a review from newren March 14, 2023 23:14
@dscho dscho self-assigned this Mar 14, 2023
@dscho
Copy link
Member Author

dscho commented Mar 24, 2023

FWIW as the test failures prove, it is probably a bad idea to introduce the -R option to the merge rebase command as it is done here (this PR would let git rebase -r always generate rebase scripts containing lines like this one: merge -R -C <commit> <label>, with no way to go back to git rebase -r's current behavior).

A much better idea is probably to introduce a "fake" merge strategy option, say, -Xreplay that is accepted only if the default merge strategy is chosen for the rebase, and that then generates merge -Xreplay -C <commit> <label> lines (I do think that it is a good idea to keep the current behavior of merge -C <commit> <label> without such an extra option).

@newren
Copy link

newren commented Mar 24, 2023

So, the W-shaped merge idea may work. It's certainly a lot more promising than the fundamentally broken strategy of involving a 3-way merge between A3, M, and A3' (and a follow-up with B3' and B3). I think it'd be a huge amount of effort to implement and get right, though. (How long did it take us to get 3-way merges right in the face of renames, D/F conflicts, mode changes, etc. 15 years? Tackling 5-way merges looks interesting to say the least.)

But since this patch is doing the 3-way merge of A3, M, and A3', let me take a minute to explain why it's a really bad idea and suggest another simpler alternative (I sent the below to you privately already, but since you're linking this PR to the mailing list..). There are two ways to see why this is a bad idea:

=== Reason #1 this is a bad idea ===

Summary: This solution does NOT match the problem statement.

Details:

The problem we need to solve is:

  • Reflect the topology -- i.e. create a new merge commit -- AND
  • Port the manual user changes (conflict resolutions, semantic
    tweaks, etc.) from the old merge commit to the new merge commit.

Now, porting the manual user changes is "risky"; it might not be portable to the newer merge. We might even get nested conflicts. So it needs to be handled with care.

But the {A3,M,A3'} 3-way merge doesn't even attempt to compute the manual user changes. It instead uses an approximation that involves the manual user changes as a very small subset, and in so doing, expands the risk zone to that huge set of changes.

=== Reason #2 this is a bad idea ===

Summary: Conflicts galore (as seen by diff|apply analogy to merging)

Details:

Fundamentally, a 3-way merge between A3, M, and A3' is roughly equivalent (yeah, I know I'm glossing over details but it's good enough as an explanation) to either:

git checkout M
git diff A3 A3' | git apply

OR

git checkout A3'
git diff A3 M | git apply

Let's look at the latter:

git diff A3 M basically computes a squashed set of changes from B1 + B2 + ... + BN + {manual user changes}.

You then attempt to apply all these changes, all at once, and "hope" that the B1 + B2 + ... + BN changes disappear due to already being detected as having applied upstream in B3'. But odds are that B1, B2,..., BN are as big or bigger than {manual user changes}, and are just as likely to conflict with upstream changes as anything in {manual user changes}. So, most of your conflicts are going to be repeated ones you already resolved when you rebased B1, B2, ..., BN.

But it gets worse, because the "repeated" conflicts are tweaked a bit so they do not look the same and even rerere won't help you.

And it gets even worse, because it's the union of all those conflicts all at once that you get to re-resolve.

And it can get even worse; things get really hilarious if one of the B's were intentionally dropped during the rebase, as you still get to resolve any conflicts between it and upstream in this funny merge (and good luck correctly resolving it to remove the changes introduced by that particular B).

A more detailed of these explanations for this being a bad idea can be found at https://github.com/newren/git/blob/2a621020863c0b867293e020fec0954b43818789/rebasing-merges-differences.txt

=== Good idea ===

Summary: Solve the actual problem statement

Details:

High level overview:

  • Merge the new commits, like today's --rebase-merges:
    git checkout $NEW_PARENT_1
    git merge $NEW_PARENT_2
  • Compute the manual user changes in old merge commit and port:
    git show --remerge-diff $OLD_MERGE_COMMIT | git apply

=== Better idea ===

Summary: See URL below

Details:

While the above is the right basic idea, and is probably leaps and bounds better than what you're currently doing, it's a bit too simplistic:

  • The conflict markers in --remerge-diff won't match (they are annotated with the commit summary of the commits being merged, and the old commit hashes were obviously different), which prevents this method from letting you "apply the previous conflict resolution".
  • It doesn't consider binary files, symlinks, and other non-textual conflicts.
  • It doesn't try to minimize nested conflicts in normal two-headed merges, and thus risks even more complex issues when conflictstyle is diff3.

But those things can be handled by extending this idea, as per:

https://github.com/newren/git/blob/2a621020863c0b867293e020fec0954b43818789/replay-design-notes.txt#L264-L341

@dscho
Copy link
Member Author

dscho commented Mar 27, 2023

@newren valid points! And your better idea even has a better name: a "RON merge" ;-)

@newren
Copy link

newren commented Apr 4, 2023

Two other points that I thought I'd bring up:

  • Technically, you could view my alternative solution(s) as subsets of Dscho's proposed solution. They are a process involving all five commits Dscho highlights, and since he doesn't detail how the five commits are used exactly, my solution is one possibility. Perhaps not the one hinted at, but a specific more-detailed solution all the same.
  • There's a third high-level reason to see the A3-M-A3' merge as a really bad idea. We are talking about rebasing. The idea of rebasing is that if you have a series of commits B1, B2, B3, etc., that you want to give them a different base. While the parent of B2' is related to the parent of B2 (since B1' is a rebase of B1), the parent of B1' is fundamentally totally different than the parent of B1. It has a new base, so to speak. Now, we want to apply this idea of getting a new base not just to regular commits but to merge commits as well. So, we need to be allowed to change the parent(s) of the merge commit while still porting the semantic fixups it included. The idea of the A3-M-A3' and B3-M-B3' merges is fundamentally tied to the original relative topology, and doesn't allow such rebasing of merges. The alternative idea I proposed, however, naturally handles the idea of changing the parents of the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载