+
Skip to content

cat-file: add a --stdin-cmd mode #1191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 157 commits into
base: next
Choose a base branch
from

Conversation

john-cai
Copy link
Contributor

@john-cai john-cai commented Jan 13, 2022

This WIP patch is mostly stealing code from builtin/update-ref.c and
implementing the same sort of prefixed command-mode that it
supports. I.e. in addition to --batch now supporting:

<object> LF

It'll support with --stdin-cmd, with and without -z, respectively:

object <object> NL
object <object> NUL

The plus being that we can now implement additional commands:

fflush NL
fflush NUL

That command simply calls fflush(stdout), which could be done as an
emergent effect before by feeding the input a "NL".

I think this will be useful for other things, e.g. a not-trivial part of "cat-file --batch" time is spent
on parsing its argument and seeing if it's a revision, ref etc.

So we could e.g. add a command that only accepts a full-length 40
character SHA-1, or switch the --format output mid-request etc.

  1. https://lore.kernel.org/git/pull.1124.git.git.1636149400.gitgitgadget@gmail.com/

requires ee4d430 ab/cat-file

cc: Taylor Blau me@ttaylorr.com

Doc update.

* cw/protocol-v2-doc-fix:
  protocol-v2.txt: align delim-pkt spec with usage
Ensure that the sparseness of the in-core index matches the
index.sparse configuration specified by the repository immediately
after the on-disk index file is read.

* vd/sparse-sparsity-fix-on-read:
  sparse-index: update do_read_index to ensure correct sparsity
  sparse-index: add ensure_correct_sparsity function
  sparse-index: avoid unnecessary cache tree clearing
  test-read-cache.c: prepare_repo_settings after config init
Code clean-up to eventually allow information on remotes defined
for an arbitrary repository to be read.

* gc/remote-with-fewer-static-global-variables:
  remote: die if branch is not found in repository
  remote: remove the_repository->remote_state from static methods
  remote: use remote_state parameter internally
  remote: move static variables into per-repository struct
  t5516: add test case for pushing remote refspecs
Doc update.

* ja/doc-cleanup:
  init doc: --shared=0xxx does not give umask but perm bits
  doc: git-init: clarify file modes in octal.
  doc: git-http-push: describe the refs as pattern pairs
  doc: uniformize <URL> placeholders' case
  doc: use three dots for indicating repetition instead of star
  doc: git-ls-files: express options as optional alternatives
  doc: use only hyphens as word separators in placeholders
  doc: express grammar placeholders between angle brackets
  doc: split placeholders as individual tokens
  doc: fix git credential synopsis
Redact the path part of packfile URI that appears in the trace output.

* if/redact-packfile-uri:
  http-fetch: redact url on die() message
  fetch-pack: redact packfile urls in traces
Doc update.

* jc/fix-first-object-walk:
  docs: add headers in MyFirstObjectWalk
  docs: fix places that break compilation in MyFirstObjectWalk
CI has been taught to catch some Unicode directional formatting
sequence that can be used in certain mischief.

* js/ci-no-directional-formatting:
  ci: disallow directional formatting
The "--date=format:<strftime>" gained a workaround for the lack of
system support for a non-local timezone to handle "%s" placeholder.

* jk/strbuf-addftime-seconds-since-epoch:
  strbuf_addftime(): handle "%s" manually
"git var GIT_DEFAULT_BRANCH" is a way to see what name is used for
the newly created branch if "git init" is run.

* tw/var-default-branch:
  var: add GIT_DEFAULT_BRANCH variable
Build optimization.

* ab/generate-command-list:
  generate-cmdlist.sh: don't parse command-list.txt thrice
  generate-cmdlist.sh: replace "grep' invocation with a shell version
  generate-cmdlist.sh: do not shell out to "sed"
  generate-cmdlist.sh: stop sorting category lines
  generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
  generate-cmdlist.sh: run "grep | sort", not "sort | grep"
  generate-cmdlist.sh: don't call get_categories() from category_list()
  generate-cmdlist.sh: spawn fewer processes
  generate-cmdlist.sh: trivial whitespace change
  command-list.txt: sort with "LC_ALL=C sort"
Tighten code for testing pack-bitmap.

* jk/test-bitmap-fix:
  test_bitmap_hashes(): handle repository without bitmaps
The "merge" subcommand of "git jump" (in contrib/) silently ignored
pathspec and other parameters.

* jk/jump-merge-with-pathspec:
  git-jump: pass "merge" arguments to ls-files
The code to decode the length of packed object size has been
corrected.

* jt/pack-header-lshift-overflow:
  packfile: avoid overflowing shift during decode
"diff --histogram" optimization.

* pw/xdiff-classify-record-in-histogram:
  xdiff: simplify comparison
  xdiff: avoid unnecessary memory allocations
  diff histogram: intern strings
trace2 error code path fix.

* js/trace2-avoid-recursive-errors:
  trace2: disable tr2_dst before warning on write errors
Test fix.

* jk/t5319-midx-corruption-test-deflake:
  t5319: corrupt more bytes of the midx checksum
Leakfix.

* ab/checkout-branch-info-leakfix:
  checkout: fix "branch info" memory leaks
…ge' into next

The advice message given by "git pull" when the user hasn't made a
choice between merge and rebase still said that the merge is the
default, which no longer is the case.  This has been corrected.

* ah/advice-pull-has-no-preference-between-rebase-and-merge:
  pull: don't say that merge is "the default strategy"
Bitop fix for platforms whose "long" is 32-bit.

* rs/mergesort:
  mergesort: avoid left shift overflow
* po/size-t-for-vs:
  object-file.c: LLP64 compatibility, upcast unity for left shift
  diffcore-delta.c: LLP64 compatibility, upcast unity for left shift
  repack.c: LLP64 compatibility, upcast unity for left shift
Doc mark-up fix.

* tl/midx-docfix:
  midx: fix a formatting issue in "multi-pack-index.txt"
Various operating modes of "git reset" have been made to work
better with the sparse index.

* vd/sparse-reset:
  unpack-trees: improve performance of next_cache_entry
  reset: make --mixed sparse-aware
  reset: make sparse-aware (except --mixed)
  reset: integrate with sparse index
  reset: expand test coverage for sparse checkouts
  sparse-index: update command for expand/collapse test
  reset: preserve skip-worktree bit in mixed reset
  reset: rename is_missing to !is_in_reset_tree
Comment fix.

* hk/ci-checkwhitespace-commentfix:
  ci(check-whitespace): update stale file top comments
"git fetch", when received a bad packfile, can fail with SIGPIPE.
This wasn't wrong per-se, but we now detect the situation and fail
in a more predictable way.

* jk/fetch-pack-avoid-sigpipe-to-index-pack:
  fetch-pack: ignore SIGPIPE when writing to index-pack
Workaround for a false-alarm by gcc-11

* jk/refs-g11-workaround:
  refs: work around gcc-11 warning with REF_HAVE_NEW
The function to cull a child process and determine the exit status
had two separate code paths for normal callers and callers in a
signal handler, and the latter did not yield correct value when the
child has caught a signal.  The handling of the exit status has
been unified for these two code paths.  An existing test with
flakiness has also been corrected.

* jk/t7006-sigpipe-tests-fix:
  t7006: simplify exit-code checks for sigpipe tests
  t7006: clean up SIGPIPE handling in trace2 tests
  run-command: unify signal and regular logic for wait_or_whine()
Docfix.

* jt/midx-doc-fix:
  Doc: no midx and partial clone relation
A small simplification of API.

* hn/create-reflog-simplify:
  refs: drop force_create argument of create_reflog API
Weather balloon to break people with compilers that do not support
C99.

* bc/require-c99:
  git-compat-util: add a test balloon for C99 support
The "reftable" backend for the refs API, without integrating into
the refs subsystem, has been added.

* hn/reftable:
  Add "test-tool dump-reftable" command.
  reftable: add dump utility
  reftable: implement stack, a mutable database of reftable files.
  reftable: implement refname validation
  reftable: add merged table view
  reftable: add a heap-based priority queue for reftable records
  reftable: reftable file level tests
  reftable: read reftable files
  reftable: generic interface to tables
  reftable: write reftable files
  reftable: a generic binary tree implementation
  reftable: reading/writing blocks
  Provide zlib's uncompress2 from compat/zlib-compat.c
  reftable: (de)serialization for the polymorphic record type.
  reftable: add blocksource, an abstraction for random access reads
  reftable: utility functions
  reftable: add error related functionality
  reftable: add LICENSE
  hash.h: provide constants for the hash IDs
@john-cai john-cai changed the base branch from master to next January 13, 2022 20:41
@john-cai john-cai force-pushed the jc-cat-file-stdin-cmd-mode branch 6 times, most recently from 6982dc9 to cd48dc2 Compare January 19, 2022 15:37
@john-cai
Copy link
Contributor Author

/preview

@gitgitgadget-git
Copy link

Preview email sent as pull.1191.git.git.1642606748.gitgitgadget@gmail.com

This is just string_list_split() copy/pasted with the "push" part
adjusted. The next commit will call this function to parse a line of
command arguments.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
@john-cai john-cai force-pushed the jc-cat-file-stdin-cmd-mode branch from cd48dc2 to 90c36de Compare January 19, 2022 17:05
@john-cai
Copy link
Contributor Author

/preview

@gitgitgadget-git
Copy link

Preview email sent as pull.1191.git.git.1642612351.gitgitgadget@gmail.com

Teach cat-file to accept a --stdin-cmd which causes cat-file to enter a
command mode to receive commands and arguments on stdin..

batch_objects_stdin_cmd() handles a line from stdin, which it expects a
command name and arguments. -z will tell --stdin-cmd to look for NUL
as the line termination.

This commit adds two commands, fflush and object. fflush immediately
flushes the buffer. object takes a sha as an argument and passes it to
batch_one_object().

The code allows for additional commands to be added as parse_cmd structs.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
@john-cai john-cai force-pushed the jc-cat-file-stdin-cmd-mode branch from 90c36de to 63fb5d1 Compare January 19, 2022 17:49
@john-cai
Copy link
Contributor Author

/submit

@gitgitgadget-git
Copy link

Submitted as pull.1191.git.git.1642615122.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-git-1191/john-cai/jc-cat-file-stdin-cmd-mode-v1

To fetch this version to local tag pr-git-1191/john-cai/jc-cat-file-stdin-cmd-mode-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-git-1191/john-cai/jc-cat-file-stdin-cmd-mode-v1

@gitgitgadget-git
Copy link

On the Git mailing list, Taylor Blau wrote (reply to this):

On Wed, Jan 19, 2022 at 05:58:40PM +0000, John Cai via GitGitGadget wrote:
> I think this will be useful for other things, e.g. a not-trivial part of
> "cat-file --batch" time is spent on parsing its argument and seeing if it's
> a revision, ref etc.
>
> So we could e.g. add a command that only accepts a full-length 40 character
> SHA-1, or switch the --format output mid-request etc.

I would like to see a more concrete proposal or need for this sort of
thing before we go too far down adding a new mode to a command so
fundamental as cat-file is.

Between your two proposals for other commands that you could add, I am
not convinced that either of them needs to be in cat-file itself. IOW,
you could easily inject another process in between which verifies that
the provided objects are 40 character SHA-1s.

The latter, changing the output format in-process, seems dubious to me.
Is the start-up time of cat-file so slow (and you need to change formats
so often) that the two together are unbearable? I'd be surprised if they
were (and if so, we should focus our efforts on improving Git's start-up
time).

In the meantime, this is quite an invasive way to provide callers a way
to control the output stream. If there really is a need to just force
cat-file to flush more often, perhaps consider designating a special
signal that we could treat as a request to flush the output stream?

Thanks,
Taylor

@gitgitgadget-git
Copy link

User Taylor Blau <me@ttaylorr.com> has been added to the cc: list.

@gitgitgadget-git
Copy link

On the Git mailing list, John Cai wrote (reply to this):



On 19 Jan 2022, at 14:38, Taylor Blau wrote:

> On Wed, Jan 19, 2022 at 05:58:40PM +0000, John Cai via GitGitGadget wrote:
>> I think this will be useful for other things, e.g. a not-trivial part of
>> "cat-file --batch" time is spent on parsing its argument and seeing if it's
>> a revision, ref etc.
>>
>> So we could e.g. add a command that only accepts a full-length 40 character
>> SHA-1, or switch the --format output mid-request etc.
>
> I would like to see a more concrete proposal or need for this sort of
> thing before we go too far down adding a new mode to a command so
> fundamental as cat-file is.

Thanks for the feedback! I realized I should have made this an RFC first.
I’ll submit an RFC separately.

>
> Between your two proposals for other commands that you could add, I am
> not convinced that either of them needs to be in cat-file itself. IOW,
> you could easily inject another process in between which verifies that
> the provided objects are 40 character SHA-1s.
>
> The latter, changing the output format in-process, seems dubious to me.
> Is the start-up time of cat-file so slow (and you need to change formats
> so often) that the two together are unbearable? I'd be surprised if they
> were (and if so, we should focus our efforts on improving Git's start-up
> time).
>
> In the meantime, this is quite an invasive way to provide callers a way
> to control the output stream. If there really is a need to just force
> cat-file to flush more often, perhaps consider designating a special
> signal that we could treat as a request to flush the output stream?
>
> Thanks,
> Taylor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载