+
Skip to content

config-parse: create config parsing library #1551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -992,6 +992,7 @@ LIB_OBJS += compat/obstack.o
LIB_OBJS += compat/terminal.o

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Tan wrote (reply to this):

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Begin this process by splitting the config parsing code out of
> config.[c|h] and into config-parse.[c|h].

I think we need to be more careful in how we split. It would be easier
if there is a concrete use case, but some preliminary questions:

 - "respect_includes" is in the library, but callers might want to opt
   out of it or provide an alternative way to resolve includes.
 - There is a lot of error reporting capability with respect to the
   source of config, and not all sources are applicable to library
   users. How should we proceed? E.g. should we expect that all library
   users conform to the list here (e.g. even if the source is something
   like but not exactly STDIN, they should pick it), or allow users to
   customize sources?

In the absence of more information, the split I would envision is
either something that can only parse a buffer, its error messages being
very generic (the caller should turn them into something more specific
before showing them to the user) (but one problem here is that we must
postprocess includes, which might be a problem if the output of parsing
is a flat hashtable, since we wouldn't know which keys are overridden
by the includes and which are not); or something that can take in a
callback that is invoked whenever something is included and maybe also
a callback for access to the object database and that has full knowledge
of all sources for error reporting (or allows the caller to customize
the sources).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):

Jonathan Tan <jonathantanmy@google.com> writes:

>> Begin this process by splitting the config parsing code out of
>> config.[c|h] and into config-parse.[c|h].
>
> I think we need to be more careful in how we split. It would be easier
> if there is a concrete use case, but some preliminary questions:
>
>  - "respect_includes" is in the library, but callers might want to opt
>    out of it or provide an alternative way to resolve includes.

Makes sense. Or alternatively, we could choose not to support
"respect_includes" initially, and exclude it to avoid confusion.

>  - There is a lot of error reporting capability with respect to the
>    source of config, and not all sources are applicable to library
>    users. How should we proceed? E.g. should we expect that all library
>    users conform to the list here (e.g. even if the source is something
>    like but not exactly STDIN, they should pick it), or allow users to
>    customize sources?

Good point. I would also prefer to have the list of sources constrained
to the list of sources available via the library. Some possibilities I
can see are:

1. Move the Git-program-specific error message reporting to a level above
   the library (i.e. config.c).

2. Proceed as-is (with the additional sources in the library) and leave a
   FIXME to address this when we find a Git library-idiomatic way to
   handle errors. This won't be the last time we'll have to untangle
   Git-program-specific error reporting from the library - it might be
   useful to try to figure out all of that in one fell swoop.

3. Figure out library-idiomatic error handling mentioned in 2. right
   now.

I think 1. is the best option, but if that fails, 2. is also reasonable.
3. is too difficult to do with a sample size of 1.

> In the absence of more information, the split I would envision is
> either something that can only parse a buffer, its error messages being
> very generic (the caller should turn them into something more specific
> before showing them to the user) (but one problem here is that we must
> postprocess includes, which might be a problem if the output of parsing
> is a flat hashtable, since we wouldn't know which keys are overridden
> by the includes and which are not);

Hm, how does the include mechanism here this differ from what's in this
patch? This also only parses a single file and ignores includes. I'm not
sure why this requires us to postprocess includes - in config.c,
includes are handled by 'pausing' parsing of the current source,
evaluating the included files, then 'resuming' parsing.

> or something that can take in a
> callback that is invoked whenever something is included and maybe also
> a callback for access to the object database and that has full knowledge
> of all sources for error reporting (or allows the caller to customize
> the sources).

Ah, I like this callback idea quite a lot. This lets config-parse.c
easily support unconditional includes and provides entry points for
program-specific behavior (like checking the odb). I will try this.

LIB_OBJS += compat/zlib-uncompress2.o
LIB_OBJS += config.o
LIB_OBJS += config-parse.o
LIB_OBJS += connect.o
LIB_OBJS += connected.o
LIB_OBJS += convert.o
Expand Down
7 changes: 4 additions & 3 deletions builtin/config.c
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@ static int actions, type;
static char *default_value;
static int end_nul;
static int respect_includes_opt = -1;
static struct config_options config_options;
static struct config_options config_options = {
.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE)
};
static int show_origin;
static int show_scope;
static int fixed_value;
Expand Down Expand Up @@ -362,8 +364,7 @@ static int get_value(const char *key_, const char *regex_, unsigned flags)
goto free_strings;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Tan wrote (reply to this):

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Glen Choo <chooglen@google.com>
> 
> git_config_parse_key() returns #define-d error codes, but negated. This
> negation is merely a convenience to other parts of config.c that don't
> bother inspecting the return value before passing it along. But:
> 
> a) There's no good reason why those callers couldn't negate the value
>    themselves.
> 
> b) In other callers, this value eventually gets fed to exit(3), and
>    those callers need to sanitize the negative value (and they sometimes
>    do so lossily, by overriding the return value with
>    CONFIG_INVALID_KEY).
> 
> c) We want to move that into a separate library, and returning only
>    negative values no longer makes as much sense.

I'm not sure if we ever concluded that functions returning errors should
return positive integers, but in this case I think it makes sense. We
can document what's returned as being the same as what's documented in
the config manpage.

The negative return was as early as when the function was first
introduced in b09c53a3e3 (Sanity-check config variable names, 2011-01-
30), but there's no indication there as to why the author chose negative
values.

> Change git_config_parse_key() to return positive values instead, and
> adjust callers accordingly. Callers that sanitize the negative sign for
> exit(3) now pass the return value opaquely, fixing a bug where "git
> config <key with no section or name>" results in a different exit code
> depending on whether we are setting or getting config.

Can you be more precise as to which bug is being fixed? (I think
somewhere, a 1 is returned when it should be a 2.)

> Callers that
> wanted to pass along a negative value now negate the return value
> themselves.

OK.

> diff --git a/builtin/config.c b/builtin/config.c
> index 1c75cbc43df..8a2840f0a8c 100644
> --- a/builtin/config.c
> +++ b/builtin/config.c
> @@ -362,8 +362,7 @@ static int get_value(const char *key_, const char *regex_, unsigned flags)
>  			goto free_strings;
>  		}
>  	} else {
> -		if (git_config_parse_key(key_, &key, NULL)) {
> -			ret = CONFIG_INVALID_KEY;
> +		if ((ret = git_config_parse_key(key_, &key, NULL))) {
>  			goto free_strings;
>  		}
>  	}

Ah, here, the return value was sanitized in such a way that it lost
information. The change makes sense.

Besides the callers modified in this patch, there is another caller
config_parse_pair() in config.c, but that just checks whether the return
value is 0, so it remaining unmodified is fine.

> diff --git a/config.h b/config.h
> index 6332d749047..40966cb6828 100644
> --- a/config.h
> +++ b/config.h
> @@ -23,7 +23,7 @@
>  
>  struct object_id;
>  
> -/* git_config_parse_key() returns these negated: */
> +/* git_config_parse_key() returns these: */
>  #define CONFIG_INVALID_KEY 1
>  #define CONFIG_NO_SECTION_OR_NAME 2

Should these be turned into an enum? Also, it might be worth adding that
these match the return values as documented in the manpage.

> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> index 387d336c91f..3202b0f8843 100755
> --- a/t/t1300-config.sh
> +++ b/t/t1300-config.sh
> @@ -2590,4 +2590,20 @@ test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such
>  	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
>  '
>  
> +# Exit codes
> +test_expect_success '--get with bad key' '

Rather than put an "exit codes" title, maybe embed that in the test
description.

> +	# Also exits with 1 if the value is not found

I don't understand this comment - what's the difference between a bad
key and a value not being found? And if there's a difference, could we
test both?

> +	test_expect_code 1 git config --get "bad.name\n" 2>err &&
> +	grep "error: invalid key" err &&
> +	test_expect_code 2 git config --get "bad." 2>err &&
> +	grep "error: key does not contain variable name" err
> +'
> +
> +test_expect_success 'set with bad key' '
> +	test_expect_code 1 git config "bad.name\n" var 2>err &&
> +	grep "error: invalid key" err &&
> +	test_expect_code 2 git config "bad." var 2>err &&
> +	grep "error: key does not contain variable name" err
> +'

Makes sense.

From a libification perspective, I'm not sure that using positive values
to indicate error is an advantage over negative values, but it makes
sense in this particular context to have the return values match the
manpage exactly, since that is part of the benefit of this function. So
I think this patch is worth getting in by itself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Glen Choo <chooglen@google.com>
>
> git_config_parse_key() returns #define-d error codes, but negated. This
> negation is merely a convenience to other parts of config.c that don't
> bother inspecting the return value before passing it along. But:
>
> a) There's no good reason why those callers couldn't negate the value
>    themselves.

That is not a good reason to break from a widely adopted convention
in UNIXy library functions to signal success with 0 and failure with
negative values.  The callers if they want to have a positive values
can flip the polarity themselves, by the way.

>
> b) In other callers, this value eventually gets fed to exit(3), and
>    those callers need to sanitize the negative value (and they sometimes
>    do so lossily, by overriding the return value with
>    CONFIG_INVALID_KEY).

There is no reason to think that each and every minute difference
the direct callers of a library function may want to notice by
different error return values needs to be propagated to the calling
process via its exit value.  It is perfectly fine and expected for
the status values of the entire process is more coarse grained than
individual library calls, the latter may convey not just "I failed"
but "I failed why" to their callers, while the former may not want
to say "I made a call to some library functions and got this error
code", let alone "I called library function X and got error code Y".

In other words, if your program does

	err = library_call_about_some_filesystem_operation();
	if (err)
		exit(err); /* or exit(-err) */
	err = library_call_about_some_database_operation();
	if (err)
		exit(err); /* or exit(-err) */
	err = library_call_about_some_parsing();
	if (err)
		exit(err); /* or exit(-err) */

there is something wrong.  The error codes from these different
library functions share the same "integer" namespace without being
segregated, and expecting the calling process to be able to tell
what error we discovered by relaying the literal translation of low
level library error code would not work.  The exit() codes would
need to be wider (i.e. not limited only to the possible errors from
a single library function) and would be coarser (i.e. a filesystem
operation may say "open failed due to permission error" or "open
failed because there was no such file"; at the end-user level, it
may be more appropriate to say "configuration file could not be
read", regardless of the reason why the filesystem operation
failed).

	err = library_call_about_some_filesystem_operation();
	if (err) {
		error("cannot open the filesystem entity");
		exit(ERR_FILESYSTEM);
	err = library_call_about_some_database_operation();
	if (err)
		exit(ERR_DATABASE);
	err = library_call_about_some_parsing();
	if (err)
		exit(ERR_PARSING);

So, this is not a good reason, either.

> c) We want to move that into a separate library, and returning only
>    negative values no longer makes as much sense.

Quite the contrary, if it were a purely internal convention, we may
not care too much, as we have only ourselves to confuse by picking
an unusual convention, but if we are making more parts of our code
available in a library-ish interface, I would expect they follow
some convention, and that convention would be "0 for success,
negative for failure".

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

>> From: Glen Choo <chooglen@google.com>
>>
>> git_config_parse_key() returns #define-d error codes, but negated. This
>> negation is merely a convenience to other parts of config.c that don't
>> bother inspecting the return value before passing it along. But:
>>
>> a) There's no good reason why those callers couldn't negate the value
>>    themselves.
>
> That is not a good reason to break from a widely adopted convention
> in UNIXy library functions to signal success with 0 and failure with
> negative values.  The callers if they want to have a positive values
> can flip the polarity themselves, by the way.

Oh, interesting. I was trying to follow the conventions of the
surrounding config.c code and many other parts of the codebase, which
returns positive values. Why do we choose to return postive values
throughout the codebase, by the way? Is it because they were really
intended for exit(3), and not to be used as a library.

>>
>> b) In other callers, this value eventually gets fed to exit(3), and
>>    those callers need to sanitize the negative value (and they sometimes
>>    do so lossily, by overriding the return value with
>>    CONFIG_INVALID_KEY).
>
> There is no reason to think that each and every minute difference
> the direct callers of a library function may want to notice by
> different error return values needs to be propagated to the calling
> process via its exit value.

Yes, I fully agree. I didn't intend to be a statement on how things
should be, but rather how they already are. The oddities in this case
are:

- No callers actually care about the sign of git_config_parse_key()
  since it can only return values of one sign. Only
  configset_find_element() benefits from the sign being negative (since
  it returns negative on config key errors), but instead of putting the
  burden on the function it depends on, it could just return the
  negative value itself. And this "burden" is real, in that other
  callers have to worry about the negative value.

- For better or worse, we've already wired git_config_parse_key()
  directly to the exit values, e.g. if one peeks into
  builtin/config.c:cmd_config(), we'll see "return
  git_config_set_multivar_in_file_gently()", which in turn may return
  the value from git_config_parse_key(). (And as a result, we also try
  hard to separate the error values returnable by git_config_parse_key()
  vs the others returnable by git_config_set_multivar_in_file_gently().)

  I would strongly prefer if builtin/config.c took more responsibility
  for noticing config.c errors and sanitizing them accordingly, but it
  seemed like too much churn for this series. If you think it is the
  right time for it, though (which I think it might be), I could try to
  make that change.

>> c) We want to move that into a separate library, and returning only
>>    negative values no longer makes as much sense.
>
> Quite the contrary, if it were a purely internal convention, we may
> not care too much, as we have only ourselves to confuse by picking
> an unusual convention, but if we are making more parts of our code
> available in a library-ish interface, I would expect they follow
> some convention, and that convention would be "0 for success,
> negative for failure".

Right. I do not care what the convention is, only that we pick one. I
chose the one that I saw in the surrounding code (positive), but I'm
happy to go with UNIXy (negative) if others think it is worth the churn.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Glen Choo <chooglen@google.com> writes:

> Oh, interesting. I was trying to follow the conventions of the
> surrounding config.c code and many other parts of the codebase, which
> returns positive values. Why do we choose to return postive values
> throughout the codebase, by the way? Is it because they were really
> intended for exit(3), and not to be used as a library.

If config.c does that, I'd say that was poorly designed oddball.
Looking at read-cache.c (which is older parts of the codebase
written back when the developer base was smaller) may give you a
better examples to follow.  After all, error() returns negative
exactly because we want to follow the usual "negative is an error"
convention.

}
} else {
if (git_config_parse_key(key_, &key, NULL)) {
ret = CONFIG_INVALID_KEY;
if ((ret = git_config_parse_key(key_, &key, NULL))) {
goto free_strings;
}
}
Expand Down
4 changes: 1 addition & 3 deletions bundle-uri.c
Original file line number Diff line number Diff line change
Expand Up @@ -237,9 +237,7 @@ int bundle_uri_parse_config_format(const char *uri,
struct bundle_list *list)
{
int result;
struct config_options opts = {
.error_action = CONFIG_ERROR_ERROR,
};
struct config_parse_options opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);

if (!list->baseURI) {
struct strbuf baseURI = STRBUF_INIT;
Expand Down
Loading
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载