config-parse: create config parsing library #1551

gitgitgadget-git · 2023-07-21T00:34:03Z

On the Git mailing list, Jonathan Tan wrote (reply to this):

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: > Begin this process by splitting the config parsing code out of > config.[c|h] and into config-parse.[c|h]. I think we need to be more careful in how we split. It would be easier if there is a concrete use case, but some preliminary questions: - "respect_includes" is in the library, but callers might want to opt out of it or provide an alternative way to resolve includes. - There is a lot of error reporting capability with respect to the source of config, and not all sources are applicable to library users. How should we proceed? E.g. should we expect that all library users conform to the list here (e.g. even if the source is something like but not exactly STDIN, they should pick it), or allow users to customize sources? In the absence of more information, the split I would envision is either something that can only parse a buffer, its error messages being very generic (the caller should turn them into something more specific before showing them to the user) (but one problem here is that we must postprocess includes, which might be a problem if the output of parsing is a flat hashtable, since we wouldn't know which keys are overridden by the includes and which are not); or something that can take in a callback that is invoked whenever something is included and maybe also a callback for access to the object database and that has full knowledge of all sources for error reporting (or allows the caller to customize the sources).

On the Git mailing list, Glen Choo wrote (reply to this):

Jonathan Tan <jonathantanmy@google.com> writes: >> Begin this process by splitting the config parsing code out of >> config.[c|h] and into config-parse.[c|h]. > > I think we need to be more careful in how we split. It would be easier > if there is a concrete use case, but some preliminary questions: > > - "respect_includes" is in the library, but callers might want to opt > out of it or provide an alternative way to resolve includes. Makes sense. Or alternatively, we could choose not to support "respect_includes" initially, and exclude it to avoid confusion. > - There is a lot of error reporting capability with respect to the > source of config, and not all sources are applicable to library > users. How should we proceed? E.g. should we expect that all library > users conform to the list here (e.g. even if the source is something > like but not exactly STDIN, they should pick it), or allow users to > customize sources? Good point. I would also prefer to have the list of sources constrained to the list of sources available via the library. Some possibilities I can see are: 1. Move the Git-program-specific error message reporting to a level above the library (i.e. config.c). 2. Proceed as-is (with the additional sources in the library) and leave a FIXME to address this when we find a Git library-idiomatic way to handle errors. This won't be the last time we'll have to untangle Git-program-specific error reporting from the library - it might be useful to try to figure out all of that in one fell swoop. 3. Figure out library-idiomatic error handling mentioned in 2. right now. I think 1. is the best option, but if that fails, 2. is also reasonable. 3. is too difficult to do with a sample size of 1. > In the absence of more information, the split I would envision is > either something that can only parse a buffer, its error messages being > very generic (the caller should turn them into something more specific > before showing them to the user) (but one problem here is that we must > postprocess includes, which might be a problem if the output of parsing > is a flat hashtable, since we wouldn't know which keys are overridden > by the includes and which are not); Hm, how does the include mechanism here this differ from what's in this patch? This also only parses a single file and ignores includes. I'm not sure why this requires us to postprocess includes - in config.c, includes are handled by 'pausing' parsing of the current source, evaluating the included files, then 'resuming' parsing. > or something that can take in a > callback that is invoked whenever something is included and maybe also > a callback for access to the object database and that has full knowledge > of all sources for error reporting (or allows the caller to customize > the sources). Ah, I like this callback idea quite a lot. This lets config-parse.c easily support unconditional includes and provides entry points for program-specific behavior (like checking the odb). I will try this.

gitgitgadget-git · 2023-07-20T23:48:26Z

On the Git mailing list, Jonathan Tan wrote (reply to this):

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Glen Choo <chooglen@google.com> > > git_config_parse_key() returns #define-d error codes, but negated. This > negation is merely a convenience to other parts of config.c that don't > bother inspecting the return value before passing it along. But: > > a) There's no good reason why those callers couldn't negate the value > themselves. > > b) In other callers, this value eventually gets fed to exit(3), and > those callers need to sanitize the negative value (and they sometimes > do so lossily, by overriding the return value with > CONFIG_INVALID_KEY). > > c) We want to move that into a separate library, and returning only > negative values no longer makes as much sense. I'm not sure if we ever concluded that functions returning errors should return positive integers, but in this case I think it makes sense. We can document what's returned as being the same as what's documented in the config manpage. The negative return was as early as when the function was first introduced in b09c53a3e3 (Sanity-check config variable names, 2011-01- 30), but there's no indication there as to why the author chose negative values. > Change git_config_parse_key() to return positive values instead, and > adjust callers accordingly. Callers that sanitize the negative sign for > exit(3) now pass the return value opaquely, fixing a bug where "git > config <key with no section or name>" results in a different exit code > depending on whether we are setting or getting config. Can you be more precise as to which bug is being fixed? (I think somewhere, a 1 is returned when it should be a 2.) > Callers that > wanted to pass along a negative value now negate the return value > themselves. OK. > diff --git a/builtin/config.c b/builtin/config.c > index 1c75cbc43df..8a2840f0a8c 100644 > --- a/builtin/config.c > +++ b/builtin/config.c > @@ -362,8 +362,7 @@ static int get_value(const char *key_, const char *regex_, unsigned flags) > goto free_strings; > } > } else { > - if (git_config_parse_key(key_, &key, NULL)) { > - ret = CONFIG_INVALID_KEY; > + if ((ret = git_config_parse_key(key_, &key, NULL))) { > goto free_strings; > } > } Ah, here, the return value was sanitized in such a way that it lost information. The change makes sense. Besides the callers modified in this patch, there is another caller config_parse_pair() in config.c, but that just checks whether the return value is 0, so it remaining unmodified is fine. > diff --git a/config.h b/config.h > index 6332d749047..40966cb6828 100644 > --- a/config.h > +++ b/config.h > @@ -23,7 +23,7 @@ > > struct object_id; > > -/* git_config_parse_key() returns these negated: */ > +/* git_config_parse_key() returns these: */ > #define CONFIG_INVALID_KEY 1 > #define CONFIG_NO_SECTION_OR_NAME 2 Should these be turned into an enum? Also, it might be worth adding that these match the return values as documented in the manpage. > diff --git a/t/t1300-config.sh b/t/t1300-config.sh > index 387d336c91f..3202b0f8843 100755 > --- a/t/t1300-config.sh > +++ b/t/t1300-config.sh > @@ -2590,4 +2590,20 @@ test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such > grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err > ' > > +# Exit codes > +test_expect_success '--get with bad key' ' Rather than put an "exit codes" title, maybe embed that in the test description. > + # Also exits with 1 if the value is not found I don't understand this comment - what's the difference between a bad key and a value not being found? And if there's a difference, could we test both? > + test_expect_code 1 git config --get "bad.name\n" 2>err && > + grep "error: invalid key" err && > + test_expect_code 2 git config --get "bad." 2>err && > + grep "error: key does not contain variable name" err > +' > + > +test_expect_success 'set with bad key' ' > + test_expect_code 1 git config "bad.name\n" var 2>err && > + grep "error: invalid key" err && > + test_expect_code 2 git config "bad." var 2>err && > + grep "error: key does not contain variable name" err > +' Makes sense. From a libification perspective, I'm not sure that using positive values to indicate error is an advantage over negative values, but it makes sense in this particular context to have the return values match the manpage exactly, since that is part of the benefit of this function. So I think this patch is worth getting in by itself.

gitgitgadget-git · 2023-07-21T04:36:35Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Glen Choo <chooglen@google.com> > > git_config_parse_key() returns #define-d error codes, but negated. This > negation is merely a convenience to other parts of config.c that don't > bother inspecting the return value before passing it along. But: > > a) There's no good reason why those callers couldn't negate the value > themselves. That is not a good reason to break from a widely adopted convention in UNIXy library functions to signal success with 0 and failure with negative values. The callers if they want to have a positive values can flip the polarity themselves, by the way. > > b) In other callers, this value eventually gets fed to exit(3), and > those callers need to sanitize the negative value (and they sometimes > do so lossily, by overriding the return value with > CONFIG_INVALID_KEY). There is no reason to think that each and every minute difference the direct callers of a library function may want to notice by different error return values needs to be propagated to the calling process via its exit value. It is perfectly fine and expected for the status values of the entire process is more coarse grained than individual library calls, the latter may convey not just "I failed" but "I failed why" to their callers, while the former may not want to say "I made a call to some library functions and got this error code", let alone "I called library function X and got error code Y". In other words, if your program does err = library_call_about_some_filesystem_operation(); if (err) exit(err); /* or exit(-err) */ err = library_call_about_some_database_operation(); if (err) exit(err); /* or exit(-err) */ err = library_call_about_some_parsing(); if (err) exit(err); /* or exit(-err) */ there is something wrong. The error codes from these different library functions share the same "integer" namespace without being segregated, and expecting the calling process to be able to tell what error we discovered by relaying the literal translation of low level library error code would not work. The exit() codes would need to be wider (i.e. not limited only to the possible errors from a single library function) and would be coarser (i.e. a filesystem operation may say "open failed due to permission error" or "open failed because there was no such file"; at the end-user level, it may be more appropriate to say "configuration file could not be read", regardless of the reason why the filesystem operation failed). err = library_call_about_some_filesystem_operation(); if (err) { error("cannot open the filesystem entity"); exit(ERR_FILESYSTEM); err = library_call_about_some_database_operation(); if (err) exit(ERR_DATABASE); err = library_call_about_some_parsing(); if (err) exit(ERR_PARSING); So, this is not a good reason, either. > c) We want to move that into a separate library, and returning only > negative values no longer makes as much sense. Quite the contrary, if it were a purely internal convention, we may not care too much, as we have only ourselves to confuse by picking an unusual convention, but if we are making more parts of our code available in a library-ish interface, I would expect they follow some convention, and that convention would be "0 for success, negative for failure". Thanks.

On the Git mailing list, Glen Choo wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes: >> From: Glen Choo <chooglen@google.com> >> >> git_config_parse_key() returns #define-d error codes, but negated. This >> negation is merely a convenience to other parts of config.c that don't >> bother inspecting the return value before passing it along. But: >> >> a) There's no good reason why those callers couldn't negate the value >> themselves. > > That is not a good reason to break from a widely adopted convention > in UNIXy library functions to signal success with 0 and failure with > negative values. The callers if they want to have a positive values > can flip the polarity themselves, by the way. Oh, interesting. I was trying to follow the conventions of the surrounding config.c code and many other parts of the codebase, which returns positive values. Why do we choose to return postive values throughout the codebase, by the way? Is it because they were really intended for exit(3), and not to be used as a library. >> >> b) In other callers, this value eventually gets fed to exit(3), and >> those callers need to sanitize the negative value (and they sometimes >> do so lossily, by overriding the return value with >> CONFIG_INVALID_KEY). > > There is no reason to think that each and every minute difference > the direct callers of a library function may want to notice by > different error return values needs to be propagated to the calling > process via its exit value. Yes, I fully agree. I didn't intend to be a statement on how things should be, but rather how they already are. The oddities in this case are: - No callers actually care about the sign of git_config_parse_key() since it can only return values of one sign. Only configset_find_element() benefits from the sign being negative (since it returns negative on config key errors), but instead of putting the burden on the function it depends on, it could just return the negative value itself. And this "burden" is real, in that other callers have to worry about the negative value. - For better or worse, we've already wired git_config_parse_key() directly to the exit values, e.g. if one peeks into builtin/config.c:cmd_config(), we'll see "return git_config_set_multivar_in_file_gently()", which in turn may return the value from git_config_parse_key(). (And as a result, we also try hard to separate the error values returnable by git_config_parse_key() vs the others returnable by git_config_set_multivar_in_file_gently().) I would strongly prefer if builtin/config.c took more responsibility for noticing config.c errors and sanitizing them accordingly, but it seemed like too much churn for this series. If you think it is the right time for it, though (which I think it might be), I could try to make that change. >> c) We want to move that into a separate library, and returning only >> negative values no longer makes as much sense. > > Quite the contrary, if it were a purely internal convention, we may > not care too much, as we have only ourselves to confuse by picking > an unusual convention, but if we are making more parts of our code > available in a library-ish interface, I would expect they follow > some convention, and that convention would be "0 for success, > negative for failure". Right. I do not care what the convention is, only that we pick one. I chose the one that I saw in the surrounding code (positive), but I'm happy to go with UNIXy (negative) if others think it is worth the churn.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Glen Choo <chooglen@google.com> writes: > Oh, interesting. I was trying to follow the conventions of the > surrounding config.c code and many other parts of the codebase, which > returns positive values. Why do we choose to return postive values > throughout the codebase, by the way? Is it because they were really > intended for exit(3), and not to be used as a library. If config.c does that, I'd say that was poorly designed oddball. Looking at read-cache.c (which is older parts of the codebase written back when the developer base was smaller) may give you a better examples to follow. After all, error() returns negative exactly because we want to follow the usual "negative is an error" convention.

-Original file line number
+Diff line change
@@ Expand Up / @@ -992,6 +992,7 @@ LIB_OBJS += compat/obstack.o @@
     LIB_OBJS += compat/terminal.o
     LIB_OBJS += compat/zlib-uncompress2.o
     LIB_OBJS += config.o
+    LIB_OBJS += config-parse.o
     LIB_OBJS += connect.o
     LIB_OBJS += connected.o
     LIB_OBJS += convert.o
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -42,7 +42,9 @@ static int actions, type; @@
     static char *default_value;
     static int end_nul;
     static int respect_includes_opt = -1;
-    static struct config_options config_options;
+    static struct config_options config_options = {
+    	.parse_options = CP_OPTS_INIT(CONFIG_ERROR_DIE)
+    };
     static int show_origin;
     static int show_scope;
     static int fixed_value;
@@ Expand Down Expand Up @@
     			goto free_strings;
     		}
     	} else {
-    		if (git_config_parse_key(key_, &key, NULL)) {
-    			ret = CONFIG_INVALID_KEY;
+    		if ((ret = git_config_parse_key(key_, &key, NULL))) {
     			goto free_strings;
     		}
     	}
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -237,9 +237,7 @@ int bundle_uri_parse_config_format(const char *uri, @@
     				   struct bundle_list *list)
     {
     	int result;
-    	struct config_options opts = {
-    		.error_action = CONFIG_ERROR_ERROR,
-    	};
+    	struct config_parse_options opts = CP_OPTS_INIT(CONFIG_ERROR_ERROR);
     	if (!list->baseURI) {
     		struct strbuf baseURI = STRBUF_INIT;
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

config-parse: create config parsing library #1551

Uh oh!

Diff view

Diff view

There are no files selected for viewing

gitgitgadget-git bot Jul 21, 2023

Uh oh!

gitgitgadget-git bot Jul 21, 2023

Uh oh!

gitgitgadget-git bot Jul 20, 2023

Uh oh!

gitgitgadget-git bot Jul 21, 2023

Uh oh!

gitgitgadget-git bot Jul 21, 2023

Uh oh!

gitgitgadget-git bot Jul 21, 2023

Uh oh!

Uh oh!

config-parse: create config parsing library #1551

Are you sure you want to change the base?

Uh oh!

config-parse: create config parsing library #1551

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

gitgitgadget-git bot Jul 21, 2023

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Jul 21, 2023

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Jul 20, 2023

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Jul 21, 2023

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Jul 21, 2023

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Jul 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!