这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@abooij
Copy link
Contributor

@abooij abooij commented Jun 16, 2020

Description

#5092 reported a loss of precision when reading floats from the database. This was caused by odd default behavior by old versions of PostgreSQL.

The origin is that in PostgreSQL 11 and older, by default, floating point values are returned with a fixed decimal precision - in the case of float4, this is 6 digits. Using the option extra_float_digits (important: compare with the v12 docs, which seems to claim some changed behavior alongside a change in the default value of extra_float_digits!), we can output the float with full precision. However, by default it is set to 0, which in this case means loss of precision.

This PR sets extra_float_digits to 3 globally to avoid this problem. But this introduces some platform dependent behavior when printing floats, so we make sure to modify our CI test suite to be resilient against this, by picking smartly chosen values that test the precision of floats, while at the same time being independent of the PostgreSQL version.

Additionally, we ask pg_dump to output floats with adequate precision using extra_float_digits.

Changelog

  • CHANGELOG.md is updated with user-facing content relevant to this PR. If no changelog is required, then add the no-changelog-required label.

Affected components

  • Server
  • Tests

Related Issues

We've had a variety of number precision issues before, such as #4429 and #4435.

Solution and Design

Steps to test and verify

Limitations, known bugs & workarounds

Server checklist

Catalog upgrade

Does this PR change Hasura Catalog version?

  • No

Metadata

Does this PR add a new Metadata feature?

  • No

GraphQL

  • No new GraphQL schema is generated

Breaking changes

  • No Breaking changes

@abooij abooij added the c/server Related to server label Jun 16, 2020
@hasura-bot
Copy link
Contributor

Review app for commit 175fec3 deployed to Heroku: https://hge-ci-pull-5104.herokuapp.com
Docker image for server: hasura/graphql-engine:pull5104-175fec3c

@hasura-bot
Copy link
Contributor

Review app for commit b81fa8e deployed to Heroku: https://hge-ci-pull-5104.herokuapp.com
Docker image for server: hasura/graphql-engine:pull5104-b81fa8ef

@abooij abooij marked this pull request as ready for review June 16, 2020 14:20
@abooij abooij requested a review from a team as a code owner June 16, 2020 14:20
@abooij abooij requested a review from rakeshkky June 16, 2020 14:21
Copy link
Contributor

@tirumaraiselvan tirumaraiselvan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changelog approved

@abooij
Copy link
Contributor Author

abooij commented Jun 16, 2020

To the reviewer: note that this switches pg-client-hs to one hosted by me until the PR for pg-client-hs gets merged. So that should probably happen first.

type: git
location: https://github.com/hasura/pg-client-hs.git
tag: 70a849d09bea9461e72c5a5bbde06df65aab61c0
location: https://github.com/abooij/pg-client-hs.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us merge your changes to pg-client-hs master and update the reference here to hasura. Please link to the PR in pg-client-hs repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Rakesh! I've linked to this PR above, and I have now also requested you as a reviewer for that PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @abooij , I merged the PR in pg-client-hs repo. Please refer the new commit.
Thank you.

type: git
location: https://github.com/hasura/pg-client-hs.git
tag: 70a849d09bea9461e72c5a5bbde06df65aab61c0
location: https://github.com/abooij/pg-client-hs.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @abooij , I merged the PR in pg-client-hs repo. Please refer the new commit.
Thank you.

@tirumaraiselvan
Copy link
Contributor

After reading the description and seeing that this is postgres version dependent, I tried this on PG 9.5:


create table test(f float4);
insert into test values (12311.1), (12311.12);
select * from test;
    f    
---------
 12311.1
 12311.1


set extra_float_digits=3;
select * from test;

     f      
------------
 12311.0996
 12311.1201
(2 rows)

I wonder if extra_float_digits should be a server flag rather than a fixed default as it could be breaking change for existing users.

@abooij
Copy link
Contributor Author

abooij commented Jun 19, 2020

@tirumaraiselvan (also re: your response on the other part of this PR)

I agree that this changes behavior. This is desirable: currently we output floats with inadequate precision, and this PR fixes that. So, I would rather refer to it as a fixing change than a breaking change.

Arguably, anyone currently relying on equality of floats is Doing It Wrong (tm) for the usual concerns about floats, but I can imagine there are use cases of that out there. For me, returning floats with adequate accuracy (i.e. not losing any data) weighs more heavily than avoiding breaking changes. And since Postgres 11 and older do not seem to have a mechanism to return exactly the right number of decimal digits (which is variable even within a single float datatype), set extra_float_digits = 3 seems unavoidable to me. (*)

Rather than making this individual setting into a flag, more generally we might have a kind of postgres_initialization field that allows running several such settings, after we run our defaults. The disadvantage is that we'd also have to plumb this through pg-client-hs, requiring another PR, and it introduces additional configuration complexity.

An alternative to this scheme is to simply recommend everyone who has so far depended on getting a specific number of digits, to simply round to that number of digits client-side from now on. We can add javascript instructions to the changelog, for instance. Would that be adequate?

(*) I'd like to add that it is beyond me how any widely used piece of software (and I'm referring to Postgres here) can have bugs, because that's what this is, in its default handling of floats. Floats are dangerous data types that are difficult to get right, and this is discussed whenever you encounter any literature on them. There is no excuse for loss of data by databases.

@tirumaraiselvan
Copy link
Contributor

tirumaraiselvan commented Jun 19, 2020

It would have been okay if we gave extra precision results. But as you can see from the results, it actually gives imprecise answers (PG fault, no doubt). That's why I am a bit concerned about making this change. Suppose someone uses our API to show something in the UI and it shows these extra digits (many of them supposedly wrong), then they have no way to change that behaviour.

If we have a setting which controls this, then we can put it on PG's behaviour. And user gets what PG gives.

@abooij
Copy link
Contributor Author

abooij commented Jun 19, 2020

It would have been okay if we gave extra precision results. But as you can see from the results, it actually gives imprecise answers (PG fault, no doubt). That's why I am a bit concerned about making this change. Suppose someone uses our API to show something in the UI and it shows these extra digits (many of them supposedly wrong), then they have no way to change that behaviour.

Sorry, I must have misunderstood your concern in my previous message. 12311.1 cannot be represented exactly in IEEE 754 floats, see e.g. this float4 calculator. The results you posted are correct. Saving 12311.1 as a float and then retrieving the saved value should result in what you obtained. Now, of course, if we then go and round the values you obtained to the original 1 digit (respectively 2 digits) after the decimal period, we get 12311.1 (respectively 12311.12), as desired.

So, yes, saving 12311.1 as a float4 in a database and then retrieving it will no longer result in 12311.1. This is avoided by using a "smart number of digits", which is available in Postgres 12. But the idea that you should "get back what you put in" is really a myth, since floats are not able to do that in general. (Only for specific values.)

@tirumaraiselvan
Copy link
Contributor

tirumaraiselvan commented Jun 19, 2020

I see what you mean. But wouldn't a non-opinionated solution (so we don't have to worry about what we are sending) be to just let this set by user?

E.g. there might be many PG systems in the wild with some runtime configuration. And if we hardcode a configuration, then there is really no way around it.

@abooij
Copy link
Contributor Author

abooij commented Jun 19, 2020

@tirumaraiselvan I agree that it would be beneficial for some users if we can support custom PG runtime configuration. Would you see that as a blocker for getting this merged?

@tirumaraiselvan
Copy link
Contributor

@abooij Hardcoding a PG runtime configuration is something we should never do considering the number of legacy and existing databases that are running with Hasura. Having sane defaults is something we cannot avoid. We can choose to change the default to extra_float_digits=3 for sure but we cannot have it non-configurable. So, yes I would say we should park this till we solve it more generally (and revert this: hasura/pg-client-hs#17) but I am happy to hear more voices on this matter.

@abooij
Copy link
Contributor Author

abooij commented Jun 22, 2020

@tirumaraiselvan I started thinking about how we may combine a safe default with user configurability. One issue here is that extra_float_digits can be set through the dabatase connection URL, as I documented here. This means that any parameters we set after connecting would overwrite user configuration, which I think is undesirable. But I'd also be quite hesitant to modify the user's connection URL to insert this parameter in the right place, so that the user can overwrite it.

So this is leading me to consider two possibilities:

  • If the user provides a database connection URL, we trust the user to do the right thing. If no extra_float_digits is set by the user, we don't set any value for this either. We add instructions to the manual saying that if a URL is used, the user should probably set extra_float_digits, especially if Postgres 11 or older is used.
  • If the user connects by setting host, port, user, pasword, etc, then we automatically set extra_float_digits to 3. If this is not what the user wants, then the user should connect using a database URL instead, so that this parameter can be set manually.

This makes things very configurable. But it also invites users to make mistakes when configuring database URLs. And we would still be setting extra_float_digits to 3 globally in some cases, which is not necessary for Postgres 12.

Any better configuration designs are very much welcome.

@tirumaraiselvan
Copy link
Contributor

Voila. I would say using the PG connection string to get this behaviour is an adequate solution to solve #5092. Also, we only support connecting to PG using connection strings (and not individual parameters).

If the user provides a database connection URL, we trust the user to do the right thing. If no extra_float_digits is set by the user, we don't set any value for this either. We add instructions to the manual saying that if a URL is used, the user should probably set extra_float_digits, especially if Postgres 11 or older is used.

Sounds about right to me!

@abooij
Copy link
Contributor Author

abooij commented Jun 22, 2020

Also, we only support connecting to PG using connection strings (and not individual parameters).

Is that right? The manual specifies how to connect using the individual host/user/password/db flags.

@tirumaraiselvan
Copy link
Contributor

Is that right? The manual specifies how to connect using the individual host/user/password/db flags.

Ah ok, my miss. I assumed this because ppl will use ENVs than override the CMD directive of our docker packages (because that will involve knowing the binary name, etc).

@abooij abooij closed this Jun 23, 2020
@hasura-bot
Copy link
Contributor

Review app https://hge-ci-pull-5104.herokuapp.com is deleted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c/server Related to server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants