这是indexloc提供的服务,不要输入任何密码
Skip to content

add uuid scalar type #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hallettj
Copy link
Collaborator

UUIDs are stored in BSON using the binData type which we already support. But the binData type is not especially user friendly to work with in its generalized form, and UUIDs are a common use case. It is helpful to surface UUIDs as a first-class scalar type that serializes to a string.

The binData type has a numeric subType field to give a hint as to how the stored binary data should be interpreted. There are two subtypes for UUIDs, 3 & 4, but subtype 4 is indicated as the "current" representation for UUIDs. The UUID() constructor in mongosh produces binData values with subtype 4.

This change:

  • adds a scalar type called UUID
  • sets the NDC type representation for UUID to String
  • converts UUID values to BSON as binData values with subtype 4 - inputs are parsed from string representations using functions provided by the BSON library that is bundled with the MongoDB Rust driver
  • serializes UUID values to JSON as strings using provided BSON library functions - for example, 4ca4b7e7-6f6a-445b-b142-9d6d252d92bc
  • updates introspection to infer the type UUID when encountering fields with binData values with subtype 4
  • introspection infers the type BinData instead for fields where subtype 4 occurs alongside binData values with other subtypes

This allows for more user-friendly queries involving UUIDs. For example filtering by UUID changed from this:

id: { _eq: { base64: "TKS3529qRFuxQp1tJS2SvA==" subType: "04" } }

and now looks like this:

id: { _eq: "4ca4b7e7-6f6a-445b-b142-9d6d252d92bc" }

Comment on lines -205 to -207
pub fn is_supertype(a: &BsonScalarType, b: &BsonScalarType) -> bool {
matches!((a, b), (Double, Int))
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to a method on BsonScalarType

S::Date => convert_date(&from_string(expected_type, value)?)?,
S::Timestamp => deserialize::<json_formats::Timestamp>(expected_type, value)?.into(),
S::BinData => deserialize::<json_formats::BinData>(expected_type, value)?.into(),
S::UUID => convert_uuid(&from_string(expected_type, value)?)?,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry about the churn in this function. This is the only added line. The logic in other lines is unchanged.

JsonSchema,
)]
#[serde(try_from = "BsonType", rename_all = "camelCase")]
#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Sequence, JsonSchema)]
pub enum BsonScalarType {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We serialize BSON scalar types using the names in bson_name(). But so for this has been implicit - the serde serialization and the bson_name function didn't reference each other. The camelCase renaming just happened to match the strings in bson_name.

I changed this now because serde camel-cases UUID to uUID which I think is wrong. I replaced the derived Serialize and Deserialize implementations with hand-written implementations that call bson_name and from_bson_name.

I also removed the try_from = "BsonType" part. That has a hack for parsing collection validators. In the schema::Property::Scalar variant we declare the bson_type field as BsonScalarType, but actually it can be an array type. So we need to parse that field as a BsonType, and attempt to convert it to a scalar type. Instead of doing that for all BsonScalarType deserializations I moved that logic into the schema module.

Comment on lines +299 to +317
/// If there is a BSON scalar type that encompasses both a and b, return it. This does not
/// require a and to overlap. The returned type may be equal to a or b if one is a supertype of
/// the other.
pub fn common_supertype(a: BsonScalarType, b: BsonScalarType) -> Option<BsonScalarType> {
fn helper(a: BsonScalarType, b: BsonScalarType) -> Option<BsonScalarType> {
if a == b {
Some(a)
} else if a.is_binary() && b.is_binary() {
Some(S::BinData)
} else {
match (a, b) {
(S::Double, S::Int) => Some(S::Double),
_ => None,
}
}
}
helper(a, b).or_else(|| helper(b, a))
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common_supertype is more general than is_supertype: it can provide a common ancestor type given two types that don't overlap, or don't fully overlap with each other. This is a bit of future-proofing in case we add more binary subtype scalar types. For example if a field is found to have UUID and also Vector values we want to unify those to BinData.

}
}

/// True iff we consider a to be a supertype of b.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit typo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a typo - "iff" is short for "if and only if"

@hallettj hallettj merged commit 8dfdc47 into v1 Mar 1, 2025
1 check passed
@hallettj hallettj deleted the jessehallett/eng-1604-mongodb-comparisons-for-uuids-serialized-as-strings branch March 1, 2025 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants