这是indexloc提供的服务,不要输入任何密码
Skip to content

feat(spanner/spansql): Add support for tokenlist and create search index #11522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rasviitanen
Copy link
Contributor

Adds support for parsing tokenlist and create search index in spanner/spansql.

Fixes #11466

@rasviitanen rasviitanen requested review from a team as code owners January 29, 2025 09:52
Copy link

google-cla bot commented Jan 29, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Jan 29, 2025
@rahul2393 rahul2393 requested a review from a team as a code owner January 29, 2025 17:05
@rasviitanen rasviitanen requested a review from jhsdickey January 30, 2025 12:28
@rasviitanen
Copy link
Contributor Author

Thanks for the review, I have now fixed your comments.
I expanded the tests a bit and also fixed some issues with ORDER BY and OPTIONS.

@neophenix
Copy link

Awesome thanks! I did not do an exhaustive review, these were just things I ran into when I pulled your branch and was running some local tests, but the changes all seem to work for me, so fingers crossed :)

@rasviitanen rasviitanen changed the title feat(spanner/spansql): Add support for parsing tokenlist and create search index feat(spanner/spansql): Add support for tokenlist and create search index Feb 12, 2025
@karunacybozu
Copy link

karunacybozu commented Mar 3, 2025

@rasviitanen it's not working if we combine it with other function.
For example I need NORMALIZE_AND_CASEFOLD() function to normalize Katakana for searching: https://cloud.google.com/spanner/docs/reference/standard-sql/string_functions#normalize_and_casefold

Example would be:

CREATE TABLE `Posts` (
    `TenantId` STRING(8) NOT NULL,
    `UserId` STRING(26) NOT NULL,
    `Id` STRING(26) NOT NULL,
    `Title` STRING(256) NOT NULL,
    `Body` STRING(1024) NOT NULL,
    `Optional` STRING(256),
    `Title_Tokens` TOKENLIST AS (TOKENIZE_FULLTEXT(Title)) HIDDEN,
    `Body_Tokens` TOKENLIST AS (TOKENIZE_FULLTEXT(Body)) HIDDEN,
    `Combined_Tokens` TOKENLIST AS (TOKENLIST_CONCAT([Title_Tokens, Body_Tokens])) HIDDEN,
    `Title_SubStrTokens` TOKENLIST AS (TOKENIZE_SUBSTRING(Title)) HIDDEN,
    `Body_SubStrTokens` TOKENLIST AS (TOKENIZE_SUBSTRING(Body)) HIDDEN,
    `Combined_SubStrTokens` TOKENLIST AS (TOKENLIST_CONCAT([Title_SubStrTokens, Body_SubStrTokens])) HIDDEN,
    `Title_NFKC_SubStrTokens` TOKENLIST AS (TOKENIZE_SUBSTRING(NORMALIZE_AND_CASEFOLD(Title, NFKC))) HIDDEN,
    `Body_NFKC_SubStrTokens` TOKENLIST AS (TOKENIZE_SUBSTRING(NORMALIZE_AND_CASEFOLD(Body, NFKC))) HIDDEN,
    `Combined_NFKC_SubStrTokens` TOKENLIST AS (TOKENLIST_CONCAT([Title_NFKC_SubStrTokens, Body_NFKC_SubStrTokens])) HIDDEN,
) PRIMARY KEY (`TenantId`, `UserId`, `Id`)

It's working just fine in spanner studio.

But in Go code:

panic: 01-Posts.sql:14: got "(", want ")" or ","

@rasviitanen
Copy link
Contributor Author

rasviitanen commented Mar 5, 2025

@karunacybozu Thanks for your comments!
The issue here is not the parsing of the TOKENIZE_* function arugments in this PR. The parsing I added here actually supports nested functions and is correct (you can try with UPPER for example). The problem is rather that NORMALIZE_AND_CASEFOLD isn't recognized as a known string function. I would prefer to not add support for it in this PR, as it isn't required by TOKENLIST/TOKENIZE_/SEARCH INDEX.

I would suggest to submit a PR of your own to add additional string functions such as NORMALIZE and NORMALIZE_AND_CASEFOLD to spanner/spansql/keywords.go. That should be all that's necessary.

One could argue that the error message from the parser is a bit cryptic though, maybe we could improve it in a follow-up.
It looks like the lib has the same problem for all the other argument parsers, but I don't want to add any more to this already quite big PR. It's probably better to solve error handling for everything at once and in that case we are better off keeping the scope of this PR limited.
Hope that works!🙏

@rasviitanen
Copy link
Contributor Author

@rahul2393 It would be greatly appreciated if this could be reviewed, not sure who to ping 🙏

@rahul2393 rahul2393 added kokoro:force-run Add this label to force Kokoro to re-run the tests. automerge Merge the pull request once unit tests and other checks pass. labels Mar 6, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 6, 2025
@rasviitanen
Copy link
Contributor Author

rasviitanen commented Mar 6, 2025

@rahul2393 looks like my branch was outdated, had to merge main again. Could you readd kokoro:force-run? Sorry for the inconvenience.

@harshachinta harshachinta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 6, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 6, 2025
@gcf-merge-on-green gcf-merge-on-green bot merged commit cd894f8 into googleapis:main Mar 6, 2025
7 of 8 checks passed
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 6, 2025
codyoss added a commit to codyoss/google-cloud-go that referenced this pull request Mar 6, 2025
rahul2393 pushed a commit that referenced this pull request Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spanner: spansql.ParseDLL error by parsing text search features in cloudspanner
7 participants