-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Hello,
I’ve been trying out Mango and Nouveau together, orienting myself along what’s in the docs and the tests, and have run into several problems. I’ve provided a full set of cURL commands to reproduce the issues, namely:
- Queries against
string
index fields only occasionally work $text
queries work the same as regular queries againststring
index fields: they’re keyword searches, not Lucene full-text searches"type": "text"
index fields never seem to work at all
This didn’t become obvious in testing because the tests seem to:
- Use the one of the two
string
index fields that actually works by sheer coincidence - Use a
$text
query that also worked by coincidence, not because the underlying mechanism works properly
I’ve basically done something similar to the Mango-related tests in this file and linked the relevant lines from the test file below.
Using CouchDB 3.4.2
with nouveau-1.0-SNAPSHOT.jar
. For all of these, _explain
showed that the correct index was used, unless I’ve stated otherwise.
Setup
Add this line to all requests in case you need auth:
--user COUCHDB_USERNAME:COUCHDB_PASSWORD \
First, make a fresh DB:
curl -X PUT \
'http://127.0.0.1:5984/mouveau' \
--header 'Accept: */*' \
Insert some docs:
curl -X POST \
'http://127.0.0.1:5984/mouveau/_bulk_docs' \
--header 'Accept: */*' \
--header 'Content-Type: application/json' \
--data-raw '
{
"docs": [
{
"_id": "FishStew",
"servings": 4,
"subtitle": "Delicious with freshly baked bread",
"title": "Fish Stew",
"type": "fish"
},
{
"_id": "LambStew",
"servings": 6,
"subtitle": "Serve with a whole meal scone topping",
"title": "Lamb Stew",
"type": "meat"
},
{
"_id": "Dumplings",
"servings": 8,
"subtitle": "Hand-made dumplings make a great accompaniment",
"title": "Dumplings",
"type": "meat"
}
]
}'
Now make a Nouveau index. This is basically the same index as used in the tests but adapted to our documents' field names:
curl -X POST \
'http://127.0.0.1:5984/mouveau/_index' \
--header 'Accept: */*' \
--header 'Content-Type: application/json' \
--data-raw '{
"type": "nouveau",
"index": {
"fields": [
{"name": "title", "type": "string"},
{"name": "servings", "type": "number"},
{"name": "type", "type": "string"}
],
"default_analyzer": "keyword"
}
}'
Let’s do some queries. First, querying for strings, analogous to the Mango search by string
test:
curl -X POST \
'http://127.0.0.1:5984/mouveau/_find' \
--header 'Accept: */*' \
--header 'Content-Type: application/json' \
--data-raw '{
"selector": {
"type": "meat"
}
}'
That works.
(Only showing selector
lines for brevity from now on, the cURL command is always the same)
"selector": {
"title": "Lamb Stew"
}
This doesn’t work at all:
{
"error": "nouveau_search_error",
"reason": "bad_request: field \"title_3astring\" was indexed without position data; cannot run PhraseQuery (phrase=title_3astring:\"lamb stew\")",
"ref": 2009526893
}
Hm, maybe if we just search for a single word?
"selector": {
"title": "Dumplings"
}
Different nope, but also nope:
{
"docs": [],
"bookmark": "W10="
}
What about numbers? This is analogous to the Mango search by number
test:
"selector": {
"servings": {"$gte": 6}
}
That works.
The last test is a $text
query, same as the mango search by text
test. Now, interestingly, the doc example for this query type selector looks different than the one in the test:
// test: just a keyword
{"$text": "hello"}
// docs: this is Lucene full-text search syntax
{
"_id": { "$gt": null },
"$text": "director:George"
}
Also, remember: the test runs this $text
query against an index that does not have any "type":"text"
fields! And surprisigly, this works!
"selector": {
"$text": "lamb"
}
But this seems to be only a keyword search, and not a fully-fledged text
search, which becomes apparent because these all fail:
"selector": {
"$text": "lamb s"
}
"selector": {
"$text": "title:lamb"
}
"selector": {
"$text": "title:lam*"
}
"selector": {
"$text": "lam*"
}
And all permutations thereof.
Summary:
number
queries, as in the tests, work. Yay.string
queries, as in the tests, sometimes work. Couldn’t find a pattern yet.$text
queries against the index as used in the tests do not actually perform Lucene queries, but the same keyword-style searches as thestring
queries, but against all (string
?) index fields.
Workaround attempts:
- Change the order of the
string
queries to see what happens: no change, querying fortitle
still fails,type
still works. - Try different values for
default_analyzer
andanalyzer.default
, bothstandard
andenglish
do not change the results for$text
queries.
What about the text
index type?
Now, the test mango index and the example index in the docs are the same, and interestingly, the docs refer to this as a Text index
. However, this section also describes a text
index field type, just like in Nouveau, to go alongside string
and number
and boolean
index field types. This isn’t documented or tested anywhere, but I’d assume this to work like so (delete the old ddoc first):
curl -X POST \
'http://127.0.0.1:5984/mouveau/_index' \
--header 'Accept: */*' \
--header 'Content-Type: application/json' \
--data-raw '{
"type": "nouveau",
"index": {
"fields": [
{"name": "title", "type": "text"}
],
"default_analyzer": "english"
}
}'
But no query against this returns anything, no permutation of $text
works, and using something like "title": "Dumplings"
falls back to not using any index.
Expected results:
- Queries against
string
index fields always work. $text
queries use Lucene syntax? Unsure what the intention is here, the docs do one thing, the tests another.text
type indexes are queryable with a selector like"$text": "fieldname:querystring"
.