这是indexloc提供的服务,不要输入任何密码
Skip to content

Mango+Nouveau queries and Indexes don’t behave as expected #5483

@espy

Description

@espy

Hello,

I’ve been trying out Mango and Nouveau together, orienting myself along what’s in the docs and the tests, and have run into several problems. I’ve provided a full set of cURL commands to reproduce the issues, namely:

  • Queries against string index fields only occasionally work
  • $text queries work the same as regular queries against string index fields: they’re keyword searches, not Lucene full-text searches
  • "type": "text" index fields never seem to work at all

This didn’t become obvious in testing because the tests seem to:

  • Use the one of the two string index fields that actually works by sheer coincidence
  • Use a $text query that also worked by coincidence, not because the underlying mechanism works properly

I’ve basically done something similar to the Mango-related tests in this file and linked the relevant lines from the test file below.

Using CouchDB 3.4.2 with nouveau-1.0-SNAPSHOT.jar. For all of these, _explain showed that the correct index was used, unless I’ve stated otherwise.

Setup

Add this line to all requests in case you need auth:

--user COUCHDB_USERNAME:COUCHDB_PASSWORD \

First, make a fresh DB:

curl  -X PUT \
  'http://127.0.0.1:5984/mouveau' \
  --header 'Accept: */*' \

Insert some docs:

curl  -X POST \
  'http://127.0.0.1:5984/mouveau/_bulk_docs' \
  --header 'Accept: */*' \
  --header 'Content-Type: application/json' \
  --data-raw '
{
    "docs": [
        {
            "_id": "FishStew",
            "servings": 4,
            "subtitle": "Delicious with freshly baked bread",
            "title": "Fish Stew",
            "type": "fish"
        },
        {
            "_id": "LambStew",
            "servings": 6,
            "subtitle": "Serve with a whole meal scone topping",
            "title": "Lamb Stew",
            "type": "meat"
        },
        {
            "_id": "Dumplings",
            "servings": 8,
            "subtitle": "Hand-made dumplings make a great accompaniment",
            "title": "Dumplings",
            "type": "meat"
        }
    ]
}'

Now make a Nouveau index. This is basically the same index as used in the tests but adapted to our documents' field names:

curl  -X POST \
  'http://127.0.0.1:5984/mouveau/_index' \
  --header 'Accept: */*' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "type": "nouveau",
    "index": {
        "fields": [
            {"name": "title", "type": "string"},
            {"name": "servings", "type": "number"},
            {"name": "type", "type": "string"}
        ],
        "default_analyzer": "keyword"
    }
}'

Let’s do some queries. First, querying for strings, analogous to the Mango search by string test:

curl  -X POST \
  'http://127.0.0.1:5984/mouveau/_find' \
  --header 'Accept: */*' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "selector": {
      "type": "meat"
    }
}'

That works.

(Only showing selector lines for brevity from now on, the cURL command is always the same)

"selector": {
  "title": "Lamb Stew"
}

This doesn’t work at all:

{
  "error": "nouveau_search_error",
  "reason": "bad_request: field \"title_3astring\" was indexed without position data; cannot run PhraseQuery (phrase=title_3astring:\"lamb stew\")",
  "ref": 2009526893
}

Hm, maybe if we just search for a single word?

"selector": {
  "title": "Dumplings"
}

Different nope, but also nope:

{
  "docs": [],
  "bookmark": "W10="
}

What about numbers? This is analogous to the Mango search by number test:

"selector": {
  "servings": {"$gte": 6}
}

That works.

The last test is a $text query, same as the mango search by text test. Now, interestingly, the doc example for this query type selector looks different than the one in the test:

// test: just a keyword
{"$text": "hello"}

// docs: this is Lucene full-text search syntax
{
  "_id": { "$gt": null },
  "$text": "director:George"
}

Also, remember: the test runs this $text query against an index that does not have any "type":"text" fields! And surprisigly, this works!

"selector": {
  "$text": "lamb"
}

But this seems to be only a keyword search, and not a fully-fledged text search, which becomes apparent because these all fail:

"selector": {
  "$text": "lamb s"
}
"selector": {
  "$text": "title:lamb"
}
"selector": {
  "$text": "title:lam*"
}
"selector": {
  "$text": "lam*"
}

And all permutations thereof.

Summary:

  • number queries, as in the tests, work. Yay.
  • string queries, as in the tests, sometimes work. Couldn’t find a pattern yet.
  • $text queries against the index as used in the tests do not actually perform Lucene queries, but the same keyword-style searches as the string queries, but against all (string?) index fields.

Workaround attempts:

  • Change the order of the string queries to see what happens: no change, querying for title still fails, type still works.
  • Try different values for default_analyzer and analyzer.default, both standard and english do not change the results for $text queries.

What about the text index type?

Now, the test mango index and the example index in the docs are the same, and interestingly, the docs refer to this as a Text index. However, this section also describes a text index field type, just like in Nouveau, to go alongside string and number and boolean index field types. This isn’t documented or tested anywhere, but I’d assume this to work like so (delete the old ddoc first):

curl  -X POST \
  'http://127.0.0.1:5984/mouveau/_index' \
  --header 'Accept: */*' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "type": "nouveau",
    "index": {
        "fields": [
            {"name": "title", "type": "text"}
        ],
        "default_analyzer": "english"
    }
}'

But no query against this returns anything, no permutation of $text works, and using something like "title": "Dumplings" falls back to not using any index.

Expected results:

  • Queries against string index fields always work.
  • $text queries use Lucene syntax? Unsure what the intention is here, the docs do one thing, the tests another.
  • text type indexes are queryable with a selector like "$text": "fieldname:querystring".

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions