Strange results in CONTAINS() CMIS search with wildcards

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Active Member

Strange results in CONTAINS() CMIS search with wildcards

Hi, everyone

I have detected a strange behaviour in CMIS queries using CONTAINS() and wildcards.

For example, I have a folder named "Someco" and I want to find it by part of its name using wildcards:

SELECT * from cmis:folder WHERE CONTAINS('cmis:name:"*Some*"')

This query find the the folder as is expected, but if I add a 'o' or an 'a' to the query as:

SELECT * from cmis:folder WHERE CONTAINS('cmis:name:"*Someo*"')

The modified query find my folder too, wich is incorrect as my folder does not contains "Someo" in its name.

¿Is there any way to correct this behaviour with the syntaxis of the query?

Thanks.

3 Replies
Highlighted
Senior Member

Re: Strange results in CONTAINS() CMIS search with wildcards

Hi

I have tried to reproduce this an failed. Are you sure it is finding the same folder!

Andy

Highlighted
Active Member

Re: Strange results in CONTAINS() CMIS search with wildcards

Hi, Andy

Yes, I'm sure. It's an issue related to FTS word stemming but I don't know if I can deactivate it in some way.

Thanks

Highlighted
Senior Member

Re: Strange results in CONTAINS() CMIS search with wildcards

Hi

It seems you are falling foul of localised stemming. The name of a document can be treated as an identifier.

SELECT * from cmis:folder WHERE CONTAINS('=cmis:name:"*Some*"')

Or you could just use LIKE

SELECT * from cmis:folder WHERE cmis:name LIKE '%Some%'

Name is indexed in three ways

  • Localised with stemming
  • Split on white space and then into token parts (using WordDelimiter factory)
  • As a single token (an identifier)

The first two options are used together. You can not split on white space and do a wildcard match on the tokens. You will always get recall from the locale bit (the first way).

It has been suggested before that we support better control here, and it is on the list.

From your example, I think you should be OK with LIKE or =

Andy