CMIS AND (conjunction) incorrectly finds no results in SOLR

cancel
Showing results for 
Search instead for 
Did you mean: 
jecbm
Active Member

CMIS AND (conjunction) incorrectly finds no results in SOLR

I'm doing a cmis query in SOLR in Search Services 1.2 for the conjunction of two predicates.

Property A is a text property and must match a regex with a fixed prefix followed by a wildcard.

Property B is a datetime property and must have a null value.

When I search for each of these in isolation, I can verify that there's nodes that match both.

IE.

SELECT * FROM xx:myNodeType WHERE xx.A LIKE 'ABC%'

and

SELECT * FROM xx:myNodeType WHERE xx.B IS NULL

intersects in found results.

Lets assume that the intersection contains a single node where property A is 'ABCD'.
If I replace the wildcard with an exact match instead of just the prefix and check that property B is null, I am also able to find the node.

I.e.

SELECT * FROM xx:myNodeType WHERE xx.A LIKE 'ABCD' and xx.B IS NULL

works.

However, when I use the same conjunction with the wildcard query, no results are found.

I.e.

SELECT * FROM xx:myNodeType WHERE xx.A LIKE 'ABC%' and xx.B IS NULL

There's no errors or anything in the log. The search simply returns an empty list.

Our model has evolved over time and we have around 10000 nodes where some were created before property B was added to the model. My best guess is that the problem is somehow related to this. I have deleted and recreated the SOLR indexes with no effect.

Any suggestions?

1 Reply
jecbm
Active Member

Re: CMIS AND (conjunction) incorrectly finds no results in SOLR

After more investigation it turns out that the LIKE is behaving weirdly. 

When I run

SELECT * FROM xx:myNodeType 

 returns 3875 results (note that I have adjusted the search query limits to allow for this).

When I manually count the number of nodes in the result where property A starts with the prefix 'ABC' I find that 3672 of those do.

When I then run

SELECT * FROM xx:myNodeType WHERE xx.A LIKE 'ABC%'

I only find 998 nodes (suspiciously close to the 1000 limit, no?). Where's the rest of the 3800 nodes that should have been included?

 

When I search for the negation

 

SELECT * FROM xx:myNodeType WHERE xx.A NOT LIKE 'ABC%'

I find nodes where property A begins with something else than ABC, which is to be expected. But I also find nodes that DO start with ABC, which is obviously not expected. I find 2674 nodes with this property. 

If I sum the number of nodes that match the LIKE for 'ABC%' and the negation of the like, I get the exact number of nodes that I expected to find with this property after manual counting of all the nodes.

2674 + 998 =  3672

So why does the prefix sometimes match the regex and sometimes don't? Could this be some encoding issue?