Exact Term Queries in Search Services 2.0

cancel
Showing results for 
Search instead for 
Did you mean: 

Exact Term Queries in Search Services 2.0

tpage
Alfresco Employee
2 5 6,153

One of the smaller changes in Search Services 2.0 is that we've changed the way the = operator behaves.  Previously this operator resulted in an exact field match, but it has always been documented as an exact term match.  For example this is from the docs page for ACS 5.2:

To search for an exact term, prefix the term with "=".

In Search Services 2.0 we have fixed the behaviour to match the documentation.  This will not have an impact when querying for fields unless they are tokenised.

For example here we have some example queries against the (tokenised) cm:name field with a corpus of two documents:

Examples of using exact term queries against a corpus of two documents.Examples of using exact term queries against a corpus of two documents.When we query for "=Driver" then we now return the document called "Taxi Driver". Previously in Search Services 1.4.0 we would not get any results.  When we query for "=Taxi =Driver" then we get both documents returned since they have the term "Taxi" in the name (and the default operator is OR). Previously we would only get results where the name was exactly "Taxi" or exactly "Driver".

Having discussed this at length within the team we realise that it can be quite hard to envisage the impact of this change in all situations, and we felt this was particularly complex when combined with phrase queries.  Here's a table showing various queries and document names with highlighting where the behaviour has changed in Search Services 2.0.

Changes in behaviour for various phrase and exact term queries.Changes in behaviour for various phrase and exact term queries.

Full details of this fix can be found in SEARCH-2228.

As mentioned this is one of the smaller changes that has gone into Search Services 2.0.0.  There is a more detailed list of features published here and for a more in-depth tour then you can register for Tech Talk Live #123: Discovering the "2" in Search Services 2.0.

5 Comments
afaust
Master

Hmm... I wonder if this change does not cause more inconsistencies between TMQ and SOLR search. Can you elaborate on how this change would affect queries such as

=cm:name:Driver
=cm:name:"Taxi Driver"
=cm:name:Taxi =cm:name:Driver

when explicitly executed against SOLR? Because the behaviour of TMQ for those queries is exactly what 1.4 (and previous versions) yielded, and if 2.0 now works differently, people can see sudden changes in search results when they use the default TRANSACTIONAL_IF_POSSIBLE, and some (other) part of the query changes to transparently switch between DB and SOLR execution.

tpage
Alfresco Employee

This is a great point @afaust - thanks for raising it.

Using the corpus from the table above, when we call the v1 REST API then we see:

{
  "query": {
    "query": "=cm:name:Driver"
  }
}

gets sent to the DB and returns 0 results, however if we send:

{
  "query": {
    "query": "=cm:name:Driver AND cm:name:*"
  }
}

then we get 3 results (the same as sending the first query to Solr directly).

I've raised a JIRA ticket here to investigate this further: https://issues.alfresco.com/jira/browse/SEARCH-2461

Tim2
Partner

I am running into a problem with this change. When performing a search like

PATH:"//*" AND =cm:title:"test"

It returns matches like you describe in the post above. When a document with "some test title" exists as title, it is returned as match. In my opinion this is faulty behaviour, since I am asking for a result with the exact title test. That is what the = stands for in my opinion and is the behaviour when you would us db-afts (without the PATH part). If I want it to not exactly match a title, I would have used this query

PATH:"//*" AND cm:title:"test"

This change created differences in search behaviour between solr and the database and it now made it impossible to do exact searchs over SOLR. I think @afaust also brought up this issue.

Is there some way I can turn this search behaviour off?


By the way, those linkes to stories are not usable. The might need to be updated?

 

Environment:

- Docker

- alfresco search services 2.0.5
- alfresco and share 7.3.1

- postgresql 14.8

amnas
Partner

Hi.

That's the case for me too. I am facing this problem with ASS 2.0.6. I also tryed ASS 2.0.8.2 and the prolem remains the same.

Is Hyland planning to publish a hotfix ou a procedure de remedy to this issue. Because it's a very big issue for many of my customers.

Regards.

 

 

Tim2
Partner

I found a fix that seems to correct this behaviour on behave of OPEN.satisfaction. All my test queries come back with the result I expect.

The fix is based on tag 2.0.5. In this file https://github.com/Alfresco/SearchServices/blob/becd3cf621841f388b6240b26cde40542a1f0790/search-serv... I modified the IDENTIFIER case from this

case IDENTIFIER:
setLowercaseExpandedTerms(false); if (isExactTermSearch(analysisMode)) {//with exact search we favour tokenization, specifically cross locale tokenization addLocaleSpecificMLOrTextAttribute(pDef, queryText, subQueryBuilder, analysisMode, luceneFunction, booleanQuery, locale, expandedFieldName, tokenisationMode, IndexTokenisationMode.TRUE); } else { addLocaleSpecificMLOrTextAttribute(pDef, queryText, subQueryBuilder, analysisMode, luceneFunction, booleanQuery, locale, expandedFieldName, tokenisationMode, IndexTokenisationMode.FALSE); } break;

to this

 case IDENTIFIER:
    setLowercaseExpandedTerms(false);
    addLocaleSpecificMLOrTextAttribute(pDef, queryText, subQueryBuilder, analysisMode, luceneFunction,
            booleanQuery, locale, expandedFieldName, tokenisationMode, IndexTokenisationMode.FALSE);
    break;

This seems to bring back exact searches with = but still leave the behaviour of queries like cm:name:"test" untouched.

I created some automated tests to try queries over solr with the following types of parameters. Most had PATH queries in them to make sure it used solr

  • =property with an exact value
  • =property with a not exact value, meaning no results
  • Dateranges
  • Exact dates
  • Searches with wildcards
  • Searches with TYPE
  • Searches with ID
  • Searches with PARENT
  • Searches with AND and OR
  • Searches with ASPECT
  • Searches with EXISTS
  • Searches with ISNOTNULL
  • Searches with ISUNSET

It seems to result in the expected replies from solr.

After making the changes I was able to build the library and replace alfresco-search-services/solr/server/solr-webapp/webapp/WEB-INF/lib/alfresco-search-2.0.5.jar with the new jar