I am looking for right documentation or steps to deal with a request we have.
We want to disable tokenization on special characters.
I tried searching this forum and documentation but had pointers to proceed. If anyone has done this or knows how to proceed, please guide
hi @venur i have not dealt with such scenarios, I will have to check. @angelborroy may be able to provide some guidance. As mentioned by angel, please share what exactly you want to achieve so we can try the scenario.
Found couple of links on the internet but not sure if they fit your requirement:
Thank you for responding @angelborroy
We are importing images and video files from a third party Dam to alfresco repo. Several images and files have special characters in their names and they are on purpose for some business use cases.
some examples special characters as below-
Solr is tokenizing the names by default whenever name has these special characters and treating it as white spaces. I read in some doc that says this is a default behavior. But in our case we get a lot of search result if user tries to search for one file name with identical prefix/postfix.
For testing i tried this to show you the results i am getting
you see above i get all the files that I don't need in results.
i also try with "" but result remains same.
Please can you guide how to change this default behavior
I guess you can't change that behaviour, since they are special SOLR characters.
You may try escaping that characters in your search string:
Apart from that, I don't see any other alternative.
Hi @angelborroy tx for the response.
We also thought this option, but we can't escape characters now right? after indexes are already created by Solr by bypassing special characters and considering all as whitespaces. Based of what i read so far, there won't be a index for the word at all that includes those special characters e.g. :
Do you mean still solr would have one index for the whole name with special characters I mentioned? Or am i understanding something wrongly
Thanks @angelborroy for response. Yeah we know its not possible by default and that is what we are looking extend.
we are aware of default behavior, and looking for steps to change this behavior either from solr or alfresco.
Your inputs or directions will be helpful