Questions about search and database index

cancel
Showing results for 
Search instead for 
Did you mean: 
jeffreyman
Active Member II

Questions about search and database index

Jump to solution

We are doing migration from old system to ACS (community 6.2). We have custom content model which has 20 data types and all are indexed (index enabled="true" in content model). We found the database size is very big. Here is summary.

- 5.1 million files (TIFF and PDF format) in content store (target to import >30 million files)

- 15 million nodes in ALF_NODE table

- 24 GB on all tables in DB

- 139 GB on all indexes in DB

Here are the questions.

1. As far as I know, the default search mechanism is to look up database first and Solr later? Is it able to use Solr only?

2. We are using CMIS QL to search documents and only doing metadata search. No need to do full text search. Can Solr do metadata search using CMIS QL?

3. Is it able to stop building indexes in database when doing bulk importing?

4. We know how to re-build indexes in Solr. How to re-build indexes in database?

5. In the content model, can we change some data types from index to "non-index" (i.e. change it to non searchable) once the content model is activated and documents are already imported into ACS?

6. Why the number of nodes in database is different from the content store? 

7. How come the size of indexes in database is so big?

8. Is there any formula to calculate the database sizing for more than 30 million documents?

Thanks.

 

 

1 Solution

Accepted Solutions
sufo
Established Member II

Re: Questions about search and database index

Jump to solution

1. It's system wide. You can set logging to DEBUG on this class org.alfresco.repo.search.impl.solr.DbOrIndexSwitchingQueryLanguage to see where the search is executed.

5. Yes models are updated when changed or added, but you can do only "additive" changes. Do not try to change something that is already defined in the model. There are many ways to introduce your custom model. So I don't know what you have tried.

8. Yes there are zip files containing metadata to be indexed by Solr. Don't know if there is also text content stored for indexing. If you want to delete the folder, you should follow full-reindex steps and do it that way.

View solution in original post

6 Replies
angelborroy
Alfresco Employee

Re: Questions about search and database index

Jump to solution

Just to reply some questions, you can take a look at:

https://hub.alfresco.com/t5/alfresco-content-services-blog/transactional-metadata-query-tmdq/bc-p/28...

You'll find also valious information in the comments of this blog post.

Software Engineer in Alfresco Search Team.
sufo
Established Member II

Re: Questions about search and database index

Jump to solution

1. You have to use EVENTUAL consistency for the search:

In alfresco-global.properties:
solr.query.cmis.queryConsistency=EVENTUAL
solr.query.fts.queryConsistency=EVENTUAL

In code:
final SearchParameters sp = new SearchParameters();
sp.setQueryConsistency(QueryConsistency.EVENTUAL);

 

2. Yes it works. We are using CMIS WorkBench tor the testing of CMIS queries: https://chemistry.apache.org/java/download.html 

3. We have dropped the indexes from the database to save space and are not using them (you have to use eventual consistency in all the searches). 

-- alf_node:
drop INDEX idx_alf_node_mdq on alf_node;
drop INDEX idx_alf_node_cor on alf_node;
drop INDEX idx_alf_node_crd on alf_node;
drop INDEX idx_alf_node_mor on alf_node;
drop INDEX idx_alf_node_mod on alf_node;
 
-- alf_node_properties:
drop INDEX idx_alf_nprop_b on alf_node_properties;
drop INDEX idx_alf_nprop_d on alf_node_properties;
drop INDEX idx_alf_nprop_f on alf_node_properties;
drop INDEX idx_alf_nprop_l on alf_node_properties;
drop INDEX idx_alf_nprop_s on alf_node_properties;

-- alf_content_url:
drop INDEX idx_alf_conturl_sz on alf_content_url;

Should not be an issue to recreate the indexes after the import if you want to use them. Be aware that in the DB all the properties are indexed (doesn't matter what you set in your custom model).

 

4. Depends on the database you are using.

5. Didn't try it. Theoretically, you could change it to https://github.com/deas/alfresco-bulk-import/tree/alfresco-6-jar 

<index enabled="false">

and do full rebuild of Solr indexes (be sure to delete also contents alfrescoModels).

6. Database has also nodes that represent folders and other objects and contentStore directory stores only content.

7. Be aware that in the DB all the properties are indexed (doesn't matter what you set in your custom model).

8. Depends heavily on your custom model. How many properties are defined, what types of properties. All the values are indexed.

For your planned number of documents to migrate it can be done in two steps. Do test migration and see what changes you want to do for the production migration. You can estimate space needed for the DB and Solr indexes. Using https://github.com/pmonks/alfresco-bulk-import/wiki or fork https://github.com/deas/alfresco-bulk-import/tree/alfresco-6-jar you can easily migrate ~3 millions of documents/day (depends heavily on the old system and your HW).

You should also try to switch to search services 2.0 (https://www.alfresco.com/events/webinars/tech-talk-live-123-discovering-search-services-2 and https://hub.alfresco.com/t5/alfresco-content-services-blog/search-services-2-0-0-release/ba-p/301308 ) because of this great change: solr.content.dir (removed from Search Services 2.0)

 

jeffreyman
Active Member II

Re: Questions about search and database index

Jump to solution

Hi sufo,

Thank you for your help. We still have questions to be clarified.

1. the parameter "solr.query.cmis.queryConsistency=EVENTUAL" takes effects on "Share" app only? or system wide?

As we are developing custom web app using CMIS java library (apache chemistry) to connect to ACS, there is no way to get SearchService object. How to set it using CMIS java library?

5. As I know, the "alfrescoModels" is downloaded from ACS, it is a temporary file that will be updated if content model is updated. Any way to manually set the content model in ACS as I tried to set it using admin tool, but failed.

8. The "solr.content.dir" is a snapshot of db indexes for Solr search? If the folder is deleted, it will automatically download it from database?

 

sufo
Established Member II

Re: Questions about search and database index

Jump to solution

1. It's system wide. You can set logging to DEBUG on this class org.alfresco.repo.search.impl.solr.DbOrIndexSwitchingQueryLanguage to see where the search is executed.

5. Yes models are updated when changed or added, but you can do only "additive" changes. Do not try to change something that is already defined in the model. There are many ways to introduce your custom model. So I don't know what you have tried.

8. Yes there are zip files containing metadata to be indexed by Solr. Don't know if there is also text content stored for indexing. If you want to delete the folder, you should follow full-reindex steps and do it that way.

View solution in original post

jeffreyman
Active Member II

Re: Questions about search and database index

Jump to solution

thanks a lot

EddieMay
Community Manager
Community Manager

Re: Questions about search and database index

Jump to solution

Hi @jeffreyman 

Great you got a solution & thanks for accepting the solution - helpful to other users.

Cheers,

Digital Community Manager, Alfresco Software.
Problem solved? Click Accept as Solution!