Optimization of Alfresco DB indexes size

yantom · ‎2 Jun 2021

Hello,
During filling of DB for the purpose of performance tests I have realized that DB size is quite bigger then I would expect, then I have found that its the indexes which takes cca. 3x more spaces than the data itself.

By default, for every property, alfresco indexes all its type columns. Lets say I have a property of type string - its value is stored in string_value column and these: "boolean_value, double_value, float_value, long_value" are all set to NULL, but they take space in index anyway. I have optimized the indexes using the query bellow, indexing only those columns related to type of the property.

Is there any point of indexing those columns unrelated to the property type? Is my optimization OK or is there any risk? Thank you!

drop index idx_alf_nprop_d;
drop index idx_alf_nprop_s;
drop index idx_alf_nprop_f;
drop index idx_alf_nprop_b;
drop index idx_alf_nprop_l;

CREATE INDEX idx_alf_nprop_b_custom ON alf_node_properties (boolean_value) WHERE persisted_type_n = 1;
CREATE INDEX idx_alf_nprop_d_custom ON alf_node_properties (double_value) WHERE persisted_type_n = 5;
CREATE INDEX idx_alf_nprop_f_custom ON alf_node_properties (float_value) WHERE persisted_type_n = 4;
CREATE INDEX idx_alf_nprop_l_custom ON alf_node_properties (long_value) WHERE persisted_type_n = 3;
CREATE INDEX idx_alf_nprop_s_custom ON alf_node_properties (string_value) WHERE persisted_type_n = 6;

afaust · ‎2 Jun 2021

Your optimisation is likely to break performance for transactional metadata queries (TMQ) feature, and your optimised index variants will likely never be used, as very few queries select based on the persisted type. Since Alfresco technically allows property values to be NULL (which I hate and actively work against in custom code), having NULL values in the index may also be required for some specific queries, e.g. CMIS QL queries where explicit checks against NULL are possible via TMQ.

I can not recommend dropping those indexes at all. What I can recommend though is to create additional indices which filter on particular qname_id values to have separate sub-indices for specific, often used query properties, in order to reduce the size of the indices Alfresco regularly uses to execute queries (and thus improve the performance of these queries on average), while still having the full index available as a fallback when a query uses a freak property.

yantom · ‎9 Jun 2021

Thank you very much for a quick reply! I understand that checks against null values are needed, I just don't understand how can Alfresco determine the column on which the check for null (or any other comparison) should be done without using persisted_type column. So if I query `cm:creator:alan` it should somehow know that it should look for the value only in the string_value column, right?

afaust · ‎9 Jun 2021

By using Alfresco's internal data model which defines what value types specific properties have, the query can already be generated to query the appropriate column without first having to check the persisted type column. In effect, Alfresco uses the same logic here that was used to fill the value of the persisted type column in the first place. The main purpose of the persisted type column is to make the loading of properties resilient against potential type changes in the data model and support properties that have not been defined in the data model or for which the data model has actually been removed / dropped ("residual properties" / legacy data).