Recently, my data on Alfresco has grown (I'm using. alfresco 5 with solr 6 and MySQL 5.6 - using CMIS with java).I've plan to improve system in 2 phase:
In the first phase, i've try partition alf_node by range as below:
CREATE TABLE `alf_node` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`store_id` bigint(20) NOT NULL,
`uuid` varchar(36) NOT NULL,
`transaction_id` bigint(20) NOT NULL,
`type_qname_id` bigint(20) NOT NULL,
`locale_id` bigint(20) NOT NULL,
`acl_id` bigint(20) DEFAULT NULL,
`audit_creator` varchar(255) DEFAULT NULL,
`audit_created` varchar(30) DEFAULT NULL,
`audit_modifier` varchar(255) DEFAULT NULL,
`audit_modified` varchar(30) DEFAULT NULL,
`audit_accessed` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `store_id` (`store_id`,`uuid`),
KEY `idx_alf_node_txn_type` (`transaction_id`,`type_qname_id`),
KEY `fk_alf_node_acl` (`acl_id`),
KEY `fk_alf_node_store` (`store_id`),
KEY `idx_alf_node_tqn` (`type_qname_id`,`store_id`,`id`),
KEY `fk_alf_node_loc` (`locale_id`),
KEY `idx_alf_node_mdq` (`store_id`,`type_qname_id`),
KEY `idx_alf_node_cor` (`audit_creator`,`store_id`,`type_qname_id`),
KEY `idx_alf_node_crd` (`audit_created`,`store_id`,`type_qname_id`),
KEY `idx_alf_node_mor` (`audit_modifier`,`store_id`,`type_qname_id`),
KEY `idx_alf_node_mod` (`audit_modified`,`store_id`,`type_qname_id`)
PARTITION BY RANGE ( YEAR(audit_created) ) (
PARTITION empty VALUES LESS THAN ( 0 ),
PARTITION p2018 VALUES LESS THAN ( YEAR('2019-01-01 00:00:00') ),
PARTITION p2019 VALUES LESS THAN ( YEAR('2020-01-01 00:00:00') ),
PARTITION other VALUES LESS THAN maxvalue
But when i try to execute statement, the system show error: 1486. Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed
I've try to check data on audit_created I've found that some row this colum (audit_created) is null, others contain Time zone in data (ex: 2018-10-01T11:55:56.312Z).
So I have to question:
1- why some row have audit_created is null
2- Could you give some advise about how to partition alf_node on audit_created?
Some tpes of nodes in Alfresco are simply not marked with the cm:auditable aspect (by default) and as such they do not get any timestamp data in those colums. Person and group nodes are some examples of such nodes.
Unfortunately you did not tell us to what dimensions your data has grown,or what type of performance issues you are facing, that made you think partitioning is the best approach to take for improvements. Partitioning of Alfresco database should be done with a thorough understanding of the consequences (i.e. likely limitations/issues in future upgrades) and how the data is both distributed and used by Alfresco (generally speaking and in the context of your specific use case).
Apart from splitting the data file for the alf_node table, I can find no benefit in partitioning by audit_created. On alf_node in particular - apart from the id column itself (i.e. using a modulo) - store_id and/or type_qname_id are the best candidates for partitioning (i.e. one partition for all dummy "deleted" nodes, one for documents, one for folders etc., or one partition for workspace and one for archive store). Generally speaking though, the alf_node_properties table is the single most impactful table to be looked at when partitioning is considered. Due to how CMIS / FTS database queries use this table, it can be highly effective to partition this table based on the qname_id and/or value patterns (i.e. have all NodeRef/uuid-like property values in one partition to avoid those from "poisoning the indizes/statistics for that column).
Dear Axel Faust,
Thanks for helpful anwser. After your help, I plan partition
But I've 2 question:
Once again, thanks for your helpful anwser.
Again, I just would like to nudge you into telling us what you mean by "big tables". Unless you have tens to hundreds of millions of documents, those tables should not be "big", and thus not be relevant for partitioning.
Anyway... Never drop foreign keys. If MySQL does not support partitions on foreign keys, then that's quite shitty. I have only ever partitioned Alfresco tables with PostgreSQL and can't remember any such limited support. I was not aware of that limitation in MySQL / MariaDB, and would advise not to partition in this case. Various operations in Alfresco rely on foreign keys and cascading deletions.
With regards to alf_child_assoc could be partitioned by parent_node_id + type_qname_id, alf_node_aspects on node_id (some modulo / hash), alf_node_assoc by source_node_id + type_qname_id, alf_content_data + _url by simple ID (modulo / hash), and alf_transaction on commit_time (potentially adding new partitions every so often.
Dear Axel Faust,
after about 6 month using, my data as:
Because i plan using alfresco in long term, so I think partition table is necessary. But limit about foreign key in MySQL make me really neverious .
Once again, thanks for your support.
Hmm... those numbers seem odd to me in multiple ways:
Generally speaking though, the numbers are still relatively low. With Enterprise customers I have seen systems 10-times that size (with correctly working cleanup / data reduction processes) that still run perfectly fine in a non-partitioned setup.
As I have mentioned (implicitly) before, partitioning is an invasive process that can leave you with a lot of technical issues in future upgrades, as Alfresco does not include any ugprade scripts capable of handling (paritally/custom) partitioned databases. Most (95+%) of customers / users never bother with it, and for all databases there are typically other optimisation methods that should be applied before partitioning becomes relevant. For MySQL / MariaDB this typically involves switching to the Barracuda file format and using table/index compression.
Ok - that clears up the question about the alf_activity_XX tables. They are of a size that seems reasonable, even quite low. alf_node + alf_node_aspects / alf_node_assoc also are in a clearly more healthy state than initially assumed (still, 1.4 million alf_node_assoc for 7 million nodes is a bit much from my experience, but your business model may just be designed that way).
I am surprised that there are only 5.6 million alf_child_assoc for 7 million nodes - that means that you likely have 1.4 (or even more) deleted nodes in the alf_node table that have not yet been cleaned up (node cleanup job runs each day at 9pm by default and cleans up nodes deleted more than 30 days ago).