Need to help about partition MySQL

cancel
Showing results for 
Search instead for 
Did you mean: 
phong_van
Active Member II

Need to help about partition MySQL

Hi everyone,

Recently, my data on Alfresco has grown (I'm using. alfresco 5 with solr 6 and MySQL 5.6 - using CMIS with java).I've plan to improve system in 2 phase:

  • The first: partition for alf_node and alf_node_properties
  • The second for others big data (I found some table: alf_activity_feed, alf_activity_post, alf_child_assoc, alf_content_data, alf_content_url, alf_node_aspects, alf_node_assoc, alf_transaction have very large data).

In the first phase, i've try partition alf_node by range as below:

CREATE TABLE `alf_node` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`store_id` bigint(20) NOT NULL,
`uuid` varchar(36) NOT NULL,
`transaction_id` bigint(20) NOT NULL,
`type_qname_id` bigint(20) NOT NULL,
`locale_id` bigint(20) NOT NULL,
`acl_id` bigint(20) DEFAULT NULL,
`audit_creator` varchar(255) DEFAULT NULL,
`audit_created` varchar(30) DEFAULT NULL,
`audit_modifier` varchar(255) DEFAULT NULL,
`audit_modified` varchar(30) DEFAULT NULL,
`audit_accessed` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `store_id` (`store_id`,`uuid`),
KEY `idx_alf_node_txn_type` (`transaction_id`,`type_qname_id`),
KEY `fk_alf_node_acl` (`acl_id`),
KEY `fk_alf_node_store` (`store_id`),
KEY `idx_alf_node_tqn` (`type_qname_id`,`store_id`,`id`),
KEY `fk_alf_node_loc` (`locale_id`),
KEY `idx_alf_node_mdq` (`store_id`,`type_qname_id`),
KEY `idx_alf_node_cor` (`audit_creator`,`store_id`,`type_qname_id`),
KEY `idx_alf_node_crd` (`audit_created`,`store_id`,`type_qname_id`),
KEY `idx_alf_node_mor` (`audit_modifier`,`store_id`,`type_qname_id`),
KEY `idx_alf_node_mod` (`audit_modified`,`store_id`,`type_qname_id`)
)

PARTITION BY RANGE ( YEAR(audit_created) ) (

PARTITION empty VALUES LESS THAN ( 0 ),
PARTITION p2018 VALUES LESS THAN ( YEAR('2019-01-01 00:00:00') ),
PARTITION p2019 VALUES LESS THAN ( YEAR('2020-01-01 00:00:00') ),
PARTITION other VALUES LESS THAN maxvalue

)

But when i try to execute statement, the system show error: 1486. Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed

I've try to check data on audit_created I've found that some row this colum (audit_created) is null, others contain Time zone in data (ex: 2018-10-01T11:55:56.312Z). 

So I have to question:

1- why some row have audit_created is null

2- Could you give some advise about how to partition alf_node on audit_created?

Many thanks!

8 Replies
afaust
Master

Re: Need to help about partition MySQL

Some tpes of nodes in Alfresco are simply not marked with the cm:auditable aspect (by default) and as such they do not get any timestamp data in those colums. Person and group nodes are some examples of such nodes.

Unfortunately you did not tell us to what dimensions your data has grown,or what type of performance issues you are facing, that made you think partitioning is the best approach to take for improvements. Partitioning of Alfresco database should be done with a thorough understanding of the consequences (i.e. likely limitations/issues in future upgrades) and how the data is both distributed and used by Alfresco (generally speaking and in the context of your specific use case).

Apart from splitting the data file for the alf_node table, I can find no benefit in partitioning by audit_created. On alf_node in particular - apart from the id column itself (i.e. using a modulo) - store_id and/or type_qname_id are the best candidates for partitioning (i.e. one partition for all dummy "deleted" nodes, one for documents, one for folders etc., or one partition for workspace and one for archive store). Generally speaking though, the alf_node_properties table is the single most impactful table to be looked at when partitioning is considered. Due to how CMIS / FTS database queries use this table, it can be highly effective to partition this table based on the qname_id and/or value patterns (i.e. have all NodeRef/uuid-like property values in one partition to avoid those from "poisoning the indizes/statistics for that column).

phong_van
Active Member II

Re: Need to help about partition MySQL

Dear Axel Faust,

Thanks for helpful anwser.  After your help, I plan partition

  • alf_node based on store_id and type_qname_id
  • alf_node_properties based on qname_id. 

But I've 2 question:

  • The first: Because mysql (5.6) not support forein key with partition => so i have to drop all forein key in alf_node and alf_node_properties => So, if I do so, I don't know there risk in future or not?
  • The second: Other table alf_child_assoc, alf_content_data, alf_content_url, alf_node_aspects, alf_node_assoc, alf_transaction in my data base is big too, so please give me some advise.

Once again, thanks for your helpful anwser.

afaust
Master

Re: Need to help about partition MySQL

Again, I just would like to nudge you into telling us what you mean by "big tables". Unless you have tens to hundreds of millions of documents, those tables should not be "big", and thus not be relevant for partitioning.

Anyway... Never drop foreign keys. If MySQL does not support partitions on foreign keys, then that's quite shitty. I have only ever partitioned Alfresco tables with PostgreSQL and can't remember any such limited support. I was not aware of that limitation in MySQL / MariaDB, and would advise not to partition in this case. Various operations in Alfresco rely on foreign keys and cascading deletions.

With regards to alf_child_assoc could be partitioned by parent_node_id + type_qname_id, alf_node_aspects on node_id (some modulo / hash), alf_node_assoc by source_node_id + type_qname_id, alf_content_data + _url by simple ID (modulo / hash), and alf_transaction on commit_time (potentially adding new partitions every so often.

phong_van
Active Member II

Re: Need to help about partition MySQL

Dear Axel Faust,

after about 6 month using, my data as:

  • alf_activity_feed: about 13 M (Milion) rows
  • alf_activity_post: about 5M rows
  • alf_child_assoc: 7M rows
  • alf_content_data: 4.5 M rows
  • alf_content_url: 4.5M rows
  • alf_node: 10M rows
  • alf_node_aspects: 10M rows
  • alf_node_assoc: 13-14 M rows
  • alf_node_properties: 100 M rows
  • alf_transaction: 7M  rows

Because i plan using alfresco in long term, so I think partition table is necessary. But limit about foreign key in MySQL make me really neverious Smiley Sad.

Once again, thanks for your support.

afaust
Master

Re: Need to help about partition MySQL

Hmm... those numbers seem odd to me in multiple ways:

  • alf_activity_post entries should be processed (and deleted) regularly (default: processing every 30 seconds, cleaning every 10 minutes), so how did you manage to accumulate 5 million rows? Maybe your activity processing is just not working as it should (check the logs)
  • Having 10 million alf_node_assoc entries for only 10 million alf_node entries means that on average every node has at least one (source) association (and two assocs overall) - this is an unusually high number using the default content model or any custom model built using typical tutorials / references.
  • Having 10 million alf_node_aspects entries for 10 million alf_node entries is an unusually low number, as most nodes will typically have two or more aspects using the default content model (cm:auditable, cm:titled/cm:versionable)
  • Having 13 million rows in alf_activity_feed seem a bit to me, but since I do not know how many (active) users you have and what your site-to-user ratio is, it could be perfectly fine or just as weird as the activity post table. By default, the size of a feed (personal / site) should be limited to only 100 entries / entries with age < 31 days), and clearing occurs regularly (every 10 minutes)

Generally speaking though, the numbers are still relatively low. With Enterprise customers I have seen systems 10-times that size (with correctly working cleanup / data reduction processes) that still run perfectly fine in a non-partitioned setup.

As I have mentioned (implicitly) before, partitioning is an invasive process that can leave you with a lot of technical issues in future upgrades, as Alfresco does not include any ugprade scripts capable of handling (paritally/custom) partitioned databases. Most (95+%) of customers / users never bother with it, and for all databases there are typically other optimisation methods that should be applied before partitioning becomes relevant. For MySQL / MariaDB this typically involves switching to the Barracuda file format and using table/index compression.

afaust
Master

Re: Need to help about partition MySQL

If you really think you have to use partitioning, switch to a better database engine (PostgreSQL).

phong_van
Active Member II

Re: Need to help about partition MySQL

Dear Axel Faust,

Because I'm afraid performace issuse, in the previous i check max Id to get number row, so I've correct as below:

  • alf_activity_feed: about 1000 rows
  • alf_activity_post: about 1000 rows
  • alf_child_assoc: 5.6 M rows
  • alf_content_data: 3 M rows
  • alf_content_url: 2 M rows
  • alf_node: 7 M rows
  • alf_node_aspects: 17 M rows
  • alf_node_assoc: 1.4 M rows
  • alf_node_properties: 100 M rows
  • alf_transaction: 5M  rows

So once again, I'm very sory about my mistake. 

afaust
Master

Re: Need to help about partition MySQL

Ok - that clears up the question about the alf_activity_XX tables. They are of a size that seems reasonable, even quite low. alf_node + alf_node_aspects / alf_node_assoc also are in a clearly more healthy state than initially assumed (still, 1.4 million alf_node_assoc for 7 million nodes is a bit much from my experience, but your business model may just be designed that way).

I am surprised that there are only 5.6 million alf_child_assoc for 7 million nodes - that means that you likely have 1.4 (or even more) deleted nodes in the alf_node table that have not yet been cleaned up (node cleanup job runs each day at 9pm by default and cleans up nodes deleted more than 30 days ago).