Apart from the child association localname, all QNames in Alfresco are static. Duplicated QNames are:
Persistence of Static QName
For each node, then, there are approximately 2710 characters stored that are duplicated between nodes. This significantly bloats the database as well as the size of the L1 and L2 caches.
A solution was written for the BRANCHES/SCHEMA_1.3 but were rolled out before the merge. It involves having a StaticQNameEntity and associated static_qname table that is referenced by foreign key relationships from the above tables. Only the unique ID of the static_qname will be stored in the node tables.
Because the static_qname values are immutable within the server, a non-clustered, static cache can be used to map between the ID and QName - and vice versa. Hibernate could manage the ID to QName mapping, but it would hydrate entities from either the database or the L2 cache each time the QName was accessed.
Most static QName usage currently uses QNameUserType in Hibernate. This can be replaced by a StaticQNameUserType for all static QName mappings. The child_assoc.qname column will keep use of the QNameUserType as it is not a static QName.
Auditable Aspect Properties
The cm:auditable aspect is applied to everything. This wasn't the case when the system was first written. Subsequently, with the AVM tables, the aspect's properties have been put directly onto the AVM node table equivalent. This can be done with the normal repository's alf_node table as well.
Put the auditable properties such as cm:modified directly on the alf_node table for speedier access.
The NodeStatus entity serves two functions. The first is to keep track of the transaction ID of of existing and deleted nodes so that the continuous index updating components can ensure that deleted nodes are removed from the indexes. The second is to act as a map from the protocol-identifier-guid' key (NodeRef) to the unique node ID. The use of this entity to access nodes adds overhead to all node operations by:
A relatively high proportion of cache misses and therefore extra database queries when accessing nodes
Keeping data related to deleted nodes for long periods of time - and there isn't currently a clean-up process
Duplicated NodeRef information
Weak foreign key relationships between NodeStatus, Node and Store.
The index recovery components need to keep transaction IDs, which are time ordered, in order to be able to rebuild partial indexes in original transaction order without double checking the existence of all nodes. The change_txn_id column must therefore be preserved for both existing and deleted nodes.
The NodeStatus will cease to exist. Instead, the change transaction ID will be put directly onto the Node entity for creations and updates. Deletions will be recorded in the DeletedNode entity. The NodeRef key will be enforced by the database for both existing and deleted nodes. For both tables, a unique key will be enforced for the NodeRef.
The NodeStatus currently provides a cache of NodeRef->Node. This can be replaced with a much simpler internal cache that maps NodeRef->List