idx_alf_conturl_cr violated on document creation

cancel
Showing results for 
Search instead for 
Did you mean: 
Member II

idx_alf_conturl_cr violated on document creation

Hi!

On creation and even on deletion of any document I get a duplicate key exception in the postgresql-database:

### Error updating database. Cause: org.postgresql.util.PSQLException: ERROR: duplicate key violated

Unique-Constraint „idx_alf_conturl_cr“

2 calls (apparently retrying):

insert into alf_content_url (id, content_url, content_url_short, content_url_crc, content_size, orphan_time)
          values (?, ?, ?, ?, ?, ?)

The documents are being created without errors in Alfresco (due to retrying transaction helper) but where do these exceptions come from?

Thank you very much in advance for some hints.

6 Replies
Alfresco Employee

Re: idx_alf_conturl_cr violated on document creation

This is a new feature added to Alfresco 5.1. It has been raised, but I don't know if it's going to be solved. 

I was looking for a public issue, but it's only at support level. Should be useful to raise also a public one?

Software Engineer in Alfresco Search Team.
Alfresco Employee

Re: idx_alf_conturl_cr violated on document creation

I found it: https://issues.alfresco.com/jira/browse/MNT-17060 

Finally the classified this as "not a bug"... 

For me it is a bug, specially when I was importing million of rows and the database log filled my disk Smiley Happy

Software Engineer in Alfresco Search Team.
Senior Member

Re: idx_alf_conturl_cr violated on document creation

Tagging ‌ for input!

Master

Re: idx_alf_conturl_cr violated on document creation

Interestingly I have been seeing a similar issue with alf_qname table on Alfresco 5.0.d before. Probably they did not check the cache before attempting to create the QName back then...

Totally agreed that this is a bug. There already is a getOrCreateContentUrl in the AbstractContentDataDAOImpl, so I see no reason why they'd need to do a createOrGetByValue on the contentUrlCache in createContentDataEntity directly instead of calling that getOrCreateContentUrl  to reuse existing code. This looks like a poor job on part of ACE-3948 - they seem to have completely misunderstood the root cause. Content URL re-use is extremely common, e.g. when copying around nodes referencing the same content. Uploading a file via Share involves creating a new version which involves copying, so in that case a conflict is guaranteed using a createOrGet approach. The same can happen when copying a node to a different location - which may cause dozens or hundreds of key conflicts depending on how many children get recursively copied. Having one SELECT more (which they complain about in ACE-3948) is always preferable to consciously causing and not caring about a key conflict.

If they really wanted to avoid the SELECT, they should have carried the content URL ID along inside the ContentData object, so they could check that against null when trying to determine if they can re-use an existing entity or have to create a new one. So when a new node is uploaded, they'd see content URL ID as null and create a new enttiy, but when it is copied as part of versioning, they'd see the content URL ID and just do the lookup in the cache (which should contain it, since it was created in that transaction).

Member II

Re: idx_alf_conturl_cr violated on document creation

Thank you very much, Axel, for your detailed analysis.

I totally agree in your stating one additional select is always preferable to provoking a duplicate key exception. Furthermore an insert in any case binds more ressources in the database than an additional select which can be cached by the database if it has already similarly been executed some seconds before.

Intermediate

Re: idx_alf_conturl_cr violated on document creation

Thanks for the conversation. I shared it with the architects for this area of the product.

As  described in MNT-17060, we rely on the database to enforce the key constraint because it is the most performant way of doing bulk inserts of new data. When the database sees the violation, we pick that up and respond appropriately. This is a common pattern with our current architecture, and is a convenient optimization.

We selected this architecture as a result of performance profiling in order to optimize new data creation because bulk import is a critical use case. As a result, there is one fewer select statement for every file. This is documented in  ACE-3948.