idx_alf_conturl_cr violated on document creation

schorsch · ‎14 Nov 2017

Hi!

On creation and even on deletion of any document I get a duplicate key exception in the postgresql-database:

### Error updating database. Cause: org.postgresql.util.PSQLException: ERROR: duplicate key violated

Unique-Constraint „idx_alf_conturl_cr“

2 calls (apparently retrying):

insert into alf_content_url (id, content_url, content_url_short, content_url_crc, content_size, orphan_time)
values (?, ?, ?, ?, ?, ?)

The documents are being created without errors in Alfresco (due to retrying transaction helper) but where do these exceptions come from?

Thank you very much in advance for some hints.

angelborroy · ‎14 Nov 2017

This is a new feature added to Alfresco 5.1. It has been raised, but I don't know if it's going to be solved.

I was looking for a public issue, but it's only at support level. Should be useful to raise also a public one?

Hyland Developer Evangelist

angelborroy · ‎14 Nov 2017

I found it: https://issues.alfresco.com/jira/browse/MNT-17060

Finally the classified this as "not a bug"...

For me it is a bug, specially when I was importing million of rows and the database log filled my disk

Hyland Developer Evangelist

kgastaldo · ‎15 Nov 2017

Tagging ‌ for input!

afaust · ‎16 Nov 2017

Interestingly I have been seeing a similar issue with alf_qname table on Alfresco 5.0.d before. Probably they did not check the cache before attempting to create the QName back then...

Totally agreed that this is a bug. There already is a getOrCreateContentUrl in the AbstractContentDataDAOImpl, so I see no reason why they'd need to do a createOrGetByValue on the contentUrlCache in createContentDataEntity directly instead of calling that getOrCreateContentUrl to reuse existing code. This looks like a poor job on part of ACE-3948 - they seem to have completely misunderstood the root cause. Content URL re-use is extremely common, e.g. when copying around nodes referencing the same content. Uploading a file via Share involves creating a new version which involves copying, so in that case a conflict is guaranteed using a createOrGet approach. The same can happen when copying a node to a different location - which may cause dozens or hundreds of key conflicts depending on how many children get recursively copied. Having one SELECT more (which they complain about in ACE-3948) is always preferable to consciously causing and not caring about a key conflict.

If they really wanted to avoid the SELECT, they should have carried the content URL ID along inside the ContentData object, so they could check that against null when trying to determine if they can re-use an existing entity or have to create a new one. So when a new node is uploaded, they'd see content URL ID as null and create a new enttiy, but when it is copied as part of versioning, they'd see the content URL ID and just do the lookup in the cache (which should contain it, since it was created in that transaction).

schorsch · ‎16 Nov 2017

Thank you very much, Axel, for your detailed analysis.

I totally agree in your stating one additional select is always preferable to provoking a duplicate key exception. Furthermore an insert in any case binds more ressources in the database than an additional select which can be cached by the database if it has already similarly been executed some seconds before.

resplin · ‎12 Mar 2018

Thanks for the conversation. I shared it with the architects for this area of the product.

As described in MNT-17060, we rely on the database to enforce the key constraint because it is the most performant way of doing bulk inserts of new data. When the database sees the violation, we pick that up and respond appropriately. This is a common pattern with our current architecture, and is a convenient optimization.

We selected this architecture as a result of performance profiling in order to optimize new data creation because bulk import is a critical use case. As a result, there is one fewer select statement for every file. This is documented in ACE-3948.

franck1436 · ‎14 Oct 2022

Hi @afaust , @resplin

I got the same issue with duplicate key error on postgres DB, on each document creation.

As the ticket is not accessible anymore, how can we solved this problem?

 UTC [962] ERROR: duplicate key value violates unique constraint "idx_alf_conturl_cr"

CDT STATEMENT: insert into alf_content_url (id, content_url, content_url_short, content_url_crc, content_size, orphan_time) values ($1, $2, $3, $4, $5, $6)
select * from alf_content_url order by id desc;

We are running Alfresco on docker with this versions from docker-compose :

postgres:13.3
alfresco/alfresco-acs-nginx:3.1.1
alfresco/alfresco-content-repository-community:7.1.1
alfresco/alfresco-search-services:2.0.2.1
alfresco/alfresco-transform-core-aio:2.5.6
alfresco/alfresco-share:7.1.1
alfresco/alfresco-activemq:5.16.1

Thanks in advance

Cordially