Kubernetes install failing on with message Another Alfresco Cluster Node is updating the DB

cancel
Showing results for 
Search instead for 
Did you mean: 
bgkoneill
Active Member

Kubernetes install failing on with message Another Alfresco Cluster Node is updating the DB

Jump to solution

We are using the helm chart for deploying within Minikube.  We can see the Alfresco ACS container starting and viewing the logs shows us that it is connecting to the DB container, creating tables and applying patches.  However, it fails when it it does the SchemaBootstrap updates which it attempts 24 times.  Each time it responds with:

Applied patches detected: 4

Bootstrap marker still present in DB

Another Alfresco Cluster node is updating the DB

The current Alfresco cluster node is waiting for another chance to bootstrap the DB schema.

Once it has tried 24 times it tries to start alfresco again and goes back into this loop.

We know that the retries are configured as part of db.schema.update.lockRetryCount.  However, why is the DB locked in the first place when the container tries to start it. We have disabled clustering in the helm chart so what other Alfresco Cluster node is updating the DB. 

Any help greatly appreciated.

Regards

Brian

1 Solution

Accepted Solutions
afaust
Master

Re: Kubernetes install failing on with message Another Alfresco Cluster Node is updating the DB

Jump to solution

The Helm / Kubernetes deploy is quite brittle - at least from my experience of my evaluation + customer workshop last year - relying on restarts and timed readyness probes to succeed probabilistically / "sometime". During the first start of the first pod / container, when the database is being bootstrapped, there is a chance that the failed readyness probe (if the whole process takes too long) results in the pod / container to be killed / restarted mid-bootstrap. As a result, the lock table may remain, requiring manual intervention to either reset the entire database or hope that a manual deletion of the lock table can result in a continuation of the bootstrap in the next restart cycle.

View solution in original post

5 Replies
afaust
Master

Re: Kubernetes install failing on with message Another Alfresco Cluster Node is updating the DB

Jump to solution

The Helm / Kubernetes deploy is quite brittle - at least from my experience of my evaluation + customer workshop last year - relying on restarts and timed readyness probes to succeed probabilistically / "sometime". During the first start of the first pod / container, when the database is being bootstrapped, there is a chance that the failed readyness probe (if the whole process takes too long) results in the pod / container to be killed / restarted mid-bootstrap. As a result, the lock table may remain, requiring manual intervention to either reset the entire database or hope that a manual deletion of the lock table can result in a continuation of the bootstrap in the next restart cycle.

bgkoneill
Active Member

Re: Kubernetes install failing on with message Another Alfresco Cluster Node is updating the DB

Jump to solution

Hi Axel,

I appreciate your help on this.  Your suggestion worked.  We connected into the DB container and deleted the alf_bootstrap_lock table and then the ACS container started correctly on its next loop.  I think you are right about the readyness probe meaning that the ACS server creates the table on startup but then fails to finish on time before the readiness probe kills it so it does not have time to delete the table and next time around its locked.

We are doing an evaluation on the containers for a client also and so far our experience with the current helm charts does not give us a lot of confidence.  We are going to try the helm charts and set up for AWS next and then make a call on what we recommend to the client.  Have you heard of any customers running Alfresco from these charts yet.  Unless I am missing something here it seems that Alfresco has architected 6.1 so that you have to run the solution as a bunch of containers or is there another way to run Alfresco 6.1 using a container such as tomcat?

Regards

Brian

afaust
Master

Re: Kubernetes install failing on with message Another Alfresco Cluster Node is updating the DB

Jump to solution

Until last week I have personally had not heard from any customers (of mine or my contacts within the community) that have deployed Alfresco in Kubernetes, either AWS or locally. Last week I was contacted about a project for a customer here in Germany who apparently did take the leap, though I do not know yet if they have used the default Helm charts.

Though Alfresco 6.1 has been architected around containerisation, it can just as well still be deployed into an "old-style" Tomcat setup. I typically deploy all my local dev environments that way, as that simplifies matters like remote debugging, configuration changes and my general development flow. In actual test / production systems, I tend to use / nudge customers towards using Docker though, but I would never recommend / nudge them towards using Kubernetes at this point. Just using Docker-Compose works way better in 99.9..% of scenarios I have encountered so far. Only if Netflix, Amazon themselves or another comparably large organisation wanted me to deploy Alfresco in a dynamically scalable deployment would I consider Kubernetes as an option...

My opinion is obviously biased by my local market. Here in central EU / Germany, most customers typically set up dedicated servers / VMs for business applications for their entire lifetime, rarely consider US cloud providers (encumbered by a culture of comparably insufficient concerns for data security, and the global reach of a surveillance-heavy government) as hosting options, and thus typically never scale up / down apart from scheduled system upgrades every few years.

bgkoneill
Active Member

Re: Kubernetes install failing on with message Another Alfresco Cluster Node is updating the DB

Jump to solution

Hi Axel,

Thanks for the insight.  At the moment I tend to agree with you on kubernetes being a bit full blown for what we want to do.   I had not considered using a docker-compose option for doing the same thing as I thought the compose releases were only for dev.  Our customer are on aws and aws now have eks for container management so Kubernetes seemed like the logical option.  Our customers are pretty much all on a VM architecture like in Germany and moving to containers is a leap for them so we really want to make sure it works.  While on the topic of containers, what is your approach for the dev cycle for customisations that were traditionally rolled out using amps.  We are currently thinking that we can use the new SDK for dev/test of the customisations and then package the changes as a container image which we then reference as an image in compose or kubernetes.  (I am assuming that there is a mvn lifecycle to build an image in the new sdk but have not checked yet.)   Does this seem like the correct approach?

Regards

Brian  

afaust
Master

Re: Kubernetes install failing on with message Another Alfresco Cluster Node is updating the DB

Jump to solution

I am typically using neither the default Docker images nor the SDK provided by Alfresco - rather I build my own, and subsequently my dev / deployment cycles around that. I have base images for Alfresco Repository and Share which are capable of downloading the base Alfresco WARs and AMPs from Maven-compatible repository and initialise themselves during the first container start. Those images may also be used to build images with specific pre-built constellations by simply running the same initialisation logic as part of a build stage.

My projects for building extensions / modules remain unchanged between 5.x and 6.x lines of Alfresco - they still produce both regular AMPs as well as "installable" JARs for those users / customers that - for some reason - are allergic to a decent AMP. In those module projects I may use a Docker Maven plugin from fabric8.io to run integration tests, using a regular Maven assembly descriptor to build up the runtime container.

The projects for building the actual deployable artifact for a specific customer system have changed only slightly. They are still regular Maven WAR builds, with an additional stage again using fabric8.io Docker Maven plugin to build and push the Docker container images.

I actually do not know how SDK 4 builds projects. Due to its various opinionated and unflexible defaults (geared to make it easy for people to start - not necessarily be productive as advanced professionals), the SDK has never been an option for my day-to-day projects and I only use in the odd community project (like OOTBee Support Tools, where we still use a 2.x version).