db connection loss & huge number of blocked threads

cancel
Showing results for 
Search instead for 
Did you mean: 
mrks_js1
Established Member II

db connection loss & huge number of blocked threads

Jump to solution

We are running Alfresco GA 201702 internally (rhel 6.8, ~25gb of ram, quadcore, mysql, solr4)

Since a few weeks we have outages where the alfresco process is running but the application does not respond anymore. Only a restart of alfresco solves this issue.

The last error message in the log is this:

2017-01-23 06:44:08,454 ERROR [      Thread-32] [ecutor.AcquireJobsRunnableImpl] exception during job acquisition: Could not open Hibernate Session for transaction; nested exception is org.hibernate.exception.GenericJDBCException: Cannot open connection
org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is org.hibernate.exception.GenericJDBCException: Cannot open connection
        at org.springframework.orm.hibernate3.HibernateTransactionManager.doBegin(HibernateTransactionManager.java:597)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:372)
        at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
        at org.activiti.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:45)
        at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:31)
        at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:40)
        at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:35)
        at org.activiti.engine.impl.jobexecutor.AcquireJobsRunnableImpl.run(AcquireJobsRunnableImpl.java:54)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.hibernate.exception.GenericJDBCException: Cannot open connection
        at org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:103)
        at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:91)
        at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
        at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:29)
        at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:426)
        at org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:144)
        at org.hibernate.jdbc.JDBCContext.connection(JDBCContext.java:119)
        at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:57)
        at org.hibernate.impl.SessionImpl.beginTransaction(SessionImpl.java:1326)
        at org.springframework.orm.hibernate3.HibernateTransactionManager.doBegin(HibernateTransactionManager.java:556)
        ... 8 more
Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
        at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
        at org.springframework.orm.hibernate3.LocalDataSourceConnectionProvider.getConnection(LocalDataSourceConnectionProvider.java:83)
        at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:423)
        ... 13 more
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
        at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1174)
        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
        ... 16 more

When analyzing the jstack / threaddump we notice that there is a huge number (~250 threads) of blocked threads:

Thread 24077: (state = BLOCKED)
- net.sf.jooreports.openoffice.converter.OpenOfficeDocumentConverter.convertInternal(java.io.File, net.sf.jooreports.converter.DocumentFormat, java.io.File, net.sf.jooreports.converter.DocumentFormat) @bci=8, line=105 (Interpreted frame)
- net.sf.jooreports.openoffice.converter.AbstractOpenOfficeDocumentConverter.convert(java.io.File, net.sf.jooreports.converter.DocumentFormat, java.io.File, net.sf.jooreports.converter.DocumentFormat) @bci=156, line=88 (Interpreted frame)
- org.alfresco.repo.content.transform.OpenOfficeContentTransformerWorker.convert(java.io.File, net.sf.jooreports.converter.DocumentFormat, java.io.File, net.sf.jooreports.converter.DocumentFormat) @bci=9, line=125 (Interpreted frame)
- org.alfresco.repo.content.transform.OOoContentTransformerHelper.transform(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=550, line=393 (Interpreted frame)
- sun.reflect.GeneratedMethodAccessor1384.invoke(java.lang.Object, java.lang.Object[]) @bci=56 (Interpreted frame)
- sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame)
- java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=56, line=498 (Compiled frame)
- org.alfresco.repo.management.subsystems.SubsystemProxyFactory$1.invoke(org.aopalliance.intercept.MethodInvocation) @bci=22, line=79 (Compiled frame)
- org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() @bci=101, line=172 (Compiled frame)
- org.springframework.aop.framework.JdkDynamicAopProxy.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[]) @bci=322, line=204 (Compiled frame)
- com.sun.proxy.$Proxy30.transform(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=24 (Interpreted frame)
- org.alfresco.repo.content.transform.ProxyContentTransformer.transformInternal(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=24, line=107 (Interpreted frame)
- org.alfresco.repo.content.transform.AbstractContentTransformer2.transform(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=167, line=272 (Compiled frame)
- org.alfresco.repo.content.transform.ComplexContentTransformer.transformInternal(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=257, line=492 (Interpreted frame)
- org.alfresco.repo.content.transform.AbstractContentTransformer2.transform(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=167, line=272 (Compiled frame)
- org.alfresco.repo.content.transform.FailoverContentTransformer.transformInternal(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=274, line=215 (Interpreted frame)
- org.alfresco.repo.content.transform.AbstractContentTransformer2.transform(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=167, line=272 (Compiled frame)
- org.alfresco.repo.content.transform.ComplexContentTransformer.transformInternal(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=257, line=492 (Interpreted frame)
- org.alfresco.repo.content.transform.AbstractContentTransformer2.transform(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=167, line=272 (Compiled frame)
- org.alfresco.repo.content.ContentServiceImpl.failoverTransformers(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions, java.lang.String, java.util.List) @bci=210, line=668 (Interpreted frame)
- org.alfresco.repo.content.ContentServiceImpl.transform(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=232, line=617 (Interpreted frame)
- sun.reflect.GeneratedMethodAccessor1383.invoke(java.lang.Object, java.lang.Object[]) @bci=56 (Interpreted frame)
- sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame)
- java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=56, line=498 (Compiled frame)
- org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(java.lang.Object, java.lang.reflect.Method, java.lang.Object[]) @bci=7, line=317 (Compiled frame)
- org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint() @bci=12, line=183 (Compiled frame)
- org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() @bci=19, line=150 (Compiled frame)
- net.sf.acegisecurity.intercept.method.aopalliance.MethodSecurityInterceptor.invoke(org.aopalliance.intercept.MethodInvocation) @bci=9, line=80 (Compiled frame)
- org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() @bci=101, line=172 (Compiled frame)
- org.alfresco.repo.model.ml.MLContentInterceptor.invoke(org.aopalliance.intercept.MethodInvocation) @bci=227, line=136 (Compiled frame)
- org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() @bci=101, line=172 (Compiled frame)
- org.alfresco.repo.security.permissions.impl.ExceptionTranslatorMethodInterceptor.invoke(org.aopalliance.intercept.MethodInvocation) @bci=1, line=53 (Compiled frame)
- org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() @bci=101, line=172 (Compiled frame)
- org.alfresco.repo.audit.AuditMethodInterceptor.invoke(org.aopalliance.intercept.MethodInvocation) @bci=15, line=166 (Compiled frame)
- org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() @bci=101, line=172 (Compiled frame)
- org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation() @bci=4, line=96 (Compiled frame)
- org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(java.lang.reflect.Method, java.lang.Class, org.springframework.transaction.interceptor.TransactionAspectSupport$InvocationCallback) @bci=58, line=260 (Compiled frame)
- org.springframework.transaction.interceptor.TransactionInterceptor.invoke(org.aopalliance.intercept.MethodInvocation) @bci=40, line=94 (Compiled frame)
- org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() @bci=101, line=172 (Compiled frame)
- org.springframework.aop.framework.JdkDynamicAopProxy.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[]) @bci=322, line=204 (Compiled frame)
- com.sun.proxy.$Proxy52.transform(org.alfresco.service.cmr.repository.ContentReader, org.alfresco.service.cmr.repository.ContentWriter, org.alfresco.service.cmr.repository.TransformationOptions) @bci=24 (Interpreted frame)
- org.alfresco.repo.rendition.executer.AbstractTransformationRenderingEngine$TransformationCallable$1.doWork() @bci=54, line=447 (Interpreted frame)
- org.alfresco.repo.rendition.executer.AbstractTransformationRenderingEngine$TransformationCallable$1.doWork() @bci=1, line=1 (Interpreted frame)
- org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(org.alfresco.repo.security.authentication.AuthenticationUtil$RunAsWork, java.lang.String) @bci=30, line=555 (Compiled frame)
- org.alfresco.repo.rendition.executer.AbstractTransformationRenderingEngine$TransformationCallable.call() @bci=47, line=437 (Interpreted frame)
- org.alfresco.repo.rendition.executer.AbstractTransformationRenderingEngine$TransformationCallable.call() @bci=1, line=1 (Interpreted frame)
- java.util.concurrent.FutureTask.run() @bci=42, line=266 (Interpreted frame)
- java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1142 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Compiled frame)
- java.lang.Thread.run() @bci=11, line=745 (Compiled frame)

We have a monitoring system in place and we are able to notice the problem instantly but we want do understand what is happening and if the 2 errors are connected. 

Increasing max_connections would probably expand the time to each restart but the question really is if there is a jdbc connection leakage and how it is produced.

I was profiling and stressing our test environment for quite some time but I can't reproduce the issue and i can not see jdbc connections not being closed properly.

1 Solution

Accepted Solutions
cesarista
Customer

Re: db connection loss & huge number of blocked threads

Jump to solution

Hi Markus:

An upgrade is a big change to be considered in terms of monitoring behaviour. So, did your Libreoffice version change in this upgrade ? IMO, last versions of Alfresco Community seem to consume more resources, even GA versions. Some possibilites may be:

- Try to increase maximum threads in Tomcat (server.xml) + db.pool.max (alfresco-global.properties) + Mysql max_connections (my.conf) until 400, for example to try if this threshold is reached.

- Consider to separate Libreoffice processes in an external machine too, or even using Jodconverter in an external server. Last one may be configured for being used with several transformation threads. 

- If you have transformations taking a long time, try to reduce maximum transformation sizes in Alfresco configuration, or even giving a reasonable timeout for these transformations, via alfresco-global.properties

Regards.

--C.

View solution in original post

9 Replies
cesarista
Customer

Re: db connection loss & huge number of blocked threads

Jump to solution

Hi Markus:

Relating "when":

  • Is it an instant outage, or can see your CPU, Memory, threads and DB connections going higher during the time the server is up ? For example with OOTB Support Tools addon
  • Do the outages correspond with a time-defined users workload (such as daily uploader batches, massive and recurrent desktop synchronizations, non-adequate use of Alfresco network drive endpoints), filesystem bulk imports with massive transformations, custom scheduled actions or internal Alfresco scheduled actions ?
  • Do you include any new customization, integration or process recently in Alfresco ?


Relating your environment and monitoring:

  • What is your value for db.pool.max in Alfresco and max_connections for your Mysql database ? Are these values coherent with maximum number of threads for avoiding bottlenecks ? Do you have/monitor slow queries in your database ? Is your database dump much bigger that some weeks ago ?
  • Do you have your Mysql server and Solr 4 in the same instance of Alfresco repository ? If they are in the same machine, how do the three systems compete for resources such as CPU or RAM ? Do you detect in your monitoring high CPU uses, memory swapping or high load averages during these weeks ?
  • How about Libreoffice/Openoffice processes ? Do they have enough RAM to work ? Did you experience previous problems with them ?
  • Is your instance machine virtualized ? Do you think your CPU / RAM resources are being overbooked ?

Relating logs:

  • Only one ERROR in catalina.out or alfresco.log ? No previous related WARNs or ERROR ?


Hope this questioning helps, or at least to re-check some points of your system. By the way, may be that your custom-log4j setup is cropping part of the class name in your logs ? 

Regards.

--C.

afaust
Master

Re: db connection loss & huge number of blocked threads

Jump to solution

Cesar's question regarding the db.pool.max and max_connections are one part of the solution to your problem. From the error message you included I suspect that it is db.pool.max that is not set correctly. The behaviour for a too low max_connections value is a bit different (I think you should get connection refused / closed errors).

The other affects the issue you are seeing with a lot of threads blocked on the OpenOffice transformation. Each of these threads will have a connection assigned to it which does not get released until the transformation completes / fails. So if your db.pool.max or max_connections is too low you will have connection starvation. Unfortunately, OpenOffice is know to stall on transformations from time to time, using 100% CPU in one process / thread and lead to transformations being blocked. Various community members have implemented an OpenOffice watchdog feature to detect such cases and force OpenOffice to restart. This will free all of the blocked threads and the associated connections.

mrks_js1
Established Member II

Re: db connection loss & huge number of blocked threads

Jump to solution

Hi - thanks for the answers guys! I am in vacation right now.

Ootbee support tools is installed.

Alf DB Connections is set to 275 and MySQL max connections are set to 300.

The Office watchdog is installed, but we still have these blocking transformation threads.

mrks_js1
Established Member II

Re: db connection loss & huge number of blocked threads

Jump to solution

Relating "when":
Is it an instant outage, or can see your CPU, Memory, threads and DB connections going higher during the time the server is up ? For example with OOTB Support Tools addon
> it is an instant outage, process is not terminated but the application is not reachable anymore.

Do the outages correspond with a time-defined users workload (such as daily uploader batches, massive and recurrent desktop synchronizations, non-adequate use of Alfresco network drive endpoints), filesystem bulk imports with massive transformations, custom scheduled actions or internal Alfresco scheduled actions ?
> no. the threads seem to pile up and at some point the db pool is not large enough anymore. we can not see a user-pattern or something that triggers our issue.

Do you include any new customization, integration or process recently in Alfresco ?
> no, there was no additional customization in the last few months but we did upgrade from 5.0.c to 201702GA some weeks ago.

Relating your environment and monitoring:
What is your value for db.pool.max in Alfresco and max_connections for your Mysql database ?
Are these values coherent with maximum number of threads for avoiding bottlenecks ?
> Alf DB Connections is set to 275 and MySQL max connections are set to 300.
> yes, it seems that when the number of blocked thread reaches ~250 we get the issue.

Do you have/monitor slow queries in your database ? Is your database dump much bigger that some weeks ago ?
> i profiled the application but i can not see anything unusual and the db backup did not increase in an unusual way.

Do you have your Mysql server and Solr 4 in the same instance of Alfresco repository ?
If they are in the same machine, how do the three systems compete for resources such as CPU or RAM ?
> yes, its all on the same host, we have some memory reserve for the OS and the other processes but it might be that we need to reserve more than that in the future, good point!

Do you detect in your monitoring high CPU uses, memory swapping or high load averages during these weeks ?
> yes, we see peaks in cpu usage due to transformation. but that has always been the case (even before the upgrade). allocate more cpu is probably needed in the future.

How about Libreoffice/Openoffice processes ? Do they have enough RAM to work ? Did you experience previous problems with them ?
Is your instance machine virtualized ? Do you think your CPU / RAM resources are being overbooked ?
> i hope so! the libreoffice/oo process is crashing (as usual) that is why we have installed the watchdog. transformation was working as usual (some documents can't be converted, sometimes taking a long time) - but we did not have any major issues with it as of now.

Relating logs:
Only one ERROR in catalina.out or alfresco.log ? No previous related WARNs or ERROR ?
> the error is being thrown over and over again, there is no warn message before that particular error.

Hope this questioning helps, or at least to re-check some points of your system. By the way, may be that your custom-log4j setup is cropping part of the class name in your logs ?
> seems so! i will check. thanks!

Regards.
--C.

cesarista
Customer

Re: db connection loss & huge number of blocked threads

Jump to solution

Hi Markus:

An upgrade is a big change to be considered in terms of monitoring behaviour. So, did your Libreoffice version change in this upgrade ? IMO, last versions of Alfresco Community seem to consume more resources, even GA versions. Some possibilites may be:

- Try to increase maximum threads in Tomcat (server.xml) + db.pool.max (alfresco-global.properties) + Mysql max_connections (my.conf) until 400, for example to try if this threshold is reached.

- Consider to separate Libreoffice processes in an external machine too, or even using Jodconverter in an external server. Last one may be configured for being used with several transformation threads. 

- If you have transformations taking a long time, try to reduce maximum transformation sizes in Alfresco configuration, or even giving a reasonable timeout for these transformations, via alfresco-global.properties

Regards.

--C.

mrks_js1
Established Member II

Re: db connection loss & huge number of blocked threads

Jump to solution

these are all good points, i will consider them! thank you Cesar! :-)

what I don't understand yet is the fact that even though i have all these ~200 blocking threads, i don't see ~200+ db connections (taken from the /alfresco/service/ootbee/admin/active-sessions webscript from the support tools).

How can this be?

cesarista
Customer

Re: db connection loss & huge number of blocked threads

Jump to solution

Well, 1 thread does not imply 1 DB connection. When starting Alfresco, you may see around 150 threads active and maybe just 10 DB connections. 

Regards.

--C.

mrks_js1
Established Member II

Re: db connection loss & huge number of blocked threads

Jump to solution

yes, i was under the same impression! but as pointed out:

"The other affects the issue you are seeing with a lot of threads blocked on the OpenOffice transformation. Each of these threads will have a connection assigned to it which does not get released until the transformation completes / fails."

seems that there should be a higher amount as compared to what i am seeing.

afaust
Master

Re: db connection loss & huge number of blocked threads

Jump to solution

My remarks is based on the fact that your blocked thread contain the following stack frames:

 - org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(java.lang.reflect.Method, java.lang.Class, org.springframework.transaction.interceptor.TransactionAspectSupport$InvocationCallback) @bci=58, line=260 (Compiled frame)
- org.springframework.transaction.interceptor.TransactionInterceptor.invoke(org.aopalliance.intercept.MethodInvocation) @bci=40, line=94 (Compiled frame)

This should have started / opened a connection to the DB (all operations on the ContentService are strangely flagged for a read-only transaction if none is yet active).