I am running 5.1.1 on an environment and ran into an issue yesterday under peak load.
We had a couple of servers get into a bad state so we tried to do a rolling restart of Alfresco.
The servers wouldn't start up because of a Hazelcast timeout. Probably because the cluster was so busy.
We had to stop everything to clear the cluster then start up the servers. This fixed us but caused a full outage to our clients.
Looking back I think I need to increase a timeout and/or maybe get creative. Try restarting an alfresco with it removed from the cluster. And get up in running with say another server out of the cluster. Then take down the other servers that are in the cluster. Start them up with a fresh cache then add the other servers back to the cluster.
Anyone have this situation before or have other ideas?
Here is the error:
2017-10-18 16:38:32,379 ERROR [web.context.ContextLoader] [localhost-startStop-1] Context initialization failed
com.hazelcast.core.OperationTimeoutException: [CONCURRENT_MAP_PUT] Operation Timeout (with no response!): 20000
at com.hazelcast.impl.BaseManager$ResponseQueueCall.waitAndGetResult(BaseManager.java:619)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getRedoAwareResult(BaseManager.java:641)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResult(BaseManager.java:636)
at com.hazelcast.impl.BaseManager$RequestBasedCall.getResultAsObject(BaseManager.java:464)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResultAsObject(BaseManager.java:555)
at com.hazelcast.impl.BaseManager$RequestBasedCall.getResultAsObject(BaseManager.java:460)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResultAsObject(BaseManager.java:555)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1894)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1818)
at com.hazelcast.impl.ConcurrentMapManager$MPut.put(ConcurrentMapManager.java:1682)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:632)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:606)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:173)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:124)
at org.alfresco.enterprise.repo.cluster.cache.HazelcastSimpleCache.put(HazelcastSimpleCache.java:108)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.transferCollectedItems(ClusteredObjectProxyFactory.java:241)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.transferCollectedItems(ClusteredObjectProxyFactory.java:204)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$AbstractClusteredObjectProxyInvoker.upgradeBackingObject(ClusteredObjectProxyFactory.java:326)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.upgradeBackingObject(ClusteredObjectProxyFactory.java:204)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory.upgradeCaches(ClusteredObjectProxyFactory.java:108)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.upgradeClusterObjects(ClusteringBootstrap.java:136)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.bootstrapWork(ClusteringBootstrap.java:127)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$1.execute(ClusteringBootstrap.java:69)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$1.execute(ClusteringBootstrap.java:65)
at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$2.doWork(ClusteringBootstrap.java:78)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$2.doWork(ClusteringBootstrap.java:75)
at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:555)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.onBootstrap(ClusteringBootstrap.java:74)
at org.springframework.extensions.surf.util.AbstractLifecycleBean.onApplicationEvent(AbstractLifecycleBean.java:56)
at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEventInternal(SafeApplicationEventMulticaster.java:214)
at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEvent(SafeApplicationEventMulticaster.java:185)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:334)
at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:950)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:482)
at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:410)
at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:306)
at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:112)
at org.alfresco.web.app.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:70)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:5016)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5524)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1081)
at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1877)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Oct 18, 2017 4:38:32 PM org.apache.catalina.core.StandardContext listenerStart
SEVERE: Exception sending context initialized event to listener instance of class org.alfresco.web.app.ContextLoaderListener
com.hazelcast.core.OperationTimeoutException: [CONCURRENT_MAP_PUT] Operation Timeout (with no response!): 20000
at com.hazelcast.impl.BaseManager$ResponseQueueCall.waitAndGetResult(BaseManager.java:619)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getRedoAwareResult(BaseManager.java:641)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResult(BaseManager.java:636)
at com.hazelcast.impl.BaseManager$RequestBasedCall.getResultAsObject(BaseManager.java:464)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResultAsObject(BaseManager.java:555)
at com.hazelcast.impl.BaseManager$RequestBasedCall.getResultAsObject(BaseManager.java:460)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResultAsObject(BaseManager.java:555)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1894)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1818)
at com.hazelcast.impl.ConcurrentMapManager$MPut.put(ConcurrentMapManager.java:1682)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:632)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:606)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:173)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:124)
at org.alfresco.enterprise.repo.cluster.cache.HazelcastSimpleCache.put(HazelcastSimpleCache.java:108)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.transferCollectedItems(ClusteredObjectProxyFactory.java:241)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.transferCollectedItems(ClusteredObjectProxyFactory.java:204)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$AbstractClusteredObjectProxyInvoker.upgradeBackingObject(ClusteredObjectProxyFactory.java:326)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.upgradeBackingObject(ClusteredObjectProxyFactory.java:204)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory.upgradeCaches(ClusteredObjectProxyFactory.java:108)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.upgradeClusterObjects(ClusteringBootstrap.java:136)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.bootstrapWork(ClusteringBootstrap.java:127)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$1.execute(ClusteringBootstrap.java:69)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$1.execute(ClusteringBootstrap.java:65)
at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$2.doWork(ClusteringBootstrap.java:78)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$2.doWork(ClusteringBootstrap.java:75)
at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:555)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.onBootstrap(ClusteringBootstrap.java:74)
at org.springframework.extensions.surf.util.AbstractLifecycleBean.onApplicationEvent(AbstractLifecycleBean.java:56)
at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEventInternal(SafeApplicationEventMulticaster.java:214)
at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEvent(SafeApplicationEventMulticaster.java:185)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:334)
at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:950)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:482)
at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:410)
at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:306)
at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:112)
at org.alfresco.web.app.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:70)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:5016)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5524)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1081)
at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1877)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)Oct 18, 2017 4:38:32 PM org.apache.catalina.core.StandardContext listenerStart
SEVERE: Exception sending context initialized event to listener instance of class org.alfresco.web.app.ContextListener
com.hazelcast.core.OperationTimeoutException: [CONCURRENT_MAP_PUT] Operation Timeout (with no response!): 20000
at com.hazelcast.impl.BaseManager$ResponseQueueCall.waitAndGetResult(BaseManager.java:619)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getRedoAwareResult(BaseManager.java:641)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResult(BaseManager.java:636)
at com.hazelcast.impl.BaseManager$RequestBasedCall.getResultAsObject(BaseManager.java:464)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResultAsObject(BaseManager.java:555)
at com.hazelcast.impl.BaseManager$RequestBasedCall.getResultAsObject(BaseManager.java:460)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResultAsObject(BaseManager.java:555)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1894)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1818)
at com.hazelcast.impl.ConcurrentMapManager$MPut.put(ConcurrentMapManager.java:1682)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:632)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:606)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:173)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:124)
at org.alfresco.enterprise.repo.cluster.cache.HazelcastSimpleCache.put(HazelcastSimpleCache.java:108)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.transferCollectedItems(ClusteredObjectProxyFactory.java:241)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.transferCollectedItems(ClusteredObjectProxyFactory.java:204)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$AbstractClusteredObjectProxyInvoker.upgradeBackingObject(ClusteredObjectProxyFactory.java:326)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.upgradeBackingObject(ClusteredObjectProxyFactory.java:204)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory.upgradeCaches(ClusteredObjectProxyFactory.java:108)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.upgradeClusterObjects(ClusteringBootstrap.java:136)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.bootstrapWork(ClusteringBootstrap.java:127)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$1.execute(ClusteringBootstrap.java:69)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$1.execute(ClusteringBootstrap.java:65)
at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$2.doWork(ClusteringBootstrap.java:78)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$2.doWork(ClusteringBootstrap.java:75)
at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:555)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.onBootstrap(ClusteringBootstrap.java:74)
at org.springframework.extensions.surf.util.AbstractLifecycleBean.onApplicationEvent(AbstractLifecycleBean.java:56)
at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEventInternal(SafeApplicationEventMulticaster.java:214)
at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEvent(SafeApplicationEventMulticaster.java:185)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:334)
at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:950)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:482)
at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:410)
at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:306)
at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:112)
at org.alfresco.web.app.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:70)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:5016)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5524)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1081)
at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1877)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)Oct 18, 2017 4:38:32 PM org.apache.catalina.core.StandardContext listenerStart
SEVERE: Exception sending context initialized event to listener instance of class org.alfresco.repo.webdav.WebDAVSessionListener
com.hazelcast.core.OperationTimeoutException: [CONCURRENT_MAP_PUT] Operation Timeout (with no response!): 20000
at com.hazelcast.impl.BaseManager$ResponseQueueCall.waitAndGetResult(BaseManager.java:619)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getRedoAwareResult(BaseManager.java:641)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResult(BaseManager.java:636)
at com.hazelcast.impl.BaseManager$RequestBasedCall.getResultAsObject(BaseManager.java:464)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResultAsObject(BaseManager.java:555)
at com.hazelcast.impl.BaseManager$RequestBasedCall.getResultAsObject(BaseManager.java:460)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResultAsObject(BaseManager.java:555)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1894)
at com.hazelcast.impl.ConcurrentMapManager$MPut.txnalPut(ConcurrentMapManager.java:1818)
at com.hazelcast.impl.ConcurrentMapManager$MPut.put(ConcurrentMapManager.java:1682)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:632)
at com.hazelcast.impl.MProxyImpl$MProxyReal.put(MProxyImpl.java:606)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:173)
at com.hazelcast.impl.MProxyImpl.put(MProxyImpl.java:124)
at org.alfresco.enterprise.repo.cluster.cache.HazelcastSimpleCache.put(HazelcastSimpleCache.java:108)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.transferCollectedItems(ClusteredObjectProxyFactory.java:241)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.transferCollectedItems(ClusteredObjectProxyFactory.java:204)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$AbstractClusteredObjectProxyInvoker.upgradeBackingObject(ClusteredObjectProxyFactory.java:326)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory$CacheProxyInvoker.upgradeBackingObject(ClusteredObjectProxyFactory.java:204)
at org.alfresco.enterprise.repo.cluster.core.ClusteredObjectProxyFactory.upgradeCaches(ClusteredObjectProxyFactory.java:108)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.upgradeClusterObjects(ClusteringBootstrap.java:136)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.bootstrapWork(ClusteringBootstrap.java:127)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$1.execute(ClusteringBootstrap.java:69)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$1.execute(ClusteringBootstrap.java:65)
at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$2.doWork(ClusteringBootstrap.java:78)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap$2.doWork(ClusteringBootstrap.java:75)
at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:555)
at org.alfresco.enterprise.repo.cluster.core.ClusteringBootstrap.onBootstrap(ClusteringBootstrap.java:74)
at org.springframework.extensions.surf.util.AbstractLifecycleBean.onApplicationEvent(AbstractLifecycleBean.java:56)
at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEventInternal(SafeApplicationEventMulticaster.java:214)
at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEvent(SafeApplicationEventMulticaster.java:185)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:334)
at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:950)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:482)
at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:410)
at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:306)
at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:112)
at org.alfresco.web.app.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:70)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:5016)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5524)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1081)
at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1877)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)Oct 18, 2017 4:38:32 PM com.sun.xml.ws.transport.http.servlet.WSServletContextListener contextInitialized
INFO: WSSERVLET12: JAX-WS context listener initializing
Oct 18, 2017 4:38:40 PM com.sun.xml.ws.transport.http.servlet.WSServletDelegate <init>
INFO: WSSERVLET14: JAX-WS servlet initializing
Oct 18, 2017 4:38:42 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error listenerStart
Oct 18, 2017 4:38:42 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [/alfresco] startup failed due to previous errors
Oct 18, 2017 4:38:42 PM com.sun.xml.ws.transport.http.servlet.WSServletDelegate destroy
INFO: WSSERVLET15: JAX-WS servlet destroyed
Oct 18, 2017 4:38:42 PM com.sun.xml.ws.transport.http.servlet.WSServletContextListener contextDestroyed
INFO: WSSERVLET13: JAX-WS context listener destroyed
Oct 18, 2017 4:38:42 PM org.apache.catalina.core.ApplicationContext log
INFO: Closing Spring root WebApplicationContext
2017-10-18 16:38:43,807 INFO [cluster.core.ClusteringBootstrap] [localhost-startStop-1] Clustering has shutdown.
2017-10-18 16:38:43,808 WARN [context.support.XmlWebApplicationContext] [localhost-startStop-1] Exception thrown from ApplicationListener handling ContextClosedEvent
java.lang.NullPointerException
at org.alfresco.repo.workflow.activiti.ActivitiEngineInitializer.onShutdown(ActivitiEngineInitializer.java:65)
at org.springframework.extensions.surf.util.AbstractLifecycleBean.onApplicationEvent(AbstractLifecycleBean.java:67)
at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEventInternal(SafeApplicationEventMulticaster.java:214)
at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEvent(SafeApplicationEventMulticaster.java:190)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:334)
at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1051)
at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1012)
at org.springframework.web.context.ContextLoader.closeWebApplicationContext(ContextLoader.java:586)
at org.springframework.web.context.ContextLoaderListener.contextDestroyed(ContextLoaderListener.java:143)
at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:5063)
at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5719)
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:160)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1081)
at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1877)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Solved! Go to Solution.
Hazelcast timeouts can be caused by many things, from actual networking issues over CPU overload to memory / garbage collection issues on the other cluster node.The issue I have seen the most often is the latter, with a system being poorly configured and very close to garbage collection hell, where only a slight change in circumstance would bring down the entire cluster. You need to investigate what issue you were actually suffering from. I'd advise running some JVM monitoring via i.e. jvisualvm during startup (on all cluster nodes) to get a picture of what's going on.
In some circumstances you might even be able to avoid doing a full restart of your entire cluster, e.g. if only the communication / cluster state is affected. Using the JavaScript Console you can restart only the Hazelcast layer, and using the Caches tool of the OOTBee Support Tools addon you can purge data caches to remove potentially stale data.
Hazelcast timeouts can be caused by many things, from actual networking issues over CPU overload to memory / garbage collection issues on the other cluster node.The issue I have seen the most often is the latter, with a system being poorly configured and very close to garbage collection hell, where only a slight change in circumstance would bring down the entire cluster. You need to investigate what issue you were actually suffering from. I'd advise running some JVM monitoring via i.e. jvisualvm during startup (on all cluster nodes) to get a picture of what's going on.
In some circumstances you might even be able to avoid doing a full restart of your entire cluster, e.g. if only the communication / cluster state is affected. Using the JavaScript Console you can restart only the Hazelcast layer, and using the Caches tool of the OOTBee Support Tools addon you can purge data caches to remove potentially stale data.
Have you tried disabling multicast and instead listing the members of the cluster individually in the hazelcast config?
It looks something like:
<hz:join>
<hz:multicast enabled="false"
multicast-group="224.2.2.5"
multicast-port="54327"/>
<hz:tcp-ip enabled="true">
<hz:members>10.84.1.151,10.84.1.152</hz:members>
</hz:tcp-ip>
</hz:join>
With Hazelcast on Repository, multicast is disabled by default. The config example from Jeff applies only to the Share tier where the Hazelcast config is embedded in Spring. For Share the documentation of Alfresco provides the configuration with multicast enabled. The error messages in the logs point to Repository-tier issues though.
Thanks for the replies Axel Faust and Jeff Potts. The actual root problem was all of our Alfresco servers in the cluster were close to being maxed on CPU.
The issue was under peak load we had a few background (custom) processes kicking off which put the servers over the edge.
In the heat of the moment we removed all of the servers from the cluster and simply had our API layer talking to Alfresco via CMIS through a load balancer unclustered. We thought we were all good. Servers seemed healthy from CPU, JVM, and the number of requests we were handling. But..... After looking into the logs a majority of the document update calls were failing with messages like the following in our custom API logs.
Expected xxxx bytes but retrieved 0 bytes!
We reproduced this issue in our Performance environment. We resolved this issue by adding the servers back into the cluster. The weird thing was it was only updates causing this issue. New document adds didn't have any issues. Only binary updates. I wonder if this is a bug with the CMIS implementation.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.