Using Apache to load balance and clustering Alfresco Share with Hazelcast

cancel
Showing results for 
Search instead for 
Did you mean: 

Using Apache to load balance and clustering Alfresco Share with Hazelcast

kevinr1
Established Member
0 10 5,394

This post assumes reasonable sys-admin Alfresco knowledge and assumes you are already familiar with setting up the Alfresco Repository in a cluster configuration and familiar configuring an Apache web-server instance. You should read the previous post first.

Since my last blog on this subject, there has been quite a bit of interest in load balancing Alfresco Share. This is good news but also means that customers and the community have found some issues that needed looking at - good, because more use of Alfresco testing means more stability for everyone when we fix the issues - that's part of the fun of having a very active user community!

Three main points were raised:

  1. A bug relating to a problem with dashboard template layouts not updating between web-tier nodes - which was quickly resolved for 3.4.8 and 4.0 see ALF-12318
    1. A request for example load balancing config for Share itself as I only posted info on the forward proxy config for Share.
      1. An issue raised in regards to a noticeable drop in performance when the cache config detailed in the previous post was added to each Share instance (which was required for load balancing multiple instances) - see this related blog post http://blog.alfrescian.com/?p=146 and ALF-12336

      I'll address all issues here and share some exciting performance related to item 3 also!

      1. A bug that manifested when a user changed the template layout selection for a dashboard - for example from 2 to 3 column layout. The problem was that SpringSurf PageView objects were internally caching the Page object rather than just the PageId - easily and quickly fixed.

      2. Config for load balancing Share is very similar to that for load balancing Alfresco.

      Set up two tomcat instances containing 'share.war' webapp with the 'share-config-custom.xml' and 'custom-slingshot-application-context.xml' config as detailed in the previously post. Remember the ports exposed by Tomcat need tweaking if you have those instances on the same physical machine - increment the HTTP and AJP and redirect ports in the tomcat/conf/server.xml config. Also ensure you have set the 'jvmRoute' attributes to different values ready for the load balancing config, I used 'tomcat3' and 'tomcat4' as I now have a lot of servers running on a single box!

      Create another Apache instance, I just did a copy and paste from the one I used to load balance the Alfresco cluster. Again bump the Apache listener port value if it exists on the same physical machine. Finally configure Apache 'httpd.conf' to load balance against your Share web-tier instances:

      # Reverse Proxy Settings (Share multi-instance load balancing)

      ProxyRequests Off

      ProxyPassReverse /share balancer://app

      ProxyPass /share balancer://app stickysession=JSESSIONID|jsessionid nofailover=On

      <Proxy balancer://app>

      BalancerMember ajp://localhost:8019/share route=tomcat3

      BalancerMember ajp://localhost:8024/share route=tomcat4

      </Proxy>

      Simply point your client browsers at your new Apache instance. If you have set up your Share instances to themselves use an Apache which is load balancing against an Alfresco cluster then you now have a full 2x Alfresco Cluster + Apache + 2x Alfresco Share + Apache set up!

      This is great - BUT it leads onto point 3...

      3. Scalability and fail-over capability has improved by having multiple Share instances - but individual performance per Share node is reduced. Now you may consider (as I did first) that this is expected, there is after all some additional work here going on - in the case of the Alfresco cluster it's inter-node communication overhead, and in the case of the Share nodes it is reduced performance due to the caches that have been disabled. What's apparent here is that if you just cluster Alfresco and keep to a single Share instance, we find that a single instance of Share can easily service a 4 node Alfresco cluster so in practice there is little need to load balance Share for performance reasons - but you certainly might want to for high availability reasons. Perhaps a price worth paying for the ability to remove and drop in additional nodes without your users knowing or having to update their URLs... But it would be nice if there wasn't such a noticeable performance drop in Share.

      The good news is that this has all changed in Alfresco 3.4.8/4.0.1 - in response to the community blog post and our drive to continually improve the performance of Alfresco, a new clustering technique has now been implemented for the web-tier.

      For a load balanced environment, Alfresco Share now uses Hazelcast to provide multicast messaging between web-tier nodes. The end result of this is that all caches are now enabled again for each node, and we send very simple cache invalidation message when appropriate to all nodes. So the performance degradation is gone - each node is as fast a single Share instance.

      The only changes required for each node are in “custom-slingshot-application-context.xml” – generally located in \tomcat \shared\classes\alfresco\web-extension and used to override the Spring application context beans for Share. There is an example “custom-slingshot-application-context.xml.sample” provided in the Alfresco distribution which now includes this config.

      Enable this section on each Share tomcat instance to enable the Hazelcast cluster messaging:




         <!--

              Hazelcast distributed messaging configuration - Share web-tier cluster config (3.4.8 and 4.0.1)

              - see http://www.hazelcast.com/docs.jsp

              - and specifically http://www.hazelcast.com/docs/1.9.4/manual/single_html/#SpringIntegration

         -->

         <!-- Configure cluster to use either Multicast or direct TCP-IP messaging - multicast is default -->

         <!-- Optionally specify network interfaces - server machines likely to have more than one interface -->

         <!-- The messaging topic - the 'name' is also used by the persister config below -->

         <hz:topic id='topic' instance-ref='webframework.cluster.slingshot' name='slingshot-topic'/>

         <hz:hazelcast id='webframework.cluster.slingshot'>

            <hz:config>

               <hz:group name='slingshot' password='alfresco'/>

               <hz:network port='5801' port-auto-increment='true'>

                  <hz:join>

                     <hz:multicast enabled='true'

                           multicast-group='224.2.2.5'

                           multicast-port='54327'/>

                     <hz:tcp-ip enabled='false'>

                        <hz:members></hz:members>

                     </hz:tcp-ip>

                  </hz:join>

                  <hz:interfaces enabled='false'>

                     <hz:interface>192.168.1.*</hz:interface>

                  </hz:interfaces>

               </hz:network>

            </hz:config>

         </hz:hazelcast>

         <bean id='webframework.slingshot.persister.remote' class='org.alfresco.web.site.ClusterAwarePathStoreObjectPersister' parent='webframework.sitedata.persister.abstract'>

            <property name='store' ref='webframework.webapp.store.remote' />

            <property name='pathPrefix'><value>alfresco/site-data/${objectTypeIds}</value></property>

            <property name='hazelcastInstance' ref='webframework.cluster.slingshot' />

            <property name='hazelcastTopicName'><value>slingshot-topic</value></property>

         </bean>

         <bean id='webframework.factory.requestcontext.servlet' class='org.alfresco.web.site.ClusterAwareRequestContextFactory' parent='webframework.factory.base'>

            <property name='linkBuilderFactory' ref='webframework.factory.linkbuilder.servlet' />

            <property name='extensibilityModuleHandler' ref='webscripts.extensibility.handler' />

            <property name='clusterObjectPersister' ref='webframework.slingshot.persister.remote' />

         </bean>

      The config enables the Hazelcast Spring integration which starts the Hazelcast server, it is easily configurable and can use either multicast (the default and minimal effort) or TCP-IP direct if preferred. See http://www.hazelcast.com/docs.jsp for more info. For the default set up, identical config can be applied to each Share node and it will 'just work'.

      When you start Share you'll see something like this:

      INFO: /192.168.2.8:5801 [slingshot] Hazelcast 1.9.4.6 (20120105) starting at Address[192.168.2.8:5801]

      19-Jan-2012 13:58:57 com.hazelcast.system

      INFO: /192.168.2.8:5801 [slingshot] Copyright (C) 2008-2011 Hazelcast.com

      19-Jan-2012 13:58:57 com.hazelcast.impl.LifecycleServiceImpl

      INFO: /192.168.2.8:5801 [slingshot] Address[192.168.2.8:5801] is STARTING

      19-Jan-2012 13:58:59 com.hazelcast.impl.MulticastJoiner

      INFO: /192.168.2.8:5801 [slingshot]

      Members [1] {

              Member [192.168.2.8:5801] this

      }

      19-Jan-2012 13:58:59 com.hazelcast.impl.management.ManagementCenterService

      INFO: /192.168.2.8:5801 [slingshot] Hazelcast Management Center started at port 5901.

      19-Jan-2012 13:58:59 com.hazelcast.impl.LifecycleServiceImpl

      INFO: /192.168.2.8:5801 [slingshot] Address[192.168.2.8:5801] is STARTED


      This means the config has driven the initialisation of Hazelcast successfully. That's all there is to creating a Share instance in the cluster, if the config is present it will become a cluster node, if the config is not present (such as for a default install) then Hazelcast never starts . Once each node is started they will find each other automatically. Then once you users interact with Share, only when the following operations occur will cache invalidation messages will be sent from the affected node to the others in the cluster:

      • an existing site/user dashboard layout is modified
      • new site or user dashboard is created
      • runtime app properties are changed (the Share theme currently)

      This keeps chatter to a minimum and performance up!

      10 Comments
      blog_commenter
      Active Member
      Hi Kev,

      thanks a lot for such fast fixing this issue!

      cheers, jan
      blog_commenter
      Active Member
      Hallo Kevin,



      definitely cool stuff !



      Are there any plans to leverage Hazelcast in the repository as well ?



      cheers

      Andreas
      kevinr1
      Established Member
      Thanks. Yes we are looking at it as a replacement for JGroups - as the LGPL license with JGroups is not always desirable.
      blog_commenter
      Active Member
      Hi Kev,



      Very interesting! thanks for the blog post.



      I'll have to try it out Smiley Happy
      blog_commenter
      Active Member
      Recently I have setup Alfresco 4.0 two-node HA cluster and did some tests but was dissapointed that when the node on which I am logged on in Share goes down I have to logon again on the second node. I wanted real session failover and it seems that I made it.



      Can someone point out if the configuration is somehow incompatible with other clustering used in Alfresco?



      In Apache I used nofailover=Off:

      [sourcecode language='xml']



              ProxyPass / balancer://alfcluster/ stickysession=JSESSIONID nofailover=Off

             

                      BalancerMember ajp://alf1:8009 route=alf1

                      BalancerMember ajp://alf2:8009 route=alf2

             



      [/sourcecode]



      And in Tomcat I enabled cluster:

      [sourcecode language='xml']



             

               



               

                 

                 



                 

                   

                 

                 

                 

               



               

               



               

               

             

      [/sourcecode]



      Now when the node where I am logged on in Share goes down, it automatically moves me to the second node and I can continue to work where I stopped.
      blog_commenter
      Active Member
      Recently I have setup Alfresco 4.0 two-node HA cluster and did some tests but was dissapointed that when the node on which I am logged on in Share goes down I have to logon again on the second node. I wanted real session failover and it seems that I made it.



      Can someone point out if the configuration is somehow incompatible with other clustering used in Alfresco?



      In Apache I used nofailover=Off:



      <IfModule mod_proxy.c>

              ProxyPass / balancer://alfcluster/ stickysession=JSESSIONID nofailover=Off

              <Proxy balancer://alfcluster>

                      BalancerMember ajp://alf1:8009 route=alf1

                      BalancerMember ajp://alf2:8009 route=alf2

              </Proxy>

      </IfModule>





      And in Tomcat I enabled cluster:



      <Engine name='Catalina' defaultHost='localhost' jvmRoute='alf1'>

      <Cluster className='org.apache.catalina.ha.tcp.SimpleTcpCluster' channelSendOptions='6'>

         <Manager className='org.apache.catalina.ha.session.DeltaManager'

            expireSessionsOnShutdown='false'

            notifyListenersOnReplication='true'/>



         <Channel className='org.apache.catalina.tribes.group.GroupChannel'>

        <Membership className='org.apache.catalina.tribes.membership.McastService'

           address='228.0.0.4'

           port='45564'

           frequency='500'

           dropTime='3000'/>

        <Receiver className='org.apache.catalina.tribes.transport.nio.NioReceiver'

            address='192.168.31.1'

            port='4000'

            autoBind='100'

            selectorTimeout='5000'

            maxThreads='6'/>



        <Sender className='org.apache.catalina.tribes.transport.ReplicationTransmitter'>

          <Transport className='org.apache.catalina.tribes.transport.nio.PooledParallelSender'/>

        </Sender>

        <Interceptor className='org.apache.catalina.tribes.group.interceptors.TcpFailureDetector'/>

        <Interceptor className='org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor'/>

         </Channel>



         <Valve className='org.apache.catalina.ha.tcp.ReplicationValve'

          filter=''/>

         <Valve className='org.apache.catalina.ha.session.JvmRouteBinderValve'/>



         <ClusterListener className='org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener'/>

         <ClusterListener className='org.apache.catalina.ha.session.ClusterSessionListener'/>

      </Cluster>

      ...





      Now when the node where I am logged on in Share goes down, it automatically moves me to the second node and I can continue to work where I stopped.
      blog_commenter
      Active Member
      I can confirm that Hazelcast has replaced JGroups in the Repository for Alfresco 4.2. Clustering with Alfresco server will now be much easier and there is a new admin screen dedicated to Clustering also.
      blog_commenter
      Active Member
      Nice - but is there  a reason Share clustering hasn't been documented yet?
      kevinr1
      Established Member
      It's coming in the next set of 4.1.x and 4.2 documentation.
      blog_commenter
      Active Member
      I had a problem with NTLM passthru SSO on Alfresco Share via an Apache load balancer. Single sign on was working perfectly if I connect to a Tomcat directly, but when I tried to open Share via the Apache URL, I was just getting an empty page result. This was only reproducible in Internet Explorer.



      After investigating it appeared that by default Apache always closes connections and thus NTLM handshake fails. A solution to the problem was to add (override) following configuration in Apache:



      #

      # KeepAlive: NTLM handshake does not work if all connections are closed,

      # so this must be set On for Alfresco Single-Sign-On to work.

      #

      KeepAlive On