ASR Performance Optimization

cancel
Showing results for 
Search instead for 
Did you mean: 

ASR Performance Optimization

resplin
Intermediate
0 0 6,749

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com



{{AVMWarning}}
AVM
Back to WCM Developer Documentation


Introduction


There are many things you can do to optimize the performance of an Alfresco System Receiver (ASR).  Herein we will cover many of them. 


Optimize XML Metadata Extraction


First, I’ll talk specifically about XML Metadata Extractors for Alfresco Web Content Management (WCM).  So what is an XML metadata extractor, and why should you care about it?  Let’s put it in the context of a diagram:

center

In the diagram above we see that the WCM authoring environment is configured with web forms.  This allows business users to enter content into the system; an article to be published for example.  To do so, the user does not have to be skilled in web related technologies, such as HTML; they simply fill out a form with the content to be published.  Once their content is entered it is saved as XML, submitted to the staging sandbox, and ultimately deployed, in this case to an ASR.  The ASR is seen as being configured with an XML Metadata Extractor and a DM content model, which defines aspects that will be applied to the deployed content.  So for an article content type on the authoring side, there would be an article aspect defined in the content model on the ASR.  The XML metadata extractor is used to extract content from the deployed content (the article) and store is according to the aspect defined in the DM content model.  As such, the content delivered via the web form can be indexed by Lucene, enabling optimized search performance on retrieval.

The problem with this approach is that the ASR is likely serving a live production web site that may have thousands (or more) concurrent visitors:

center

As such, it is less than ideal to have the ASR execute the processing required to extract metadata from the XML content on the ASR itself.  Wouldn’t it be better if that was done on the authoring server?  You bet it would.  Hence the first ASR optimization; perform XML metadata extraction in the authoring environment:

center

By configuring the XML metadata extraction to occur on the authoring environment, we save some cycles on the ASR, which is certainly a good thing if we’re using it to back a high traffic web site.

For assistance setting up XML metadata extraction for WCM, see Metadata_Extraction#XML_Meta-data_Extractor_Configuration_for_WCM.


Disable Permission Checking


One of the major features Alfresco brings to the table as an ECM system is the ability to secure content by user, group, or role (or some combination thereof).  As such, when searches are carried out on behalf of a user, results are pruned by default by the Alfresco PermissionService, thus only returning results that the authenticated user has read access to.

In the case of an ASR though, content is typically retrieved by a single calling application using a single login.  This very common scenario does not require “pruning” of content that the calling application is trying to retrieve.  Avoiding the unnecessary (and expensive) calls to the database for permission checking, particularly as the resulting content set grows in size, would yield significant savings in response time.  Therefore, it is very simple in Alfresco to turn off permissions checking via configuration, which will turn off permission checking repository wide.  Note that this should be done with extreme care - not all use cases are created the same, so be sure to fully evaluate your specific requirements before taking this action.

In my tests, the response time with permission checking turned off was 33% faster on average over comparative tests for 1, 2, 3, 4, 5, 10, 20, and 110 concurrent users.  See beautiful chart reflecting these numbers:

center

To disable permission checking for your entire repository, simply rename “alfresco/tomcat/shared/classes/alfresco/extension/unsecured-public-services-security-context.xml.sample” to “alfresco/tomcat/shared/classes/alfresco/extension/unsecured-public-services-security-context.xml” in your installation.  Note that I found a defect in 3.1.1 Enterprise and 3.2 Community regarding this issue.  You will need to use the “unsecured-public-services-security-context.xml.sample” file attached to this issue: https://issues.alfresco.com/jira/browse/ETHREEOH-2604 instead of the one that the installer lays down for you.


Use 64 bit


Moving to a 64 bit operating system and database (MySQL) from 32 bit showed a performance gain of ~20% on average for test runs ranging from five to three hundred concurrent users, as shown below.  Therefore I highly recommend moving to a 64 bit operating system and database to further enhance the performance of your ASR.

550 px|center


Tune your JVM


Optimize your JVM settings!  By default, Alfresco ships with a maximum heap size of 512MB of RAM.  As such, under high load, an ASR can max out a heap of that size and thus serve content a bit more slowly.  So you definitely want to increase the maximum heap size to as high as you can.  Also, depending on how you set up your minimum and maximum heap sizes, you may want to adjust the size of newly allocated heap space using the -XX:NewSize command line option.  There are a few other things you can adjust as well regarding your JVM like enabling hotspot pre-compilation, and the size of the stack, amongst others.  See the Repository Hardware page for more details.  Ultimately for my testing on my Macbook Pro running Alfresco on Windows XP via VMWare Fusion, I used the following settings (in alfresco.bat):



set JAVA_OPTS=-Xms768m -Xmx1536m -Xss1m -XX:MaxPermSize=128m -Xcomp -Xbatch -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:NewSize=384m -XX:CMSInitiatingOccupancyFraction=80 -server



Don't Virtualize


Speaking of virtualization, that reminds me.  Don’t virtualize!  For best performance, use the actual hardware!  Virtualization adds overhead.  I have not yet tested ASR performance without virtualization to be able to prove the impact that it has, but I will, and when I do, you can be sure that I’ll update this page, so come back sometime soon!


Increase Database Connection Pool


What about the database?  Increasing the size of the database connection pool will enable your ASR to handle more concurrent users.  Therefore, in custom-repository.properties in your extension directory (Alfresco 3.x) or in alfresco-global.properties (Alfresco 3.2+), you can set the following:



db.pool.initial=10
db.pool.max=350


Note that for MySQL, you will also have to update the global 'max_connections' variable. In the command line mysql client, you can do this by typing 'set global max_connections=350;'.  To confirm that it worked, type 'show variables;' and locate max_connections.


Increase Tomcat Thread Pool


In addition to increasing the database thread pool, you should increase the tomcat thread pool as well.  This can be done by modifying the following line in alfresco/tomcat/conf/server.xml:



<Executor name='tomcatThreadPool' namePrefix='catalina-exec-' maxThreads='300' minSpareThreads='4'/>


You will also have to comment out the active connector for port 8080, and uncomment the one that makes use of the threadpool:


<!--


    <Connector port='8080' protocol='HTTP/1.1' URIEncoding='UTF-8'
               connectionTimeout='20000'
               redirectPort='8443' />


-->



    <Connector executor='tomcatThreadPool'
               port='8080' protocol='HTTP/1.1'
               connectionTimeout='20000'
               redirectPort='8443' />





Note that we keep the Tomcat thread pool smaller than the database connection pool, as Alfresco has a number of background threads that also require database connections.


Remove/Disable What You Don't Need


Now that we’ve tuned some parts of the server, let’s turn off some things we don’t need.  For example, you should remove the share web application (share.war) that is installed with Alfresco by default, as well as Web Studio (studio.war) and the mobile web application (mobile.war) if you’re running Alfresco Community.  Be sure to remove (or just move elsewhere) all of the unneeded .war files as well as their exploded directories.  The only application you need on an ASR is alfresco.war.  Additionally, you may want to download a clean copy of alfresco.war (making sure you have the correct version for your installation), and be sure NOT to apply the SharePoint Protocol AMP file to it.  No sense running another listener that no one will ever use!  Also, you typically want to lock down an ASR so that content is only accessed by your web application via the web scripts you’ve authored to expose your web content.  Therefore, you don’t need to be running virtual filesystems such as CIFS for shared drives, and FTP.  This can be done very simply, by setting the following in custom-repository.properties (3.1) or alfresco-global.properties (3.2):



ftp.enabled=false
cifs.enabled=false



Optimize Web Scripts


Another very useful exercise is to optimize your web scripts.  For example, if you have a page in your web application that has three content areas managed by Alfresco, don’t have the web application call three separate web scripts, each of which returns only content related to that area.  Network hops are expensive, so they should be minimized to just one hop per page type.  In this scenario, performance would likely be improved by having the web application call a single web script per page type which returns all of the content to be displayed by that page at that time.


Cache


That leads to the next point: build caching into your web application, or as a layer between your web application and Alfresco, and be sure to determine and implement a cache update strategy.  If the web application can simply hit a cache that is local to it while the cache is updated asynchronously via some other process, the web application will fly as compared to making calls to the ASR every single time a user views a page.


Conclusion


To conclude, here’s a handy list of steps you can take to optimize ASR performance, from my estimation of decreasing impact:

  1. Cache
  2. Turn permissions checking off
  3. Use 64 bit
  4. Optimize your web scripts per page type on your web site
  5. Don’t virtualize
  6. Tune your JVM for performance
  7. Increase the database connection pool
  8. Increase the tomcat thread pool
  9. Remove/disable things you don’t need
         * CIFS Server
         * FTP Server
         * Share
         * Web Studio
         * Mobile
         * SharePoint Protocol (SPP) support
  10. Configure XML metadata extraction so that it occurs on your authoring server, not the ASR

If you have additional suggestions, please share!


Note about 3.2 ASR performance


November 25, 2009 - I had a chance to create a nightly build of the 3.2 Enterprise source and run my tests against it.  Below is how it performed compared to 3.1.1 Enterprise for me.  On average, 3.2 performed 51% faster than 3.1.1 in my tests.  This significant improvement, to the best of my knowledge, can be attributed to upgrading Lucene and refactoring at the database layer.
650 px|center