Notice:Now that HAProxy 1.5.x has been released, I thought I'd update this to bring the configuration in line with some of the syntax changes.The changes are small but without them the haproxy service won't start.-----------------------------------------------------------------------------------------For the cloud service we (Alfresco DevOps) used to use apache for all our load balancing and reverse proxy use, but more recently we switched to use HAProxy for this task.In this article I'll list some of the settings we use, and give a final example that could be used (with a bit of environment specific modifications) for a general Alfresco deployment.The main website for HAProxy is: http://haproxy.1wt.eu/The docs can be found here: http://cbonte.github.io/haproxy-dconv/configuration-1.5.htmlI suggest that for any of the settings covered in the rest of this article, the HAProxy docs are consulted to gain a deeper understanding of what they do.The 'global' section:
globalA quick breakdown of these:
log 127.0.0.1 local2 info
stats socket /var/run/haproxy.stat user nagios group nagios mode 600 level admin
The 'defaults' section:
- global - defines global settings.
- pidfile - Writes pids of all daemons into file <pidfile>.
- log - Adds a global syslog server. Optional
- stats socket - Sets up a statistics output socket. Optional
defaultsA quick breakdown of these:
Now we configure some options that specify how HAProxy works, these options are very important to get your service working properly:
- defaults - defines the default settings
- mode - sets the working mode to http (rather than tcp)
- log - sets the log context
option httplogThese options do the following:
Next we define the timeouts - these are fairly self-explanatory:
- option httplog - this enables logging of HTTP request, session state and timers.
- option dontlognull - disable logging of null connections as these can pollute the logs.
- option forwardfor - enables the insertion of the X-Forwarded-For header to requests sent to servers.
- option http-server-close - enable HTTP connection closing on the server side. See the HAProxy docs for more info on this setting.
- option redispatch - enable session redistribution in case of connection failure, which is important in a HA environment.
- option tcp-smart-accept - this is a performance tweak, saving one ACK packet during the accept sequence.
- option tcp-smart-connect - this is a performance tweak, saving of one ACK packet during the connect sequence.
timeout http-request 10sWe then configure gzip compression to reduce the amount of data being sent across the wire - I'm sure no configuration ever misses out this easy performance optimisation:
timeout queue 1m
timeout connect 5s
timeout client 2m
timeout server 2m
timeout http-keep-alive 10s
timeout check 5s
compression algo gzipThe next section is some error message housekeeping. Change these paths to wherever you want to put your error messages:
errorfile 400 /var/www/html/errors/400.httpNow we have finished setting up all our defaults, we can start to define our front ends (listening ports).We first define our frontend on port 80. This just does a redirect to the https frontend:
errorfile 403 /var/www/html/errors/403.http
errorfile 408 /var/www/html/errors/408.http
errorfile 500 /var/www/html/errors/500.http
errorfile 502 /var/www/html/errors/502.http
errorfile 503 /var/www/html/errors/503.http
errorfile 504 /var/www/html/errors/504.http
# Front end for http to https redirectNext we define our https frontend which is where all traffic to Alfresco is handled:
redirect location https://my.yourcompany.com/share/
# Main front end for all servicesWe now get into the more 'fun' part of configuring HAProxy - setting up the acls.These acls are the mechanism used to match requests to the service to the appropriate backend to fulfil those requests, or to deny unwanted traffic from the service. I suggest that if you are unfamiliar with HAProxy that you have a good read of the docs for acls and what they can achieve (section 7 in the docs).We separate out all the different endpoints for Alfresco into their own sub-domain name, e.g. my.alfresco.com for share access, webdav.alfresco.com for webdav, sp.alfresco.com for sparepoint access.I'll use these three endpoints in the examples below, using the following mapping:
bind *:443 ssl crt /path/to/yourcert/yourcert.pem
capture request header X-Forwarded-For len 64
capture request header User-agent len 256
capture request header Cookie len 64
capture request header Accept-Language len 64
We first set up some acls that check the host name being accessed and match on those. Anything coming in that doesn't match these won't get an acl associated (and therefore won't get forwarded to any service).
- Share - my.yourcompany.com
- Webdav - webdav.yourcompany.com
- Sharepoint - sp.yourcompany.com
# ACL for backend mapping based on host headerThese are in the syntax:acl acl_name match_expression case_insensitive(-i) what_to_matchSo, acl is_my hdr_beg(host) -i my.yourcompany.com states:
acl is_my hdr_beg(host) -i my.yourcompany.com
acl is_webdav hdr_beg(host) -i webdav.yourcompany.com
acl is_sp hdr_beg(host) -i sp.yourcompany.com
We then do some further mapping based on url paths in the request using some standard regex patterns:
- acl - define this as an acl.
- is_my - give the acl the name 'is_my'.
- hdr_beg(host) - set the match expression to use the host HTTP header, checking the beginning of the value.
- -i - set the check to be case insensitive
- my.yourcompany.com - the value to check for.
# ACL for backend mapping based on url pathsThese do the following:
acl robots path_reg ^/robots.txt$
acl alfresco_path path_reg ^/alfresco/.*
acl share_path path_reg ^/share/.*/proxy/alfresco/api/solr/.*
acl share_redirect path_reg ^$|^/$
We next add in some 'good practice' - a HSTS header. You can find out more about HSTS here: https://www.owasp.org/index.php/HTTP_Strict_Transport_SecurityNote, my.alfresco.com is in the internal HSTS list in both Chrome and Firefox so neither of these browsers will ever try to access the service using plain http (see http://www.chromium.org/sts).
- acl robots - checks for a web bot harvesting the robots.txt file
- acl alfresco_path - checks whether the request is trying to access the alfresco webapp. We deny direct access to the Alfresco Explorer webapp so you can remove this check if you want that webapp available for use.
- acl share_path - We use this to deny direct access to the Solr API.
- acl share_redirect - this checks whether there is any context at the end of the request (e.g. /share)
# Changes to header responsesWe next set up some deny settings, you can ignore these if you don't want to limit access to any service. The example below denies access to the Alfresco Explorer app from public use via the 'my.yourcompany.com' route. These use matched acls from earlier, and can include multiple acls that must all be true.
rspadd Strict-Transport-Security:\ max-age=15768000
# Denied pathsNow we redirect to /share/ if this wasn't in the url path used to access the service.
http-request deny if alfresco_path is_my
# RedirectsNext we set up the list of backends to use, matched against the already defined acls.
redirect location /share/ if share_redirect is_my
# List of backendsThen we set up the default backend to use as a catch-all:
use_backend share if is_my
use_backend webdav if is_webdav
use_backend sharepoint if is_sp
default_backend shareNow we define the backends, the first being for share:
backend shareOn this backend, enable the stats page:
# Enable the stats page on share backendThe stats page gives you a visual view on the health of your backends and is a very powerful monitoring tool.
stats auth <user>:<password>
stats uri /monitor
stats refresh 2s
option httpchk GET /shareThese define the following:
cookie JSESSIONID prefix
server tomcat1 server1:8080 cookie share1 check inter 5000
server tomcat2 server2:8080 cookie share2 check inter 5000
Define the webdav backend.Here we hide the need to enter /alfresco/webdav on the url path which gives a neater and shorter url needed to access webdav, and again we enable server health checking:
- backend share - this defines a backend called share, which is used by the use_backend config from above.
- option httpchk GET /share - this enables http health checks, using a http GET, on the /share path. Server health checks are one of the most powerful feature of HAProxy and works hand in hand with tomcat session replication to move an active session to another server if the server your active session on fails healthchecks.
- balance leastconn - this sets up the balancing algorithm. leastconn selects the server with the lowest number of connections to receive the connection.
- cookie JSESSIONID prefix - this enables cookie-based persistence in a backend. Share requires a sticky session and this also is used in session replication.
- server tomcat1 server1:8080 cookie share1 check inter 5000 - this breaks down into:
- server - this declares a server and its parameters
- tomcat1 - this is the server name and appears in the logs
- server1:8080 - this is the server address (and port)
- cookie share1 - this checks the cookie defined above and if matched routes the user to the relevant server. The 'share1' value has to match the jvmroute set on the appserver for Share/Alfresco (for Tomcat see http://tomcat.apache.org/tomcat-7.0-doc/cluster-howto.html)
- check inter 5000 - this sets the health check, with an inter(val) of 5000 ms
backend webdavDefine the SPP backend.Here we define the backend for the sharepoint protocol, again with health checks:
option httpchk GET /alfresco
reqrep ^([^\ ]*)\ /(.*) \1\ /alfresco/webdav/\2
server tomcat1 server1:8080 check inter 5000
server tomcat2 server2:8080 check inter 5000
backend sharepointOnce this is all in place you should be able to start HAProxy. If you get any errors you will be informed on which lines of the config these are in. Or, if you have HAProxy as a service, you should be able to run 'service haproxy check' to check the config without starting HAProxy.There are many more cool things you can do with HAProxy, so give it a go and don't forget to have a good read of the docs!
balance url_param VTISESSIONID check_post
cookie VTISESSIONID prefix
server tomcat1 server1:7070 cookie share1 check inter 5000
server tomcat2 server2:7070 cookie share2 check inter 5000