The official documentation is at: http://docs.alfresco.com
The primary role of the virtualization server is to support in-context preview of Alfresco managed content. It is not, however, the only way to achieve this - the other options (along with a comparison between them) are described at WCM Preview.
Note that the other functions the virtualization server has been used for in the past (including dynamic Web Form includes) have been superceded by non-virtualization-server based mechanisms that are recommended going forward. For that reason if you're not using the virtualization server for in-context preview, it is redundant and should not be installed.
The virtualization server's job is to allow data within certain portions of the AVM repository to be interpreted as a set of virtual websites, and to allow these websites to be browsed by end users (and QA tools) prior to deployment (or submission to a 'common staging area'). These websites may include virtualized servlets and JSPs, but completely static websites can be virtualized too (.html, .gif, .png, etc).
See the wiki article: Configuring the Virtualization Server.
The virtualization server is a self-contained tomcat 5.5 instance that listens on port 8180 and is part of the WCM tarball. To get it running, download the linux tarball for the WCM, unpack it and run virtual_alf.sh start. The whole thing is self-contained under the virtual-server directory.
On Unix/Linux, do the following:
# Start or stop Alfresco server
# Set the following to where Tomcat is installed
Ideally, the virtualization server would be started as a service under /etc/init.d but you'll have to write it yourself.
On Windows, the 2.0 Alfresco installer attempts to run the virtualization server as a console application; however, this can cause problems (e.g.: the server can hang, due to being blocked on IO from logging). While some have suggested introducing pause during startup, this is inappropriate also, because it only postpones the inevitable hang. The proper thing is to install the virtualization server as a Windows service, and start/stop
it from the service manager console.
Here's how to set it up;
cmd> set JAVA_HOME=c:\java\jdk1.5.0_08
cmd> cd c:\alfresco\virtual-tomcat
cmd> bin\service.bat install alfrescoVirtualTomcat
cmd> bin\tomcat5w //ES//alfrescoVirtualTomcat
Control Panel > Administrative Tools > Services
To modify a service setting, right click on the appropriate service, click on the 'General' tab, then modify things such as 'Startup type'. NOTE: If you start the virtualization service automatically, be sure to make the Tomcat instance hosting the Alfresco webapp start automatically too; otherwise, make them both be 'Manual'. A bug has been filed on this (see: WCM-422).
In some configurations, it was necessary to delay starting the virtualization server until after the Alfresco webapp was up (see WCM-750). This issue has been resolved in Alfresco 2.1E.
Edit the configuration file:
Modify this file so that the following lines appear within it:
org.alfresco.catalina.host.AVMHostConfig.level = FINE
org.apache.catalina.startup.HostConfig.level = FINE
After you've saved this change, restart the virtualization server.
In the WCM authoring environment, you can navigate to your author sandbox of an empty web project
that you've created, and click on the 'create' menu, then select 'create web content'. You can create
a file called 'index.html' of type HTML and then enter text 'hello world'.
Assuming your virtualization server is running, you can click on the icon that looks like an
eyeball (i.e. 'preview') and you'll see your one page 'website' virtualized from your sandbox.
Alternatively, suppose you have a static website in your UNIX or windows native file system.
Let's say this is in a directory called 'xxx', and it just has one
file called 'index.html'. This is about as simple of
a website as you can get:
% mkdir xxx
% cd xxx
% echo 'hello world' > index.html
Now, to import this website into Alfresco, you've got to bundle it up
as a war file. That's quite easy:
% jar -cfv example.war *
Just for the fun of it, let's see what's in that war file:
% jar -tvf example.war
0 Thu Mar 22 12:51:20 EDT 2007 META-INF/
71 Thu Mar 22 12:51:20 EDT 2007 META-INF/MANIFEST.MF
12 Thu Mar 22 12:40:50 EDT 2007 index.html
Ok, now you've got a war file that contains the website you want
to import. Again. this war file (example.war) could have contained
lots of files & directories filled with servlets, JSPs, filters, and so
on, or it could have been something utterly mundane like a static
website. In the end, it's a war file -- that's all.
So now we have our war file, and it's time to import it using
'bulk import'. You can navigate to your author sandbox of a web project
that you've created, and click on the 'create' menu, then select 'bulk import'.
You'll be prompted to find the war file you just created in your native
UNIX/Windows file system. Do the import, and that's it! Again, assuming
your virtualization server is running, you can click on the icon that looks like an
eyeball (i.e. 'preview') and you'll see your website virtualized from your sandbox.
Put another way, just because you're importing a war file doesn't
mean the data *within* the war file has to be a full-blown webapp.
Simple websites can be virtualized too.
The ROOT webapp is associated with the virtual host relative path: '/'
Suppose you only have ROOT webapp, and you issue the request:
The virtualization server knows you're talking about the AVM path
However, suppose you had a webapp named 'foo'. Now the same
request would be mapped to the AVM path:
In other words, having a webapp named 'foo' makes /foo/bar.html
refer to the /bar.html file within foo (as opposed to the /foo/bar.html
within ROOT). The fact that the 1st segment of the request path in the URL
either maps to a webapp name or a path within a webapp
(depending on whether or not you have a webapp by that name)
is something that I personally dislike, but that's the spec
(grumble grumble!). To compound matters, serlvet containers
get very unhappy in certain circumstances if you do not have a
Therefore, here's what's done by the Alfresco GUI:
Unless you are doing something funky/snazzy/weird, you should
not be moving files into the avm_webapps dir directly. Instead,
just import your war normally via the web project's wizard.
An acronym for 'Domain Name System'. The DNS protocol creates what amounts to a distributed telephone book for computer networks; however, instead of associating human names like 'John Q. Public' with telephone numbers (for example, 617-555-1212), DNS allows computer networks to associate domain names (for example, mail.yahoo.com) with IP addresses (for example, 192.168.1.5). Within Alfresco, the DNS Name you provide when creating a new web project is used to create a separate virtual website for each virtualized webapp.
Everything reachable via the web at a given a protocol/hostname/port. Each of following URLs point to the splash page of a 'different' website:
A website that shares the same IP address and port with other websites. The server disambiguates which site is being accessed on the basis of the host name that the client used to reach it. DNS allows arbitrarily many host names may be associated with a single IP address; there is no fixed limit on the number of virtual websites a server can host.
Within DNS it is possible to associate all subdomains of a domain with a common property, such as an IP address. For example, consider the following domains:
Note that aaa.example.com and bbb.example.com are two distinct subdomains of example.com. Further, ccc.bbb.example.com is a subdomain of bbb.example.com. The following expression refers to all subdomains of bbb.example.com:
Therefore, if we had a wildcard domain record within a nameserver that associated *.bbb.example.com with the IP address 192.168.1.5 then here are a few examples of domains names that would share this same IP address:
In Alfresco the virtualization domain is a wildcard domain that is used to associate a large number of virtual websites with a common IP address.
A set of virtual websites bound together in various ways to facilitate collaborative web site development. Virtual websites that are a part of the same web project are members share a common subdomain of www--sandbox.<virtualization-domain>
Most systems use one of the following methods to provide a 'virtual view' of a website:
Instead, Alfresco encodes the information required for virtualization within a 'virtual hostname'. This approach has several advantages:
The general format of an Alfresco virtual hyperlink is:
In order for this to work, virtualization-domain and all its subdomains must resolve in DNS to the IP address of the Virtualization Server. This is sometimes known as a 'wildcard DNS' address mapping. There are two ways to achieve this:
While the second method will let you use your Virtualization Server even when disconnected from the Internet, many people will find the first method easier.
It's fairly obvious to most people why using cookies and/or extra query string arguments is a bad way to do virtualization (see above). However, the reasons against embedding virtualization information in the request path portion of a URI are more subtle. For starters, this technique requires that every single integration for which you forward-proxy to embed a custom plugin that duplicates the same logic for internal subrequests (e.g.: SSIs). You also lose the contents of POSTS across redirects, unless you cache these arbitrarily long POST bodies across redirects for N clients in parallel, thereby exposing you to a range of scalability issues & and potential security problems. Even then, you'd *still* have to rely upon cookies because when pages contain frames, some browsers get confused and send back bogus 'Referer' headers, thereby busting virtualization. Path-based virtualization also uses network resources less well, due to the large number or redirects required to keep requests in 'canonical request-path-mangled form'; if you don't do this, subsequent clicks end up with the wrong 'Referer' if you traverse a link of the form href='/...' (thereby propagating the wrong value into your cookie).
The only really 'tricky' problem created by embedding virtualization information into the hostname is how to make it easy for users to set up a DNS wildcard that resolves all subdomains of your 'virtualization domain' back to the same place. The solution to this problem is the free/public EchoDNS service provided at ip.alfrescodemo.net.
EchoDNS is a special-purpose nameserver developed at Alfresco. It infers the IP address to return in response to a lookup request from the host label prior to the zone it's serving (and any subdomains of that host label). This IP-bearing host label is expected to contain digits separated by hyphens. The domain 'ip.alfrescodemo.net' is served by EchoDNS nameservers ('ns1.alfrescodemo.net' and 'ns2.alfrescodemo.net'). Thus, suppose EchoDNS is asked to translate the following FQDN into an IP address:
Its response would be:
This is because the host label immediately prior to ip.alfrescodemo.net is:
Alfresco virtualizes content on the basis of name-mangled hostnames (rather than using cookies or request-path mangling). Therefore, it's important to be able to map a wildcard domains back to a specific IP address. The problem is, which IP address? The answer to that question really depends upon the IP address of the machine you've installed Alfresco's virtualization server on. Ideally, people in a larger organization would have installed Tinydns, or BIND, or some other nameserver, and they could just configure things for themselves. However, what about those who just don't want the hassle? Note: /etc/hosts is not a viable option because it lacks support for wildcarding; you really do need to use a nameserver for such tasks.
The common way out of an install problem like this would be to use a nameserver on the Internet that allows wildcarding, and just take advantage of *their* setup. Unfortunately (but understandably), companies like http://dyndns.com require you to register with them, fill out a password, etc. That's a hassle too... plus hosting something like this carries with it security issues due to the updates.
EchoDNS allows Alfresco to provide a wildcard DNS domain for all possible IP addresses, without requiring users to install and/or configure a name server themselves. Further, there's no need to register a dynamic wildcard domain, because the 'answer' (e.g.: 192.168.1.5) is embedded in the question itself (e.g.: alice.mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net).
Your virtualization domain can be configured within:
By default, this configuration file sets the virtualization domain to:
However, you can (and typically should) change this to allow any browser in your LAN to access the virtualization server. When the virtualization server starts up, it registers itself with the Alfresco webapp, and tells it the value of your alfresco.virtserver.domain property. The Alfresco webapp then uses this value when creating clickable 'preview' links (c.f.: the 'eyeball' icon).
For more details, see: Configuring the Virtualization Server.
Yes. If you want to work on a machine with no access to the internet (thus precluding using XXX-XXX-XXX-XXX.ip.alfrescodemo.net as your virtualization domain), then you should set up your own name server (or reconfigure your existing one) so that it includes a wildcard domain for virtualization. If you rely on a hack like
editing /etc/hosts with some pain you'll be able to set things up for a static set of users & staging areas, but reviewers will never be able to preview content that comes to them via a workflow. This is because workflows use ephemeral AVM stores with ephemeral DNS names -- it's an ever-changing set of GUIDs. Unfortunately, there's no way to create a DNS wildcard in the 'hosts' file... you must use a nameserver (e.g.: Djbdns, or BIND).
As mentioned in: Configuring the Virtualization_Server, you can configure BIND on windows or UNIX to create a wildcard DNS domain for virtualization. If you're on a UNIX platform, you have at least two free/common options: BIND and Djbdns.
Shameless plug: Djbdns (tinydns + dnscache) is secure, stable, and well-designed. No security flaws have been
seen in Djbdns for the past few years -- despite a $500 reward for the first person to find one. By contrast, a fresh set of horrible security issues are discovered with each new release of BIND. If you're on Unix/Linux, and you don't already use Djbdns, consider the merits of switching. On the down-side, the docs for Djbdns are only so-so, and the config may strike you as a little strange at first (Daniel Bernstein has a different approach to daemons than most folks in the world of UNIX). It may take you a day or two in order to feel like you've got the whole thing under your belt. Once you've gotten over this initial hurdle, it's very smooth sailing.
There's no denying one thing: DJB's programs are well thought-through and executed with a level of care that has won him a devoted following.
Yes. Edit the configuration file:
Within this, set the attribute alfresco.rmi.services.host to the name of the host running the AVM.
And also ensure that alfresco.rmi.services.port is correct.
When you restart the virtualization server, it will register itself with the alfresco webapp at alfresco.rmi.services.host, and let WCM know that it can be contacted back at all subdomains of alfresco.virtserver.domain.
Important information is embedded within URLs to virtualized assets that is needed to determine the store and version being accessed. That said, if you want to create a pretty-looking facade for a single store/version, this can be accoplished using a reverse proxy.
For example, suppose we have a sandbox for the user alice within the web project 'mysite', which is hosted by a virt server on port 8180 of a machine who's IP address is 192.168.1.5. Normally, you'd access it via your browser using URL looking something like this:
Instead though, suppose we want this to appear to the outside world as:
All you'd need to do is set up Apache2 as a reverse proxy on example.com that fetches its real data from the virt server.On example.com, your Apache2 config would look like this:
ProxyPass / http://mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/
ProxyPassReverse / http://mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/
Now, suppose a user in the outside world goes to:
What happens is Apache 2 on example.com fetches data from:
That data is then served back to the user. The idea is that you can hide a backend server (i.e.: the virt server) behind a front end (i.e.: apache2), and the end user's browser will never know. Thus you can make the URLs look however you wish. Typically, you'd do this for staging sandbox, not a user sandbox, so instead our config might look more like this in the real world:
ProxyPass / http://mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/
ProxyPassReverse / http://mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/
Once you get your mind around configuring Apache2, there's a lot of power here!
Yes. You need to use an X.509 wildcard cert. They are easy to create/buy, and browsers work with them nicely. If you want to experiment with this yourself, just install openssl; when you get to the step of creating the actual certificate request, do something like this:
openssl req -new -x509 -keyout demoCA/private/cakey.pem -out demoCA/cacert.pem -days 3652
You'll be asked to specify the 'Common Name' (or 'CN') in the key signing request; just use a wildcard at the appropriate subdomain level (e.g.: *.your-virtualization-domain ). It's that simple. It works like any other cert.
The term 'native file system path' refers to a file or directory within the 'native file system' of the host computer. Absolute native file system paths on UNIX begin with a '/' character. For example:
On Windows, absolute file system paths begin with a drive, letter followed by a colon. For example:
Java Naming and Directory Interface (JNDI) is a Java-based API that provides the ability to construct very generic/abstract 'naming and directory services'. JNDI is just an interface; it is the pluggable implementation behind this interface that determines exactly what is being 'named', and how the 'lookup' operation is performed. JNDI can be used for naming files in a file system, users in LDAP, DNS, NDS, or even objects in an arbitrary data structure.
Rather than access files and directories via APIs specifically geared toward native file system paths, Tomcat uses abstract JNDI-based APIs, in combination with a 'pluggable' concrete implementation that interprets the 'names' given to it as native file system paths; from there, the filesystem-specific implementation uses normal file system APIs to fetch the result, and returns it back to the abstract JNDI-based interface invoked by the user. The beauty of JNDI is that because the interface is abstract, the user-level code that relies upon it does not need to know or care how the actual 'lookup' is done. Therefore, it's possible for Tomcat to combine the abstract JNDI interface with different concrete implementations for different purposes. For example, one way Tomcat uses this generality is to allow it to fetch data from unexpanded '.war' files.
Alfresco's Tomcat-based virtualization server replaces the default concrete JNDI implementation (which accesses the local native file system) with one that accesses the AVM via calls to AVMRemote. Because of the generality of JNDI, 'names' can really look like just about anything (e.g.: 'names' for LDAP don't look like file system paths at all). However, the virtualization server uses a JNDI naming scheme that happens to look exactly like native file system paths. Therefore, if you call the standard servlet function getRealPath() what you'll see is a JNDI name that looks like a valid file system path for whatever operating system is hosting the virtualization server. For example, on Unix, you might see a path like:
The interesting thing to note is this: you can still see preview this index.html file in your browser via the virtualization server even though there is no such 'file' visible in your native file system! This is because getRealPath() returns a JNDI name (which is 'abstract'), and this name only looks like a native file system path. The concrete JNDI binding used to fetch this 'name' employs AVMRemote, not the native filesystem-centric APIs. Behind the scenes, this 'name' is translated into an AVMRemote call that fetches the data within index.html by transforming th JNDI path into an AVM path and a version number. In this example, the associated AVM path is:
To look anything up within the AVM, you must always specify a version number and an AVM path. This is because the AVM is a versioning content repository. The special version number '-1' refers to the 'latest' contents. All other version numbers are non-negative integers. The AVM path itself is composed of two components: the virtual repository name, and the repository-relative path; a colon is used to delimit these two pieces of information. Therfore in the previous example, the virtual repository (aka: 'store') being accessed is:
The repository-relative path is:
Many webapps never call getRealPath() because considered a 'best practice' to avoid assuming making assumptions about whether or not the application server has expanded your webapp's .war file; instead, such webapps merely generate pages with URLs, which are then handled via the built-in mechanisms JNDI binding. However, because many people are not too familiar with JNDI, sometimes it's easier for them to use filesystem-centric functions directly. In other cases, a webapp uses a library that makes direct calls to the file system, so it can't control what's being done. In situations like this, you'd really like the JNDI name to correspond to an actual file in the native file system.
The JNDI names used for files and directories within the AVM have been designed so that they contain all the information needed by AVMRemote (the version number, the store name, and the store-relative path), while still conforming to the constraints imposed by the host operating system for native file system paths. Alfresco has the ability to mount the AVM using a CIFS client. If your CIFS mount point agrees with the value you've specified in $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties, then your webapp can use the JNDI names returned by getRealPath() just as if they *were* 'native file system paths'. CIFS handles the mapping to AVM assets for you.
By default, the CIFS mount point on UNIX is:
By default, the CIFS mount point on Windows is:
For more details, see the internal documentation within
The utility class org.alfresco.util.JNDIPath allows you manually translate between a JNDI name and its associated AVM version and path. This allows webapps to use the full power of AVMRemote, and eliminates the need for setting up a CIFS mount.
If you want to use the JNDI path you fetched from getRealPath() directly as a valid file system path, then you'll need to create a CIFS mount so that when Java goes to look for your file, that 'file system' will actually be there. If you try do use the JNDI path you get from getRealPath() as if it were a valid native file system path without having created a CIFS mount in the location indicted by $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties, you'll get null pointer exceptions. While using a CIFS mount is a nice way of working around a webapp that relies on a file system, it has a few disadvantages. For one thing, you don't have access to the full AVMRemote API, and for another, it creates another out-of-band config to get right.
However, you could avoid all reliance on CIFS mounts if you translate the JNDI path you've gotten back from getRealPath() into an AVM path. If you do this, you can then use AVMRemote to fetch whatever you want (and have the full power of the remote API at your disposal, not just what's exposed in CIFS. The JNDI names used to reference all assets in the AVM begin with a string whose value you can fetch via the static method:
This mount point can then be used to parse the jndi path you get from the servlet method getRealPath() to produce an AVM version & AVM path that can be passed as args to AVMRemote using:
This class is available within:
Therefore, it is already in their classpath, so you can use it from within your webapp. Here's the constructor:
public JNDIPath(String mount_point, String jndi_path)
Therefore, you could say:
String mount_point = org.alfresco.jndi.AVMFileDirContext.getAVMFileDirMountPoint();
String real_path = ... whatever the servlet method getRealPath() says...
JNDIPath p = new JNDIPath(mount_point, real_path );
Now do whatever you want with 'p.getAvmVersion()' and 'p.getAvmPath()'.
For example, on UNIX, if the constructor args are:
mount_point == /media/alfresco/cifs/v
jndi_path == /media/alfresco/cifs/v/mysite/VERSION/v-1/DATA/www/avm_webapps/ROOT
Or in Windows, if the constructor args are:
mount_point == v:
jndi_path == v:/mysite/VERSION/v-1/DATA/www/avm_webapps/ROOT
getAvmVersion() == -1
getAvmPath() == mysite:/www/avm_webapps/ROOT
From here, you can use the values returned by getAvmVersion() and getAvmPath() to query AVMRemote. Note that the value for the JNDI mount point is configured in $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties.
The property on UNIX is:
The property on Windows is:
There are two different senses of 'virtualization': low-level 'content virtualization' (provided by the AVM), and higher-level 'website virtualization' (provided by the Tomcat-based virtualization server). Currently, support for virtualizing .net & php websites does not include integration with the Alfresco GUI. However, if all you want to do is see the content within a specific set of workareas rendered within your webserver (e.g.: php pages served using Apache 2), then you can do the following:
This approach has some obvious limitations: all virtualized areas must be configured within your content server (e.g.: Apache 2) as a virtual host manually. It also won't scale as nicely as the virtualization server does in terms of having many different users automatically sharing the same underlying libraries (the virtualization server does some fancy footwork to share the jar files in the staging WEB-INF/lib directory with the user-level workareas when they're identical). Further, you won't be able to just click on a URL in the Alfresco GUI and see a window pop up with your website in it.
However, if you have a limited number of author workareas & staging areas, using CIFS to expose the contents of the AVM as a file system does work quite well -- it is surprisingly fast. If you don't want to go through the hassle of trying to customize the Alfresco GUI, there's a simple low-tech solution that might be adequate for you in the short term: bookmarks.
The Tomcat-based virtualization server can deal with static sites, and full-blown java-based websites (webapps/servlets/jsps, etc). Providing a rich integration for virtualizing Apache 2 might be the next logical step; the goal would be to make it so that you'll end up with a different Apache 2 virtual host per area automatically, just like the current Tomcat-based virtualization server does for webapps. Another goal would be to make it easier to configure an Apache 2 / Tomcat stack. If this work gets underway in earnest, announcements will be made & this page will be updated.
Incidentally, the bias towards Apache 2 over 1.x is due to the fact that Apache 2 works nicely on both Windows & UNIX/UNIX-like operating systems, while Apache 1.x behaves poorly on Windows (it's slow & buggy).
Yes. However, there are two potential issues you must be aware of:
Most web developers are familiar with client-side caching issues. A common solution is to just hit refresh (CTRL-R) within the browser. A more drastic, and less efficient solution is to turn off the browser's cache entirely for some amount of time; this is seldom a good idea, particularly because it's possible to configure the virtualization server to inject different Cache-Control headers into its responses for each type of workarea (author, preview, workflow, and staging). For more details, see the 'Cache-Control parameters' section of
Updates to WEB-INF and META-INF via CIFS are another matter. When you make modifications to the contents of WEB-INF or META-INF via the browser-based interface provided by the Alfresco webapp, the webapp issues JMX messages to the virtualization server informing it that the webapp must be reloaded (along with any other virtual webapps that are transparently layered over it). However, if you modify the contents of your webapp's WEB-INF or META-INF directory via CIFS (e.g.: adding/changing jar file), the virtualization server will have no idea it needs to refresh the set of classes loaded behalf of your webapp. Therefore, you must go back to the Alfresco webapp GUI and hit 'Refresh' manually on the appropriate sandbox in your web project.
jar xvf alfresco-sample-website.war
If you've done this properly, you should see the WEB-INF and META-INF within:
At this point, you can go back to the GUI and hit the 'Refresh'
control associated with the staging sandbox of your web project.
Once the virtualization server has reloaded it, you should be
able to click on the 'eyeball' icon in staging (or any of the
associated author sandboxes) and see your webapp virtualized.
Again, this is only necessary if you upload content via
CIFS that modifies the META-INF or WEB-INF directories.
Otherwise, you can modify content directly in CIFS and/or
the GUI and no special action needs to be taken by the user.
If you really want to see what's going on at a deeper level, you can edit:
org.alfresco.catalina.host.AVMHostConfig.level = FINE
However, changing the debug level of the virtualization server requires you to restart it;
you only need to restart the virt server, not the alfresco webapp, if you want to employ
this low-level debugging technique.
The virtualization server is slower at deploying web applications than a normal version of Tomcat, since each web application is read out of the Alfresco repository via RMI rather than being read directly off disk (as happens with a vanilla installation of Tomcat). Once deployment is complete however, performance should be fairly comparable. You can test this for yourself quite easily by doing a CIFS mount, copying the contents of a webapp to a native file system, and pointing a 'pristine' version of Tomcat at it. From there, a cheap and cheerful way to do a comparison on your hardware is to just point a spider at the splash page of each server.
Technically yes, although this is not recommended by Alfresco. The thing you've got to assess for yourself is the performance and scalability of the virtualization server in your environment, with your traffic and load characteristics. There are many ways to measure this, and many different kinds of load your site might experience depending upon the nature of your content, and the pattern of use experienced by the site. Some sites are very bandwidth heavy, some demand more CPU and/or memory, some tend to get more bursts of activity from users than others, and so on.
Here are some things to consider:
The last two points are probably worth discussing in more detail. While serving content directly out of an Alfresco AVM instance will offer
you a wider set of features, nothing is going to beat the performance of deploying your webapp to a native file system and serving content
directly off of that (e.g.: on UNIX, an ext3 file system, on NT, NTFS). Whether you want to serve content out of the AVM directly
or off of a native file system, it's advisable to make your entire live/customer-facing infrastructure totally independent
of the one you're using for development (e.g.: machines, processes, databases, disks, subnets, etc.). The central idea is to arrange things so that no matter how hard the users or developers pound on things, no matter what security issues arise, what network traffic is generated, or what sort of maintenance/upgrade/reboot shenanigans are necessary from time to time, doing something to one system does not effect the other in any way.
If you are going to serve content directly out of the AVM, caching might be more important. There are so many variables when it comes to this stuff that the best advice is to test with a realistic load, measure, tune, and experiment with the many options available to you. Once you've done that, throw your system some curve-balls such as alternating periods of quiescence and high load, long sustained periods of activity, lots of simultaneous requests, fewer but heavy requests, and so forth. The approach(es) that work best will depend upon the sheer scale of what you are doing, the audience you're serving, and your site's technical & business requirements.
If you are willing to suffer some performance penalty, you can map external links back into a sandbox. It might be nice to have a browser-specific plugin to do this (it would be faster and much more scalable), but there is a forward proxy solution based on Apache2 you could right now.
For more details, see: WCM-128
Yes, it already does.
The link validation feature built into the Alfresco 2.1 webapp GUI uses the virtualization server to fetch pages in the website. These pages are hyperlinks, which are validated as well. Note that because link validation is done by fetching the page as a browser would, both static and programmatically generated links are validated. In order to build self-consistent link validation reports, the pages being virtualized within the staging area are all taken from the same snapshot. Thus, the combination of AVM's built-in versioning feature and the virtualization server's ability to virtualize an archived edition makes it possible to construct validation information incrementally (from checkin to checkin), and eliminate all 'version skew' from the reports it creates.
There are some limits to what can be virtualized; for example, if you have a singleton of any sort (not just a java singleton), then it will be shared by all virtual webapps based off of the project's staging webapp.
For example, suppose your webapp writes to a database table. If users Alice and Bob are viewing their 'separate' virtualized instances of the webapp within their own sandboxes, and Alice does something that makes this webapp modify the 'moo' table, then Bob will see the change Alice has made to the 'moo' table immediately. In other words, the virtualization server can virtualize files but can't magically restructure arbitrary hard-coded programs.
Currently, Alfresco's ability collect XML data via browser-based XForms that are auto-generated from XSDs is tied to the Alfresco webapp. The current integration also makes it possible to associate the data that's collected with one or more 'data rendering' templates (such as XSLT or Freemarker); this allows you to do things like produce HTML output in one easy step. From there, the current integration can also create a URL to these generated web pages that allows them to be previewed via the virtualization server.
Eventually, we'd like to repackage things in such a way the websites you're building can use these auto-generated forms too.
Any data rendering template that is powerful enough to allow for programmatic extensions (e.g.: XSLT, Freemarker) is also powerful enough to generate an arbitrary set of output. For example, it is possible to create Freemarker template that produces multi-paginated output (several html files connected via next/previous links). It's also possible for names of a template's output files to be based on arbitrary dynamic logic and/or manipulations of the XML data itself.
Note also that it's fairly common for someone who is exploring different visual tradeoffs to want to experiment with 'what if' scenarios. Depending on the template, seemingly minor changes to the data or the template's internal logic could cause a very different set output files to be created. The problem then becomes allowing users the freedom to experiment with different visual possibilities while ensuring that none of their tests overwrite 'precious' data in their workarea (or leave detritus behind afterwards). The solution is this:
Thus, you can consider the 'preview' store within an author's sandbox as a sort of 'scratch area'. Suppose you have a web project named 'mysite', and an author sandbox named 'alice'. A URL like the following one is like looking through a transparent 3-layer sandwich:
URL like the following one will omit the contents of the 'preview' layer within the 'alice' sandbox entirely:
This arrangement makes it possible for the user to 'accept' the output of the template (there by moving it into the 'main' working layer of the 'alice' sandbox), or to reject it without fear that this experiment has polluted the sandbox's main working layer in any way.
Incidentally, you might also notice that even if your browser's cache time is long, by default it won't end up caching the contents of a preview layer for more than a few seconds. This feature allows you to do multiple 'what-if' experiments in sequence on the same file without requiring the user to do an explicit browser refresh (CTRL-R). See the 'Cache-Control parameters' section within $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties for more details.
See the 'Cache-Control parameters' section within $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties explains all this, but here's the same information in wiki form:
The virtualization server limits how long a browser may cache items received from different virtual hosts by injecting 'Cache-Control: max-age=...' HTTP headers in its responses. The actual value used for max-age depends upon what the name of the virtual host is. By default, low values are used for 'preview' hosts, and longer values are used for 'staging'. You can tune these settings here.
For generic browsing on the Internet, a user might choose to configure their browser to cache data for a very long time. It's a personal preference: the longer an item is permitted to be cached, the better the browser's performance is likely to be. However, it also becomes more likely that 'stale' versions of files will be viewed.
When browsing content on the virtualization server, the trade-offs are not only different from 'generic' Internet browsing, they're also different for each major category of virtual host (i.e.: 'staging', 'preview', etc.).
If the user does encounter 'stale' data, they can always hit their browser's 'refresh' button. The goal of these settings is to help them avoid having to explicit refresh *most* the time.
If an administrator errs to much on the side of short max-age values, performance will suffer. Browser caches are particularly important on slow links.
If an administrator errs to much on the side of lengthy max-age Cache-Control settings, users might end up turning off their browser's cache entirely. This would hurt performance even more. Again, it's a trade-off.
The $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties file specifies the following defaults settings:
# Time is given in seconds:
The AVM is a very general-purpose content repository; some of its features are being used to support web content management (WCM), but it is capable of performing many other tasks. For example, we plan to create default configurations that will be suited to the needs of source code management (SCM).
In WCM, every virtual repository (aka: 'store') contains a top-level directory named 'www'. By default, the 'www' directory of 'main store' of each author and workflow sandbox is just a transparent overlay on the 'www' directory of the corresponding staging sandbox's 'main store'.
The Tomcat-based virtualization server transforms the virtualized webapp files in the AVM into a set of name-mangled virtual webapps. If you're curious, you can see some of this in action if you look at virtualization server's 'work' directory. For example, on Unix/Linux, if you have two web projects ('mysite' and 'silly'), the 'work' directory will be:
Its contents might look something like this:
Caution: this is being shown only to give you a peek under the covers. The specifics of the name-mangling are subject to change at any time, without notice.
Now suppose that the virtualization server gets a request for:
At this point, it reverse-proxies the request to the appropriate version of the appropriate name-mangled webapp. In this example, the HEAD versions of all the webapps associated with the 'mysite' and 'silly' project have been transformed into virtual webapps (each web project just has a single 'ROOT' webapp). In addition, version 42 of the staging repository in 'mysite' has also been 'brought online' explicitly.
When you do a CIFS mount of the AVM, and you look at the top-level directories you might see a bunch of directories like this:
Each of these directories corresponds to a store within the AVM. A sandbox is a collection of stores. Each store (aka: 'virtual repository') is very much like a Subversion (SVN) repository, only you can overlay them using a mechanism called 'transparency'. See also: Collaborative Content Production.
When each user works in their own sandbox, they all enjoy the ability to isolate their development environment (e.g.: what files are changed) from everyone else's. The reason why sandboxes include both a 'main' working store, and a 'preview' store is so that templating operations can be 'previewed' without clobbering any of the precious data in the main working store of the user's sandbox. If you like what you 'preview', you can accept it; otherwise, you can toss it away and be confidant that *nothing* in your main workarea has been altered.
Just as the store used for the author's main workarea is overlay on your web project's staging store, ephemeral 'preview' data is sent to a store that's an overlay on the author's main workarea. A triple-decker transparent sandwich, if you will.
Just as a sandbox is a higher-level structure of AVM content stores bound together by metadata & workflows, a web project is an even higher-level structure of sandboxes bound together by metadata & workflows too. A web project is the high-level object corresponds to *all* the various areas & objects used to create a multi-user collaborative environment devoted to building a website.
Suppose you do a directory listing in CIFS, and you see something like:
What you're seeing here are stores (low-level virtual repositories) within the AVM. The default naming scheme of these stores gives you a very good idea of how things are organized under the hood (unless someone has renamed them). For example, all these stores probably belong to the web project named 'mysite'. Within the 'mysite' web project the user Alice has a sandbox that consists of two stores: mysite--alice and mysite--alice--preview. Note that the mysite project also has a sandbox for bob & admin. The 'staging area' of the mysite project contained within a store named 'mysite'; note also that there's yet another store called 'mysite--preview' in case you want to do fancy stuff later like deferring the final promotion of submitted content to the main staging area. Currently, all sandboxes in Alfresco consist of just 2 stores (it's 'main' store and the scratch 'preview' store), but this is something that will probably be made more flexible & tunable later on. As things stand, the CIFS projection avoids showing you all the stores by default... it omits showing you most of the 'preview' stores because people typically don't want the visual clutter of seeing directories that just contain ephemeral/scratch data.
Another interesting thing to note is that the leading portion of the default DNS name corresponding to each store follows a very similar scheme to the one used by CIFS, only the order is reversed. For example, the URL for the main working store of Alice's sandbox looks like:
By making the CIFS name for this store look like 'mysite--alice' the users of the file system get to see the 'real' store names and everything lines up nicely when you do a directory listing (and works nicely with tab-completion). File systems are organized from least specific to most specific, so this makes perfect sense.
However, DNS is organized the opposite way: from most to least specific. Thus, in order for stuff like wildcard cookies, wildcard X.509 certs, etc to work properly, 'alice' is a subdomain of 'mysite'. Similarly, when you're previewing ephemeral 'what if' data in templating, your URL will look like this:
By default, that will correspond to the mysite--alice--preview store. Several important details are being glossed over in this discussion, because the real associations are determined by metadata values associated with the stores, not by the store names. However, if you don't change the naming scheme by renaming stores, the simplified description provided here is valid. Relying on metadata is more robust because it allows us to do store renaming (if/when necessary), and still have all the workflow & webapp logic dealing with stores, sandboxes and web projects work.
Within the Alfresco webapp suppose you have a web project named 'mysite', and two users: Alice and Bob. Within Alice's sandbox, the 'eyeball' icon will have URLs for Alice of the form:
The 'eyeball' icon within the Alfresco webapp will have URLs for Bob like this:
Let's say your virtualization server is listening on IP address 192.168.1.5, and that you're using the EchoDNS server at ip.alfrescodemo.net to deal with DNS wildcards. Therefore, *.192-168-1-5.ip.alfrescodemo.net will be resolved as the IP address 192.168.1.5. Consequently, the following URL will resolve to 192.168.1.5 (on port 8180):
Note that the same is true of the following url for Bob:
The wiki page entitled Configuring the Virtualization Server covers this in greater detail.
In any event, using the area info embedded into the 'virtual host name' the virtualization server is able to figure out the associated AVM repository and area-ize the request path accordingly when it makes calls to AVMRemote behind the scenes. There's a lot of other stuff going on too, like how it knows when to reload a webapp, how it maps between JNDI/webapp/CIFS/URL namespaces, and so on.
The virtualization server is a modified version of Tomcat that knows how to use interpret the area-izing information embedded within a URL ( *.www--sandbox.*:8180/...), and fetch data via AVMRemote. It also does some fancy footwork with classloaders to allow many users to have what appears to be a separate version of your webapp in the 'staging' area, and yet still share the jar files they have in common, when possible (this allows the system to scale).
At startup, the virtualization server attempts to connect to the Alfresco webapp every few seconds until it succeeds. Once connected, it registers itself with the Alfresco webapp (to receive JMX event notifications later), and queries the AVM to determine which virtual repositories contain webapp data, and what DNS name has been associated with each of them. Then, a classloader hierarchy for the virtual webapps is created that parallels the overlay structure of the AVM stores containing them (this is used whenever possible, to allow the virtualization server to scale). From there, a webapp name mangling scheme is used to create a lexically unique name for every virtual webapp. Reverse proxy logic within the virtualization server maps the DNS name provided by clickable 'eyeball' URLs in the Alfresco webapp into a request for the appropriate 'name mangled' webapp. From there, Tomcat's normal servlet container logic takes over. Access to the 'file system' is abstracted via JNDI, so Tomcat does not know or care that it is fetching 'file data' using AVMRemote, rather than 'native file system APIs'. Certain events within the Alfresco webapp (such as the creation or destruction of a sandbox) cause JMX event notification messages to be sent to the virtualization server. This allows the set of virtual webapps hosted by the virtualization server to remain in-sync with the virtualized file and directory data maintained by the AVM.
This is an over-simplification, and like most implementation-level details, it's subject to change without notice. However, it's nice to have a rough idea how things work as an aid during trouble-shooting sessions.
By default, neither Alfresco's Tomcat-based virtualization server nor the instance of Tomcat used to host the Alfesco webapp contains whitespace characters (or any other characters that require URL encoding). If you customize your installation, you must abide by the same constraint. This is due to a long-standing bug in the JRE.
If you're curious, you can read all about it on Sun's bug database: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4273532
There's also another related issue on Sun's bug database that's marked as a 'duplicate', but it's worth reading too: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4496398
Here's a helpful checklist:
Make sure you didn't do anything illegal, like customize the installation of either the virtualization server or the Tomcat hosting the Alfresco webapp so they're on a path that includes a space character (or any other character that requires URL encoding).
Currently, if you change the password for the 'admin' user in the Alfresco webapp, you must also propagate that change to $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties by hand. You can find the relevant properties at the end of this file. By default, they look like this:
# Admin level user and password to connect to login to Alfresco server.
There are two cases:
# Admin level user and password to connect to login to Alfresco server.
Also, to rule out any possible browser caching issues, if you do get a blank page, be sure you hit refresh after the virtualization server stops warning about not being able to connect yet.
In order for the virtualization server to work properly, firewalls must not block the ports it requires to operate. Here are the ports that must be open by default:
50500 | On alfresco webapp machine
| RMI registry port for all remote services.
| Within virt server config conf/alfresco-shared.properties
| see alfresco.rmi.services.port
| On virt server machine.
50510* | RMI registry port for Virtualisation server used to enable the alfresco webapp
| to do callbacks
| Within virt server config: conf/alfresco-virtserver.properties
| see alfresco.virtserver.jmxrmi.port property
8180 | On virt server machine.
| HTTP port for servicing requests of virtualized content (e.g.: html)
| See also: conf/server.xml
| Within virt server config, see:
Information regarding the development/availability of new features will be announced via Alfresco's normal channels, but in case you're wondering 'what's next?', the following sections should provide some idea. As always, your feedback is greatly appreciated.
Currently, anybody can browse anything via the virtualization server; no Alfresco-specific login is required (though your webapp you're virtualizing may insist upon a login). For intranet-based content development, that's often OK because webapps are mostly public anyhow, and services that are only exposed on an intranet are typically not subject to the same level of security threat that is seen on a live Internet site.
However, real threats do exist within intranets, and malicious (or careless) users should be thwarted by the virtualization server just as they are by the Alfresco webapp. The plan is to make it so that if you access the virtualization server via clicking on an asset within the webapp, you won't be prompted to login again (nor will you be challenged if you click on any of the links within this or any other page). However, if you just click on a link to an asset within the virtualization server *without* having authenticated at some point, then you'll get temporarily redirected to the login page; if you have provided proper credentials, you'll get to surf around again, as usual. All actions on the virtualization server will then be done *as* the logged in user. Credentials will be bound to the connection internally, and maintained across connections on the client side via cookies.
As a nice side-benefit, this enhancement will eliminate the common configuration error of changing the admin password within the Alfresco webapp, but neglecting to update the alfresco.server.password property within $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties
Security will be turned on by default.
The AVM supports a very powerful feature that lets you take 'snapshots' of websites are they were at any given moment in time. However, the 2.0.1 GUI does not expose the capability that the virtualization server has to virtualize these snapshots. All that remains is to provide GUI support (everything else works).
Assuming you have a web project named 'mysite', and your virtualization domain looks like:
Then if you're browsing the 'version 4' snapshot in staging, the URL will look like:
Note also that the following URLs will be equivalent:
This is because version '-1' and the 'HEAD' version are the same.
Bringing an snapshot on/offline in terms of the virtualiation server will probably be a privileged operation within the Alfresco webapp, because of the resources consumed; these settings will be persistent across restarts. Just like when you're browsing around in a 'HEAD' version, you'll be able to surf around, and set bookmarks (you'll also be able to browse multiple snapshots and/or HEAD versions simultaneously).
It would be nice if the Alfresco webapp were able to be configured so that the load of servicing different virtualized webapps could be distributed amongst a number of servers on a different machines. A related feature is the ability to make finer-grained mappings between assets in the AVM and their associated content server using custom logic (e.g.: regex rules, logic encapsulated within beans, etc.). For a bit more discussion, see this forum post.
Currently, Alfresco's only virtualized content server is Tomcat 5.5 -based. Support for a deep integration of Apache2 would be the first step to providing first-class support to creating an efficient collaborative environment for PHP developers. As things stand, PHP developers can virtualize specific areas by creating virtual hosts 'by hand' within Apache, setting up CIFS, then making Apache act as reverse proxy for Tomcat in all cases where the site needs to do something webapp-ish/jsp-ish. Right now, PHP developers don't have a good solution for previewing items within a workflow 'in context'. If Apache2 were to be virtualized and integrated, that (and more) should be possible.
The virtualisation server requires version 5.5 of Tomcat
The Alfresco webapp generates various JMX events to keep the virtualization server's set of virtual contexts in sync. Ideally, these (and other) events should go over an ESB (perhaps Mule), so that as 3rd-party applications are integrated, there will be a common and maintainable framework for everyone to use. The first client of this enhanced architecture will almost certainly be the virtualization server(s).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.