Configuring the Share HTML processing black/white list

cancel
Showing results for 
Search instead for 
Did you mean: 

Configuring the Share HTML processing black/white list

kevinr1
Established Member
0 23 11.1K
Alfresco Share has a number of features to protect against XSS (Cross Site Scripting) attacks, session hijacking and similar. One of the most aggressive features is the automatic processing of 3rd party HTML to 'sanitise' or 'strip' out unwanted HTML tags and attributes before rendering in the page. By 3rd party HTML, I mean any HTML content that is displayed in Share that is sourced from a node content stream - such as a Wiki page, Blog post or Discussion post. So any content that may be user edited or could come from any source (not just Share itself!)



This is a well tested feature that handles all commonly known XSS attack holes and many less well known ones - including all the attack vectors listed here: http://ha.ckers.org/xss.html



One of the downsides to this, is the stripping of some otherwise useful HTML attributes and elements is mainly to support issues in legacy browsers such as IE6 and IE7. Consider the STYLE attribute - not a problem attribute you would assume, how could setting a STYLE cause an XSS attack?! Well in IE8, FireFox, Safari, Chrome etc. it can't. But in IE6/7 Microsoft in their wisdom allowed JavaScript to be inserted into a STYLE attribute (called 'CSS Expressions' - a better name would have 'CSS Hacks'). This is a potential XSS hole that only affects those legacy browsers - but the HTML stripping process cannot rely on your browser agent (which of course could be faked) so must always assume the worst and strip those STYLE attributes.



For the majority Alfresco users who discarded IE6 (or even just IE...) long ago, why should they be punished with this limitation? And it is an annoying limitation, as most of the in-line editing capabilities of TinyMCE and other in-line editors that can potentially be used with Alfresco use STYLE attributes to apply formatting to much of their generated content.



In Alfresco 3.4.9/4.0.2 and onwards, it is now possible to fully configure the black/white list of HTML tags and attributes that the HTML stripping process will use.



This is the default configuration this is applied OFTB:

      <!-- the set of HTML tags considered safe for rendering when mixing with existing client-side output -->

      <!-- NOTE: define all tags in UPPER CASE only -->

      <property name='tagWhiteList'>

         <set>

            <value>!DOCTYPE</value>

            <value>HTML</value>

            <value>HEAD</value>

            <value>BODY</value>

            <value>META</value>

            <value>BASE</value>

            <value>TITLE</value>

            <value>LINK</value>

            <value>CENTER</value>

            <value>EM</value>

            <value>STRONG</value>

            <value>SUP</value>

            <value>SUB</value>

            <value>P</value>

            <value>B</value>

            <value>I</value>

            <value>U</value>

            <value>BR</value>

            <value>UL</value>

            <value>OL</value>

            <value>LI</value>

            <value>H1</value>

            <value>H2</value>

            <value>H3</value>

            <value>H4</value>

            <value>H5</value>

            <value>H6</value>

            <value>SPAN</value>

            <value>DIV</value>

            <value>A</value>

            <value>IMG</value>

            <value>FONT</value>

            <value>TABLE</value>

            <value>THEAD</value>

            <value>TBODY</value>

            <value>TR</value>

            <value>TH</value>

            <value>TD</value>

            <value>HR</value>

            <value>DT</value>

            <value>DL</value>

            <value>DT</value>

            <value>PRE</value>

            <value>BLOCKQUOTE</value>

            <value>BUTTON</value>

            <value>CODE</value>

            <value>FORM</value>

            <value>OPTION</value>

            <value>SELECT</value>

            <value>TEXTAREA</value>

         </set>

      </property>

      <!-- The set of HTML tag attributes that are to be removed before rendering -->

      <!-- NOTE: define all attributes in UPPER CASE only -->

      <!-- IMPORTANT: JavaScript event handler attributes starting with 'on' are always removed -->

      <property name='attributeBlackList'>

         <set>

            <value>STYLE</value>

         </set>

      </property>

      <!-- The set of HTML tag attributes that are considered for sanitisation i.e. script content removed -->

      <!-- NOTE: define all attributes in UPPER CASE only -->

      <property name='attributeGreyList'>

         <set>

            <value>SRC</value>

            <value>DYNSRC</value>

            <value>LOWSRC</value>

            <value>HREF</value>

            <value>BACKGROUND</value>

         </set>

      </property>


As you can see it's quite a list. The import config for STYLE attribute processing is here:

      <property name='attributeBlackList'>

         <set>

            <value>STYLE</value>

         </set>

      </property>


So simply override the black list in the stringutils bean in your custom-slingshot-application-context.xml file - generally found in \tomcat\shared\classes\alfresco\web-extension - as detailed in previous blog posts:

<?xml version='1.0' encoding='UTF-8'?>

<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans-2.0.dtd'>



<beans>



   <!-- Override HTML processing black list -->

   <bean id='webframework.webscripts.stringutils' parent='webframework.webscripts.stringutils.abstract'

         class='org.springframework.extensions.webscripts.ui.common.StringUtils'>

      <property name='attributeBlackList'>

         <set></set>

      </property>

   </bean>



</beans>


Restart the Share web-application and STYLE attributes will no longer be removed by Share.
23 Comments
blog_commenter
Active Member
Hi Kev,



Thank you for this article, this is exactly what I need now. However saving custom-slingshot-application-context.xml, I get the following arror from alfresco when restarting:



ERROR [web.context.ContextLoader] [pool-2-thread-1] Context initialization failed

org.springframework.beans.factory.BeanDefinitionStoreException: Invalid bean definition with name 'webframework.webscripts.stringutils' defined in file [

/home/alfresco/tom_shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml]: Could not resolve parent bean definition 'webframework.webscripts.stringutils.abstract'; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No bean named 'webframework.webscripts.stringutils.abstract' is defined



The parent bean seems missing. Am I missing something in my configurations ?



I'm running Alfresco 4.0.0.d.



Regards
kevinr1
Established Member
You need to be running a more recent version unfortunately, this is a relatively new feature.



Either a HEAD nightly build or one of the Enterprise builds such as 4.0.3.
blog_commenter
Active Member
Thank you. We are using the community 4.0.0d, I guess I'll have to wait for the next update :-/
kevinr1
Established Member
Yes, the development blogs do tend to be around the newer features. Fortunately the good news is that a new Community release is coming in the next few weeks. You are welcome to try out a nightly build until then to test the feature if you like.
blog_commenter
Active Member
[...] Kev’s blog A blog from Alfresco Engineering      « Configuring the Share HTML processing black/white list [...]
blog_commenter
Active Member
[...] same for plain text files (rendered in browser instead of transformed to flash). Note that Alfresco sanitize some html tags for security [...]
blog_commenter
Active Member
[...] (visade direkt i webbläsaren instället för konverterade till flash). Notera att Alfresco tvättar bort vissa html-taggar av [...]
blog_commenter
Active Member
Hi Kevin,



we implemented this (whitelisted the  attribute) and it works fine for the Wiki module. But it still will be stripped out in the other TinyMCE instances (Blog, Discussion, HTML content). Any idea why?



We assumed that this is a system wide configuration.



We are on Enterprise version 4.1.1.9.



Alfresco support is not helpful so far. Any hint is really appreciated.



Thanks!

Arno
blog_commenter
Active Member
I have the problem, that alfresco removes specific html tags (e.g. 'iframe') after saving content (in share, but also by working with WEBDAV, so this should be no SHARE issue), but I find no 'custom-slingshot-application-context.xml' File and I find no 'StringUtils' class in any Spring Context??!!

Beside this blog, I find no information about this issue (which makes me wondering, that Alfresco has no documentation about this, how can that be???)
kevinr1
Established Member
>We assumed that this is a system wide configuration.

It affects all Share processing of HTML, including Wiki, Blog, Discussion etc.



>but I find no ‘custom-slingshot-application-context.xml’ File and I find no ‘StringUtils’ class in any Spring Context?

You create the custom-slingshot-application-context.xml file in your web-extension folder (which is part of the installed tomcat - e.g. tomcat/W:\apache-tomcat\shared\classes\alfresco\web-extension

An example file called: custom-slingshot-application-context.xml.sample is provided which you can edit.
blog_commenter
Active Member
Hi Kevin,



thanx for your answer - but I have exactly the same problem as Arno Hagen:

Wiki pages accept 'IFRAME' tags, but Blog pages do not (and my customer would like to blog posts with youtube content - which does no work)

I also found this JIRA issue which also indicates me that the problem is something different:

https://issues.alfresco.com/jira/browse/ALF-17862



Any idea???
blog_commenter
Active Member
Dear Kevin,



thanks for the interesting article. I found that Share also applies this 'santitation' with any HTML file that is uploaded (4.2c CE). Now my customer has created this with word, and guess what - many tags are gone.

Also, it seems that the process does not work properly - in many cases, the opening tag is in upper case but the closing tag is still lowercase.

Since Word creates an XHTML file, the tags must (by standard) be lowercase anyway, so the process in effect breaks the document...

In principle I can understand the reasoning behind this, but it frightens people if suddenly a document is altered in Alfresco for no apparent reason.



Just my 2 cents (I wil feed Jira)



Cheers,

Daniel
blog_commenter
Active Member
In fact, someone else already did it:



https://issues.alfresco.com/jira/browse/ALF-18696
kevinr1
Established Member
Hi Daniel,



Actually I agree - however we only sanitise HTML/XHTML documents on display in the browser, not on download. So we are only modifying the content for reasons of security when it is viewed directly in the browser. If you download the document the content is not modified.



We recently upgraded the html parser library used for this process and it looks like there is a regression in that we now see mixed case start/end tags which you have identified. I will ensure this is fixed for 4.2.d.
kevinr1
Established Member
The issue has now been fixed in SpringSurf. It will appear in 4.2.d during the next trunk merge.
blog_commenter
Active Member
Dear Kevin,



thanks, I saw your activity in Jira, thats good news!

However, in our 4.2c CE, the HTML is also changed when you download the file in Share (this is how we found out).



Cheers,

Daniel
kevinr1
Established Member
Yes - that was spotted also and already fixed for 4.2.d. Thanks.
blog_commenter
Active Member
Kevin,

I need to backport this to the enterpise version 4.0.2.9 as one of our customers wants to upload/download reports(html-files) generated in some other tool.



For a quick test I built the spring-webscripts-1.2.0-SNAPSHOT.jar from the sources and replaced the version that comes with 4.0.2.9 with this snapshot. Unfortunately that did not work very well - got errors related to LocalWebScriptRuntimeContainer.addExtensibilityDirectives at server startup.



How are the chances to get alfresco support provide a patch (generated via svn) that could cover that task?
kevinr1
Established Member
>How are the chances to get alfresco support provide a patch (generated via svn) that could cover that task?
It would be possible to generate a SP that had this backport in, please generate a request via support.
blog_commenter
Active Member
This solution is useful in 4.2.d to enable the background color on table.



BTW the option align='center' for the table still doesn't work at all. any solution?  (i've created a post in jira https://issues.alfresco.com/jira/browse/ALF-20059)
udaikumar26
Customer

Need help on this. We are facing similar issue where the share UI is removing tags from the html file.Also it seem to be removing attribute from certain tags. For example we found that the content attribute from  meta tag is removed.

We need to remove all of the black listed tags.

We are using Alfresco 5.0.2 and need to know if this change will work in 5.0.2. Are the changes same in v5 or do we need to do more in v5? Kindly help

udaikumar26
Customer

Thanks for this article. We are using Alfresco 5.2.0 and found that alfresco is removing include tags from html, also there are some other information that is being removed like content attribute in <meta> tag. I tried the change suggested in this article and after the change, the include tags are retained and also the content attribute of meta tag. Now, even after making this change, i find that some information are removed from DOCTYPE. for example in the below tag everything after the html is removed by alfresco.

<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>

When we save the html file in share UI, the above tag becomes,

<!DOCTYPE html>

 

Any idea why this is happening and how to avoid that? Also, i would like to understand the impact of making this change in custom-slingshot file.

SalmanEdhi
Member II

Hi @kevinr1 , is this still working? I tried it on Alfresco version 6.2 and getting below error

my-all-in-one-share_1 | 2022-05-24 12:48:21,490 WARN [context.support.XmlWebApplicationContext] [main] Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'webframework.webscripts.stringutils' defined in URL [jar:file:/usr/local/tomcat/webapps/share/WEB-INF/lib/my-all-in-one-share-1.0-SNAPSHOT.jar!/alfresco/web-extension/custom-slingshot-application-context.xml]: Error setting property values; nested exception is org.springframework.beans.NotWritablePropertyException: Invalid property 'attributeBlackList' of bean class [org.springframework.extensions.webscripts.ui.common.StringUtils]: Bean property 'attributeBlackList' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?
my-all-in-one-share_1 | 2022-05-24 12:48:21,535 ERROR [web.context.ContextLoader] [main] Context initialization failed
my-all-in-one-share_1 | org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'webframework.webscripts.stringutils' defined in URL [jar:file:/usr/local/tomcat/webapps/share/WEB-INF/lib/my-all-in-one-share-1.0-SNAPSHOT.jar!/alfresco/web-extension/custom-slingshot-application-context.xml]: Error setting property values; nested exception is org.springframework.beans.NotWritablePropertyException: Invalid property 'attributeBlackList' of bean class [org.springframework.extensions.webscripts.ui.common.StringUtils]: Bean property 'attributeBlackList' is not writable or has an invalid setter method. Does the parameter type of the setter match the return type of the getter?