The Future of CMS Technologies

cancel
Showing results for 
Search instead for 
Did you mean: 

The Future of CMS Technologies

pmonks2
Member II
0 10 3,769
Julian Wraith recently started a discussion entitled 'The future of content management' that has kicked off quite a few interesting responses.



Of those, the one that really grabbed my attention was Justin Cormack's great response entitled 'CMS technology choices'.  By strange coincidence it closely echoes (but far more eloquently and in a lot more detail!) a conversation Kevin Cochrane and I had in twitter at about the same time, and while I almost entirely agree with everything Justin has written, the twitter conversation does highlight my one fundamental disagreement with the post.  Here's the transcript of my side of that conversation:

Managing web content is about more than simply supporting the technical constructs the web uses (REST, stateless etc.).



eg. the graph of relationships between the content items making up a site can be an important source of information for authors.



But the web itself has no direct support for graph data structures (beyond humble 'pointers': <a href> tags and the like).



And perhaps as a consequence many (most?) Web CMSes don't have support for that either. ;-)



IMNSHO the future is: schemaless (ala CouchDB, MongoDB, at al), graph based (ala Neo4J), distributed version control (ala Git).


(in hindsight I should also have mentioned 'queryable (ala RDBMS, MongoDB, etc.)')



To better describe my divergence from Justin's vision of the future, I believe that management of, and visibility into the 'content graph' (the set of links / relationships / associations / dependencies / call-them-what-you-will) is one of the more important features a CMS can provide, particularly for web content management where the link structure (including, but not limited to, the site's navigation model) is so integral to the consumer's final experience of the content.



So what 'content graph' features, specifically, should a hypothetical CMS provide?



In my opinion a CMS needs to support at least the following operations on the content graph:



  • Track all links between assets that are under management, in such a way that the content graph can be:



    • bi-directionally traversed ie. the CMS can quickly and efficiently answer questions such as 'which assets does asset X refer to?', 'which assets refer to asset X?'


    • used within queries ie. the CMS can quickly and efficiently answer questions such as 'show me all content items that are within 3 degrees of separation from asset X, are of type 'press release', and were published in the last month by 'Peter Monks''





  • Flag any content modifications that 'break' the content graph eg. deletion of an asset that is the target of one or more references



    • From a usability perspective our hypothetical CMS would provide the ability for the user requesting the breaking change to automatically 'fix' the breakages eg. by correcting the soon-to-be invalid (dangling) links in the source item(s)





  • Support arbitrary metadata on references, preferably using the same metadata modeling language that is used for 'real' content assets


  • Support basic validity checking of external links - links that point to assets that are not under management (eg. URIs that point to other web sites)



Other than linking, I think Justin's post pretty much nails it.  I'm a big fan of schemaless repositories, having worked extensively with several 'schemaed' CMSes that made seemingly simple steps (such as adding or removing a single property from a content type that happened to have instances in existence) a lengthy exercise in frustration.



I'm also a big fan of 'structural' versioning (ala SVN, Git, Mercurial etc.), as it's the only way to properly support rollback in the presence of deletions.  Trying to explain to an irate user that they just deleted not only an asset but also its entire revision history is not something I particularly relish!



Rich query and search facilities are a given - it's one thing to put content into a CMS, but if you can't query and search that content, it's little better than a filesystem.



Replication (as in CouchDB, Git, etc.) is also an inevitable requirement for CMSes - I regularly see requirements for a CMS that can provide efficient access to documents across locations that are widely geographically distributed (including cases where connectivity to some of those locations is low bandwidth and/or intermittent).  Replication (with automatic conflict detection and sophisticated features to assist with the inevitably manual process of conflict resolution) is the only mechanism I'm aware of that can handle these cases.



And in closing, a big thank you to Julian Wraith for initiating this discussion - it's extremely refreshing to discover other folks who are as passionate and (if I may say) as opinionated about CMS technology as I am!
10 Comments
blog_commenter
Active Member
[...] This post was Twitted by scroisier [...]
blog_commenter
Active Member
Hi there, glad you liked the post!



Actually I think we are in agreement here too. There are three ways you need to use links, one the link from the authors point of view (I want to link here, so I reference it with a URL or some sort of internal reference), two internally is it strictly a foreign key or not, and three what can I do with (valid) links, all the things you describe, which are as you say key things a CMS needs to give you.



I think that the problems with strict foreign keys/internal references are two big (no sharding, how do you link to a page you have not written yet, all sorts of issues), so you need URI based links (although those might not be deployment URLs, but they are resource identifiers more generally) to name things, but the state of being broken is allowable, but there are tools to help the user manage this brokenness, in order to fix it.



Internally you do need access to the full graoh of (non broken!) links though in order to build your IA.



Links need to be first class objects too, as they need to be permissioned and versioned (eg you cannot see in your graph a private link someone has made).
pmonks2
Member II
Justin, it was spooky how much of your post mirrored my feelings on these topics!



That said, while I agree that links should be represented as 'dumb' URIs to the author, that representation isn't sufficient to be able to support the kinds of operations I mentioned above (specifically efficient bi-directional traversal and queryability).  For those, the CMS needs to have a persistence model that directly models the graph that those URIs represent - hence the mention of Neo4J,  (although it's by no means the only graph database out there - CODASYL has been around for quite a while!)  ;-)



As you say, sharding is a thorny problem for interrelated data structures such as graphs - there's not necessarily a clean way to split the content graph into shards without losing the ability to efficiently traverse / query the entire graph.  But in my experience sharding is primarily a performance optimisation (ie. is of most use in the the delivery tier where internet scale is important), while many of the operations mentioned above are primarily in support of authoring activities (where traffic is minuscule, by internet standards).  In other words these two limitations (can't have both sharding and efficient graph traversal / query) don't have to intersect - the CMS can optimise the authoring environment for efficient full-graph operations (at the cost of not being able to shard), while optimising the other way around for the delivery tier.



In fact I think replication is more important in authoring than sharding, as that supports the requirement for disconnected and/or geographically distributed authoring (think multi-national corporation with authors on every continent).
blog_commenter
Active Member
I keep meaning to have a serious look at Neo4J actually, but I havent had a chance yet. I think that some sort of incremental update model kind of like the CouchDB views,  but for graph algorithms would be nice. There is not a lot of literature on incremental graph algorithms though, even less than other incremental stuff. Most edits dont change the graph much.



You are right about authoring though, there is absolutely no reason not to throw lots of dedicated CPU time at making the authoring experience easier may as well shift the work to the client side if possible, make those multicore laptops do something. As you say this is a low traffic environment.



I notice Google Wave tries to do shared editing without merge conflicts (atomic updates only, always ordered by the server, no attempt to support disconnected operation as far as I can see), but I think you do need disconnected operation and interfaces for resolving merges.
blog_commenter
Active Member
[...] Peter Monk [...]
blog_commenter
Active Member
Peter,



You really need to pop into London Town so you can join me and Justin discussing this kind of thing :-) These are the kind of posts I was hoping to see in response to Julian's meme-starting antics. I wish I'd posted something more serious than a drunken MLK babble ..



Jon
blog_commenter
Active Member
[...] technical decisions that have to be made that there is not yet agreement on (except between me and Peter Monks!) and the existing putative standards (CMIS and JCR) do not extend far enough to take a position [...]
pmonks2
Member II
Jon, by strange coincidence I was there (Windsor actually) just last week.  There's a chance I'll be visiting the mother ship again later in the year, in which it'd be great to catch up for beers and blather!
blog_commenter
Active Member
[...] open up if content is stored in a more modern, non-file/folder based repository (see my earlier post for a brief discussion of the storage mechanism such a hypothetical CMS might [...]
blog_commenter
Active Member
[...] NoSQL and CMS – a Match made in Heaven?  As anyone who’s visited planet Earth in the last year or so knows, the NoSQL (“Not Only SQL”) movement is rapidly gaining both momentum and mind share, despite a number of prominent detractors. Rather than entering into a lengthy debate on the general pros and cons of NoSQL technologies, I’d like to reflect on the possible applications of these technologies to the specific problems of content management, a use case that (to my mind) it seems particularly well suited to. I briefly scraped the surface of this topic in a prior post. [...]