How can we run tests on production data without copying large amounts of data all the time?
We are thinking about integrating the Alfresco Content Services in one of our applications and are now confronted with the question of how we should go about development – in particular integration testing. We expect the ACS to harbour large amounts of data at some point in the future. This will make copying data to staging/test systems impractical. What would be ideal is if we could fire up a throw-away instance for development/testing that copies data on write from a production or at least staging system as we would not want our tests to corrupt the data all the time.
Has anyone had this issue before? How would you go about this problem?
We were thinking that we could implement a custom content store but this would only partially solve our problem because metadata would still come from the working store which is based on the RDBMS. This would result in discrepancies between the file contents and the metadata. We could also provide a custom implementation of the NodeService but it is not a supported extension point and we want to stay compatible with future ACS versions. Has anyone experience with custom implementations of the NodeService?
Re: How can we run tests on production data without copying large amounts of data all the time?
I don't know why you would consider customizing the NodeService to help with the "test against production data" problem.
Everyone faces the "how can my test data look as close to production as possible" problem. You can either approximate the production data with a test set that you re-load into a fresh repository when you are ready to test. Or, you can snapshot your database and content store volumes, spin up a clone of the production server, create volumes from the snapshots and attach those, and then run your tests.
Of course the larger your production data gets the more time it takes to snapshot and clone the data but that's one of the costs you pay to test against actual production data.
One thing you might think about is this: If the actual content (the binary files and the metadata) is really going to affect the functioning of the application so much so that testing the system with anything but real production data is not sufficient, perhaps you are asking Alfresco to do too much and the architecture should be revisited.
Integration tests using the Alfresco SDK work quite well. And they can be developed to import a consistent and reusable test data set on each run. To me, this kind of setup is far preferable.