At the end of my last post I alluded to an improved ingestion pipeline using Step Functions, this post looks at how we can use the recently announced AWS service Comprehend to analyse text files.
The updated architecture is shown below (click to enlarge).
The Lambda function that fetches the content sets a flag to indicate whether the content is an image, this is used by the Step Function definition to decide whether to call the ProcessImage or ProcessText Lambda function as shown in the Step Function definition below:
We covered the behaviour of the ProcessImage in the last post, the new ProcessText Lambda function takes the text and sends it Comprehend to detect entities and to perform semantic analysis. The function then looks for Person, Location and Date entities in the text and compares the positive and negative values from the sentiment analysis to determine the values for the properties on the acme:insuranceClaimReport custom type.
Everything required to deploy and run the demo is available in this GitHub repository. Clone the repository to your local machine using the command below and follow the deployment instructions.
git clone https://github.com/gavincornwell/firehose-step-functions-demo.git .
Once the repository is up and running follow the detailed demo steps to upload images and text files to the system and see the metadata in Alfresco get updated automatically.
I'll be a doing a live demo of this at the forthcoming DevCon in Lisbon, Portugal, hope to see you there!