Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities that you need to build generative AI applications, simplifying development with security, privacy and responsible AI. Amazon Bedrock leverages AWS Lambda for invoking actions, Amazon S3 for training and validation data, and Amazon CloudWatch for tracking metrics.
With Amazon Bedrock, you can get a summary of textual content such as articles, blog posts, books, and documents to get the gist without having to read the full content. Other use cases that Amazon Bedrock can quickly get you started on are listed in the Amazon Bedrock FAQ.
In this post, we describe the use of Alfresco Content Services ReST and Amazon Bedrock SDK APIs to provide a summary of a document’s textual content from an Alfresco repository. We assume that you are competent at developing Java software using these APIs.
We first add an aspect to the repository and apply it documents we want to get summarizations for. Next the AmazonBedrockSummarizationAlfresco lambda function is triggered that queries the repository for documents marked for summarization and extracts textual content from those documents. The prompt to Amazon Bedrock is read from the crestBedrock:generateSummary property and concatenated with the textual content and sent to Amazon Bedrock. If the length of the textual content is less than ArbitrarySynchronousBedrockInvocationLength, Amazon Bedrock’s synchronous API is used; else, Amazon Bedrock’s batch API is used. Additionally, if the length of the textual content is greater than BedrockBatchClaudePromptMaxLength, the textual content will be truncated to fit Amazon Bedrock’s allowable prompt length.
A separate lambda function, AmazonBatchedBedrockSummarizationAlfresco retrieves JSONL files from S3 and processes them with Amazon Bedrock’s batch API. Amazon Bedrock writes completed batches back into S3 and the same lambda function retrieves them from S3 and writes the Amazon Bedrock generated summarizations into the crestBedock:summary attributes for the respective document nodes in the Alfresco repository.
If the textual content is shorter than ArbitrarySynchronousBedrockInvocationLength, the Amazon Bedrock runtime synchronous API) is used to generate the summary. If the length of the textual content is longer than BedrockBatchClaudePromptMaxLength it will be truncated before it is packaged and staged at s3Uri for asynchronous batch processing. Textual content longer than ArbitarySynchronousBedrockInvocationLength but shorter than BedrockBatchClaudePromptMaxLength is packaged and staged at the same s3Uri without truncation.
The string contained in modelOutput.completion is used to update the Alfresco document node’s crestBedock:summary attribute. The Alfresco node is identified in the name of the *.jsonl.out object as the substring after OutputRandomizerPrefixLength and before the .jsonl.out extension.
See this page. This project places the user’s instruction read from crestBedrock: prompt after the text extracted from the document when prompting Amazon Bedrock.
This post walks through the salient points in the design to enable document summarization on Alfresco Content Services with Amazon Bedrock.
The sample code, software libraries, command line tools, proofs of concept, templates, scripts, or other related technology (including any of the foregoing that are provided) is provided “as is” without warranty, representation, or guarantee of any kind. All that is provided in the post is without obligation of the author or anyone to provide any support, update, enhance, or guarantee its functionality, quality, or performance. You use the technology in this post at your own risk. Neither the author nor anyone else except you is liable or responsible for any issues arising from errors, omissions, inaccuracies or your use of this post. You are solely responsible for reviewing, testing, validating and determining the suitability of this post for your own purposes. By utilizing this post, you release the author and anyone else from any liability related to your use or implementation of it. You should not use this content in your production accounts, or on production or other critical data.
You are responsible for testing, securing, and optimizing this content, such as sample code and/or template, as appropriate for production-grade use based on your specific quality control practices and standards. Deploying this content may incur AWS charges for creating or using AWS chargeable resources, such as running inference on Amazon Bedrock or programs in AWS Lambda.