Loading video player...
In this video, we walk through how to use AWS CloudWatch to monitor and alert on Large Language Models deployed on Amazon Bedrock. You’ll learn how to observe default LLM metrics like input tokens, output tokens, and invocation latency, and how to extend CloudWatch with custom metrics such as Time To First Token (TTFT). We also cover setting up alarms and email notifications so you can proactively catch issues in production. You’ll learn how to: Monitor Bedrock LLMs using CloudWatch default metrics Track input tokens, output tokens, and invocation latency Add a custom CloudWatch metric for Time To First Token (TTFT) Visualize LLM performance in CloudWatch dashboards Configure alarms based on token usage thresholds Receive email alerts using Amazon SNS Test and validate alarms with real model traffic Timestamps: 0:00 - Overview of CloudWatch for LLM monitoring 0:44 - Default Bedrock metrics: tokens and latency 1:22 - Tracking Time To First Token (TTFT) with custom metrics 2:27 - Viewing metrics by model ID in CloudWatch 3:35 - Understanding averages, sums, and P99 metrics 4:33 - Adding custom TTFT metrics to the dashboard 5:28 - Creating alarms on LLM metrics 7:29 - Triggering and validating alerts with SNS Watch this video if you’re deploying LLMs on AWS, operating production AI systems, or need visibility into cost, latency, and performance for Bedrock-hosted models. This video is part of the LLM Engineering and Deployment Certification Program by Ready Tensor. Enroll Now: https://app.readytensor.ai/certifications/llm-engineering-and-deployment-DAROCXlj About Ready Tensor: Ready Tensor helps AI and ML professionals build, evaluate, and deploy intelligent systems through certifications, competitions, and real-world project publications. Learn more: https://www.readytensor.ai/ Like the video? Subscribe and tell us what other LLM deployment or monitoring topics you want us to cover.