Chatting up CloudWatch

Building a conversational Slack bot I use all the time with Dialogflow

Jordan Place
Aug 29th, 2019
Share

When something goes awry on production, I’m itching to investigate. I love debugging and the hunt is on. But, nearly every time, my momentum is stymied by the same foe: the AWS CloudWatch UI.

There’s no gentle way to put this — I’m really bad at the CloudWatch UI. There are so many “gotchas” in navigating it! I’m constantly surprised by the log filtering syntax. I look up logs by UTC timestamp when I mean to use local time. I accidentally select misleading axes for graphs (and then stare at them for way too long). Despite years of practice, my CloudWatch repertoire remains limited.

I’ve wondered, what would a better CloudWatch experience look like? Hmmm, how about an intern to use the UI for me? I could ask them things like:

  • “Hey, get the stack trace for request id dbe3e44a-480f-4cba-b100-c81dd2ed9348.”
  • “Show me the number of requests between 5-6PM last night.”
  • “After you grab me that coffee, find out how many times this error occured yesterday.”

Fortunately for them, our interns work on more significant projects. And, it turns out I can get this same experience with just some automation and NLP. It took a few hours of experimentation in Transposit to yield my debugging Slack bot, IggyIgz.

IggyIgz uses Dialogflow, so I can chat with it instead of treating it like a glorified CLI. It uses Transposit to pull data from CloudWatch, so our conversations are actually useful (sorry SmarterChild). I get to start debugging in Slack and escalate to the CloudWatch UI only for dire situations. Perfect.

In the rest of this blog post, I’ll walk through how I built IggyIgz. Use this as a roadmap to build a useful NLP chatbot for your own data.

Dialogflow

Dialogflow is a platform for building conversational interfaces. You pass it natural language input and train it to understand a set of intents.

IggyIgz is an agent that understands just one intent: search CloudWatch logs by requestId. This intent requires two parameters:

  • requestId – Find log statements for this HTTP request
  • instance – Search either “staging” or “production” log streams

To start, I wrote out phrases that I thought should indicate this intent. I annotated them to train my agent to parse parameters.

Dialogflow gave me fine-grained control over parameters. I could teach my agent to understand synonyms, assume default values, or even follow-up if a required parameter was omitted from a chat message.

I used Dialogflow’s Slack integration to expose my agent as a Slack bot. I could DM @IggyIgz to test out the parsing. Every message received was saved as training data, so I could correct mistakes to improve the agent.

Transposit

After about 100 training messages, my Slack bot reliably understood me. But it didn’t yet know how to respond.

To let it respond, I configured a Dialogflow webhook to call out to Transposit for intent fulfillment.

I tested with a simple webhook that echoed into Slack. My bot was working!

// fulfillment
({ http_event }) => {
 return {
 status_code: 200,
 headers: { "Content-Type": "application/json" },
 body: {
 payload: {
 slack: {
 text: `\`\`\`${JSON.stringify(http_event.parsed_body, null, 2)}\`\`\``
 }
 }
 }
 };
};

As a last step, I changed the webhook to act on the intent of the user. I used our AWS CloudWatch data connector to query logs. SQL was the easiest way find log groups and filter log statements by requestId.

-- describe_log_groups
SELECT * FROM aws_cloudwatch_logs.describe_log_groups
-- ---->
-- [
-- {
-- "arn": "arn:aws:logs:us-west-2:967604848322:log-group:/aws/ecs/staging/web:*",
-- "creationTime": 1554959217198,
-- "logGroupName": "/aws/ecs/staging/web",
-- "metricFilterCount": 15,
-- "storedBytes": 892687097
-- },
-- ...
-- ]
-- describe_log_streams
SELECT logStreamName FROM aws_cloudwatch_logs.describe_log_streams
 WHERE $body.logGroupName="/aws/ecs/" + @instance + "/web"
 AND $body.orderBy="LastEventTime"
 AND $body.descending=TRUE
 LIMIT 1
-- filter_log_events
SELECT * FROM aws_cloudwatch_logs.filter_log_events
 WHERE $body.logGroupName="/aws/ecs/" + @instance + "/web"
 AND $body.filterPattern='"' + @requestId + '"'
 AND $body.logStreamNames = (
 SELECT [logStreamName] FROM this.describe_log_streams
 WHERE instance = @instance
 )

I used JavaScript to format this data as a Slack message and post a response via Dialogflow.

// fulfillment
({ http_event }) => {
 const parsed_body = http_event.parsed_body;
 const intent = parsed_body.queryResult.intent.name;
 const parameters = parsed_body.queryResult.parameters;

 // fetch logs from AWS
 const log_events = api.run("this.filter_log_events", {
 instance: parameters.instance,
 requestId: parameters.requestId
 });

 // format a message for slack
 const message = [
 {
 type: "section",
 text: {
 type: "mrkdwn",
 text: `I searched \`${parameters.instance}\` for request \`${parameters.requestId}\` :`
 }
 }
 ];
 if (log_events.length === 0) {
 message.push({
 type: "section",
 text: {
 type: "mrkdwn",
 text: "_No logs matched_ :cry:"
 }
 });
 }
 for (const log_event of log_events) {
 const short_log_stream_name = log_event.logStreamName.substring(0, 7);
 const log_message = log_event.message;
 message.push({
 type: "section",
 text: {
 type: "mrkdwn",
 text: `_(${short_log_stream_name}...)_\n\`\`\`${log_message}\`\`\``
 }
 });
 }

 // post message to slack
 return {
 status_code: 200,
 headers: { "Content-Type": "application/json" },
 body: {
 payload: {
 slack: {
 attachments: [
 {
 blocks: message
 }
 ]
 }
 }
 }
 };
};

I committed my code and my Slack bot became functional!

Next steps

Beyond what I’ve outlined here, there’s lots of ways to make IggyIgz the best CloudWatch experience.

Teach new intents

Teach IggyIgz to take over more CloudWatch tedium:

  • Search CloudWatch logs within a timespan
  • Check the frequency of a log statement
  • Graph a CloudWatch metric

Create each new intent in Dialogflow. Then, write code in Transposit to fulfill them.

Improve responsiveness

When IggyIgz takes a moment to respond, have it post a quick message: “Hold on a sec! Lemme see…”

Use our Slack data connector to post an ephemeral message instead of responding through Dialgflow. This will give you much more control over the conversation.

Post interactive messages

Make IggyIgz ask for approval before performing a particularly expensive CloudWatch query.

Post an interactive message in Slack and give the user a clear yes/no interface.

Build your own IggyIgz

I built IggyIgz for my specific needs, so it’s pretty tied to Transposit’s AWS infrastructure - that’s what makes it useful. To build your own bot, fork my code and tweak it for your own infrastructure! Get started here.

Share