这是indexloc提供的服务,不要输入任何密码
Skip to content
This repository was archived by the owner on Feb 13, 2025. It is now read-only.

Conversation

@johnewing1
Copy link
Contributor

@johnewing1 johnewing1 commented May 29, 2018

Requires a role with the ListMetrics,GetMetricData and GetMetricStatistics capabilities.
Role can be supplied using either the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables or by attaching it to the EC2 instance on which bosun runs.

Example query

$region="eu-west-1"
$namespace="AWS/EC2"
$metric="CPUUtilization"
$period="60"
$statistics="Average"
$dimensions="InstanceId:i-0123456abcedfgh"
$start = 1h
cloudwatch($region, $namespace, $metric, $period, $statistics, $dimensions, "$start" ,"")

@captncraig captncraig requested a review from kylebrandt June 6, 2018 17:09
Copy link
Contributor

@captncraig captncraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments. I am not sure exatly what to make of the api before I see a concrete example in some docs.

@@ -0,0 +1,105 @@
// Package cloudwatch defines structures for interacting with Cloudwatch Metrics.
package cloudwatch // import "bosun.org/cloudwatch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there is a better place for this package than off of the root? Perhaps cmd/bosun/cloudwatch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd based it off the graphite backend, so was copying the pattern where there was an top level graphite and opentsdb folder with the struct representing the query objects. I notice that the newer data sources like elastic and influx are in cmd/bosun/expr/
I could refactor it to just all live in the cmd/bosun/expr/cloudwatch.go file if that works for you ?

return parsed
}

func CloudWatchQuery(e *State, T miniprofiler.Timer, region, namespace, metric, period, statistic, dimensions, sduration, eduration string) (*Results, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some docs that explain how to use this please? Its a bit hard to visualize without an example as the end-user will call into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation for the public method, configuration section and a full example of an alert using cloud watch metrics

@kylebrandt
Copy link
Member

Like the idea, been thinking about doing something similar for azure. General thoughts on top of what Craig said:

  • Maybe this should be done like elastic in that multiple cloudwatch instances (different auth) could be used from the same bosun instance?
  • Expr name: clouldwatch is long, and also if you want to add more function types in the future they might be cloudwatchsomething, maybe just cw instead.

@johnewing1 johnewing1 force-pushed the cloudwatch-support branch from d76c901 to 6bfb9b2 Compare June 15, 2018 09:59
@johnewing1
Copy link
Contributor Author

I've updated it so it now supports passing in a profile name in a prefix key similar to the elastic datasource.

$region = "eu-west-1"
$namespace = "AWS/EC2"
$metric = "CPUUtilization"
$period = "60"
$statistics = "Average"

$prodDim = "InstanceId:i-1234567890abcdef0"
$testDim = "InstanceId:i-0598c7d356eba48d7"

$p = ["prod"]cw($region, $namespace, $metric, $period, $statistics, $prodDim, "1h" ,"")
$t = ["test"]cw($region, $namespace, $metric, $period, $statistics, $testDim, "1h" ,"")

b, _ := json.MarshalIndent(req, "", " ")
T.StepCustomTiming("cloudwatch", "query", string(b), func() {

// disabled caching fuctionality
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no caching?

Copy link
Member

@kylebrandt kylebrandt Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually to put it differently querying is pretty essential for the expression language. Since Bosun expressions don't have "real" variables that hold values but just text expansion, operations that combine sets can result in queries being made repeatedly in a single expression. Therefore it is important that those queries be cache hits in this case.

svc := new(mockCloudWatchClient)
c.Profiles["bosun-default"] = svc
e := State{
now: time.Date(2018, time.January, 1, 0, 0, 0, 0, time.UTC),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment explaining this Time value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I lost that I was in a test function - ignore comment :-/

alert elb_error_rate {
critNotification = default
template = elb_error_rate
$region="eu-west-1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I don't know much about cloudwatch.

I wonder if we may want operations that can get N items and tag results for things like region and or/namespaces. So for example, might I want something like (again, making things up since I don't know CW): cw(getregions("myApp", "eu-*"), ...). Then one query is made for your regions and you are writing one alert that applies to whatever regions are for that application. However, maybe CW doesn't even work like this so this doesn't mean much. If that would be better, we can add new types to the language - like a awsRegions type that returns a list of regions easily. So if you think changing certain arguments to fetch and pass multiple items might reduce alert repition we can explore that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have other ways to reduce duplication though through macros. That way things are a bit more declarative but you are not repeating all the contents of the alert. So as a CW person, asking you to imagine your own environment and what other people might be doing with CW for what might make sense.

It also isn't the end of the world if we go forward and make some new functions that can do more if they are needed. Don't want to over-engineer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been having a think about this.
It would probably be most useful to have a function which can populate the dimensions parameter based on filter criteria similar to the grafana dimension_values templating function
http://docs.grafana.org/features/datasources/cloudwatch/#query-variable

A good example of where this is useful is that it's hard to write an alert to check the individual CPU usage of EC2 instances in a auto scaling group. Cloudwatch doesn't have the ability to wildcard on the instance id and if you only supply the id of the autoscaling group it will return a single series with the average for the whole group.

in theory this would look something like

$prodDim = getdimensions("instanceId","tag:asg=webservers-scaling-group")
 ["prod"]cw($region, $namespace, $metric, $period, $statistics, $prodDim, "1h" ,"")

The drawback is that you need to do a getMetricStatistics call for each unique dimension value

For namespace & metric I don't think it makes sense because it would be hard to write a query that would actually return anything due to the way cloud watch dimensions work.

Similarly I'm not sure how effective pulling from multiple regions in a single query is, because it's comparable to querying multiple tsdb servers with one call.

Is this something you'd want tackled immediately ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At Stack we are currently looking to do something similar with Azure monitoring this month. So the main reason I am interested is to see how our requirements for that will relate to this - and what the commonalities are.

In terms of which arguments are strings (maybe a csv within a string) there are two ways we might be able to get both things in certain instances:

  • Make an argument type that takes either a string that has a list of dimensions, or a awsDemensionType that would be the return of a getdimensions call (or something like this, can replace dimensions with regions etc)
  • Have multiple functions with different names that are similar, but take different sorts of arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had an attempt at adapting to allow multiple dimensions, and think the approach of having multiple functions which return the groups of dimensions to use in the query is the cleanest.

The cw() function dimension parameter accepts a series of helper functions which give it a list of dimensions to apply.

cwdim() just parses the dimensions directly from a string. So in the simplest case

$dim=cwdim("LoadBalancerName:dynamic-content-elb")

Or if you wanted to specify multiple entries

$dim=cwdim("LoadBalancerName:dynamic-content-elb|LoadBalancerName:static-content-elb")

If for things like s3 where you need 2 or more dimensions per query do

$dim = cwdim("type:IA,name:prod|type:IA,name:test")

Which would perform two queries and return a seriesSet with with two series with tags
{type=IA,name=test}
{type=IA,name=prod}

End to end example

$region="eu-west-1"
$namespace="AWS/ELB"
$metric="UnHealthyHostCount"
$period="300"
$statistics="Maximum"
$dimensions=cwdim("LoadBalancerName:dynamic-content-elb|LoadBalancerName:static-content-elb")
$unhealthy_hosts=["production"]cw($region, $namespace, $metric, $period, $statistics, $dimensions, "12h" ,"")
$unhealthy_hosts

For queries where you want to dynamically determine the dimensions have functions like cwdesc to query for the values you want

cwdesc("eu-west-1","InstanceId", "tag:Stack:Elasticsearch")

This would use e2_describe api to get the instance ids for all ec2 instances tagged with Stack=Elasticsearch

I have a working prototype using cw & cwdim. If you think this approach is ok I'll finish the other functions and tidy the code up so it's suitable for review
@kylebrandt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya that sounds similar to like what I did with Azure. You probably want to create a type for it in the expression language (see the Azure commit) so the type checking can make sure the argument is sane.

The other outstanding commit which may get which may require some rebasing or whatever is the moving miniprofiler into the state, but that is just simple find and replace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR with the new functions to support dynamically generating the dimensions, and updated the docs. Query format looks like this now

Directly specifying dimension values

$region="eu-west-1"
$namespace="AWS/EC2"
$metric="CPUUtilization"
$period=60
$statistics="Average"
$dimensions=cwdim('''[ {"InstanceId":"i-0123456abcedfgh"} ] ''')
$start = 1h

cw($region, $namespace, $metric, $period, $statistics, $dimensions, "$start" ,"")

Getting dimension values by query ec2_describe api

$region="eu-west-1"
$namespace="AWS/EC2"
$metric="CPUUtilization"
$period=60
$statistics="Average"
$filter = '''[{"tag:Purpose":["database"]} , {"InstanceType":["m4.2xlarge"]}]'''
$dimensions = cwdimq("eu-west-1","EC2","InstanceId",$filter) 

$start = 1h

cw($region, $namespace, $metric, $period, $statistics, $dimensions, "$start" ,"")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kylebrandt @captncraig If you have time it be great to get feedback on the new query format. Thanks

@johnewing1
Copy link
Contributor Author

Closing in favour of #2484

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants