SRE here. I feel for your situation. Here's some advice. One simple thing you could do is set up AWS billing alarms and have them delivered to a notification app like PagerDuty.
If you don't want to pay for PD, you can patch together any number of ways to get your phone to scream and holler when it gets an email from ohshit@amazonasws.com. It's also good to have clear expectations as to whose responsibility it is to deal with problem x between the hours of y and z and exactly what they are supposed to do.
Keep the alerts restricted to the really important stuff, because if your team becomes overloaded with useless alerts they will 1) dislike you and 2) be more prone to accidentally mistaking a five alarm fire for a burnt casserole.
There are more complex systems you could build, but that's a start.
Thank you for this. How can anyone run ANY service with ANY company and not add a clause in the contract (and then have the alerts up an running) in controlling costs?
I remember PagerDuty was advertising (a lot) on Leo Laporte's podcasts a few years back.
A clause in the contract: if monthly bill reaches $Xk amount then:
(a) seek written approval by client, and
(b) continue until $Yk or approval is given with a new ceiling price.
I was just playing around with AWS a while ago and was surprised that I could not find any option to put a cap on the amount I'd spend in a month. Only thing I could do was set up alerts.
I imagine AWS would have 0 problems suspending all my services if I can't pay, so why can't it do the same thing when it reaches my arbitrary cap?
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitori....
If you don't want to pay for PD, you can patch together any number of ways to get your phone to scream and holler when it gets an email from ohshit@amazonasws.com. It's also good to have clear expectations as to whose responsibility it is to deal with problem x between the hours of y and z and exactly what they are supposed to do.
Keep the alerts restricted to the really important stuff, because if your team becomes overloaded with useless alerts they will 1) dislike you and 2) be more prone to accidentally mistaking a five alarm fire for a burnt casserole.
There are more complex systems you could build, but that's a start.