Questions to consider for post-incident improvement

Jason Yip
Apr 1, 2024
Faster detection, faster diagnosis, faster containment, reduced impact, prevention
Goals for post-incident improvement

How might we have detected the problem faster? Preferably before our users do.

How might we have diagnosed the problem faster? See observability.

How might we have contained the problem faster? See containment vs countermeasure.

How might we have reduced the impact of the problem? For example, see staged rollout.

How might we have prevented the problem?

--

--

Jason Yip

Senior Manager Product Engineering at Grainger. Extreme Programming, Agile, Lean guy. Ex-Spotify, ex-ThoughtWorks, ex-CruiseControl