Give yourself a test…

If you were to have a business critical application fail, like the sales team’s CRM or a customer facing video streaming service, how long would the following take?

  1. Identify the outage
  2. Understand the severity by users impacted
  3. Find the root cause (this is where roughly 70% of MTTR is spent)
  4. Remediation
  5. Update SLA adherence metrics and report out business impact

We recently found our client was working through those exact questions. The challenge is not to figure out the answers for a handful of applications. Most customers can do that with a data store and application mapping (sometimes using Excel or Access – which I don’t recommend). The real challenge comes with scale (that word is tossed around a lot so let’s be specific). When you need to manage 5,700 business applications with thousands of people making changes to servers, networks, and the applications themselves – that is scale.

So how would you answer the questions above?  You need the right CI data and process to discover and absorb changes to the environment. That data needs to be leveraged for application dependency mapping, so engineers understand their change risks. That model requires a significant amount of process automation to keep up with the speed of changes made to the environment – without it, the data quickly becomes irrelevant.

Start with the configuration data store (CMDB) and leverage discovery tools like HPE’s Universal Discovery to automatically detect change and update the CMDB. This becomes the single source of truth for the operations teams to build business service and application maps of system dependencies. Both engineers and service desk teams are the winners here; once those changes are deployed, the CMDB will update with new data that can be leveraged during future failure.