Severity levels
Service360 defines 5 risk levels based on the severity of the threat imposed to
the business.
- Critical: direct threat to the day-to-day business operations
- Abandoned service
- No active contributors
- Dependency on the RETIRED service
- High: imposes a threat to the business operations in short-to-mid term
- Service has no owner
- Service depends on the Abandoned service
- Service depends on the Deprecated service
- Service has no active SME
- Service has low bus-factor
- Service is fragile
- Medium: imposes a threat to the business operations in mid-to-long term
- Service has no passport
- Service has no recent releases
- Service is deprecated
- Service uses too many ASSESS/TRIAL technologies
- Service uses ALIEN technologies
- Service depends on the unknown service
- Service has no dependencies documentation
- Service has no disaster recovery documentation
- Service audit is required
- Low: imposes a threat to business operations in the long term
- Service uses HOLD technologies
- Service has no description
- Service has no used technologies listed
- Service has no explicit status
- Service has no release documentation
- Service has no development documentation
- Service has no monitoring documentation
- Service audit is older than 180 days
- Info: Not a direct threat to business operations, but fixing an issue
might either improve day-to-day IT operation or increase overall system robustness
- Service is a potential SPOF
- Service has no how-to documentation
- Service has no post-mortem documentation
Critical severity risks
Direct threat to the day-to-day business operations
Abandoned service
- Severity
-
CRITICAL
- Trigger
- No releases in the last 365 days
- Explanation
- Abandoned production services (abandonware) impose direct threat to normal business operations mode due to Software rot
- Mitigation
- Schedule service audit session
- Retire service
No active contributors
- Severity
-
CRITICAL
- Trigger
- There is no active contributors left with the hands-on knowledge of this service.
(contributor is treated as active if they did any contribution to the library
in the last 4 weeks)
- Explanation
- Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
- Mitigation
- Schedule knowledge transfer/service audit session within service owner team
- Reassign service owner
- Retire service
Dependency on the RETIRED service
- Severity
-
CRITICAL
- Trigger
- Service depends on the RETIRED service (according to the Service Dependency Graph)
- Explanation
- Retired service might stop functioning any moment (if not already) and cause an outage of your service
- Mitigation
- Replace retired service with its alternative
- Retire this service
- Restore retired dependency service
- Update dependency documentation, if it is outdated
High severity risks
Imposes a threat to the business operations in short-to-mid term
Service has no owner
- Severity
-
HIGH
- Trigger
- Owner for the service is missing in ServicePassport
- Explanation
- True collective ownership in the enterprise environment is extremely rare (if ever existed).
In most cases people tend to treat “collective” as “no ones”. In IT “no ones” services introduce a lot of
additional failure modes to the business: starting from usual operations failures and to possible security issues/leakages.
- High risk of software architecture/domain fragmentation/bloat.
- Mitigation
- Assign service owner
Service depends on the Abandoned service
- Severity
-
HIGH
- Trigger
- Service depends on the abandoned service (according to the Service Dependency Graph)
- Explanation
- Abandoned services yield a high risk of an outage/longer recovery time
- Mitigation
- Replace the abandoned service with its alternative
- Release (un-abandon) the abandoned service
- Deprecate and retire the service
Service depends on the Deprecated service
- Severity
-
HIGH
- Trigger
- Service depends on the deprecated service (according to the Service Dependency Graph)
- Explanation
- Deprecated service might stop functioning after deprecation period and will cause an outage of your service
- Mitigation
- Replace deprecated service with its alternative
- Retire/deprecate the service
- Consider un-deprecation of the dependency service
Service has no active SME
- Severity
-
HIGH
- Trigger
- There is no active SMEs (Subject Matter Experts) left with the hands-on knowledge of this service
(contributor is treated as active if they did any contribution to the library
in the last 4 weeks)
- Explanation
- Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
- Mitigation
- Schedule knowledge transfer/audit session
- Retire/deprecate the service
Service has low bus-factor
- Severity
-
HIGH
- Trigger
- There is only one active contributor with hands-on knowledge of the service
(contributor is treated as active if they did any contribution to the library
in the last 4 weeks)
- Explanation
- Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
- Mitigation
- Schedule knowledge transfer/audit session
- Reassign service owner
- Retire/deprecate the service
Service is fragile
- Severity
-
HIGH
- Trigger
- Service has >3 hard (synchronous) dependencies according to the Service Dependency Graph
- Explanation
- High potential of a cascading failure for your application landscape
- Mitigation
- Consider switching to async communication
- Consider reducing the amount of deps
- Consider using anti-fragility approaches: timeouts, circuit breakers, etc
Medium severity risks
Imposes a threat to the business operations in mid-to-long term
Service has no passport
- Severity
-
MEDIUM
- Trigger
- ServicePassport is missing or malformed
- Explanation
- Missing passport blocks early detection of variety of other risks and prolongs new team members onboarding time.
- Mitigation
- Introduce Service Passport
Service has no recent releases
- Severity
-
MEDIUM
- Trigger
- Service was not released in the last 90 days
- Explanation
- Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
- Mitigation
- Release an application
- If active service development is finished, consider
marking it with Context “MaintenanceOnly” in the ServicePassport
Service is deprecated
- Severity
-
MEDIUM
- Trigger
- Service is marked as Deprecated in the Service Passport
- Explanation
- Deprecated service imposes a direct threat to a business due to a “deprecation” reason.
- Mitigation
- Retire the service
Service uses too many ASSESS/TRIAL technologies
- Severity
-
MEDIUM
- Trigger
- Service uses > 3 technologies, which are marked as ASSESS/TRIAL in TechRadar
- Explanation
- Every non-production proven (ASSESS/TRIAL) technology used in
the project exponentially increases project/service risks
- Mitigation
- Consider reducing the amount of the risk inducing technologies by replacing them with ADOPT alternatives
- Consider promotion of ADOPT/TRIAL technologies to ADOPT in TechRadar
Service uses ALIEN technologies
- Severity
-
MEDIUM
- Trigger
- Service uses technologies not listed in TechRadar
- Explanation
- Technologies not listed in the TechRadar are unknown to the company and may impose security or other business threats.
- Mitigation
- Consider replacing ALIEN technologies with the ADOPT/TRIAL alternatives
- Consider adding ALIEN technologies into the TechRadar
Service depends on the unknown service
- Severity
-
MEDIUM
- Trigger
- Service depends on the unknown service according to the Service Dependency Graph
- Explanation
- Blind spots in the architecture landscape might lead to suboptimal solutions on the landscape evolution path
- Missing external services might lead to missed learning opportunities and wasted reimplementation/reintegration time
- Mitigation
- Add unknown services to the ExternalServices repository,
- Remove dependency on the unknown service
Service has no dependencies documentation
- Severity
-
MEDIUM
- Trigger
- Service has no dependency documentation
- Explanation
- Missing service dependency documentation blocks early detection of variety of other risks and prolongs new team members onboarding time
- Mitigation
- Introduce the Service Dependency Graph documentation
Service has no disaster recovery documentation
- Severity
-
MEDIUM
- Trigger
- Disaster-recovery documentation for the service is missing
- Explanation
- Disaster recovery documentation should describe
the process of the service recovery in cases of severe failures: database failure, network outage, backup procedures, etc.
- Disaster recovery runbooks reduce time to recover service in case of failure
- Disaster recovery runbooks make developers think about possible service failure modes and thus mitigate them early
- Mitigation
- Introduce service disaster-recovery documentation
Service audit is required
- Severity
-
MEDIUM
- Trigger
- No service audit happened for at least 360 days
- Explanation
- Documentation tends to become outdated with time. In order to keep it up-to-date regular reviews are necessary.
- Mitigation
- Schedule a service audit session
Low severity risks
Imposes a threat to business operations in the long term
Service uses HOLD technologies
- Severity
-
LOW
- Trigger
- Service uses technologies marked as HOLD in TechRadar
- Explanation
- Technologies marked as HOLD on TechRadar for a reason. Usage of those technologies for the production
services is allowed only for the legacy applications. Usage of HOLD technologies imposes different risks to an
organization.
- Mitigation
- Consider replacing HOLD technologies with the ADOPT/TRIAL alternatives
Service has no description
- Severity
-
LOW
- Trigger
- Service has no description section in ServicePassport
- Explanation
- Potential loss of knowledge and wasted time during investigations (time is wasted for both investigator and SMEs)
- Mitigation
- Add description to the ServicePassport
Service has no used technologies listed
- Severity
-
LOW
- Trigger
- Service has no Tech section in ServicePassport
- Explanation
- Used technologies are a very important part of the Service Passport as they allow to detect various application landscape risks
as well as ease onboarding of the new contributors.
- Mitigation
- Add Tech section to the ServicePassport
Service has no explicit status
- Severity
-
LOW
- Trigger
- Service has no Status section in ServicePassport
- Explanation
- Explicit status of the service (production/in development/retired) provides better transparency to the
service consumers and allows early mitigation of multiple risks
- Mitigation
- Add Status section to the ServicePassport
Service has no release documentation
- Severity
-
LOW
- Trigger
- Release documentation for the service is missing
- Explanation
- Release documentation should document procedures
of the service release/rollback to production/staging/whatever else environment
- Deployment (release/rollback) runbooks significantly reduce risks of knowledge-loss
- Mitigation
- Introduce service release documentation
Service has no development documentation
- Severity
-
LOW
- Trigger
- Development documentation for the service is missing
- Explanation
- Development runbooks significantly reduce risks of knowledge-loss and reduce on-boarding time for the new contributors
- Mitigation
- Introduce service development documentation
Service has no monitoring documentation
- Severity
-
LOW
- Trigger
- Monitoring documentation for the service is missing
- Explanation
- Monitoring runbooks significantly reduce risks of knowledge-loss
- Monitoring runbooks provide great transparency into how service is maintained and enables fast audits and
assessments of alerting gaps
- Mitigation
- Introduce service monitoring documentation
Service audit is older than 180 days
- Severity
-
LOW
- Trigger
- Service had no audit in the last 180 days
- Explanation
- Documentation tends to become outdated with time. In order to keep it up-to-date regular reviews are necessary.
- Mitigation
- Schedule a service audit session
Info severity risks
Not a direct threat to business operations, but fixing an issue
might either improve day-to-day IT operation or increase overall system robustness
Service is a potential SPOF
- Severity
-
INFO
- Trigger
- Over 4 services have synchronous (hard) dependencies on this service.
- Explanation
- High potential of becoming a cascading failure entry point for your application landscape
- Mitigation
- Consider switching to async/soft dependency for dependent services
- Consider introducing “fragility” mitigation patterns in the dependent services
Service has no how-to documentation
- Severity
-
INFO
- Trigger
- How-to documentation for the service is missing
- Explanation
- How-to runbooks significantly reduce risks of knowledge-loss and save time and mental capacity needed to perform
manual operations
- Mitigation
- Introduce service how-to documentation
Service has no post-mortem documentation
- Severity
-
INFO
- Trigger
- Post-mortem documentation for the service is missing
- Explanation
- Every service will fail sooner or later.
Every failure is a learning opportunity and this opportunity should not be missed. Every
significant/customer-impacting failure should be documented in order for the future maintainers of the service
to learn and not repeat the same mistakes again
- Post-mortem reports are a concentrated experience. They provide huge learning opportunities for the team.
- Mitigation
- Introduce service post-mortem reports