Risks glossary


Risks glossary

Severity levels

Service360 defines 5 risk levels based on the severity of the threat imposed to the business.

  • Critical: direct threat to the day-to-day business operations
    • Abandoned service
    • No active contributors
    • Dependency on the RETIRED service
  • High: imposes a threat to the business operations in short-to-mid term
    • Service has no owner
    • Service depends on the Abandoned service
    • Service depends on the Deprecated service
    • Service has no active SME
    • Service has low bus-factor
    • Service is fragile
  • Medium: imposes a threat to the business operations in mid-to-long term
    • Service has no passport
    • Service has no recent releases
    • Service is deprecated
    • Service uses too many ASSESS/TRIAL technologies
    • Service uses ALIEN technologies
    • Service depends on the unknown service
    • Service has no dependencies documentation
    • Service has no disaster recovery documentation
    • Service audit is required
  • Low: imposes a threat to business operations in the long term
    • Service uses HOLD technologies
    • Service has no description
    • Service has no used technologies listed
    • Service has no explicit status
    • Service has no release documentation
    • Service has no development documentation
    • Service has no monitoring documentation
    • Service audit is older than 180 days
  • Info: Not a direct threat to business operations, but fixing an issue might either improve day-to-day IT operation or increase overall system robustness
    • Service is a potential SPOF
    • Service has no how-to documentation
    • Service has no post-mortem documentation

Critical severity risks

Direct threat to the day-to-day business operations

Abandoned service
Severity
CRITICAL
Trigger
No releases in the last 365 days
Explanation
Abandoned production services (abandonware) impose direct threat to normal business operations mode due to Software rot
Mitigation
Schedule service audit session
Retire service
No active contributors
Severity
CRITICAL
Trigger
There is no active contributors left with the hands-on knowledge of this service. (contributor is treated as active if they did any contribution to the library in the last 4 weeks)
Explanation
Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
Mitigation
Schedule knowledge transfer/service audit session within service owner team
Reassign service owner
Retire service
Dependency on the RETIRED service
Severity
CRITICAL
Trigger
Service depends on the RETIRED service (according to the Service Dependency Graph)
Explanation
Retired service might stop functioning any moment (if not already) and cause an outage of your service
Mitigation
Replace retired service with its alternative
Retire this service
Restore retired dependency service
Update dependency documentation, if it is outdated

High severity risks

Imposes a threat to the business operations in short-to-mid term

Service has no owner
Severity
HIGH
Trigger
Owner for the service is missing in ServicePassport
Explanation
True collective ownership in the enterprise environment is extremely rare (if ever existed). In most cases people tend to treat “collective” as “no ones”. In IT “no ones” services introduce a lot of additional failure modes to the business: starting from usual operations failures and to possible security issues/leakages.
High risk of software architecture/domain fragmentation/bloat.
Mitigation
Assign service owner
Service depends on the Abandoned service
Severity
HIGH
Trigger
Service depends on the abandoned service (according to the Service Dependency Graph)
Explanation
Abandoned services yield a high risk of an outage/longer recovery time
Mitigation
Replace the abandoned service with its alternative
Release (un-abandon) the abandoned service
Deprecate and retire the service
Service depends on the Deprecated service
Severity
HIGH
Trigger
Service depends on the deprecated service (according to the Service Dependency Graph)
Explanation
Deprecated service might stop functioning after deprecation period and will cause an outage of your service
Mitigation
Replace deprecated service with its alternative
Retire/deprecate the service
Consider un-deprecation of the dependency service
Service has no active SME
Severity
HIGH
Trigger
There is no active SMEs (Subject Matter Experts) left with the hands-on knowledge of this service (contributor is treated as active if they did any contribution to the library in the last 4 weeks)
Explanation
Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
Mitigation
Schedule knowledge transfer/audit session
Retire/deprecate the service
Service has low bus-factor
Severity
HIGH
Trigger
There is only one active contributor with hands-on knowledge of the service (contributor is treated as active if they did any contribution to the library in the last 4 weeks)
Explanation
Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
Mitigation
Schedule knowledge transfer/audit session
Reassign service owner
Retire/deprecate the service
Service is fragile
Severity
HIGH
Trigger
Service has >3 hard (synchronous) dependencies according to the Service Dependency Graph
Explanation
High potential of a cascading failure for your application landscape
Mitigation
Consider switching to async communication
Consider reducing the amount of deps
Consider using anti-fragility approaches: timeouts, circuit breakers, etc

Medium severity risks

Imposes a threat to the business operations in mid-to-long term

Service has no passport
Severity
MEDIUM
Trigger
ServicePassport is missing or malformed
Explanation
Missing passport blocks early detection of variety of other risks and prolongs new team members onboarding time.
Mitigation
Introduce Service Passport
Service has no recent releases
Severity
MEDIUM
Trigger
Service was not released in the last 90 days
Explanation
Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
Mitigation
Release an application
If active service development is finished, consider marking it with Context “MaintenanceOnly” in the ServicePassport
Service is deprecated
Severity
MEDIUM
Trigger
Service is marked as Deprecated in the Service Passport
Explanation
Deprecated service imposes a direct threat to a business due to a “deprecation” reason.
Mitigation
Retire the service
Service uses too many ASSESS/TRIAL technologies
Severity
MEDIUM
Trigger
Service uses > 3 technologies, which are marked as ASSESS/TRIAL in TechRadar
Explanation
Every non-production proven (ASSESS/TRIAL) technology used in the project exponentially increases project/service risks
Mitigation
Consider reducing the amount of the risk inducing technologies by replacing them with ADOPT alternatives
Consider promotion of ADOPT/TRIAL technologies to ADOPT in TechRadar
Service uses ALIEN technologies
Severity
MEDIUM
Trigger
Service uses technologies not listed in TechRadar
Explanation
Technologies not listed in the TechRadar are unknown to the company and may impose security or other business threats.
Mitigation
Consider replacing ALIEN technologies with the ADOPT/TRIAL alternatives
Consider adding ALIEN technologies into the TechRadar
Service depends on the unknown service
Severity
MEDIUM
Trigger
Service depends on the unknown service according to the Service Dependency Graph
Explanation
Blind spots in the architecture landscape might lead to suboptimal solutions on the landscape evolution path
Missing external services might lead to missed learning opportunities and wasted reimplementation/reintegration time
Mitigation
Add unknown services to the ExternalServices repository,
Remove dependency on the unknown service
Service has no dependencies documentation
Severity
MEDIUM
Trigger
Service has no dependency documentation
Explanation
Missing service dependency documentation blocks early detection of variety of other risks and prolongs new team members onboarding time
Mitigation
Introduce the Service Dependency Graph documentation
Service has no disaster recovery documentation
Severity
MEDIUM
Trigger
Disaster-recovery documentation for the service is missing
Explanation
Disaster recovery documentation should describe the process of the service recovery in cases of severe failures: database failure, network outage, backup procedures, etc.
Disaster recovery runbooks reduce time to recover service in case of failure
Disaster recovery runbooks make developers think about possible service failure modes and thus mitigate them early
Mitigation
Introduce service disaster-recovery documentation
Service audit is required
Severity
MEDIUM
Trigger
No service audit happened for at least 360 days
Explanation
Documentation tends to become outdated with time. In order to keep it up-to-date regular reviews are necessary.
Mitigation
Schedule a service audit session

Low severity risks

Imposes a threat to business operations in the long term

Service uses HOLD technologies
Severity
LOW
Trigger
Service uses technologies marked as HOLD in TechRadar
Explanation
Technologies marked as HOLD on TechRadar for a reason. Usage of those technologies for the production services is allowed only for the legacy applications. Usage of HOLD technologies imposes different risks to an organization.
Mitigation
Consider replacing HOLD technologies with the ADOPT/TRIAL alternatives
Service has no description
Severity
LOW
Trigger
Service has no description section in ServicePassport
Explanation
Potential loss of knowledge and wasted time during investigations (time is wasted for both investigator and SMEs)
Mitigation
Add description to the ServicePassport
Service has no used technologies listed
Severity
LOW
Trigger
Service has no Tech section in ServicePassport
Explanation
Used technologies are a very important part of the Service Passport as they allow to detect various application landscape risks as well as ease onboarding of the new contributors.
Mitigation
Add Tech section to the ServicePassport
Service has no explicit status
Severity
LOW
Trigger
Service has no Status section in ServicePassport
Explanation
Explicit status of the service (production/in development/retired) provides better transparency to the service consumers and allows early mitigation of multiple risks
Mitigation
Add Status section to the ServicePassport
Service has no release documentation
Severity
LOW
Trigger
Release documentation for the service is missing
Explanation
Release documentation should document procedures of the service release/rollback to production/staging/whatever else environment
Deployment (release/rollback) runbooks significantly reduce risks of knowledge-loss
Mitigation
Introduce service release documentation
Service has no development documentation
Severity
LOW
Trigger
Development documentation for the service is missing
Explanation
Development runbooks significantly reduce risks of knowledge-loss and reduce on-boarding time for the new contributors
Mitigation
Introduce service development documentation
Service has no monitoring documentation
Severity
LOW
Trigger
Monitoring documentation for the service is missing
Explanation
Monitoring runbooks significantly reduce risks of knowledge-loss
Monitoring runbooks provide great transparency into how service is maintained and enables fast audits and assessments of alerting gaps
Mitigation
Introduce service monitoring documentation
Service audit is older than 180 days
Severity
LOW
Trigger
Service had no audit in the last 180 days
Explanation
Documentation tends to become outdated with time. In order to keep it up-to-date regular reviews are necessary.
Mitigation
Schedule a service audit session

Info severity risks

Not a direct threat to business operations, but fixing an issue might either improve day-to-day IT operation or increase overall system robustness

Service is a potential SPOF
Severity
INFO
Trigger
Over 4 services have synchronous (hard) dependencies on this service.
Explanation
High potential of becoming a cascading failure entry point for your application landscape
Mitigation
Consider switching to async/soft dependency for dependent services
Consider introducing “fragility” mitigation patterns in the dependent services
Service has no how-to documentation
Severity
INFO
Trigger
How-to documentation for the service is missing
Explanation
How-to runbooks significantly reduce risks of knowledge-loss and save time and mental capacity needed to perform manual operations
Mitigation
Introduce service how-to documentation
Service has no post-mortem documentation
Severity
INFO
Trigger
Post-mortem documentation for the service is missing
Explanation
Every service will fail sooner or later. Every failure is a learning opportunity and this opportunity should not be missed. Every significant/customer-impacting failure should be documented in order for the future maintainers of the service to learn and not repeat the same mistakes again
Post-mortem reports are a concentrated experience. They provide huge learning opportunities for the team.
Mitigation
Introduce service post-mortem reports

Very soon

We are working hard with our alpha testers to make sure you will get the best in class product. Subscribe to our newsletter to stay informed about the progress and to receive an invitation to our beta test!