Service360.io - Risks glossary

Risks glossary

Severity levels

Service360 defines 5 risk levels based on the severity of the threat imposed to the business.

Critical: direct threat to the day-to-day business operations
- Abandoned service
- No active contributors
- Dependency on the RETIRED service
High: imposes a threat to the business operations in short-to-mid term
- Service has no owner
- Service depends on the Abandoned service
- Service depends on the Deprecated service
- Service has no active SME
- Service has low bus-factor
- Service is fragile
Medium: imposes a threat to the business operations in mid-to-long term
- Service has no passport
- Service has no recent releases
- Service is deprecated
- Service uses too many ASSESS/TRIAL technologies
- Service uses ALIEN technologies
- Service depends on the unknown service
- Service has no dependencies documentation
- Service has no disaster recovery documentation
- Service audit is required
Low: imposes a threat to business operations in the long term
- Service uses HOLD technologies
- Service has no description
- Service has no used technologies listed
- Service has no explicit status
- Service has no release documentation
- Service has no development documentation
- Service has no monitoring documentation
- Service audit is older than 180 days
Info: Not a direct threat to business operations, but fixing an issue might either improve day-to-day IT operation or increase overall system robustness
- Service is a potential SPOF
- Service has no how-to documentation
- Service has no post-mortem documentation

Critical severity risks

Direct threat to the day-to-day business operations

Abandoned service

Severity: CRITICAL
Trigger: No releases in the last 365 days
Explanation: Abandoned production services (abandonware) impose direct threat to normal business operations mode due to Software rot
Mitigation: Schedule service audit session; Retire service

No active contributors

Severity: CRITICAL
Trigger: There is no active contributors left with the hands-on knowledge of this service. (contributor is treated as active if they did any contribution to the library in the last 4 weeks)
Explanation: Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
Mitigation: Schedule knowledge transfer/service audit session within service owner team; Reassign service owner; Retire service

Dependency on the RETIRED service

Severity: CRITICAL
Trigger: Service depends on the RETIRED service (according to the Service Dependency Graph)
Explanation: Retired service might stop functioning any moment (if not already) and cause an outage of your service
Mitigation: Replace retired service with its alternative; Retire this service; Restore retired dependency service; Update dependency documentation, if it is outdated

High severity risks

Imposes a threat to the business operations in short-to-mid term

Service has no owner

Severity: HIGH
Trigger: Owner for the service is missing in ServicePassport
Explanation: True collective ownership in the enterprise environment is extremely rare (if ever existed). In most cases people tend to treat “collective” as “no ones”. In IT “no ones” services introduce a lot of additional failure modes to the business: starting from usual operations failures and to possible security issues/leakages.; High risk of software architecture/domain fragmentation/bloat.
Mitigation: Assign service owner

Service depends on the Abandoned service

Severity: HIGH
Trigger: Service depends on the abandoned service (according to the Service Dependency Graph)
Explanation: Abandoned services yield a high risk of an outage/longer recovery time
Mitigation: Replace the abandoned service with its alternative; Release (un-abandon) the abandoned service; Deprecate and retire the service

Service depends on the Deprecated service

Severity: HIGH
Trigger: Service depends on the deprecated service (according to the Service Dependency Graph)
Explanation: Deprecated service might stop functioning after deprecation period and will cause an outage of your service
Mitigation: Replace deprecated service with its alternative; Retire/deprecate the service; Consider un-deprecation of the dependency service

Service has no active SME

Severity: HIGH
Trigger: There is no active SMEs (Subject Matter Experts) left with the hands-on knowledge of this service (contributor is treated as active if they did any contribution to the library in the last 4 weeks)
Explanation: Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
Mitigation: Schedule knowledge transfer/audit session; Retire/deprecate the service

Service has low bus-factor

Severity: HIGH
Trigger: There is only one active contributor with hands-on knowledge of the service (contributor is treated as active if they did any contribution to the library in the last 4 weeks)
Explanation: Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
Mitigation: Schedule knowledge transfer/audit session; Reassign service owner; Retire/deprecate the service

Service is fragile

Severity: HIGH
Trigger: Service has >3 hard (synchronous) dependencies according to the Service Dependency Graph
Explanation: High potential of a cascading failure for your application landscape
Mitigation: Consider switching to async communication; Consider reducing the amount of deps; Consider using anti-fragility approaches: timeouts, circuit breakers, etc

Medium severity risks

Imposes a threat to the business operations in mid-to-long term

Service has no passport

Severity: MEDIUM
Trigger: ServicePassport is missing or malformed
Explanation: Missing passport blocks early detection of variety of other risks and prolongs new team members onboarding time.
Mitigation: Introduce Service Passport

Service has no recent releases

Severity: MEDIUM
Trigger: Service was not released in the last 90 days
Explanation: Increased probability of loss of knowledge/team trust in the app and slow response on the any occurring incident
Mitigation: Release an application; If active service development is finished, consider marking it with Context “MaintenanceOnly” in the ServicePassport

Service is deprecated

Severity: MEDIUM
Trigger: Service is marked as Deprecated in the Service Passport
Explanation: Deprecated service imposes a direct threat to a business due to a “deprecation” reason.
Mitigation: Retire the service

Service uses too many ASSESS/TRIAL technologies

Severity: MEDIUM
Trigger: Service uses > 3 technologies, which are marked as ASSESS/TRIAL in TechRadar
Explanation: Every non-production proven (ASSESS/TRIAL) technology used in the project exponentially increases project/service risks
Mitigation: Consider reducing the amount of the risk inducing technologies by replacing them with ADOPT alternatives; Consider promotion of ADOPT/TRIAL technologies to ADOPT in TechRadar

Service uses ALIEN technologies

Severity: MEDIUM
Trigger: Service uses technologies not listed in TechRadar
Explanation: Technologies not listed in the TechRadar are unknown to the company and may impose security or other business threats.
Mitigation: Consider replacing ALIEN technologies with the ADOPT/TRIAL alternatives; Consider adding ALIEN technologies into the TechRadar

Service depends on the unknown service

Severity: MEDIUM
Trigger: Service depends on the unknown service according to the Service Dependency Graph
Explanation: Blind spots in the architecture landscape might lead to suboptimal solutions on the landscape evolution path; Missing external services might lead to missed learning opportunities and wasted reimplementation/reintegration time
Mitigation: Add unknown services to the ExternalServices repository,; Remove dependency on the unknown service

Service has no dependencies documentation

Severity: MEDIUM
Trigger: Service has no dependency documentation
Explanation: Missing service dependency documentation blocks early detection of variety of other risks and prolongs new team members onboarding time
Mitigation: Introduce the Service Dependency Graph documentation

Service has no disaster recovery documentation

Severity: MEDIUM
Trigger: Disaster-recovery documentation for the service is missing
Explanation: Disaster recovery documentation should describe the process of the service recovery in cases of severe failures: database failure, network outage, backup procedures, etc.; Disaster recovery runbooks reduce time to recover service in case of failure; Disaster recovery runbooks make developers think about possible service failure modes and thus mitigate them early
Mitigation: Introduce service disaster-recovery documentation

Service audit is required

Severity: MEDIUM
Trigger: No service audit happened for at least 360 days
Explanation: Documentation tends to become outdated with time. In order to keep it up-to-date regular reviews are necessary.
Mitigation: Schedule a service audit session

Low severity risks

Imposes a threat to business operations in the long term

Service uses HOLD technologies

Severity: LOW
Trigger: Service uses technologies marked as HOLD in TechRadar
Explanation: Technologies marked as HOLD on TechRadar for a reason. Usage of those technologies for the production services is allowed only for the legacy applications. Usage of HOLD technologies imposes different risks to an organization.
Mitigation: Consider replacing HOLD technologies with the ADOPT/TRIAL alternatives

Service has no description

Severity: LOW
Trigger: Service has no description section in ServicePassport
Explanation: Potential loss of knowledge and wasted time during investigations (time is wasted for both investigator and SMEs)
Mitigation: Add description to the ServicePassport

Service has no used technologies listed

Severity: LOW
Trigger: Service has no Tech section in ServicePassport
Explanation: Used technologies are a very important part of the Service Passport as they allow to detect various application landscape risks as well as ease onboarding of the new contributors.
Mitigation: Add Tech section to the ServicePassport

Service has no explicit status

Severity: LOW
Trigger: Service has no Status section in ServicePassport
Explanation: Explicit status of the service (production/in development/retired) provides better transparency to the service consumers and allows early mitigation of multiple risks
Mitigation: Add Status section to the ServicePassport

Service has no release documentation

Severity: LOW
Trigger: Release documentation for the service is missing
Explanation: Release documentation should document procedures of the service release/rollback to production/staging/whatever else environment; Deployment (release/rollback) runbooks significantly reduce risks of knowledge-loss
Mitigation: Introduce service release documentation

Service has no development documentation

Severity: LOW
Trigger: Development documentation for the service is missing
Explanation: Development runbooks significantly reduce risks of knowledge-loss and reduce on-boarding time for the new contributors
Mitigation: Introduce service development documentation

Service has no monitoring documentation

Severity: LOW
Trigger: Monitoring documentation for the service is missing
Explanation: Monitoring runbooks significantly reduce risks of knowledge-loss; Monitoring runbooks provide great transparency into how service is maintained and enables fast audits and assessments of alerting gaps
Mitigation: Introduce service monitoring documentation

Service audit is older than 180 days

Severity: LOW
Trigger: Service had no audit in the last 180 days
Explanation: Documentation tends to become outdated with time. In order to keep it up-to-date regular reviews are necessary.
Mitigation: Schedule a service audit session

Info severity risks

Not a direct threat to business operations, but fixing an issue might either improve day-to-day IT operation or increase overall system robustness

Service is a potential SPOF

Severity: INFO
Trigger: Over 4 services have synchronous (hard) dependencies on this service.
Explanation: High potential of becoming a cascading failure entry point for your application landscape
Mitigation: Consider switching to async/soft dependency for dependent services; Consider introducing “fragility” mitigation patterns in the dependent services

Service has no how-to documentation

Severity: INFO
Trigger: How-to documentation for the service is missing
Explanation: How-to runbooks significantly reduce risks of knowledge-loss and save time and mental capacity needed to perform manual operations
Mitigation: Introduce service how-to documentation

Service has no post-mortem documentation

Severity: INFO
Trigger: Post-mortem documentation for the service is missing
Explanation: Every service will fail sooner or later. Every failure is a learning opportunity and this opportunity should not be missed. Every significant/customer-impacting failure should be documented in order for the future maintainers of the service to learn and not repeat the same mistakes again; Post-mortem reports are a concentrated experience. They provide huge learning opportunities for the team.
Mitigation: Introduce service post-mortem reports

Service Dependency Graph

How does Service360 work