Skip to main content

Problem

Feature Overview

The platform AI algorithm engine monitors alert events in real time and aggregates alerts with the same root cause within an effective observation window to form Problems. Problems provide in-depth root cause analysis and impact analysis both horizontally and vertically.

Concepts

Problems have the following three statuses:

  • Open: Once a Problem is created, it enters the "Open" status. As long as one entity is still in an abnormal state, the problem will remain in the "Open" status.

  • Resolved: When all affected entities return to normal (abnormal metrics disappear) and all events are recovered, the problem status changes to "Resolved".

  • Closed: When a problem is in "Resolved" status and no new events are aggregated within the observation window period (default 30 minutes), the problem status will be set to "Closed", indicating that the problem has been truly resolved.

Use Cases

After alert events occur in the system, Problems are used for issue identification, location, and analysis, helping you quickly resolve problems.

Configuration Approach

  1. Configure alerts. In Alert Configuration, select "AI Smart Algorithm" as the detection method, which will generate Problems from alert events.
  2. Configure notification strategies for Problems.

Configuration Guide

For alert rule configuration of Problems, see "Alert > Alert Configuration".

For notification strategy configuration of Problems, see "Alert > Notification Configuration".

Problem List

Problem List

AreaItemDescription
Left search barStatusOpen, Resolved, Closed
TypeAvailability, Error, Slow, Resource, Custom Event
Impact LevelUser Experience, Application, Process, Pod, Host, Hardware, Deployment Environment
Chart statisticsOpen CountNumber of problems with the status Open within the current query window
Resolved CountNumber of problems with the status Resolved within the current query window
Closed CountNumber of problems with the status Closed within the current query window
Bar ChartBar trend chart showing status statistics
ListProblem DescriptionBrief description of the problem
TypeAvailability, Error, Slow, Resource, Custom Event
StatusOpen, Resolved, Closed
Root CauseRoot cause entity
Affected EntitiesAll alert entities
Affected Entity CountNumber of all alert entities
Created TimeTime when the problem occurred
IDUnique identifier of the problem

Problem Details

Problem Details

AreaItemDescription
Problem AttributesStatusDescribes the current status of the problem
Problem IDUnique identifier of the problem
TypeType of the problem
Detection TimeProblem occurrence time and problem duration
StatisticsRelated EntitiesComposed of alert entities and automatically detected anomaly entities
Event CountNumber of events generated by alerts
Problem PathRoot Cause Propagation PathRequest-level topology diagram describing how the root cause propagates
Related EntitiesEntity NameName of the entity object
Entity TypeType of the entity, such as Host, Application, System
Root CauseRoot cause entity
Event CountNumber of alerts generated by the entity
EventsEvent DescriptionDescription of the alert event
Event Metric Trend ChartMetric trend chart of the alert event
Event MetadataProperties of the alert event