Multi-dimensional Anomaly Detection

Updated at: 2022-12-09 03:49:50

Multi-dimensional anomaly detection can be processed and calculated based on data of multiple influence dimensions, and supports Mahalanobis distance model algorithm.
A complete multi-dimensional anomaly detection includes 3 parts: data preview, data preprocessing and model computation:
► Data Preview: It is to visualize the data and dimensions that users want to analyze;
► Data preprocessing: It includes missing value processing and preprocessing configuration:
• Missing value processing configuration: It provides 9 different data missing value processing methods to avoid the calculation inability due to missing data;
• Preprocessing configuration: Preprocessing can be performed according to different methods to improve data quality and the accuracy and performance of subsequent model calculations.
► Model Computation: It is to provide a variety of algorithms for users to select different algorithms and parameters for model computation as needed.
To create a new multi-dimensional anomaly detection, the specific steps are as follows:
1. Click Machine Learning > New to create New machine learning task, and click Help to view the Brief and Scenario for Show of the anomaly detection machine learning task, as follows:

2. Click Multi-Dimension to make configuration of new multi-dimensional anomaly detection parameters. Click Help in the upper right corner to view the brief, usage help, parameter configuration guidance, and algorithm introduction of multi-dimensional anomaly detection, as follows:

3. Configuration data preview:
The data preview part is to visualize the original data, data statistics and trend graph.
1) Configure data preview: Fill in the data preview configuration information. During configuring data preview, except that multiple fields can be added to Metric, the rest are completely the same. For details, please refer to the section One-dimensional Anomaly Detection ;
2) Click Preview to view the data preview result, as follows:

The data preview results include statistics, trend graph and Raw Data list, as follows:

Field Name	Description
Statistics	The statistical information of the data, including: Name, Normal Distribution (Yes/No), Median, Average, Standard Deviation
Trend	The trend of the selected data: ► Hover over the trend graph to prompt message: Bucket field, Metric information; ► The time range of anomaly trend graph can be zoomed in/out: • Hover over the trend graph, slide the mouse wheel to zoom in/out time range; • Drag the scaler below the graph to zoom in/out.
Original Data	The first column of the Raw Data is Bucket field information, and the second and the following columns are the information of selected Metric fields.

4. Configuration data preprocessing:
Multi-dimensional anomaly detection data preprocessing includes two parts: Missing Value Filled with and Preprocessing.
1) Configure Missing Value Filled with: Selecting different missing value processing methods, to make the missing values filled in the data, and improve the quality of the data;
• Configure Default Method: The default method is valid for the selected fields except for specific field;
• To configure the missing value processing method for specific field, you can click Add to set missing value processing for specific field, as follows:

2) Configure Preprocessing: You can choose different methods for data preprocessing. Currently, two methods are available: Standardization and Dimensionality Reduction, and multiple preprocessing tasks can be added;
• Standardization: It is variance calculation, and variance is the comparison of average values. Multi-dimensional anomaly detection supports range standardization and Z-Score standardization;

• Range standardization: Range, also known as range error or full range, is used to count the number of outliers in data. Range is the difference between the maximum and minimum values. Range standardization can reflect the fluctuation range of a group of data. The greater the range, the greater the degree of dispersion.
• Z-Score standardization: It is zero-mean standardization, also known as standard deviation standardization, as one of the methods of data standardization processing. It can be used in scenarios where the data distribution is too messy to judge the maximum and minimum values, or there are too many outliers in the data center.

• Dimension reduction: It is a method that can reduce the number of features in a data set while avoiding losing too much information and maintaining or improving model performance, helpful for data visualization. Multidimensional anomaly detection supports PCA and Kernelized PCA:

• PCA (Principal Component Analysis) is a technique for analyzing and simplifying data sets. Its purpose is to compress the dimensions of data and reduce the dimensions (complexity) of source data as much as possible, that is, to extract a new set of variables from a large number of existing variables. However, PCA will lose a small amount of information and can reduce the number of features in regression analysis or clustering analysis;
• Kernelized PCA is based on PCA added with Kernel function to deal with scenario with higher-dimensional data. PCA analysis in higher-dimensional space can be realized through KPCA. For data points that are difficult to be linearly classified in normal linear space, KPCA can be used to find suitable high-dimensional linear classification planes in higher dimensions.

3) Click Apply. After successful execution, you can click View Result to view the field results after data preprocessing, as follows:

5. Configure model computation:
Multi-dimensional anomaly detection supports Mahalanobis Distance algorithm. You can calculate the data model through calculated field, threshold calculation method, threshold parameter and configuration in sliding window.

1) The configuration parameters are as follows:

Field Name	Description
* Algorithm	Mahalanobis Distance algorithm is available.
* Calculated Field	The field requiring multi-dimensional anomaly detection. The field comes from the fields after data preprocessing.
* Threshold Method	The threshold method includes standard deviation, median absolute deviation, and quartile deviation. The corresponding threshold method can be selected according to the Data Feature and the outliers to be detected.
* Threshold Parameter	Adjust the value of this parameter according to the estimation of the number of outliers. The larger the threshold parameter, the smaller the probability of outliers. If standard deviation is selected as Threshold Method, the recommended threshold parameter is 1- 4: • 1 means that about 68.26% of the data is normal; • 2 means that about 95.45% of the data is normal; • 3 means that about 99.73% of the data is normal; • 4 means that about 99.90% of the data is normal.
* Sliding Window	If the distribution of data changes over time, the sliding window can be used to obtain a more accurate threshold. It is recommended that the value of the sliding window is greater than the number of metrics.

2) Click Compute to view the multi-dimensional anomaly detection result, as follows:

Multi-dimensional anomaly detection model computation result description:

Field Name	Description
Total Anomaly	The total number of data anomaly events calculated
Event Count	The total number of data events calculated
Details of Mahalanobis Distance	The calculation result of Mahalanobis distance algorithm: ► Hover over the graph to prompt message, where the first line is the actual value; the second line is the threshold range calculated by the algorithm. Those exceeding the threshold range will be marked as outliers; ► The time range of anomaly trend graph can be zoomed in/out: • Hover over the trend graph, slide the mouse wheel to zoom in/out time range;； • Drag the scaler below the graph to zoom in/out.
Anomaly Trend of Calculated Field	It is the anomaly trend of the selected calculated field data. After the anomaly is detected by Mahalanobis distance, the impact indicator will be calculated, that is, which field has a greater impact on the anomaly.
Anomaly Raw Data	The anomaly of all raw data: • The first column: Bucket field information; • The second column: There are two kinds of status: Normal and Abnormal; it can be filtered to display; • The third column: Impact factor, the calculation field mainly affecting the anomaly. • The following columns: The specific value of each calculated fields.

6. After completing the above configuration, click Save to fill in the machine learning task Name, and click OK to complete the machine learning task creation.

< Previous:

Next: >

Submission was successful!

Multi-dimensional Anomaly Detection