In Economics and Finance , an anomaly is when the results under a given set of assumption is different from the expected result predicted by a model. An anomaly provides evidence that a given assumption or model does not hold up in Real time.
Anomalous observations which appears to be inconsistent with the remainder of that set of data. Anomalies are the collection of data that occur very rarely in the data set and whose features differ significantly from most of the data.
“Anomalousvalue” command in Splunk to predict failure and identifying anomalies in data infrastructure. Anomalies command is used to find anomalous behaviour within your data. For each field of every event, the anomalous value command generates an anomaly score. This will be relative to the values of this field across all other events.
When we use this command, we see new additional fields getting generated against original fields that are determined to be anomalous, a new field is added with the following scheme.
If the original field is numeric, such as size and there is some anomaly in it,new field will be Anomaly_Score_Num(size).
If the field is non-numeric, such as name, the new field will be Anomaly_Score_Cat(name).
| anomalousvalue <avoption>….[action][pthresh][field_list]
All arguments are optional.
Descriptions for the av-option arguments:
Description: It value is b/w between 0 and 1. If the ratio of anomalous occurrences of the field to the total number of occurrences of the field is greater than the maxanofreq value, then the field is removed from consideration.
Description: Its value is b/w between 0 and 1. If the ratio of anomalous occurrences of the field to the total number of occurrences of the field is smaller than p, then the field is removed from consideration.
Description: Value must be positive integer. If the field appears fewer than N times in the input events, the field is removed from consideration.
Description: Its value is b/w between between 0 and 1. The minsupfreq argument checks the ratio of occurrences of the field to the total number of events. If this ratio is smaller than p the field is removed from consideration.
- [action] : action=annotate |filter |summary
- [pthresh]: pthresh=<num>
Probability threshold (decimal) that has to be met for a value to be considered anomalous (default: 0.01).
- [field_list]: The List of fields to consider. By default If no field list is provided, all fields are considered.
Now lets move towards practical we will take same set of events under same time range and try use different optional arguments of command to understand usage
index=main sourcetype=web_ping|anomalousvalue action=annotate pthresh=0.02
In above query we used action=annotate here it will show all events in result whether they include some anomalous event or not and are meeting probability threshold value.
As a proof we can see anomalous score generated for only 3.7% events coverage but still, we got all 673 events. As action =annotate includes all.
index=main sourcetype=web_ping|anomalousvalue action=annotate pthresh=0.02 minsupcount=500
Now when we used query with av-option “minsupcount” and pthresh , the coverage changed from 3.7% to 3.12% , showing parameters affected by anomalous event category.
index=main sourcetype=web_ping|anomalousvalue action=filter pthresh=0.02
Now for the query ,when we use action=filter, we took same time range and same events count but in result we see only 21 events as this shows us only events for which anomaly got detected.
Collectively the above list of fields is bringing 17 events under anamoly behaviour as shown in the diag above
The summary action returns a table summarizing the anomaly statistics for each field generated. The table includes how many events contained this field, the fraction of events that were anomalous, what type of test (categorical or numerical) were performed, and so on.
index=main sourcetype=web_ping|anomalousvalue action=summary pthresh=0.02
Again we ran same query for same time range with action= summary .It shows total 19 stats row which is equal to the total field in data .It means each row is providing info about field showing anamoly , datatype ,anamoly freq, count etc
index=main sourcetype=web_ping|anomalousvalue action=summary pthresh=0.01 |search isNum=YES AND fieldname=content_size
We can take further any field as per stats table to select particular type or a particular field we want.
index=main sourcetype=web_ping|anomalousvalue action=summary pthresh=0.09 |search isNum=YES AND fieldname=content_size
As we changed the probability threshold range “that has to be met for a value to be considered anomalous” so the output changed