Problem Statement : Determining Telecom Supplier end outages
Solution Overview :
Whiteklay joined hands with Tata Telecommunications to capture real time metrics from telecom device logs to identify anomalies and enable predictive maintenance.
For use case delivery, we calculated certain metrics from the raw cdr data which act as an input to the machine learning models to further calculate scores which will be used to detect anomalies.
Input Parameters : Raw Call data records in CSV/Text Format
Calculated Metrics : We aggregated raw cdr data at device end for a day only for outbound calls and store the data into IZAC using custom work flows .Below are the aggregated metrics which were calculated
|Count of Total In country calls|
|Count of Total In country calls|
|% In country Calls|
- Getting the log messages into IZAC kafka queue.
- Translate data into tables with aggregated metrics.
- Run transformation flows in IZAC and insert data with a time series format.
Identifying Supplier end outages
Input Parameters: Row Call data records in CSV/Text Format
Step 1: Calculating the given metrics
|Total Count of Calls|
|Total Count of Inbound Invalid Cause codes for Supplier|
|Total Count of Outbound Invalid cause codes for Supplier|
|Standard Deviation in Call Duration|
|Total Invalid Cause Codes|
|Count of Calls in Business Hours Origin Country|
|Count of Calls in Business Hours Destination Country|
The time interval for which the records were aggregated based on the data set and discussions with the user team. Given above are the identified metrics which will be calculated and loaded into aMAPR DB Table using a custom Apache Spark Code.
Standard deviation in call duration: Would help us identify if “Outliers for calls getting disconnected at specific call duration/time.”
Step 2: Identifying sudden rise or fall in any of the metrics
We would start by simple calculation of slope for all the metrics given above at each interval. If the slope is too close to vertical or horizontal (based on a threshold that can decided after analysing the provided data). Any sudden rise or drop can be an indicator of a major fault such as a power outage.A major warning can be thrown at this step if such a situation has occurred.
Step 3: Seasonality removal (or Deseasonalization)
Before we could proceed further we had to remove seasonality from data since our data varies heavily depending on the time of day. We only consider daily seasonality in the event and seasonality between months or by year. STL analysis to be done to remove trends as well.
Step 4: Training the model and identifying the outliers
The approach was to feed all the metrics calculated into a different ARIMA model. That means a set of models each “learning” the trends for each of the metrics listed above. With that a prediction can be made for what the value for this particular metric should be on the next day, and if the actual value differs from our predicted value by a measure greater than the standard deviation we would classify that data point for that particular metric as an anomaly. The results from each of the models would give us anomalous data points depending on each metric. These results can be combined in to a single output using a weighted sum (with weights being tweaked in the testing phase), with which we can determine if the data point is an anomaly or not
Output: The above was calculated score to determine the severity of a Supplier end outage. Based on the severity score it can be determined if a particular supplier has outages beyond a normal level.