Telecom – Predictive Maintenance Pilot

Problem Statement : ​Determining Telecom Supplier end outages

Solution Overview :

Whiteklay joined hands with Tata Telecommunications to capture real time metrics from telecom device logs to identify anomalies and enable predictive maintenance.

For use case delivery, we calculated certain metrics from the raw cdr data which act as an input to the machine learning models to further calculate scores which will be used to detect anomalies.

Input Parameters​ : Raw Call data records in CSV/Text Format

Calculated Metrics ​: We aggregated raw cdr data at device end for a day only for outbound calls and store the data into IZAC using custom work flows .Below are the aggregated metrics which were calculated

Country
Count of Total In country calls
Count of Total In country calls
% In country Calls

Deliverables:

  • Getting the log messages into IZAC kafka queue.
  • Translate data into tables with aggregated metrics.
  • Run transformation flows in IZAC and insert data with a time series format.

Identifying Supplier end outages

Input Parameters: Row Call data records in CSV/Text Format

Step 1: Calculating the given metrics

Total Count of Calls
Total Count of Inbound Invalid Cause codes for Supplier
Total Count of Outbound Invalid cause codes for Supplier
ASR
ACD
Standard Deviation in Call Duration
NER
Total Invalid Cause Codes
Count of Calls in Business Hours Origin Country
Count of Calls in Business Hours Destination Country

The time interval for which the records were aggregated based on the data set and discussions with the user team. Given above are the identified metrics which will be calculated and loaded into aMAPR DB Table using a custom Apache Spark Code.

Standard deviation in call duration: Would help us identify if “Outliers for calls getting disconnected at specific call duration/time.”

Step 2:​ Identifying sudden rise or fall in any of the metrics

We would start by simple calculation of slope for all the metrics given above at each interval. If the slope is too close to vertical or horizontal (based on a threshold that can decided after analysing the provided data). Any sudden rise or drop can be an indicator of a major fault such as a power outage.A major warning can be thrown at this step if such a situation has occurred.

Step 3​: Seasonality removal (or Deseasonalization)

Before we could proceed further we had to remove seasonality from data since our data varies heavily depending on the time of day. We only consider daily seasonality in the event and seasonality between months or by year. STL analysis to be done to remove trends as well.

Step 4​: Training the model and identifying the outliers

The approach was to feed all the metrics calculated into a different ARIMA model. That means a set of models each “learning” the trends for each of the metrics listed above. With that a prediction can be made for what the value for this particular metric should be on the next day, and if the actual value differs from our predicted value by a measure greater than the standard deviation we would classify that data point for that particular metric as an anomaly. The results from each of the models would give us anomalous data points depending on each metric. These results can be combined in to a single output using a weighted sum (with weights being tweaked in the testing phase), with which we can determine if the data point is an anomaly or not

Output: The above was calculated score to determine the severity of a Supplier end outage. Based on the severity score it can be determined if a particular supplier has outages beyond a normal level​.