As a transitional step, this site will temporarily be made Read-Only from July 8th until the new community launch. During this time, you can still search and read articles and discussions.

While the community is read-only, if you have questions or issues requiring TIBCO review/response, please access the new TIBCO Community and select "Ask A Question."

You will need to register or log in or register to engage in the new community.

Anomaly Detection with Tensorflow Autoencoder

Published:
4:34pm May 04, 2022

Anomaly Detection – what it is and why we care

 

The conductor of an orchestra stops the rehearsal. “Mr. Smith, you’re flat.” 

The conductor has identified an anomaly. From the many streams of sound coming from dozens of instrumentalists, he was able to pinpoint a problem. It’s neither what is expected or what is wanted.  We would like to perform the same feat in other domains. In high-tech manufacturing, we would like to identify problems in a large process, which may produce hundreds or thousands of data streams. Can we do as well as the conductor?

The conductor has an advantage: he has the score. (One might say: he knows the score) He knows quite well what is required from each instrument at each point in the piece. It’s not really feasible for the factory manager to know the specs for every process measurement in his factory. There may be a library of specifications in the basement, but beyond the fact of how voluminous it is, the published specifications are not and cannot be sufficiently detailed to identify all but the simplest problems. For example, the specs do not include the relationships between the measurements. Published specifications are sometimes wrong and always incomplete.

Due to process high dimensionality, complexity and limited product-to-market time, it is difficult to build batch process models based on first-principle. Therefore, multivariate statistical modeling methods, which only require process history data, have been widely applied.

Yan Chai, Hua Yang*, Lifeng Zhao

Statistical Process Control in its simplest form monitors each variable separately from all the others. A good overview of the process and some of the TIBCO tools that enable it can be found here. Quoting from that paper,

While this is an essential first step in any process control program, it is really only a starting point.  It is the most fundamental form of anomaly detection. 

In a multivariate context, many methods relate to Principal Components Analysis. These can create a function that approximately reproduces the input with a linear transformation, while simultaneously performing dimensionality reduction by selecting a lower dimensional subspace. Here’s a link to a detailed introduction to these methods.

In this blog we describe an advanced multivariate AI-based method enabled by a Spotfire Template. You may try this out in your browser to see how it operates, as described below. here

 

The Autoencoder

 

An autoencoder is the deep learning analogue of Principal Component Analysis.  Dropping the linearity assumption, the autoencoder creates an approximate reproduction of the input, with dimensionality reduction added explicitly with a bottleneck layer in the network architecture.  Just as with Principal Components, some loss occurs as a result. The advantage of an autoencoder is its ability to model complex, nonlinear functions with interactions and previously unknown features. A full introduction to Autoencoders can be found here.

Anomalies stand out in relation to the approximate reconstruction of the input, which fails when an anomaly is present. A relatively high reconstruction error is a symptom of lack of fit for a particular data point. This is likely caused by the fact that nothing similar was seen in the training set. A simple example: a point has both high temperature and low pressure and this is very unusual. Part of the Autoencoder would fit the usual relationship correlation,, but this rare occurrence would not fit well.

The AI App

 

This blog is  a short guide to exploring the online demo. A more detailed document may be found  here.

 

Tab 1 – Explore

Simple tools for exploring the data. Note that this demo has preloaded data; in order to use the tool with your own data, please download this dxp and install in your environment  Anomaly Detection Template for TIBCO Spotfire

  • Use the controls to set the x and y axes in the ScatterPlot and the Histogram. Add color, size, shape, trellising to help explore the raw data.
  • You can mark the data in any of the views to specify what data you want to use in the following steps.
  • Then press the Summarize button to prepare for the next steps.

 

Tab 2 – Model

We need Training, Validation and Test samples. We use date ranges rather than random assignment to keep most adjacent time periods together, an important feature of the data. The data we model in the autoencoder are cross-sections at a single point in time, and we examine the time dimension in the post-processing described below.

We need to specify the column in the data that has the relevant dates/times. For the assignment of the data to the samples, we specify a convenient time granularity and update the display with the Summarize button. We can then use the mouse to specify time intervals and assign them using the buttons. The bar chart shows the number of samples in each group.

 

The next section shows summaries of the data we selected previously.

  • We can assign roles (Predictor, ID, factor, integer)  to these for use in the model build. Select one or more rows as the target of an action you select from the pull-down, and then press update. Repeat until all variables are ready to play their desired roles.
  • Finally, specify the hyperparameters to use in building the autoencoder and “Create Model”. 
  • When the model fit is complete, the Learning Curve and the Reconstruction Error Histogram will be updated. You may repeat this process until you are happy with the results.

Tab 3 – Results

Here we have the tools needed to understand and utilize the results. The key metric is the Reconstruction Error, which we examine over time. It’s closely related to the Squared Prediction Error which is often used. 

Starting with the central scatter plot, we can use the zoom slider to control the level of detail we see.

The anomalies show up as unusually high values on the y-axis. Points above the mass of points represent times when the reconstruction of the autoencoder input has failed more severely than usual. Of particular interest is the fact that such points often are clustered together in time. We call these “incidents”, ie, times of atypical behavior.

Using Spotfire marking, we can explore these incidents in more detail.

The marked points are used to create the bar chart below, which shows the contribution of each component to the Reconstruction Error. The components which contribute the most give us insight into the anomaly and what was anomalous about the process at that time.

By highlighting any one of the bars, we obtain 3 time series views for the highlighted bar. The trellised view (reproduced below) shows not only the time series for the overall Reconstruction Error (that we have seen in the center visualization), but also the time series for the contribution of the marked bar to the total; and also of the raw value of the variable of the marked bar.

We want to know what kinds of anomalies have been identified and how they might be distributed over time. To this end, we group the outliers into incidents and then cluster the incidents.

The dxp provides convenient tools to do this: 

  • Adjust the cutoff to get a good compromise between the severity of what we call an outlier and the number of outliers we have for clustering.
  • Use k-means clustering and see the drivers of the different clusters.

 

 

Show the incidence of each cluster over time: