Insights: Knowledge Mining software that extracts knowledge from data. Gather. 
    		Mine. Extract. Predict. Identify. Simulate. Discover. Associate. Easily and Reliably.
    		Ultra-fast, parallel, self-organizing, high-dimensional modeling of real-world processes and complex systems.
Multi-level Model Validation.

Innovation for reliable predictive models.

One original and powerful feature implemented in Insights is additional external evaluation of self-organized linear and nonlinear analytic models. This document is about to show how this new model evaluation approach actively supports answering the question if the obtained model reflects a valid, reliable, and in best case, causal relationship or if it's just pretending model accuracy due to chance correlation, only. Also, a new model quality measure that takes into consideration the noise filtering power of the modeling algorithm and model complexity is introduced: Descriptive Power.

The Problem

A key problem in data mining is final evaluation of developed models. This evaluation process is an important condition for deployment of data mining models. By learning from a finite set of data, only, it is hardly possible to decide whether the developed model reflects a valid relationship between input and output or if it's just a stochastic model with non-causal correlations. Model evaluation needs, in addition to a properly working noise filtering procedure for avoiding overfitting the learning data, some new external information to justify a model's quality, i.e., both its predictive and descriptive power.

Why

Let's have a look at this example: Based on an artificial data set of 2 outputs, 4 inputs, and 15 samples Insights self-organizes an analytical model for each output variable, Y1 and Y2 (fig.1; red line: the model, blue line (almost hidden): the original data).

Graphs of two models

a) Model 1: Y1 = f1(x)                                                       b) Model 2: Y2 = f2(x)

Fig. 1: Model graph of the two models M1 and M2.

For both models an accuracy (model fit on the learning data, R2, for example, or a more complex criterion like PSE, AIC or BIC) of 99,9% is reported. Concluding from this accuracy and from the graphs of fig. 1 there is no reason to not considering both models as "true" models that reflect a causal relation between output and input. Also, taking into account that KnowledgeMiner Insights, compared to the vast majority of data mining tools, is implementing in its inductive self-organizing model synthesis a powerful noise filtering procedure, already (see also "Self-Organizing Data Mining" book, section 3.2), this seems to underline the above assumption.
Now, assume that there is information that only one model actually describes a causal relationship while the other model simply reflects stochastic correlations. Although this information is given to you - which is usually not the case in real-world - you cannot decide from the available information which of the two models is the true model and which one is the chance model. Only applying the models on some new data (which adds new information) will turn out model M2 as the only valid model (fig. 2):

Prediction of two models

a) Model 1: invalid                                                       b) Model 2: valid

Fig. 2: Prediction of models M1 and M2.

The Noise Filtering Behavior of an Algorithm

This example clearly shows that any "closeness-of-fit" measure is not sufficient to evaluate a model's predictive and descriptive power. Recent research has shown that model evaluation requires a two-stage validation approach (at least):

1. Level
Noise filtering to avoid overfitting the learning data based on external information (hypothesis testing) not used for creating a model candidate (hypothesis) as an integrated part of the "Model Learning" process. A corresponding tool that is used in Insights from the beginning within "Model Learning" is leave-one-out cross-validation.

2. Level
A characteristic that describes the noise filtering behavior of the "Model Learning" process to justify model quality based on external information not used in the first validation level. This noise-filtering characteristic is implemented in Insights for the first time for linear and nonlinear analytical models. This characteristic was obtained by running Monte Carlo simulations many times. In this way, new and independent external knowledge is available that any model has to be adjusted with.

Figure 3 shows a detail of the characteristic for linear analytical models.

Fig. 3: Noise filtering characteristic
M: number of inputs; N: number of samples; Qu: virtual quality of a model
Qu = 1: noise filtering does not work at all; Qu = 0: ideal filtering

The reason for a second level validation is (1) that noise filtering implemented in level 1 is not an ideal noise filter and thus is not working properly in every case (see this example) and (2) to get a new model quality measure that is adjusted by the noise filtering power of the modeling algorithm.

The noise sensitivity characteristic expresses a pretending model quality Qu that can be obtained when simply using a data set of M potential inputs of N random samples. It is pretending model quality (accuracy), because, by definition, there is not any causal relationship between stochastic variables a priori (true and best model quality Q = 0, by definition), so - when using random samples - any model of quality Q > 0 just pretends having that better quality and having the found input-output relationship while we know that it actually does not exist. This means, given a number of potential inputs M and a number of samples N, a threshold quality Qu = f(N, M) is calculated by Insights that any model of quality Q must exceed to be considered valid with respect to describing a relevant relationship between input and output. Otherwise, a model of quality Q <= Qu is assumed invalid, since its quality Q can also be obtained when simply using random variables, which means that this certain model's quality does not significantly differ from a chance model. It has to be considered unreliable and invalid.

Descriptive Power

In addition to deciding if a model appears being valid or not, the noise sensitivity characteristic is also a tool for calculating the descriptive power of an input-output model, directly. It introduces a new model quality measure, which is adjusted by model complexity and the algorithm's noise sensitivity behavior and which, finally, is independent from the learning data set dimensions. The Descriptive Power (DP) is defined as:

Descriptive Power

whith Q as the obtained accuracy of the evaluated model and Qu(N, L) as the reference accuracy calculated from the number of samples N the model was created on and the number of input variables L the model is actually composed of (selected relevant inputs), with L <= M. This means that Descriptive Power is a chance-correlation-adjusted quality measure, which is independent from the data set dimension used to develop the model. Figure 4 shows an example of two models M1 and M2 which show the same accuracy Q = Q1 = Q2 but different Descriptive Power since both models where obtained from data sets of different sample lengths, and thus, different noise immunity of the modeling algorithm.

Fig. 4: Descriptive Power of two models.

Model Evaluation

The concept of an algorithm's noise sensitivity and Descriptive Power provide additional external information required to check a model's validity with respect to whether or not it distinguishes from a chance model and to which extent. Back to the example at the top of this page this means that it is possible now to identify suspect models right after modeling, automatically. For model M1 and M2 the following evaluation results are reported in Insights:
MODEL EVALUATION: INVALID

The requested noise immunity could not be applied for the chosen sample length. Instead, VERY POOR noise immunity was used for modeling, only. To get the requested noise immunity, i ncrease the number of samples to at least 21.

The model seems not reflecting a valid relationship. The likelihood that the data used for modeling is actually random data with no existing input-output relationship is 33%. Keep in mind, however, that the model was built using VERY POOR noise immunity. This makes evaluation of the model more uncertain.

a) Report of Model 1 --> status: invalid

MODEL EVALUATION: VALID

The requested noise immunity could not be applied for the chosen sample length. Instead, VERY POOR noise immunity was used for modeling, only. To get the requested noise immunity, increase the number of samples to at least 21.

The model seems to establish a valid relationship. The Descriptive Power of the model relative to a chance model is 42% for the actually used noise immunity.

a) Report of Model 2 --> status: valid

This means, the modeler (you) knows instantly that model 2 does well indeed with a Descriptive Power of 42% while model 1 is seen invalid to 33%. Following the recommendation given in the report of model 1, increasing the number of samples to 21, in a second modeling run KnowledgeMiner Insights now comes up with this report:

MODEL EVALUATION: INVALID

The model seems not reflecting a valid relationship. The likelihood that the data used for modeling is actually random data with no existing input-output relationship is 67%.

The model was generated by self-organizing high-dimensional modeling.

Insights now reports an increased certainty of 67% that this model is just a chance model and therefore has to be rejected. Interesting to note is also that this tiny modeling problem has been identified as high-dimensional modeling task, which sounds strange, first. However, "high-dimensional" has to be seen not only in absolute but also in relative terms: every modeling problem with a high number of inputs-to-samples ratio is a high-dimensional modeling task, actually, with respect to model building and validation and has to be handled as such.

Noise Immunity Levels in Insights

In KnowledgeMiner Insights, the noise immunity levels shown in figure 6 are available to the user for building models of corresponding validity and reliability by avoiding pretending model accuracy above the level it is assigned to. If you choose a GOOD noise immunity for building a model, for example, Insights will take care that at the end of modeling the resulting model usually will not have a pretending accuracy above a value of 8% while for a POOR noise immunity the pretending accuracy can have a value of up to 30%. It is important to note that under certain conditions - especially if the number of samples of the learning data set is small - validity and reliability of a model on the one hand and model accuracy on the other hand may become mutually exclusive goals: If I request increased model reliability model accuracy may decrease and vice versa.

Fig. 6: Noise immunity levels for model self-organization.


Summary

The two-stage model validation approach implemented in KnowledgeMiner Insights allows for the first time in a data mining software to get active decision support in model evaluation for minimizing the risk of false interpreting a model's quality and power and using invalid models for prediction and classification tasks that in fact just reflect a chance correlation. In combination with our original Live Prediction Validation technology, it gives you the highest degree of reliability about your data mining models you get from software available on the market today.