Book: Preface.
The Theoretical Background of Insights.

The rapid development of information technology, continuing computerization in almost every field of human activity and distributed computing has led to a flood of data stored in data bases and data warehouses. In the 1960s, Management Information Systems (MIS) and then, in the 1970s, Decision Support Systems (DSS) were praised for their potential to supply executives with mountains of data needed to carry out their jobs. While these systems have supplied some useful information for executives, they have not lived up to their proponents' expectations. They simply supplied too much data and not enough information to be generaly useful. Today, there is an increased need for information  contextual data  non obvious and valuable for decision making from a large collection of data. Commonly, a large data set is one that has many cases or records. With this book, however, 'large' rather refers to the number of variables describing each record. When there are more variables than cases, the most known algorithms are running into some problems (in mathematical statistics, for instance, covariance matrix becomes singular so that inversion is impossible; Neural Networks fail to learn). Even if the data are wellbehaved, a large number of variables means that the data are distributed in a high dimensional hypercube, causing the known dimensionality problem. Therefore, decision making based on analysing data is an interactive and iterative process of various subtasks and decisions and is called Knowledge Discovery from Data. The engine of Knowledge Discovery  where data is transformed into knowledge  is Data Mining. There are very different data mining tools available and many papers are published describing data mining techniques. We think that it is most important for a more sophisticated data mining technique to limit the user involvement in the entire data mining process to the inclusion of wellknown a priori knowledge. This makes the process more automated and more objective. Most users' primary interest is in generating useful and valid model results without having to have extensive knowledge of mathematical, cybernetic and statistical techniques or sufficient time for complex dialog driven modelling tools. Soft computing, i.e., Fuzzy Modelling, Neural Networks, Genetic Algorithms and other methods of automatic model generation, is a way to mine data by generating mathematical models from empirical data more or less automatically. In the past years there has been much publicity about the ability of Artificial Neural Networks to learn and to generalize despite important problems with design, development and application of Neural Networks:
this book introduces principles of evolution  inheritance, mutation and selection  for generating a network structure systematically enabling automatic model structure synthesis and model validation. Models are generated from the data in the form of networks of active neurons in an evolutionary fashion of repetitive generation of populations of competing models of growing complexity and their validation and selection until an optimal complex model  not too simple and not too complex  has been created. That is, growing a treelike network out of seed information (input and output variables' data) in an evolutionary fashion of pairwise combination and survivalofthefittest selection from a simple single individual (neuron) to a desired final, not overspecialized behavior (model). Neither, the number of neurons and the number of layers in the network, nor the actual behavior of each created neuron is predefined. All this is adjusted during the process of selforganisation, and therefore, is called selforganising data mining. A selforganising data mining creates optimal complex models systematically and autonomously by employing both parameter and structure identification. An optimal complex model is a model that optimally balances model quality on a given learning data set ("closeness of fit") and its generalisation power on new, not previously seen data with respect to the data's noise level and the task of modelling (prediction, classification, modelling, etc.). It thus solves the basic problem of experimental systems analysis of systematically avoiding "overfitted" models based on the data's information only. This makes selforganising data mining a most automated, fast and very efficient supplement and alternative to other data mining methods. The differences between Neural Networks and this new approach focus on Statistical Learning Networks and induction. The first Statistical Learning Network algorithm of this new type, the Group Method of Data Handling (GMDH), was developed by A.G. Ivakhnenko in 1967. Considerable improvements were introduced in the 1970s and 1980s by versions of the Polynomial Network Training algorithm (PNETTR) by Barron and the Algorithm for Synthesis of Polynomial Networks (ASPN) by Elder when Adaptive Learning Networks and GMDH were flowing together. Further enhancements of the GMDH algorithm have been realized in the "KnowledgeMiner" software described and enclosed in this book. KnowledgeMiner is a powerful and easytouse modelling and prediction tool designed to support the knowledge extraction process on a highly automated level and has implemented three advanced selforganising modelling technologies: GMDH, Analog Complexing and selforganising Fuzzy Rule Induction. There are three different GMDH modelling algorithms implemented  active neurons, enhanced network synthesis and creation of systems of equations  to make knowledge extraction systematically, fast and easytouse even for large and complex systems. The Analog Complexing algorithm is suitable for prediction of the most fuzzy processes like financial or other markets. It is a multidimensional search engine to select most similar past system states compared with a chosen (actual) reference state from a given data set. All selected patterns will be synthesized to a most likely, most optimistic and most pessimistic prediction. KnowledgeMiner does this in an objective way using GMDH finding out the optimal number of synthesized patterns and their composition. Fuzzy modelling is an approach to form a system model using a description language based on fuzzy logic with fuzzy predicates. Such a language can describe a dynamic multiinput/multioutput system qualitatively by means of a system of fuzzy rules. Therefore, the generated models can be
This book provides a thorough introduction to selforganising data mining technologies for business executives, decision makers and specialists involved in developing Executive Information Systems (EIS) or in modelling, data mining or knowledge discovery projects. It is a book for working professionals in many fields of decision making: Economics (banking, financing, marketing), business oriented computer science, ecology, medicine and biology, sociology, engineering sciences and all other fields of modelling of illdefined systems. Each chapter includes some practical examples and a reference list for further reading. The accompanying diskette/internet download contains the KnowledgeMiner Demo version and several executable examples. This book offers a comprehensive view to all major issues related to selforganising data mining and its practical application for solving realworld problems. It gives not only an introduction to selforganising data mining, but provides answers to questions like:
The book spans eight chapters. Chapter 1 discusses several aspects of knowledge discovery from data as an introductory overview and understanding, such as why it is worth building models for decision support and how we think forecasting can be applied today to get valuable predictive control solutions. Also considered are the pros, cons and difficulties of the two main approaches of modelling: Theorydriven and datadriven modelling. Chapter 2 explains the idea of a selforganising data mining and put it in context to several automated datadriven modelling approaches. The algorithm of a selforganising data mining is introduced and we describe how selforganisation works generally, what conditions it requires, and how existing theoretical knowledge can be embedded into the process. Chapter 3 introduces and describes some important terms in selforganising modelling: Statistical Learning Networks, inductive approach, GMDH, nonphysical models, and model of optimal complexity. Chapter 4 focuses on parametric regression based GMDH algorithms. Several algorithms on the principles of selforganisation are considered, and also the important problem of selection criteria choice and some model validation aspects are discussed. In chapter 5, three nonparametric algorithms are discussed. First, there is the Objective Cluster Analysis algorithm that operates on pairs of closely spaced sample points. For the most fuzzy objects, the Analog Complexing algorithm is recommended selecting the most similar patterns from a given data set. Thirdly, a selforganising fuzzyrule induction can help to describe and predict complex objects qualitatively. In chapter 6 we want to point to some application opportunities of selforganising data mining from our own experience. Selected application fields and ideas on how a selforganising modelling approach can contribute to improve results of other modelling methods  simulation, Neural Networks and econometric modelling (statistics)  are suggested. Also included in this chapter is a discussion on a synthesis of model results, its goals and its options while the last part gives a short overview of existing selforganising data mining software. In chapter 7 the KnowledgeMiner software is described in more detail to give the reader an understanding of its selforganising modelling implementations and to help examining the examples included in the accompanied diskette or Internet download. Chapter 8 explains based on several sample applications from economics, ecology, medicine and sociology how it is possible to solve complex modelling, prediction, classification or diagnosis tasks systematically and fast using the knowledge extraction capabilities of a selforganising data mining approach. This book serves all KnowledgeMiner users as a documentation and guide about theory and application of selforganising data mining. March 6, 2000 JohannAdolf Müller Frank Lemke 