Previous Contents Next

Chapter 7   Machine Learning

machine learning For large, complex systems it is difficult to formulate effective rules. The large amounts of data and the patches of unknown variables make it near impossible to formalise clear reasons for certain behaviour. Computers can help here, either by finding patterns in databases of cases, or by refining models to use as simulations. This is the aim of machine learning. Section 7.1 explains the process of induction. Section 7.2 describes qualitative modelling which has been used in many ill defined areas. Section 7.3 describes the process of abduction which can clarify models to obtain useful simulations.

7.1   Rule Induction

induction The computer looks for connections between the examples. Systematically the examples are processed. The program needs both positive and negative examples, that is, examples showing when a certain inference can be made and examples where that inference cannot be made. This helps the program to classify the examples. Classifications are formed, tested and refined as more examples are processed.

It helps the program to have some background knowledge, such as hierarchies of concepts. This allows the program to make generalisations, such as in the ARCHES problem described in chapter 18 of [Bratko, 1990]. The aim was to learn the conditions for a group of objects shown in Figure 7.1 to form an arch. The program worked out that you needed two rectangular objects side by side (but not touching each other) and another stable shape on top. The classification of a stable shape was made because the program knew that certain triangles, rectangles and trapeziums were stable. This means that the program will be able to correctly identify more examples.



Figure 7.1: Examples for learning in ARCHES from page 464 [Bratko, 1990]


7.1.1   Decision Trees

decision trees Decision trees are a knowledge representation particularly suited to induction. The task of classifying things into a hierarchy of concepts is more easily represented by a tree than by if-then rules. It is possible to translate between the two representations.

The actions to be taken are treated as classes. The conditions that are required to recommend each action are treated as the attributes of the different items in each class.

The first step is to find a test. This will be used on the attributes to divide the examples into subsets. This is makes the first layer of the tree. Now each of these sets is treated similarly. Recursively, each subset is split up into further subsets. The process finishes when the set contains examples of only one class, or the set is empty.

The problem of how to divide each set is important in making good decision trees. Any division that results in at least two non-empty sets will eventually bring about a decision tree, but it may be large and unwieldy and difficult to follow. Preferably we would like as small and compact a decision tree as possible.

The tests usually involve just one attribute, otherwise the tree can get too complex. This attribute may be a discrete value with distinct possible cases, like the severity of an alarm; or a more meaningful discrete value which can be classed into groups, such as the object ID number which helps identify what sort of object is issuing the alarm; or a continuous value, like an analogue thermocouple reading.

C4.5 Quinlan, J. R.

Tests based on a discrete value are easy to determine. Tests on a continuous value, however, require a threshold value to divide the examples. For the temperature example, the division may be between (<0ºC) (0 -- 65ºC) (>65ºC). To find the threshold C4.5[Quinlan, 1993] sorts all the training cases according to the continuous value in question. A finite number of divisions is possible, according to the number of examples in the training set, so each is evaluated.

The evaluation of a test in C4.5 is based on information theory. The amount of information in each subset is estimated. The gain to be had from the division is measured by subtracting the entropies of pairs of sets. The aim is to make subsets of roughly equal information. Each set should then produce subsets with similar levels of complexity. This will produce a tree that is balanced.

7.1.2   Knowledge Maintenance

knowledge maintenance knowledge maintenance!by induction One way to use machine learning in knowledge maintenance is to keep an extensive database of examples. This will be the training set for the induction process.
Once the new examples are generated then the problem is to add them to the knowledge base without invalidating the existing functionality. Changes are limited by the structure of the knowledge base. Diamantidis and Giakoumakis take the more drastic way of inducing a whole new knowledge base. Diamantidis and Giakoumakis

The system they advocate is one of recursively testing a rules base with the domain expert. This is not appropriate for all situations. For example, the DENDRALDENDRAL system for finding all possible molecular structures of chemical compounds from the results of a spectrum analyser. That system could not be tested like this since the process of identifying molecular structure is too time consuming. Also, it would be even more difficult to tell whether all the possible cases had been found.

The other problem with this system is that even for systems where the results are easier to verify, it is still time consuming to run these sessions. The problems are artificially generated, so there may be a problem with maintaining the interest and cooperation of the domain expert, if the situations seem fake and pointless.

7.2   Qualitative Modelling

qualitative modelling To generate a set of cases first we create a model of the system, then simulate what happens when certain elements fail. Then the simulations can be captured into cases ready for rule induction.

Modelling a telecommunication system for fault management is a daunting task. A telecommunications system can be very simple if you only consider the telephone connections. In Figure 8.7 we see that the network is made up of local networks held together by junction boxes which in turn are connected together by switching offices.

The system gets more complex when you consider all the other telecommunication systems that are now in use. There is a fairly heterogeneous collection of different types of networks all connected together.

Meira, D.

The model proposed by Meira [Meira, 1997] includes cellular networks, analog and digital networks, terrestrial radio, coaxial cable networks and many others. At the top level each of these are treated as black box subnetworks. It is left to different layers to fill out the complexities of each subnetwork.

Recursive Multifocal Correlation He calls this Recursive Multifocal Correlation (page 76 [Meira, 1997]). In Figure 7.2 there are three layers. The top layer contains only a few elements. Each of these can be explored in more detail in the lower levels.



Figure 7.2: Recursive focus in a network from page 76 [Meira, 1997]


Ideally these different layers should be able to update each other, so that adding a new element in the detail of a subnetwork may update the way that network acts in the higher, more abstract layers.

KARDIO In KARDIO [Bratko etal., 1989] these levels of abstraction are shown in an hierarchy of models. KARDIO aims to model the relationship between arrhythmias (abnormalities in the heart) and the electro cardio grams (ECGs). In Figure 7.3 two layers are shown. The top layer is a simplistic representation of the system. The second layer gives a better idea of what is going on. The abstract layer may be incomplete, it may not answer all that you want to know about how the system works. For example, it can't tell you how a particular disfunction in the left ventricle shows up on the ECG.



Figure 7.3: Two layers of the model of a heart from page 167 [Bratko etal., 1989]


hypothesis testing A little vocabulary needs to be defined here. An hypothesis is a model of a system proposed to cover known facts. Hypotheses are made up of clauses which are the constraints which define the valid facts - combinations of conditions and conclusions. The two properties of hypotheses relevant here are:
complete
the hypothesis must cover all known facts
consistent
the hypothesis must not cover any known negative facts

A detailed layer must be consistent with the corresponding abstract layer. Testing for consistency is discussed under abduction in the next section. consistency

In Meira's work the behaviour of a network was modelled by Bayesian probabilities. Each node in the network was assigned probabilities of: Bayesian Networks

  1. Being at fault
  2. Raising a critical alarm
  3. Raising an alarm
  4. Being normal

Each node can have a combination of these states.

The main task is to identify these nodes and the connections. Finding the functions of each node can be done through abduction.

7.3   Abduction

abduction From the names one could infer that abduction would be the opposite of induction. Abduction goes from the rules about a system to generating specific models whereas induction takes the specific cases and comes up with general rules.

In some situations it is impossible to get the quantities of data that are needed for induction[Menzies and Compton, 1997][Bratko etal., 1989]. Instead there may be known relationships between values, such as if inflation rises, prices rise; if prices rise, sales drop; if inflation rises, wages rise; if wages rise, sales rise. This is an indeterminate model. There is no sure link between inflation and sales in this model. The model is incomplete.

One proposal on how to make use of such information is to generate `worlds' [Menzies and Compton, 1997] which takes the proposed model as an hypothesis. From the model comes a set of proofs or traces of the way things work in the logic of the model. Next the model is used to generate worlds.



Figure 7.4: Overview of semi-automatic construction of qualitative models from [Bratko etal., 1989]


The structure of the qualitative model is passed into an interpreter which derives some behaviour of the model. This is compared to the intended behaviour given by the user. After the behaviour is sorted out, instances of this behaviour are passed to the learner that works like an inductive learner to come up with definitions of the functions of the nodes in the original model. These definitions are fed back to the interpreter to keep the cycle going.

The debugger tests for consistency in the model. It does this by trying to derive the intended behaviour from the model. If this fails then the hypothesis must be incomplete. The hypothesis is refined as shown in Figure 7.5. Incremental refinement involves:
  1. tossing out inconsistent clauses
  2. adding covering clauses



Figure 7.5: Refining an hypothesis


Many hypotheses are generated at first. If an hypothesis becomes too complex through the addition of clauses, then it can be discarded.

7.4   Summary

Machine Learning is a good way to generate a rules base when there is not the time to form rules by incremental RDR or when there is already a large set of training cases available. Even when a large set of cases is not available, machine learning techniques can help create a qualitative model. This, in turn, can be used to generate the rules.

As mentioned in the previous chapter, the process of creating a knowledge base in RDR can also provide the cases for an induced rules base.


Previous Contents Next