|
Review by Slawomir T. Wierzchon, Polish
Academy of Sciences, in Control Engineering Practice, March 1997,
pp. 442-443.
Glenn Shafer is known to the AI community as the author of "A Mathematical
Theory of Evidence" - a monograph published exactly twenty years
ago, and devoted to a generalization of the Bayesian theory of subjective
probability judgement. Within this new and not uncontroversial theory,
subjective judgements are expressed in terms of so-called "belief
functions", i.e. using not necessarily additive but monotone "probabilities".
However, there is one serious problem: how to combine such functions effectively,
particularly when they are defined on different domains. An algorithm
for doing this task, called the propagation algorithm, was worked out
by G. Shafer and his colleagues. During further studies the algorithm
proved to be a useful tool in solving discrete optimization problems,
decision-making problems and reasoning in predicate calculus. Since the
probability measure is a special kind of belief function, the propagation
algorithm supports the task of probabilistic reasoning, which relies upon
computing the conditional probability of a variable, given a set of observations.
It is remarkable that, although almost trivial from a textbook point of
view, this problem is extremely difficult from the computational standpoint.
In 1978 two influential AI researchers, P. Szolovits and S.G. Pauker,
argued in their paper "Categorical and probabilistic reasoning"
(published in Artificial Intelligence, 11, pp. 115-144)
that, because of their enormous data and space requirements, the pure
probabilistic models may be used only in small problems. For instance,
if a model engages only 56 binary variables, then 256
values must be computed to specify the joint probability distribution.
If the computer can calculate the terms of the probability distribution
at one million values per second, then it will take 2,283 years to come
up with the whole probability distribution. Hopeless! But no-one knew
that an effective method for performing this task in a reasonable time
already existed. This had been suggested in 1970 by geneticists analyzing
pedigree data. Unfortunately, it was published in Clinical Genetics
a journal unknown to the AI community, who had to wait until 1983 when
J. Pearl and J. Kim developed the first effective algorithm for computing
the posterior probabilities in problems represented by tree structures.
In the late eighties, algorithms oriented toward probability propagation
in general graphical structures were developed by S. Lauritzen & D.
Spiegelhalter, G. Shafer & P. Shenoy, and F.V. Jensen.
The three approaches referred to above are discussed in depth in Shafer's
new book. When implementing these algorithms it is better to think of
appropriate probabilities as tables (vectors) of numbers satisfying certain
conditions. This point of view is consequently used in the whole book,
and the author starts his presentation by tracing the main properties
of probability tables that represent conditional and unconditional probabilities.
Under certain conditions imposed on the set of conditionals, their products
represent a joint probability distribution, and a sequence of conditionals
satisfying these requirements is said to be a "construction sequence".
So the author studies the main properties of such sequences, and he mentions
their various graphical representations including belief (or Bayesian)
networks, bubble graphs (this is a new notion introduced by him), chain
graphs (studies by N. Wermuth and S.L. Lauritzen) and valuation networks
(introduced by P. Shenoy). All these representations are discussed int
eh context of a practical example concerning the external audit of an
organization's financial statement, which allows a reader to see the advantages
and disadvantages of the corresponding representations.
Knowing a construction sequence, the problem of probabilistic reasoning
can be briefly described as the process of the elimination, one by one,
of "unnecessary" (at the current moment) variables. This is
a rather old idea, used in 1972 by U. Bertele and F. Brioschi to solve
so-called "nonserial dynamic programming problems". Shafer and
Shenoy renewed this approach, and gave it a new appeal. Instead of working
with graphs (as its originators have done) they use so-called "join
trees", well known to those working on the mathematical foundations
of database systems. A particularly good part of the book is the presentation
of how the join trees occur naturally during the elimination of variables
from a construction sequence. The rest of the book is devoted the various
implementations of the propagation algorithm in join trees.
In summary, this is a clever and concise book guaranteeing a quick and
detailed introduction to the domain of probability propagation. Exercises
added at the end of each chapter extend its content, and allow a reader
a deeper understanding of the core ideas. The short comments added to
the biographical material included at the end of the book opens a perspective
to other and new approaches to this problem (especially those based on
Markov-chain Monte Carlo simulation, mentioned in the closing section).
Lastly, the book contains the electronic addresses of the main web sites
where it is possible to find further information and computer software.
Bayesian networks, a type of probabilistic system, are currently used
in domains such as diagnosis, planning and control, dynamic systems and
time series, data analysis or computer vision. Hence, the material presented
in the book may be of use, not only for students, but also for research
workers from these domains.
|