Markov Blanket

Jin
7 min readMay 17, 2020

--

We are all familiar with the quip of chaos theory — a butterfly flaps its wing in Brazil, setting off a hurricane in Texas. Every being on Earth is so deeply interconnected to each other that it seems almost impossible to make any reasonable prediction about the future state of the world without accounting for every factor. Yet, I recently learned that many predictive models rely on the fact that world processes are mostly independent.

In this article, I write about a fascinating concept I learned from Judea Pearl’s groundbreaking book Probabilistic Reasoning in Intelligent Systems — Markov Blanket, that provides theoretical insights into this phenomenon, and how it can be exploited in inference problems. I attempt to write intuitively as possible while conveying the essential message, and will make occasional reference to the trusty source of “common sense”.

The Basics

An understanding of Markov Blanket begins with an understanding of the Markov process. It is one approach to modeling stochastic (random) process which assumes that the probability of future events depends only on the present state. In other words, the probability of the future state is conditionally independent of the observation of past states given the observation of the present state.

Real-life processes that obey this Markovian property, such as weather, stock market and Brownian motion of small particles can be analyzed to yield interesting insights. It is important to note that the information available at each state, i.e. the state representation, must be sufficiently descriptive in order for the Markovian property to hold. For instance, knowing only the position of air molecules in the atmosphere does not allow us to predict future weather, but knowledge of both the location and velocity of every air molecule does.

With the fundamentals laid out, let’s investigate a special property that emerges when we have a network of interacting Markov processes, starting from the commonly cited example of the Sprinkler problem.

Fig. 1 Sprinkler Problem

In this problem, there are four interacting entities — Cloudy, Sprinkler, Rain and WetGrass, and we are tasked with finding the probability of one occurrence, for instance WetGrass=True, given some observations such as Sprinkler=True and Rain=False. This problem can be compactly expressed in the form of a Bayesian Network, as shown in the diagram above. A Bayesian Network (also commonly referred to as Bayes net, which sounds far more alluring) is a directed acyclic graph representation of Markovian relationships, such that:

  • A node represents the probability of an event, and
  • A directed edge from parent to child implies that the child node is conditionally dependent on the parent node

This representation is particularly powerful when applied to Markov processes because it visually expresses the implications of the Markov property. Let’s say we pick the combined states of Sprinkler and Rain as our current state representation. According to the Markov property, given the current state, i.e. Sprinkler=True and Rain=False, knowledge about the preceding state Cloudy becomes irrelevant when evaluating P(WetGrass). When visualized through a Bayes net, we can intuitively see how the Sprinkler and Rain node “cuts-off” connection from Cloudy and WetGrass.

Knowledge on Cloudy can be omitted

Having familiarized ourselves with the Bayes net representation through a concrete example, we can now look at the abstract version and attempt to make some generalizations. The diagram below shows a cut-out of a generic Bayes net centered around node X.

Fig. 2 Markov Blanket surrounding Node X in a Bayesian Network

In this much more convoluted diagram, the task of identifying the relevant nodes to factor in when calculating P(X) becomes non-trivial. From the previous example, we know that the parent nodes ought to be considered, but what about child nodes, or children of child nodes? To answer these questions, we first consider the different ways that Markov relations can interact with each other.

The mutual interaction between Markov processes can be divided into three categories: causal chain, common cause and common effect. They are illustrated below:

(i) Causal Chain:

In causal chains, nodes are linked in series, with the direction of the edges lined up. Following the definition of Markov property, the knowledge of node B (conditioned upon B) will render node A conditionally independent of node C, and vice-versa. We say that A and C are d-separated by C.

The state of A does not contribute additional info to C, if B is already known

(ii) Common Cause:

In common causes, multiple child nodes share the same parent. In the absence of any prior knowledge, the child nodes are conditionally dependent. This is apparent through simple induction — if it is snowing outside, then it is reasonable to predict that tomorrow will be windy because one can infer that it is probably winter time (the common cause). However, when we condition upon C (i.e. when we have certain knowledge about C), A and B become conditionally independent (d-separated by the knowledge of C). The intuition here is that within the winter season itself, the chances of getting either a windy or snowy day are uncorrelated.

A and B become conditionally independent when the common cause C is known

(iii) Common Effect

In common effect, multiple parent nodes share the same child. We call a shared child a ‘collider’, and colliders (C in this case) d-separate their parents (A,B). However, when conditioned upon colliders, their parents become conditionally dependent (also known as d-connected).

This effect is a little counterintuitive and can be best demonstrated with an example. Imagine this, a patient walks into a clinic. Prior to diagnosing the patient, the doctor has some guesses as to what ailments might plague the patient, but the probability of the right diagnosis is conditionally independent (the patient could be suffering from inflammation, flu, indigestion or any combination of them). At this time, the causes are d-separated by the unknown symptoms. However, upon finding that the patient has a fever, P(inflammation) and P(flu) become entangled. If flu is the correct diagnosis, then P(inflammation) becomes significantly reduced because flu explained away other possible causes of fever, and vice versa.

In the absence of knowledge about C, A and B are independent. In the presence of C, they are dependent (not shown)

Having identified three possible types of interactions between Markov relations, we now attempt to dissect Fig. 2. We begin with the upper portion consisting of the parents and parents’ parents of X.

The setup of V→U→X is isomorphic to (i) causal chain, so we know that X is dependent on its immediate parents U1 and U2, but independent of and further relation upstream due to the Markov property. The setup X2← U1→ X is isomorphic to (ii) common cause, and since U1 is known, X and X2 are conditionally independent. Hence, we know that only the states of Parents(X) are part of what’s needed to evaluate P(X).

P(X) is dependent upon Y1, Y2 by Bayes’ rule (P(A|B) = αP(B|A)). The setup X→Y2←Z21 is that of (iii) common effect, and since Y2 must be known, X and Z21 become conditionally dependent. Thus, we know that Children(X) and Parents(Children(X)) are both necessary factors in calculating P(X).

In totality, P(X) is dependent upon Parents(X), Children(X) and Parents(Children(X)), and once all of them are known, independent of every other factor in the Bayes net. Combining the abovementioned proportionality relationships, we can arrive at the following relation:

Just like that, we have discovered that in making predictions about the likelihood of event X having happened, our best estimates can be obtained from assessing its root causes, consequences and potential alternative sources for the consequences. To determine whether your bag of bagels have molded, think about where it has been stored, whether it has turned purple and whether that could’ve just been a result of the blueberries variety that you purchased. This constitutes the “Markov Blanket” of knowledge necessary to make accurate inferences about a real-world event, allowing us to filter out all other information as noise and lead a life free of needless worries.

--

--

Jin

Casual thinker, serious procrastinator, trying to understand life through different lenses