In this post I will try to cover some of the widely used graphical models and how can we convert one to another. This post is largely influenced by discussions with my classmate Ankush Ganguly.
Some material which I used extensively to learn about this topic, and also strongly suggest include:
- Paper by Brendan J. Frey, “Extending Factor Graphs so as to Unify Directed and Undirected Graphical Models “
- Bayesian Reasoning and Machine Learning by David Barber
- Lecture slides by Prof. Amos Storkey on Factor Graphs.
It is important that you know about graph theory. I found Chapter 2 of the book by Barber to be very helpful in refreshing my concepts. It is a short chapter (5-7 pages, I guess)
I found this topic very useful since it helped me understand probability much better. Using this you can better understand how the variables in a probability distribution are linked to each other and effect each other. Most importantly, how would the monstrous probability expression look like in the form of a picture.
A factor graph has two types of nodes: Factors and Variables. Factors define the relation between variables. Typically any probability distribution can be written in the following form:
This follows from the fact that exponential function is defined for all real-valued inputs and output is constrained to be positive, an essential condition for any probability distribution. We ensure that this distribution sums to 1 by dividing it with normalisation constant Z.
The function E could be decomposed as a sum of smaller components. For example:
Alternatively, the probability distribution can also be decomposed as a product of smaller components:
In both the cases, we get a combination of smaller components which define some dependency between variables. This is represented by a single factor. For example, consider the following distribution
This can be written as a combination of 4 factors Φ1, Φ2, Φ3 and Φ4:
The factor graph for this would look like the following:
Factor Φ2 captures the relation between variables x2, x3, x4 and hence has a connection with them. Factors Φ3 and Φ4 give information about variables x3 nd x4 respectively. Φ3 and Φ4 are also called unary factors sometimes.
Note that we have drawn an undirected factor graph. It lacks the information how variables are related with each other or the direction of information flow. A directed factor graph can be drawn using the expression defining the probability distribution in terms of other probabilities. Variables that are in the conditioning set will have arrow heads into that factors, and the variable not being conditioned on will have an arrow coming from the factor. For factor Φ2, which was a simplification of p(x2|x3,x4), the arrows from x3, x4 will point into the factor node and factor node will point towards x2. The directed factor graph is given by Fig 2. Try understanding it with the equation: p(x1, x2, x3, x4) = p(x1|x4)p(x2|x3,x4)p(x3)p(x4)
Another type of graphical model is Belief Network. A Belief Network is a directed model without any factors. To arrive at a Belief Network from a directed factor graph, we connect the variables directly represented by that factor and remove all the factors from our graph. Thus we arrive at Fig 3.
Note that the unary factors would go away without leaving a trace behind. This is an example of how we lose some information while moving from directed factor graphs to belief network.
Next, let’s look at another form of a graphical model called Markov Network. Markov Networks are an undirected graphical model and can be derived from a factor graph or a Belief Network. Let’s see how we can convert the above Belief Network into a Markov Network. The rule would be to draw a link between two nodes which have the same child node. In this case, nodes x3 and x4, which have a common child x2. After connecting all such nodes, do away with arrow heads. Thus we arrive at Fig 4.
The same Markov network can be derived from undirected factor graph in Fig 1. Try to do it yourself. I am too lazy to write down the repetitive details, it is the same as directed graph to belief network to Markov network.
It is a fun thing to note how different in representation Fig 1, Fig 2, Fig 3 and Fig 4 are in terms of information about the model being captured or conveyed to the reader. As, an easy representation, the information loss can be illustrated with the help of Fig 5. Where going down the arrow means a loss in information about the model.
I believe I have covered most of the information about Factor Graphs and Probabilistic Graphical Models in general, except how to check for conditional independence between variables (Coming Soon) and message passing. Please let me know if I missed something (one or two words are also good enough), it might help someone else.