Markov Random Field is basically a model of joint probability distribution. It is commonly used in many places where an output get biased depending on the output of its neighbors which including Computer Vision. In computer vision one of the major problem that we are aware of is its noise problem. This noise usually depends on its neighbors which satisfies markovian properties.

Now lets represent the neighborhood relationship in a undirected graph where the vertices represents random variables and edges defines the neighborhood. We assume that this random variables that can be picked can be denoted as a finite set L of cardinality h. If I have n random variable then I can say that I have h^n possible combination out of which one is responsible for all. So we can represent it in a probability distribution, p(x). MRF has its own way to assumes this property for p(x). It assumes a vertice conditionally independent of all other vertices other than its neighbor. Mainly because it would help us to use Hammersly Clifford theorem.

In graph theory, there is a common term called clique. Clique is basically the vartices that are somehow connected via an edge. If we consider clique potential function of size 2 then clique potential between 2 vartice will return a non negative value. The P(x) is proportional to the multiplication of all clique potentials.

Previously we were busy with only the outputs, but now we will try to predict inputs as well. So x_n were unobserved data and d_n’s are observed data. So now we have another edge which will be subject to clique potential as well. So now the probablity of d given x, P(d|x) is proportional to the multiplication of all clique potentials between x and d.

But the function that we get can be normalized using an exponential function, P(x)=Exp(-E(x))/z

Here we get another function, E(x) which is known as Energy function. E(x) is summation instead of product as because we have an exponential in the original function. E(x)=Summation of theta(x)

This E(x) is closely related with Clique Potential because its just analogous to -log(psi(x))

Let me give an example of this energy function. Suppose we are separating out background and foreground image. We can mark which one is backgroun and which one is foreground by marking them in different label. The pixel that has a probability denoted like this P(x_p,l_1)=2 and this P(x_p,l_0)=5 then it is more probable ffor x_p to be in foreground. Now we have to calculate total energy function to get the result.

E(x)=Summation of theta(x) [for unary potential]

But if we want more accurate result we would go for binary potential. So we will get connected in all possible way.

E(x)=Summation of theta(xp) + Summation of theta(xp,xq) [for binary potential]

Now using dynamic programming we would figure out the minimum energy so that we get a higher probability in total.

Now we know that the only thing we care about is energy function. So we can change the value of potential function to make the calculation more convenient by using Reperamatrization. Reperamatrization is a trick where we change the value of potential function without leting the Energy function being changed. Few of the example of that are given below.