Say I have this amazing friend who travels a lot, say he visits every corner of the world. If I had the job/money I would have done the same, but unfortunately I need to be passionate a little bit to make it happen. So he want to share all his photos with me. There could have been the picture of the hill top buildings of italy, the underground city of turkey, Plitvice lakes of Croatia, Prague of Czech Republic. Obviously some of the photos will be in Zambia capturing The Southern Cross. Bazaruto Archipelago of Mozambique will obviously be there, Some of them will capture Nyiragongo Volcano of Congo. And so on, not one picture in each, but instead, there would be multiples of it. At this age, we don’t just take one picture at a place, right? But as I have seen these places in only in my dream, I could not really tell, which pictures are taken from where just by looking at it. I need to take a closer look, I need to figure out something unique in that picture and correspond that unique feature with other pictures I can tell if the place is same or not. They key point of this para is “Feature detection”.

So as a human we do feature detection. We want our algorithm to do that for us, by nature computer can’t tell a man “a man”, because everything is just a bunch of number that represent pixels, how would it know if they are feature or not? For the simplicity sake lets say we know there are two images where the picture where the image has been captured from the same place. Say we have two corresponding images representing the same scene, we need to match one point of the image to the other point. If we try to correspond each pixel of one image with other image. It will be extremely costly, because even a simple can have 1000×600 pixels. So better approach would be to get few unique identical points that is very rare in the images. Say within 1000×600=100000 pixels, we choose 100 points that does not comes too often, within two images. Since both of the images represent the same scene then something that is unique feature in one image should also be unique in other. So now our new aim will be to correspond 100 points of one image to the other.

Feature pixcels such as mountain peaks, building corners, doorways, or interestingly shaped patches are kinds of localized feature are often called keypoint features, interest points or even corners. They are basically defined by the appearance of patches of pixels surrounding the point location. Another set of important features are edges, say the profile of mountains against the sky.

As we see, similarity plays a great role in it, so we can use summed square difference here between two images, mathematically,

w is the weight, say some pixcel that is close to x_i has a different weight when it is far far away. Usually, it is a gaussian filter.

Which is essentially the correlation equation.

Now we know are going to consider pixels as a series of 2D array, like I(x+u,y+v). As we know any function can be represented as a tylor series, so do our I(). So our I(x+u,y+v) can be represented as

So putting all together:

Now we will compute eigenvalues, it helps us to find least squares plane of some data. It helps us to find the surface normal of point cloud.

is a 2×2 matrix, it can have two eigenvalues, λ1 and λ2.

If λ1 and λ2 are too small then flat.

If λ1 and λ2 are large then it is corner.

If λ1 or λ2 are too small or too large from other it is edge.

Harris measured this cornerness using R=λ1 * λ2 – k (λ1 + λ2)^2 = det M – k(trace M)^2,

if R is negative then it is edge, R is small then flat, R is greater than corner.

from scipy.ndimage import * from pylab import * from PIL import * def compute_harris_response(im,sigma=3): """ Compute the Harris corner detector response function for each pixel in a graylevel image. """ # derivatives imx = zeros(im.shape) filters.gaussian_filter(im, (sigma,sigma), (0,1), imx) imy = zeros(im.shape) filters.gaussian_filter(im, (sigma,sigma), (1,0), imy) # compute components of the Harris matrix Wxx = filters.gaussian_filter(imx*imx,sigma) Wxy = filters.gaussian_filter(imx*imy,sigma) Wyy = filters.gaussian_filter(imy*imy,sigma) # determinant and trace Wdet = Wxx*Wyy - Wxy**2 Wtr = Wxx + Wyy return Wdet / Wtr def get_harris_points(harrisim,min_dist=10,threshold=0.1): """ Return corners from a Harris response image min_dist is the minimum number of pixels separating corners and image boundary. """ # find top corner candidates above a threshold corner_threshold = harrisim.max() * threshold harrisim_t = (harrisim > corner_threshold) * 1 # get coordinates of candidates coords = array(harrisim_t.nonzero()).T # ...and their values candidate_values = [harrisim[c[0],c[1]] for c in coords] # sort candidates index = argsort(candidate_values) # store allowed point locations in array allowed_locations = zeros(harrisim.shape) allowed_locations[min_dist:-min_dist,min_dist:-min_dist] = 1 # select the best points taking min_distance into account filtered_coords = [] for i in index: if allowed_locations[coords[i,0],coords[i,1]] == 1: filtered_coords.append(coords[i]) allowed_locations[(coords[i,0]-min_dist):(coords[i,0]+min_dist), (coords[i,1]-min_dist):(coords[i,1]+min_dist)] = 0 return filtered_coords def plot_harris_points(image,filtered_coords): """ Plots corners found in image. """ figure() gray() imshow(image) plot([p[1] for p in filtered_coords],[p[0] for p in filtered_coords],'*') axis('off') show() im = array(Image.open('empire.jpg').convert('L')) harrisim = compute_harris_response(im) filtered_coords = get_harris_points(harrisim,10) plot_harris_points(im, filtered_coords)

Thanks to:

1. Professor Mubarak Shah for sharing his course online.

2. Professor Szeleski, for making his book draft public

3. Jan Erik Solem for his book on programming computer vision.

4. That random guy at physics forum who tells me why it works, after full day of wondering, why it works.