Project 3 - [Auto]Stitching Photo Mosaics

Project Overview

The goal of this assignment is to explore homographies, image warping, and mosaicing. This project demonstrates both manual and automatic approaches to creating panoramic mosaics, with automatic stitching using Harris corner detection, feature matching, and RANSAC for robust homography estimation.

Shoot and Digitize Pictures

We start by shooting and digitizing pictures of the various scenes we will be using for the final mosaics. The 3 sets I chose was a lovely park by my house, some stairs nearby a building at Berkeley, and the exterior of a building towards the center of campus.

It is critical to take pictures that can be related to eachother via homography. To do this, we need to ensure rays from one picture to the next are captured. This can be enforced by rotating the camera slightly between pictures and never translating the camera's center of projection.

Park Set

Park 1 - source

Park 2 - source

Park 3 - source

Stairs Set

Stairs 1 - source

Stairs 2 - source

Stairs 3 - source

Berkeley Set

Berkeley 1 - source

Berkeley 2 - source

Berkeley 3 - source

Recover Homographies

Now that we have our images, we need to recover the homography transformations between them. We can do this by finding point correspondences between the images (at first manually) and then using the homography matrix to warp the images to a common coordinate system.

Consider the projective transform relating the second and third picture of the park scene. By manually saving the correspondances between the two images, we can solve an overdetermined system of linear equations to recover the homography matrix. We can see the correspondances visualized below.

Point Correspondences

park_2_to_3_points (scaled for visibility)

Mathematical Foundations

Now lets figure out how to mathematically transform our images via a homography so we can use them to create mosaics. A homography is a projective transformation that maps points from one plane to another. For image mosaicing, we need to find the homography matrix that relates corresponding points between two images.

Homography Formulation

For each correspondence $(x,y) \to (u,v)$, the projective transformation gives:

$$\mathbf{p}' = \mathbf{H}\,\mathbf{p}, \quad \mathbf{H} = \begin{bmatrix} h_{11}&h_{12}&h_{13} \\ h_{21}&h_{22}&h_{23} \\ h_{31}&h_{32}&1 \end{bmatrix}$$

Expanding to linear form and rearranging:

$$\begin{aligned} h_{11}x + h_{12}y + h_{13} - h_{31}ux - h_{32}uy &= u \\ h_{21}x + h_{22}y + h_{23} - h_{31}vx - h_{32}vy &= v \end{aligned}$$

This forms the linear system $\mathbf{D}\mathbf{h} = \mathbf{b}$ where:

$$\mathbf{D} = \begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -ux & -uy \\ 0 & 0 & 0 & x & y & 1 & -vx & -vy \end{bmatrix}, \quad \mathbf{h} = \begin{bmatrix} h_{11} \\ h_{12} \\ h_{13} \\ h_{21} \\ h_{22} \\ h_{23} \\ h_{31} \\ h_{32} \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} u \\ v \end{bmatrix}$$

With $n$ correspondences, we stack $2n$ equations and solve using least squares to find the 8 unknown homography parameters.

Warped Result

We can now visualize the projective transform between the second and third picture of the park scene. This transform can be applied to any points from the second image using the recovered homography matrix. This gives us a good foundation to start stitching together image mosaics.

park_2_to_3 (projective transform)

$$\mathbf{H} = \begin{bmatrix} 1.40 & -0.0436 & -1310 \\ 0.188 & 1.21 & -438 \\ 1.05 \times 10^{-4} & -7.63 \times 10^{-6} & 1.00 \end{bmatrix}$$

Recovered Homography Matrix

Warp the Images (Rectification and Interpolation)

We have figured out how to mathematically recover homography matrices. We can now apply those matrices to images.

Inverse Warping

Instead of forward warping (which can create holes), we use inverse warping. For each pixel $(x', y')$ in the output image, we find the corresponding location $(x, y)$ in the source image:

$$\begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \mathbf{H}^{-1} \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix}$$

We then sample the source image at $(x, y)$ using interpolation (nearest neighbor or bilinear) and place that value at $(x', y')$ in the output image. This ensures every output pixel gets a value without holes.

Besides using inverse warping for image mosaics, we can also use it to rectify images since we can use the homography matrix to warp the image to a flat 2D projected rectangle.

Unrectified Skull Point Correspondences

Rectified Image (Nearest Neighbor Interpolation)

Rectified Image (Bilinear Interpolation)

Interpolation Tradeoffs

Tradeoffs and Comparisons: Both images appear very similarly across both interpolation methods, however bilinear interpolation is more computationally expesnive resulting in the time it took to generate the image being longer than the nearest neighbor interpolation.

We can visualize more examples of rectification and interpolation below (Recognize any albums?)

Point Correspondances

Rectified Image (Bilinear Interpolation)

Point Correspondances

Rectified Image (Bilinear Interpolation)

Blend Images into a Mosaic

Now for the main event. Using homographies, inverse warping, and interpolation, we can blend our images into beautiful mosaics. To do this, we can learn each homography matrix between an image and a reference image and then use the inverse warping to warp the image to the reference image, apply that homography matrix to each image, then use weighted blending to blend together the images into a single mosaic.

Blending Masks

Below are the masks used for weighted averaging to reduce edge artifacts in the final mosaics. These were generated with scipy.ndimage.distance_transform_edt and allow us to blend smooth gradients between the images for one final result. Here we can visualize the first source image blended with the center reference image for the park set.

park_source_mask_blended

park_reference_mask_blended

Using these blending masks and everything we learned so far, we can finally visualize our final mosaics! Pay attention to any edge artifacts or blurring especially in the Berkeley Mosaic. These will be reduced when we implement feature detection and autostitching in the next section.

Park Mosaic

Source: park_1

Source: park_2

Source: park_3

park_mosaic (weighted blending)

Stairs Mosaic

Source: stairs_1

Source: stairs_2

Source: stairs_3

stairs_mosaic (weighted blending)

Berkeley Mosaic

Source: berkeley_1

Source: berkeley_2

Source: berkeley_3

berkeley_mosaic (weighted blending)

We saw how we can create image mosaics using manual correspondances between images. However, this is not always practical and is prone to errors as we saw above. For this next part of the project, we will implement methods to automatically detect features in images and autostitch our images together.

The following parts of this project follows the methodology described in "Multi-Image Matching using Multi-Scale Oriented Patches" by Brown et al. The paper presents a comprehensive approach to automatic image stitching that includes Harris corner detection, adaptive non-maximal suppression, feature descriptor extraction, and robust matching techniques. Note that we do not implement the full framework (such as using wavelet transforms) but nonoftheless the core framework provides the foundation for our automatic mosaicing system.

Harris Corner Detection

The Harris corner detector identifies points of interest in images by analyzing local intensity changes. These corners are characterized by significant intensity variations in multiple directions, making them robust features for image matching.

While Harris corner detection is a good starting point, even with a slight bit of filtering it produces a great deal of corners. We can see the effecg of basic Harris corner detection on the first image of the park scene below.

Basic Harris Corner Detection

park_1_harris_corners - Basic Harris corner detection with filtering

Adaptive Non-Maximal Suppression (ANMS)

Lets work on optimizing the corners we select. ANMS reduces the number of detected corners while ensuring they are well-distributed across the image. This prevents clustering of corners in high-texture regions and maintains spatial diversity. Below I highlight the effect of ANMS on the above image by filtering down to 250 corners versus 125 corners. Note the fact that corners stay relatively uniformly distributed across the image.

park_1_harris_suspress_250_corners - ANMS with 250 corners

park_1_harris_suspress_125_corners - ANMS with 125 corners

We have successfully found a way to automatically detect a good set of corners for our images. We now need to work on implementing a method to match these corners between images.

Feature Descriptor Extraction

Feature descriptors capture the local appearance around each detected corner point. With good feature descriptors, it will become significantly easier to match correspondances between images. For this project, 8x8 patches are extracted from a larger 40x40 window, which are then bias/gain-normalized to create robust, illumination-invariant descriptors. Here are some example feature descriptors extracted from the first image of the park scene and the first image of the Berkeley scene.

The first row is the original 40x40 patch, the second row is the blurred 40x40 patch, and the third row is the 8x8 descriptor patch.

Extracted Feature Descriptors

park_1_500_descriptors - Normalized 8x8 feature descriptors

berkeley_1_500_descriptors - Normalized 8x8 feature descriptors

Feature Matching

With good feature descriptors, we can now match correspodances between images. To accomplish this we use the ratio test (Lowe's method) to filter matches based on the ratio between the first and second nearest neighbors, which helps eliminate ambiguous matches.

Lowe's Ratio Test

For each descriptor in image 1, we find its two nearest neighbors in image 2 using Euclidean distance:

$$d_1 = \min_{j} ||\mathbf{d}_1^{(i)} - \mathbf{d}_2^{(j)}||_2, \quad d_2 = \min_{j \neq j_1} ||\mathbf{d}_1^{(i)} - \mathbf{d}_2^{(j)}||_2$$

where $\mathbf{d}_1^{(i)}$ is the $i$-th descriptor in image 1, and $j_1$ is the index of the first nearest neighbor.

The ratio test accepts a match if:

$$\frac{d_1}{d_2} < \tau$$

where $\tau$ is typically 0.7. This ensures the best match is significantly better than the second-best match, reducing false correspondences.

Using Lowe's ratio test, we can now visualize feature matches between a pair of images. Below are the feature matches between the first image of the park scene and the second image of the park scene, and the first image of the Berkeley scene and the second image of the Berkeley scene. Note that there are a significant number of outliers in the Berkeley image due to the repetitive architectural features like windows and building facades that create many similar-looking corners. We can reduce these using RANSAC in the next step.

Feature Correspondences

park_1_to_2_500_normal - Clean feature matches with minimal outliers

berkeley_1_to_2_500_normal - Many outliers due to repetitive structures (windows, buildings)

RANSAC for Robust Homography

We have good feature dsecriptors, we have found a way to match them, but now there are too many outliers that will hurt our homography estimation. By using RANSAC (Random Sample Consensus) we can robustly estimate homographies by iteratively selecting random subsets of correspondences and finding the transformation that best fits the majority of points. This effectively filters out outlier matches and produces more accurate mosaics.

Manual vs Automatic Mosaics

Below are comparisons between manually stitched mosaics (using hand-selected correspondences) and automatically stitched mosaics (using RANSAC-filtered correspondences). The automatic approach demonstrates the robustness of the feature matching and RANSAC pipeline.

Park Mosaic

Manual Mosaic - Hand-selected correspondences

Automatic RANSAC Mosaic - Feature matching + RANSAC

Stairs Mosaic

Manual Mosaic - Hand-selected correspondences

Automatic RANSAC Mosaic - Feature matching + RANSAC

Berkeley Mosaic

Manual Mosaic - Hand-selected correspondences

Automatic RANSAC Mosaic - Feature matching + RANSAC