Fun With Filters and Frequencies!

Image Processing & Multiresolution Blending

San/Eboshi Hybrid

Project Overview

This project explores fundamental concepts in digital image processing, focusing on spatial domain filtering, frequency domain analysis, and image blending techniques. The project is divided into two main parts: Part 1 focuses on building intuitions about 2D convolutions and filtering, while Part 2 explores frequency domain processing, hybrid images, and multiresolution blending.

Key Learning Objectives: Understanding 2D convolutions, implementing filters from scratch, exploring the relationship between spatial and frequency domains, and creating seamless image composites.

Part 1: Fun with Filters

In this part, we build intuitions about 2D convolutions and filtering, beginning with the humble finite difference filters in the x and y directions. These allow us to detect vertical and horizontal edges respectively.

Part 1.1: Convolutions from Scratch!

First as a recap to convolution, I implemented an implementation with two four loops and four for loops to compare and contrast the runtime performance of the convolution operations. I also compared these implementations with the built-in convolution function scipy.signal.convolve2d.

I took a picture of myself (read as grayscale), wrote out a 9x9 box filter, and convolved the picture with the box filter. I also applied the finite difference operators Dx and Dy.

Code Implementation Comparison

Four Loop Implementation
Four Loop Implementation
Two Loop Implementation
Two Loop Implementation

Runtime Performance Comparison

After averaging the results across 5 experiments, we can see that scipy is significantly faster due to numerical optimizations and proper vectorization. It is also worth noting that the extra cost of having to compute a patch by patch dot product slows down the four loop implementation significantly.

Implementation Average Time
Two Loops 55.9 seconds
Four Loops 581.1 seconds
Scipy 2.1 seconds

Image Processing Results

To begin apply convolutions to images, I first applied a 9x9 box filter, a finite difference operator D_x, and a finite difference operator D_y to my self portrait. The finite difference operators are defined as:

$$D_x = \begin{bmatrix} 1 & 0 & -1 \end{bmatrix}, \quad D_y = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}$$
Original Self Portrait
Original Self Portrait
Finite Difference Dx
Finite Difference Dx
Box Filter (Low Pass)
Box Filter (Low Pass)
Finite Difference Dy
Finite Difference Dy

We can see the box filter as a simple, but effective low pass filter resulting in a blurred image due to averaging out the pixel values. We can also see the finite difference operators as simple edge detectors for horizontal and vertical edges. For proper edge detection however, we need to use a more sophisticated operator.

Part 1.2: Finite Difference Operator

Lets move on to proper edge detection. We can convolve this image of "The Cameraman" with the finite difference operators D_x and D_y. This gives us once again simple horizontal and vertical edge detection. Next, we can combine the results of each of these operators through the gradient magnitude defined below where Ix and Iy are the images resulting from the finite difference operators:

$$|\nabla I| = \sqrt{I_x^2 + I_y^2}$$

Threshold Selection: I chose a threshold that balances finding all real edges while suppressing noise. For the above images, thresholding at 3e-1 was a good balance between finding all edges and suppressing noise.

Part 1.3: Derivative of Gaussian (DoG) Filter

Even with threshold tuning, the results with just the gradient magnitude were rather noisy. Luckily, a gaussian filter can be used to smooth the image before applying the graident magnitude. Not only this but through the properties of convolution, this is equivelant to convolving with the derivative of the gaussian filter which means we only need to convolve a filter with an image once in each respective direction!

$$\text{DoG}_x = G * D_x, \quad \text{DoG}_y = G * D_y$$

We can see the results of applying these filters is a much smoother and sharper edge detection then before. Additionally the two methods result in the exact same image. We have made a pretty nice edge detector simply through convolution!

Implementation Note: I used cv2.getGaussianKernel() to create 1D Gaussian kernels and took the outer product to create 2D Gaussian filters. The DoG filters were created by convolving the Gaussian with D_x and D_y operators.

DoG Filter Components

Part 2: Fun with Frequencies!

Part 2.1: Image "Sharpening"

We can also use convolution and gaussian filters to sharpen images. Because the gaussian filter is a low pass filter that retains only low frequencies, by subtracting the blurred version of an image from an original image we get the high frequencies. Adding these with a scalar factor, alpha, to the original image allows us to artificially introduce high frequency features resulting in images looking sharper.

$$I_{\text{sharp}} = I + \alpha(I - I_{\text{blur}})$$

This combines into a single convolution operation called the unsharp mask filter.

Sharpening Results with Different Alpha Values

Original Images and High Frequency Components
Original Taj Mahal
Original Taj Mahal Image
Taj High Frequency
Taj High Frequency Component
Original Lizard
Original Lizard Image
Taj Mahal Sharpening Results
Sharpened Taj Mahal α=0.5
Sharpened Taj Mahal (α=0.5)
Sharpened Taj Mahal α=0.75
Sharpened Taj Mahal (α=0.75)
Sharpened Taj Mahal α=1.0
Sharpened Taj Mahal (α=1.0)
Sharpened Taj Mahal α=2.0
Sharpened Taj Mahal (α=2.0)
Sharpened Taj Mahal α=2.5
Sharpened Taj Mahal (α=2.5)
Lizard Sharpening Results
Sharpened Lizard α=0.5
Sharpened Lizard (α=0.5)
Sharpened Lizard α=0.75
Sharpened Lizard (α=0.75)
Sharpened Lizard α=1.0
Sharpened Lizard (α=1.0)
Sharpened Lizard α=2.0
Sharpened Lizard (α=2.0)
Sharpened Lizard α=2.5
Sharpened Lizard (α=2.5)

Note: It should be noted this technique is cheating a bit. By adding in scaled high frequency features to the image, we are not actually adding any new information that "sharpens" the image. Noneoftherefore, the effect is still noticeable and a useful enhancemnet that is very common in cheaper cameras.

Part 2.2: Hybrid Images

Hybrid images are static images that change in interpretation as a function of viewing distance. Because high frequencies tend to dominate human perception whenver available, but low frequency (smoother) parts of images appear at a distance, it is possible to construct images that are composed of two images. One seen further away, and the other seen only up close. This technique is based off the SIGGRAPH 2006 paper by Oliva, Torralba, and Schyns and can be used to create some extremely interesting results.

Hybrid Image Results

Derek Original
Derek (Low Frequency)
Nutmeg Original
Nutmeg (High Frequency)
Derek and Nutmeg Hybrid
Derek/Nutmeg
Heisenberg Original
Heisenberg (Low Frequency)
Walter Original
Walter (High Frequency)
Walter and Heisenberg Hybrid
Walter/Heisenberg
Mononoke Original
San (Low Frequency)
Eboshi Original
Lady Eboshi (High Frequency)
Mononoke and Eboshi Hybrid
San/Eboshi

Frequency Analysis

For my favorite result (San + Eboshi), I illustrate the process through frequency analysis:

Original Images and Their Frequency Components
San Original
San (Original)
San Frequency Domain
San Frequency Domain
Eboshi Original
Eboshi (Original)
Eboshi Frequency Domain
Eboshi Frequency Domain
Post-Filtered Images and Their Frequency Spectra
Post Low Pass Filter
San (Low Pass Filtered)
San Low Pass Frequency
San Low Pass Frequency
Post High Pass Filter
Eboshi (High Pass Filtered)
Eboshi High Pass Frequency
Eboshi High Pass Frequency
Cutoff Frequency Choice

For the San/Eboshi hybrid, I carefully selected the cutoff frequencies to achieve optimal blending:

  • Low Pass Filter (San): σ = 4.0 - This preserves the smooth, low-frequency features of San's face while removing fine details
  • High Pass Filter (Eboshi): σ = 3.5 - This retains the sharp, high-frequency details of Eboshi's features while removing the smooth background

The choice of these sigma values was determined through experimentation to find the optimal balance between preserving important facial features and achieving seamless blending in the final hybrid image.

Implementation Note: In order to create hybrid images, it was necessary to "align" images so San's eyes can overlay Eboshi's eyes. This results in a slight rotation as seen by the post low pass filtered image. This also means I cropped the final image so it appears more clean.

Final Hybrid Result
San/Eboshi Hybrid
San/Eboshi
San/Eboshi Frequency
San/Eboshi Frequency

Part 2.3: Gaussian and Laplacian Stacks

An exciting application of Gaussian filters and frequency analysis is that it can be combined to create seamless image blending. A gaussian stack can be built by applying a gaussian filter at different scales to the same image. A laplacian stack can then be created by subtracting each level of the gaussian stack from the next lower level. Since Gaussian filters are low pass filters, this resulting difference gives an image composed of frequencies in a particular band allowing us to progressivley blend two images together. The resulting stack from the differences is known as a Laplacian stack. Each level L_i of the stack can be described by the following:

$$L_i = G_i - G_{i+1}, \quad \text{where } G_i \text{ is the ith level of the Gaussian stack}$$

This process of image blending is described in detail in Burt & Adelson (1983).

To start, I recreated Figure 3.42 in Szelski's book to show the process of image blending using Gaussian and Laplacian stacks on the famous "Oraple" example. We can see that as we go down the stack the image becomes inceasingly blurred due to reaching lower and lower frequency bands. This is seen in the first 3 rows of the below figure. The final row are the weighted image blends of each image and the final result.

We can also visualize the Gaussian and Laplacian stacks directly for a stack with more levels. It is important to note the very last level of the stack is the image with the lowest frequencies. Additionally, I used a larger laplacian stack here to show the results more clearly.

Gaussian and Laplacian Stacks

Gaussian Stacks
Laplacian Stacks

Part 2.4: Multiresolution Blending (a.k.a. the Oraple!)

Finally, I implemented multiresolution blending described in the paper using these laplacian stacks. For each blend, I take a mask and alpha blend two images together with an increasingly blurrier mask as we go down the stack. This results in a seamless blend of two images. Mathematically, this can be described below for each level of the blended stack we reconstruct.

Mathematical Foundation of Alpha Blending

For each level i of the blend stack, the blended Laplacian level is calculated as:

$$L_i^{blend} = L_i^{(1)} \cdot M_i + L_i^{(2)} \cdot (1 - M_i)$$

where:

  • $L_i^{(1)}$ and $L_i^{(2)}$ are the i-th levels of the Laplacian stacks for images 1 and 2
  • $M_i$ is the i-th level of the Gaussian mask stack (increasingly blurred)
  • $(1 - M_i)$ represents the complementary mask

The final blended image is reconstructed by summing all blended levels: $I_{final} = \sum_{i=0}^{n} L_i^{blend}$, then normalized to [0,1] range.

Oraple - The Classic Blend

Final Oraple
The Oraple

Custom Blended Images

Additionally, I created a few custom blended images using irregular masks to demonstrate the power of multiresolution blending:

Black Hole Sun
Sun Image
Sun Image
Black Hole Image
Black Hole Image
Black Hole Mask
Blending Mask
Black Hole Sun Blend
Black Hole Sun
Night Sky Road
Night Sky Image
Night Sky Image
Road Image
Road Image
Road Mask
Blending Mask
Night Road Blend
Starry Road

Conclusion

This project provided comprehensive hands-on experience with fundamental image processing techniques, from basic spatial filtering to advanced frequency domain operations and multiresolution blending. The combination of theoretical understanding and practical implementation revealed the elegance and power of digital image processing algorithms.

The project successfully demonstrated how different filtering approaches can be used for various applications, from edge detection to image enhancement and creative compositing. The multiresolution blending technique proved particularly effective for creating seamless image composites, while hybrid images showcased the fascinating intersection of computer vision and human perception.