Project 2 - Fun With Filters and Frequencies!

Project Overview

This project explores fundamental concepts in digital image processing, focusing on spatial domain filtering, frequency domain analysis, and image blending techniques. The project is divided into two main parts: Part 1 focuses on building intuitions about 2D convolutions and filtering, while Part 2 explores frequency domain processing, hybrid images, and multiresolution blending.

                    Key Learning Objectives: Understanding 2D convolutions, implementing filters from scratch, exploring the relationship between spatial and frequency domains, and creating seamless image composites.
                

Part 1: Fun with Filters

In this part, we build intuitions about 2D convolutions and filtering, beginning with the humble finite difference filters in the x and y directions. These allow us to detect vertical and horizontal edges respectively.

Part 1.1: Convolutions from Scratch!

First as a recap to convolution, I implemented an implementation with two four loops and four for loops to compare and contrast the runtime performance of the convolution operations. I also compared these implementations with the built-in convolution function scipy.signal.convolve2d.

I took a picture of myself (read as grayscale), wrote out a 9x9 box filter, and convolved the picture with the box filter. I also applied the finite difference operators Dx and Dy.

Code Implementation Comparison

Four Loop Implementation

Two Loop Implementation

Runtime Performance Comparison

After averaging the results across 5 experiments, we can see that scipy is significantly faster due to numerical optimizations and proper vectorization. It is also worth noting that the extra cost of having to compute a patch by patch dot product slows down the four loop implementation significantly.

Implementation	Average Time
Two Loops	55.9 seconds
Four Loops	581.1 seconds
Scipy	2.1 seconds

Image Processing Results

To begin apply convolutions to images, I first applied a 9x9 box filter, a finite difference operator D_x, and a finite difference operator D_y to my self portrait. The finite difference operators are defined as:

$$D_x = \begin{bmatrix} 1 & 0 & -1 \end{bmatrix}, \quad D_y = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}$$

Original Self Portrait

Finite Difference Dx

Box Filter (Low Pass)

Finite Difference Dy

We can see the box filter as a simple, but effective low pass filter resulting in a blurred image due to averaging out the pixel values. We can also see the finite difference operators as simple edge detectors for horizontal and vertical edges. For proper edge detection however, we need to use a more sophisticated operator.

Part 1.2: Finite Difference Operator

Lets move on to proper edge detection. We can convolve this image of "The Cameraman" with the finite difference operators D_x and D_y. This gives us once again simple horizontal and vertical edge detection. Next, we can combine the results of each of these operators through the gradient magnitude defined below where Ix and Iy are the images resulting from the finite difference operators:

$$|\nabla I| = \sqrt{I_x^2 + I_y^2}$$

Partial Derivative in X (D_x)

Partial Derivative in Y (D_y)

Gradient Magnitude

Binarized Edge Image

Threshold Selection: I chose a threshold that balances finding all real edges while suppressing noise. For the above images, thresholding at 3e-1 was a good balance between finding all edges and suppressing noise.

Part 1.3: Derivative of Gaussian (DoG) Filter

Even with threshold tuning, the results with just the gradient magnitude were rather noisy. Luckily, a gaussian filter can be used to smooth the image before applying the graident magnitude. Not only this but through the properties of convolution, this is equivelant to convolving with the derivative of the gaussian filter which means we only need to convolve a filter with an image once in each respective direction!

$$\text{DoG}_x = G * D_x, \quad \text{DoG}_y = G * D_y$$

We can see the results of applying these filters is a much smoother and sharper edge detection then before. Additionally the two methods result in the exact same image. We have made a pretty nice edge detector simply through convolution!

Edge Detection with Gaussian Smoothing

Derivative of Gaussian (DoG) Edge Detection

                    Implementation Note: I used cv2.getGaussianKernel() to create 1D Gaussian kernels and took the outer product to create 2D Gaussian filters. The DoG filters were created by convolving the Gaussian with D_x and D_y operators.
                

DoG Filter Components

DoG_x Filter

DoG_y Filter

Part 2: Fun with Frequencies!

Part 2.1: Image "Sharpening"

We can also use convolution and gaussian filters to sharpen images. Because the gaussian filter is a low pass filter that retains only low frequencies, by subtracting the blurred version of an image from an original image we get the high frequencies. Adding these with a scalar factor, alpha, to the original image allows us to artificially introduce high frequency features resulting in images looking sharper.

$$I_{\text{sharp}} = I + \alpha(I - I_{\text{blur}})$$

This combines into a single convolution operation called the unsharp mask filter.

Sharpening Results with Different Alpha Values

Original Images and High Frequency Components

Original Taj Mahal Image

Taj High Frequency Component

Original Lizard Image

Taj Mahal Sharpening Results

Sharpened Taj Mahal (α=0.5)

Sharpened Taj Mahal (α=0.75)

Sharpened Taj Mahal (α=1.0)

Sharpened Taj Mahal (α=2.0)

Sharpened Taj Mahal (α=2.5)

Lizard Sharpening Results

Sharpened Lizard (α=0.5)

Sharpened Lizard (α=0.75)

Sharpened Lizard (α=1.0)

Sharpened Lizard (α=2.0)

Sharpened Lizard (α=2.5)

Note: It should be noted this technique is cheating a bit. By adding in scaled high frequency features to the image, we are not actually adding any new information that "sharpens" the image. Noneoftherefore, the effect is still noticeable and a useful enhancemnet that is very common in cheaper cameras.

Part 2.2: Hybrid Images

Hybrid images are static images that change in interpretation as a function of viewing distance. Because high frequencies tend to dominate human perception whenver available, but low frequency (smoother) parts of images appear at a distance, it is possible to construct images that are composed of two images. One seen further away, and the other seen only up close. This technique is based off the SIGGRAPH 2006 paper by Oliva, Torralba, and Schyns and can be used to create some extremely interesting results.

Hybrid Image Results

Derek (Low Frequency)

Nutmeg (High Frequency)

Derek/Nutmeg

Heisenberg (Low Frequency)

Walter (High Frequency)

Walter/Heisenberg

San (Low Frequency)

Lady Eboshi (High Frequency)

San/Eboshi

Frequency Analysis

For my favorite result (San + Eboshi), I illustrate the process through frequency analysis:

Original Images and Their Frequency Components

San (Original)

San Frequency Domain

Eboshi (Original)

Eboshi Frequency Domain

Post-Filtered Images and Their Frequency Spectra

San (Low Pass Filtered)

San Low Pass Frequency

Eboshi (High Pass Filtered)

Eboshi High Pass Frequency

Cutoff Frequency Choice

For the San/Eboshi hybrid, I carefully selected the cutoff frequencies to achieve optimal blending:

Low Pass Filter (San): σ = 4.0 - This preserves the smooth, low-frequency features of San's face while removing fine details
High Pass Filter (Eboshi): σ = 3.5 - This retains the sharp, high-frequency details of Eboshi's features while removing the smooth background

The choice of these sigma values was determined through experimentation to find the optimal balance between preserving important facial features and achieving seamless blending in the final hybrid image.

Implementation Note: In order to create hybrid images, it was necessary to "align" images so San's eyes can overlay Eboshi's eyes. This results in a slight rotation as seen by the post low pass filtered image. This also means I cropped the final image so it appears more clean.

Final Hybrid Result

San/Eboshi

San/Eboshi Frequency

Part 2.3: Gaussian and Laplacian Stacks

An exciting application of Gaussian filters and frequency analysis is that it can be combined to create seamless image blending. A gaussian stack can be built by applying a gaussian filter at different scales to the same image. A laplacian stack can then be created by subtracting each level of the gaussian stack from the next lower level. Since Gaussian filters are low pass filters, this resulting difference gives an image composed of frequencies in a particular band allowing us to progressivley blend two images together. The resulting stack from the differences is known as a Laplacian stack. Each level L_i of the stack can be described by the following:

$$L_i = G_i - G_{i+1}, \quad \text{where } G_i \text{ is the ith level of the Gaussian stack}$$

This process of image blending is described in detail in Burt & Adelson (1983).

To start, I recreated Figure 3.42 in Szelski's book to show the process of image blending using Gaussian and Laplacian stacks on the famous "Oraple" example. We can see that as we go down the stack the image becomes inceasingly blurred due to reaching lower and lower frequency bands. This is seen in the first 3 rows of the below figure. The final row are the weighted image blends of each image and the final result.

Figure 3.42

We can also visualize the Gaussian and Laplacian stacks directly for a stack with more levels. It is important to note the very last level of the stack is the image with the lowest frequencies. Additionally, I used a larger laplacian stack here to show the results more clearly.

Gaussian and Laplacian Stacks

Gaussian Stacks

Orange Gaussian Stack

Apple Gaussian Stack

Laplacian Stacks

Orange Laplacian Stack

Apple Laplacian Stack

Part 2.4: Multiresolution Blending (a.k.a. the Oraple!)

Finally, I implemented multiresolution blending described in the paper using these laplacian stacks. For each blend, I take a mask and alpha blend two images together with an increasingly blurrier mask as we go down the stack. This results in a seamless blend of two images. Mathematically, this can be described below for each level of the blended stack we reconstruct.

Mathematical Foundation of Alpha Blending

For each level i of the blend stack, the blended Laplacian level is calculated as:

$$L_i^{blend} = L_i^{(1)} \cdot M_i + L_i^{(2)} \cdot (1 - M_i)$$

where:

$L_i^{(1)}$ and $L_i^{(2)}$ are the i-th levels of the Laplacian stacks for images 1 and 2
$M_i$ is the i-th level of the Gaussian mask stack (increasingly blurred)
$(1 - M_i)$ represents the complementary mask

The final blended image is reconstructed by summing all blended levels: $I_{final} = \sum_{i=0}^{n} L_i^{blend}$, then normalized to [0,1] range.

Oraple - The Classic Blend

The Oraple

Custom Blended Images

Additionally, I created a few custom blended images using irregular masks to demonstrate the power of multiresolution blending:

Black Hole Sun

Sun Image

Black Hole Image

Blending Mask

→

Black Hole Sun

Night Sky Road

Night Sky Image

Road Image

Blending Mask

→

Starry Road

Conclusion

This project provided comprehensive hands-on experience with fundamental image processing techniques, from basic spatial filtering to advanced frequency domain operations and multiresolution blending. The combination of theoretical understanding and practical implementation revealed the elegance and power of digital image processing algorithms.

The project successfully demonstrated how different filtering approaches can be used for various applications, from edge detection to image enhancement and creative compositing. The multiresolution blending technique proved particularly effective for creating seamless image composites, while hybrid images showcased the fascinating intersection of computer vision and human perception.