Digging into OpenMVS fundamentals

I like going down rabbitholes! That’s where I learn the most

Context

I have some toys & souvenirs I want to 3D scan - for example, this frog I actually made myself in a ceramics studio

frog-images — Images I'm using for reconstructing my frog

I’m using Colmap to estimate all parameters of the photos & get a rough estimate of the pointcloud, but then I need an MVS program to densify the pointcloud

OpenMVS is a good one, but it requires strictly CUDA for executing Pointcloud Densification - and I want to do everything locally on my Mac M3 (no CUDA 😔)

Turns out there’s such a difference in the execution model for Metal vs CUDA, that it’s not just “rewriting a CUDA kernel to Metal”, it would require a whole reengineering of the approach. Well at least that’s what Claude told me

So I thought… do I really trust Claude? I decided I wanted to have a deeper understanding of how OpenMVS works (and then how programming for Apple Silicon looks) to decide for myself whether it’s really that hard of a task

Diving in

OpenMVS

Went ahead to the OpenMVS paper — first thing I realized — I need a solid grasp on PatchMatch, so that’s where I went

(Any other progress on reading OpenMVS is outside of the scope of this post lol)

PatchMatch

It wasn’t at all what I expected! I thought the original PatchMatch was also intended for MVS, but turned out it was for image editing - specifically tools like Content-aware Fill!

content-aware fill — Example of Content-Aware fill. Works funky, but great in some cases!

PatchMatch proposed a very efficient way to find matches between patches (haha) which wasn’t bruteforce. Though it was pretty simple - it assumed both patches were ailgned and we could match them pixel-by-pixel with just a single offset value

The algorithm is interesting - I feel like we can draw a parallel to RANSAC - we’re replacing bruteforce searching with some random sampling + fit metric, in PatchMatch we also have very efficient propagation

patch match process — Example of applying patch match to two images. It tries to reconstruct first image simply by offsetting pixels from the second one

But the efficient algorithm is the main thing that PatchMatch did, and it wasn’t actually the paper that proposed using these matches for image editing - it was Summarizing visual data using bidirectional similarity

Bidirectional similarity

Core idea - it defined a “image similarity” function which evaluated whether the contents of two images are identical - by compared patches at varying scales.

I.e. if an image has all the same objects but just in a different arrangement, the similarity is satisfied

bidirectional similarity examples — Examples showing what completeness and coherence mean here

Then we can do stuff like:

Image summarization - generate an image that is 3x smaller but contains all the same content

Showimg image summarization - we downscale by 0.95, then optimize for image similarity

Photo merging - we just concatenate images and then optimize for similarity to all input images - then the discontinuities become some kind of merging

Showimg image merging - we concatenate images, then optimize for similarity to all input images. Discontinuous seams are replaced with something ok

Image synthesis - we stretch the image & fill in the remaining things optimizing for coherence

I was reeeeally surprising that such simple formulas lead to such good basic image processing tools. Ofc this is very slow, and also performs terrible on harder cases, but this is very interesting!!! Pre-AI image fill

And in its algorithm results in compared with Seam Carving which turned out to be very interesting!

Comparing to Seam Carving

Seam Carving

I was amazed at the algorithm!!! It’s so stunningly simple yet so beautiful in how it works

The idea is so simple I didn’t even have to look at the paper after I saw the animation of how it works. And there’s this website that lets you try it out: https://www.aryan.app/seam-carving/

Seam carving my cat