Some time ago I wrote a simple ray tracer in Go. It’s able to generate 3D scenes by simulating a ray of light. I did it by following the book The Ray Tracer Challenge by Jamis Buck. I didn’t worry about performance so much at that moment. Now I’d like to see if I can make this raytracer faster.

I have a test that I can run to generate a sample image. Let’s run it with the flag to generate cpu profile.

Executing

go test samples/refraction_sample_test.go -cpuprofile=cpu.prof

produces an image like this sample image

That took 206 seconds :(. Let’s explore generated profile.

go tool pprof -http=":8081" samples.test cpu.prof

profile001

Following the red line we can see that a lot of time spent in ColorAt func. ShadeHit, IsShadowed, RefractedColor and ReflectedColor functions are not so interesting because to do their job they invoke ColorAt. The next big red box is Matrix4.Inverse method.

func (m *Matrix4) Inverse() *Matrix4 {
	determinant := m.determinant()
	inverse := &Matrix4{}
	for i := 0; i < L4; i++ {
		for j := 0; j < L4; j++ {
			inverse[j][i] = cofactor4x4(m, i, j) / determinant
		}
	}
	return inverse
}

Matrix4.Inverse can be invoked multiple times on the same receiver. The method is expensive let’s try to cache its result. Matrix4 type defined as type Matrix4 [L4][L4]float64, which means there is no place to store inverse matrix. Maybe it’s time to convert Matrix4 to a struct. For now let’s create a global cache.

var cache = make(map[*Matrix4]*Matrix4)

func (m *Matrix4) Inverse() *Matrix4 {
	if cached, ok := cache[m]; ok {
		return cached
	}
	determinant := m.determinant()
	inverse := &Matrix4{}
	for i := 0; i < L4; i++ {
		for j := 0; j < L4; j++ {
			inverse[j][i] = cofactor4x4(m, i, j) / determinant
		}
	}
	cache[m] = inverse
	return inverse
}

Let’s profile again.

profile002

Now the test is executed in 15s. It’s 10x faster. Matrix4.Inverse is not a performance bottleneck anymore.

Of course a global cache presents its own problems. It’s OK-ish to have it for a command line application that render single image and exists. But if we decide to use rendering in a web service we will have to worry about the size of the cache. This naive fix may also be a problem if we decide to render scenes concurrently. I’ll have to revisit this solution later.