Profiling Go Raytracer
Some time ago I wrote a simple ray tracer in Go. It’s able to generate 3D scenes by simulating a ray of light. I did it by following the book The Ray Tracer Challenge by Jamis Buck. I didn’t worry about performance so much at that moment. Now I’d like to see if I can make this raytracer faster.
I have a test that I can run to generate a sample image. Let’s run it with the flag to generate cpu profile.
Executing
go test samples/refraction_sample_test.go -cpuprofile=cpu.prof
produces an image like this
That took 206 seconds :(. Let’s explore generated profile.
go tool pprof -http=":8081" samples.test cpu.prof
Following the red line we can see that a lot of time spent in ColorAt
func. ShadeHit
, IsShadowed
, RefractedColor
and ReflectedColor
functions are not so interesting because to do their job they invoke ColorAt
. The next big red box is Matrix4.Inverse
method.
func (m *Matrix4) Inverse() *Matrix4 {
determinant := m.determinant()
inverse := &Matrix4{}
for i := 0; i < L4; i++ {
for j := 0; j < L4; j++ {
inverse[j][i] = cofactor4x4(m, i, j) / determinant
}
}
return inverse
}
Matrix4.Inverse
can be invoked multiple times on the same receiver. The method is expensive let’s try to cache its result. Matrix4
type defined as type Matrix4 [L4][L4]float64
, which means there is no place to store inverse matrix. Maybe it’s time to convert Matrix4
to a struct. For now let’s create a global cache.
var cache = make(map[*Matrix4]*Matrix4)
func (m *Matrix4) Inverse() *Matrix4 {
if cached, ok := cache[m]; ok {
return cached
}
determinant := m.determinant()
inverse := &Matrix4{}
for i := 0; i < L4; i++ {
for j := 0; j < L4; j++ {
inverse[j][i] = cofactor4x4(m, i, j) / determinant
}
}
cache[m] = inverse
return inverse
}
Let’s profile again.
Now the test is executed in 15s. It’s 10x faster. Matrix4.Inverse
is not a performance bottleneck anymore.
Of course a global cache presents its own problems. It’s OK-ish to have it for a command line application that render single image and exists. But if we decide to use rendering in a web service we will have to worry about the size of the cache. This naive fix may also be a problem if we decide to render scenes concurrently. I’ll have to revisit this solution later.