NumPy-like syntax for Halide with Numlide

When introducing others to Halide in Python there is one common feedback I often get: While it is nice to use Halide in Python instead of C++, the syntax is not familiar.

Developers with this complaint are typically used to the syntax of NumPy, which is quite a bit different from Halide.

A simple sum - where NumPy is slightly more readable

Take for instance this snippet of code, adding together two arrays in NumPy:

import numpy as np

image_a = np.random.randn(640, 480)
image_b = np.random.randn(640, 480)

summed = image_a + image_b

In Halide, this becomes

import halide as hl
import numpy as np

image_a = np.random.randn(640, 480)
buffer_a = hl.Buffer(image_a)
image_b = np.random.randn(640, 480)
buffer_b = hl.Buffer(image_b)

summed = Func("filtered")
filtered[x, y] = (
    buffer_a[x, y] + buffer_b[x, y]
)

filtered.realize(image.shape)

Reductions - where NumPy is way more readable

Calculating the mean value of an image is pretty straightforward in NumPy:

average = np.mean(image)

Compare that to the following in Halide:

rdom = hl.RDom([(0, image.width()), (0, image.height())])

average = hl.Func("average")
average[()] = 0.0
average[()] += image[rdom.x, rdom.y]
average[()] /= image.width() * image.height()

To be fair, it is possible to create abstractions similar to Halide’s hl.maximum(..) to simplify this to something like:

rdom = hl.RDom([(0, image.width()), (0, image.height())])

average = hl.Func("average")
average[()] = average(image[rdom.x, rdom.y])

But this simplification still requires an explicit RDom. And the width and height need to be passed in explicitly or by passing in the input ImageParam like we do here.

Convolutions - where Halide is more readable

NumPy is not always a clear winner in terms of syntax. One of the cases where Halide shines is when combining multiple images with offset indices.

Take for instance this box filter in NumPy:

import numpy as np

image = np.random.randn(640, 480)

filtered = (
    image[1:-1, 1:-1] +
    image[2:, 1:-1] +
    image[:-2, 1:-1] +
    image[1:-1, 2:] +
    image[1:-1, :-2]
) / 5

In Halide, this is much more readable in my opinion:

import halide as hl
import numpy as np

image = np.random.randn(640, 480)
buffer = hl.Buffer(image)

filtered = Func("filtered")
filtered[x, y] = (
    buffer[x, y] +
    buffer[x + 1, y] +
    buffer[x - 1, y] +
    buffer[x, y + 1] +
    buffer[x, y - 1]
)

filtered.realize(image.shape)

Halide with NumPy syntax

I started to grow curious about what Halide would look like if it used the same syntax as NumPy. So I made Numlide: a Halide-wrapper with NumPy-like syntax.

Here is the same example as above in Numlide:

average = nl.mean(image)

Yes, that is basically the same syntax as NumPy.

It is very unlikely that I will get around to implementing every function NumPy has to offer, but Numlide could still be a useful wrapper. For instance when introducing people to Halide or to quickly port algorithms from NumPy.

Performance

How does it perform? Well, it depends.

Let us for instance use the mean function on a 2D array with 5 million elements. If we measure this using pytest-benchmark, the result is pretty clear in NumPy’s favor:

---------------- benchmark: 2 tests ---------------
Name (time in ms)        Mean
---------------------------------------------------
test_mean_numpy        3.9120 (1.0)
test_mean_numlide     39.9363 (10.21)
---------------------------------------------------

However, if we implement something with more per-pixel operations and no reductions, the tables turn.

For instance, here is a simplified algorithm for gray code decoding in structured light imaging:

def _structured_light(m, images):
    patterns = images.shape[-1]
    minimum = m.min(images, axis=2)
    maximum = m.max(images, axis=2)
    threshold = (minimum + maximum) / 2
    binary_code = threshold[:, :, m.newaxis] < images
    decimal_code = m.sum(2 ** (patterns - m.arange(0, patterns) - 1)[m.newaxis, m.newaxis, :] * binary_code, axis=2)
    return decimal_code

Then the results are clearly in favor of Halide and Numlide:

---------------- benchmark: 2 tests ----------------
Name (time in ms)                     Mean
----------------------------------------------------
test_structured_light_numlide       1.8369 (1.0)
test_structured_light_numpy       146.3858 (79.69)
----------------------------------------------------

Where is this going?

The point of making Numlide was just a fun exercise. I wanted to see if it would at all be possible to have NumPy-like syntax for Halide code. But given the promising results, I think it would be fun to implement a few more algorithms and put Numlide to the test.

After all, Numlide could be a nice drop-in replacement for any NumPy-based algorithm. With the potential for a significant speedup. Alternatively, Numlide could be used speed up development of Halide code.

Before Numlide becomes useful for anything but an experiment, I will need to add quite a few more operations and functions. And before releasing this to PyPI, a proper Halide package should be released there first. Currently, there is only a somewhat outdated package on test.pypi.org, which is not ideal to rely on.

I also need to think a bit about how Numlide should expose a nice API for scheduling. I am currently making some assumptions about which scheduling you want when you for instance calculate the mean value of an array. I already see some potential for performance improvements, but there is obviously no schedule that fits in every case. The user will definitely need some way to disable my scheduling and roll their own. Or be able to call an autoscheduler.

Time will show.