NumPy-like syntax for Halide with Numlide
When introducing others to Halide in Python there is one common feedback I often get: While it is nice to use Halide in Python instead of C++, the syntax is not familiar.
Developers with this complaint are typically used to the syntax of NumPy, which is quite a bit different from Halide.
A simple sum - where NumPy is slightly more readable
Take for instance this snippet of code, adding together two arrays in NumPy:
import numpy as np
image_a = np.random.randn(640, 480)
image_b = np.random.randn(640, 480)
summed = image_a + image_b
In Halide, this becomes
import halide as hl
import numpy as np
image_a = np.random.randn(640, 480)
buffer_a = hl.Buffer(image_a)
image_b = np.random.randn(640, 480)
buffer_b = hl.Buffer(image_b)
summed = Func("filtered")
filtered[x, y] = (
buffer_a[x, y] + buffer_b[x, y]
)
filtered.realize(image.shape)
Reductions - where NumPy is way more readable
Calculating the mean value of an image is pretty straightforward in NumPy:
average = np.mean(image)
Compare that to the following in Halide:
rdom = hl.RDom([(0, image.width()), (0, image.height())])
average = hl.Func("average")
average[()] = 0.0
average[()] += image[rdom.x, rdom.y]
average[()] /= image.width() * image.height()
To be fair, it is possible to create abstractions similar to Halide’s
hl.maximum(..)
to simplify this to something like:
rdom = hl.RDom([(0, image.width()), (0, image.height())])
average = hl.Func("average")
average[()] = average(image[rdom.x, rdom.y])
But this simplification still requires an explicit RDom
.
And the width
and height
need to be passed in explicitly
or by passing in the input ImageParam
like we do here.
Convolutions - where Halide is more readable
NumPy is not always a clear winner in terms of syntax. One of the cases where Halide shines is when combining multiple images with offset indices.
Take for instance this box filter in NumPy:
import numpy as np
image = np.random.randn(640, 480)
filtered = (
image[1:-1, 1:-1] +
image[2:, 1:-1] +
image[:-2, 1:-1] +
image[1:-1, 2:] +
image[1:-1, :-2]
) / 5
In Halide, this is much more readable in my opinion:
import halide as hl
import numpy as np
image = np.random.randn(640, 480)
buffer = hl.Buffer(image)
filtered = Func("filtered")
filtered[x, y] = (
buffer[x, y] +
buffer[x + 1, y] +
buffer[x - 1, y] +
buffer[x, y + 1] +
buffer[x, y - 1]
)
filtered.realize(image.shape)
Halide with NumPy syntax
I started to grow curious about what Halide would look like if it used the same syntax as NumPy. So I made Numlide: a Halide-wrapper with NumPy-like syntax.
Here is the same example as above in Numlide:
average = nl.mean(image)
Yes, that is basically the same syntax as NumPy.
It is very unlikely that I will get around to implementing every function NumPy has to offer, but Numlide could still be a useful wrapper. For instance when introducing people to Halide or to quickly port algorithms from NumPy.
Performance
How does it perform? Well, it depends.
Let us for instance use the mean function on a 2D array with 5 million elements. If we measure this using pytest-benchmark, the result is pretty clear in NumPy’s favor:
---------------- benchmark: 2 tests ---------------
Name (time in ms) Mean
---------------------------------------------------
test_mean_numpy 3.9120 (1.0)
test_mean_numlide 39.9363 (10.21)
---------------------------------------------------
However, if we implement something with more per-pixel operations and no reductions, the tables turn.
For instance, here is a simplified algorithm for gray code decoding in structured light imaging:
def _structured_light(m, images):
patterns = images.shape[-1]
minimum = m.min(images, axis=2)
maximum = m.max(images, axis=2)
threshold = (minimum + maximum) / 2
binary_code = threshold[:, :, m.newaxis] < images
decimal_code = m.sum(2 ** (patterns - m.arange(0, patterns) - 1)[m.newaxis, m.newaxis, :] * binary_code, axis=2)
return decimal_code
Then the results are clearly in favor of Halide and Numlide:
---------------- benchmark: 2 tests ----------------
Name (time in ms) Mean
----------------------------------------------------
test_structured_light_numlide 1.8369 (1.0)
test_structured_light_numpy 146.3858 (79.69)
----------------------------------------------------
Where is this going?
The point of making Numlide was just a fun exercise. I wanted to see if it would at all be possible to have NumPy-like syntax for Halide code. But given the promising results, I think it would be fun to implement a few more algorithms and put Numlide to the test.
After all, Numlide could be a nice drop-in replacement for any NumPy-based algorithm. With the potential for a significant speedup. Alternatively, Numlide could be used speed up development of Halide code.
Before Numlide becomes useful for anything but an experiment, I will need to add quite a few more operations and functions. And before releasing this to PyPI, a proper Halide package should be released there first. Currently, there is only a somewhat outdated package on test.pypi.org, which is not ideal to rely on.
I also need to think a bit about how Numlide should expose a nice API for scheduling. I am currently making some assumptions about which scheduling you want when you for instance calculate the mean value of an array. I already see some potential for performance improvements, but there is obviously no schedule that fits in every case. The user will definitely need some way to disable my scheduling and roll their own. Or be able to call an autoscheduler.
Time will show.