Metal の compute shader について。随時書いていきます。
[[ thread_position_in_grid ]] って何?
kernel void add_vectors(const device float4 *inA [[ buffer(0) ]], const device float4 *inB [[ buffer(1) ]], device float4 *out [[ buffer(2) ]], uint id [[ thread_position_in_grid ]]) { out[id] = inA[id] + inB[id]; }
変数名としては id, gid, tid となっているのをよく見かける。
『Metal Shading Language Specification / Guide』によると、
thread_position_in_grid identifies its position in the grid.
とシンプルに書かれている( Attribute Qualifiers for Kernel Function Input)。
When a kernel is submitted for execution, it executes over an N-dimensional grid of threads, where N is
one, two or three. A thread is an instance of the kernel that executes for each point in this grid, and
Threads are organized into threadgroups. Threads in a threadgroup cooperate by sharing data through
threadgroup memory and by synchronizing their execution to coordinate memory accesses to both
device and threadgroup memory. The threads in a given threadgroup execute concurrently on a
single compute unit12 on the GPU. Within a compute unit, a threadgroup is partitioned into multiple
smaller groups for execution.
`thread_position_in_grid` is an index (an integer) in the grid that takes values in the ranges you specify in `dispatchThreadgroups:threadsPerThreadgroup:`. It's up to you to decide how many thread groups you want, and how many threads per group.
In the following sample code you can see that `threadsPerGroup.width * numThreadgroups.width == inputImage.width` and `threadsPerGroup.height * numThreadgroups.height == inputImage.height`. In this case, a position in the grid will thus be a non-normalized (integer) pixel coordinate.
なるほど、`dispatchThreadgroups:threadsPerThreadgroup:` を呼ぶときに渡すサイズ(`threadsPerGroup`と`numThreadgroups`)によって「グリッド」の範囲が決まり、`thread_position_in_grid` はそのグリッド内のインデックスを保持する、と。
kernel void computeShader(texture2d<float, access::read> tex [[ texture(0) ]],
float w = tex.get_width(); float h = tex.get_height();