jhlee525 7 hours ago

This is incredibly useful. Thanks for making the kernels public.

I'm curious if anyone has tried generalizing this to batched matmuls or to sparse inputs on Ada?