Skip to content

Latest commit

 

History

History
53 lines (34 loc) · 1.36 KB

File metadata and controls

53 lines (34 loc) · 1.36 KB

Reduction with CK Tile

This example demonstrates parallel reduction (sum, max, etc.) using the CK Tile programming model, a core operation for normalization, statistics, and aggregation in deep learning.


Algorithm and Math

Given a tensor $X$ and a reduction axis, compute:

  • Sum: $Y = \sum_i X_i$

  • Max: $Y = \max_i X_i$

  • Mean: $Y = \frac{1}{N} \sum_i X_i$

  • Tilewise Reduction: Each thread block reduces a tile (block) of the input, using shared memory and register accumulation for efficiency.


Tile Programming Model

  • Tiles: Each thread block processes a tile (block) of the input tensor.
  • Pipeline: Modular, can be extended for fused reductions or post-processing.

Build & Run

mkdir build && cd build
sh ../script/cmake-ck-dev.sh ../ <arch>
make tile_example_reduce -j
./bin/tile_example_reduce -?

Source Structure

  • Kernel: reduce.hpp (tile-programming kernel template)
  • Executable: reduce.cpp (argument parsing, kernel launch)
  • Build: CMakeLists.txt

Related CK Tile Examples

For distribution, see include/ck_tile/tile_program/tile_distribution/.


Back to CK Tile Examples