Add FFT support via AbstractFFTs interface by KaanKesginLW · Pull Request #713 · JuliaGPU/Metal.jl

KaanKesginLW · 2025-12-03T09:42:08Z

Adds FFT support for MtlArray via the AbstractFFTs.jl interface.

HEAVILY based on CUDA.jl's AbstractFFTs.jl interface implementation using MPSGraph functionality.

using Metal

x = MtlArray(randn(ComplexF32, 2048, 2048))
y = fft(x)  # Just works!

Performance

Benchmarked on Apple M2 Max with 30-core GPU against FFTW.jl on CPU:

Size	CPU (FFTW)	GPU (Metal)
512×512	4.1ms	5.3ms
1024×1024	19.7ms	8.5ms
2048×2048	119.7ms	10.5ms
4096×4096	460.6ms	15.8ms

Example Usage

using Metal

# Complex FFT
x = MtlArray(randn(ComplexF32, 1024, 1024))
y = fft(x)
z = ifft(y)  # z ≈ x

# Real FFT  
r = MtlArray(randn(Float32, 1024, 1024))
c = rfft(r)           # Real → Complex
r2 = irfft(c, 1024)   # Complex → Real, r2 ≈ r

# FFT along specific dimensions
y = fft(x, 1)         # First dimension only
y = fft(x, (1, 2))    # Batched transform

# Plan reuse
x = MtlArray(randn(ComplexF32, 1024, 1024))
another_x = MtlArray(randn(ComplexF32, 1024, 1024))
p = plan_fft(x)
y1 = p * x
y2 = p * another_x    # Same plan, different data

Close #270

codecov · 2025-12-03T11:38:44Z

Codecov Report

❌ Patch coverage is 81.63265% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.99%. Comparing base (b94fd4b) to head (d9d399a).

Files with missing lines	Patch %	Lines
src/fft.jl	79.54%	27 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #713      +/-   ##
==========================================
- Coverage   83.09%   82.99%   -0.10%     
==========================================
  Files          62       64       +2     
  Lines        2851     2999     +148     
==========================================
+ Hits         2369     2489     +120     
- Misses        482      510      +28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions

Metal Benchmarks

Details

Benchmark suite	Current: `9bd038f`	Previous: `b94fd4b`	Ratio
`array/accumulate/Float32/1d`	`1063959` ns	`1098958` ns	`0.97`
`array/accumulate/Float32/dims=1`	`1615375` ns	`1554708` ns	`1.04`
`array/accumulate/Float32/dims=1L`	`10382979` ns	`9848583.5` ns	`1.05`
`array/accumulate/Float32/dims=2`	`2040354` ns	`1886771` ns	`1.08`
`array/accumulate/Float32/dims=2L`	`7545979.5` ns	`7256459` ns	`1.04`
`array/accumulate/Int64/1d`	`1287958` ns	`1261958` ns	`1.02`
`array/accumulate/Int64/dims=1`	`1886709` ns	`1824291.5` ns	`1.03`
`array/accumulate/Int64/dims=1L`	`12201292` ns	`11664208.5` ns	`1.05`
`array/accumulate/Int64/dims=2`	`2271083.5` ns	`2170333.5` ns	`1.05`
`array/accumulate/Int64/dims=2L`	`9885208` ns	`10120062.5` ns	`0.98`
`array/broadcast`	`549000.5` ns	`605916` ns	`0.91`
`array/construct`	`6750` ns	`6292` ns	`1.07`
`array/permutedims/2d`	`1216916.5` ns	`1168125` ns	`1.04`
`array/permutedims/3d`	`1807000` ns	`1673084` ns	`1.08`
`array/permutedims/4d`	`2506313` ns	`2365959` ns	`1.06`
`array/private/copy`	`821834` ns	`545792` ns	`1.51`
`array/private/copyto!/cpu_to_gpu`	`724208` ns	`802916` ns	`0.90`
`array/private/copyto!/gpu_to_cpu`	`674583.5` ns	`801917` ns	`0.84`
`array/private/copyto!/gpu_to_gpu`	`522249.5` ns	`634458` ns	`0.82`
`array/private/iteration/findall/bool`	`1464854` ns	`1402750` ns	`1.04`
`array/private/iteration/findall/int`	`1588875` ns	`1564021` ns	`1.02`
`array/private/iteration/findfirst/bool`	`2058334` ns	`2055916` ns	`1.00`
`array/private/iteration/findfirst/int`	`2117375` ns	`2064479.5` ns	`1.03`
`array/private/iteration/findmin/1d`	`2597812.5` ns	`2499959` ns	`1.04`
`array/private/iteration/findmin/2d`	`1850812.5` ns	`1790791` ns	`1.03`
`array/private/iteration/logical`	`2697166` ns	`2631896` ns	`1.02`
`array/private/iteration/scalar`	`3268062` ns	`5047625` ns	`0.65`
`array/random/rand/Float32`	`840396` ns	`582958` ns	`1.44`
`array/random/rand/Int64`	`926000` ns	`775667` ns	`1.19`
`array/random/rand!/Float32`	`543354` ns	`574750` ns	`0.95`
`array/random/rand!/Int64`	`532709` ns	`550792` ns	`0.97`
`array/random/randn/Float32`	`1045959` ns	`1006937.5` ns	`1.04`
`array/random/randn!/Float32`	`721688` ns	`755666` ns	`0.96`
`array/reductions/mapreduce/Float32/1d`	`808125` ns	`1029500` ns	`0.78`
`array/reductions/mapreduce/Float32/dims=1`	`815875` ns	`840875` ns	`0.97`
`array/reductions/mapreduce/Float32/dims=1L`	`1360958` ns	`1324000` ns	`1.03`
`array/reductions/mapreduce/Float32/dims=2`	`848500` ns	`860875` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=2L`	`1820166.5` ns	`1799541` ns	`1.01`
`array/reductions/mapreduce/Int64/1d`	`1311062.5` ns	`1374875` ns	`0.95`
`array/reductions/mapreduce/Int64/dims=1`	`1123083` ns	`1097625` ns	`1.02`
`array/reductions/mapreduce/Int64/dims=1L`	`2005208` ns	`2002854` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2`	`1162042` ns	`1145000` ns	`1.01`
`array/reductions/mapreduce/Int64/dims=2L`	`3647916` ns	`3614000` ns	`1.01`
`array/reductions/reduce/Float32/1d`	`832500` ns	`1028437.5` ns	`0.81`
`array/reductions/reduce/Float32/dims=1`	`808125` ns	`832667` ns	`0.97`
`array/reductions/reduce/Float32/dims=1L`	`1360333` ns	`1318416.5` ns	`1.03`
`array/reductions/reduce/Float32/dims=2`	`844270.5` ns	`853041.5` ns	`0.99`
`array/reductions/reduce/Float32/dims=2L`	`1816916.5` ns	`1810250` ns	`1.00`
`array/reductions/reduce/Int64/1d`	`1317166` ns	`1516958` ns	`0.87`
`array/reductions/reduce/Int64/dims=1`	`1130667` ns	`1095375` ns	`1.03`
`array/reductions/reduce/Int64/dims=1L`	`2042083` ns	`2023499.5` ns	`1.01`
`array/reductions/reduce/Int64/dims=2`	`1170625` ns	`1240750` ns	`0.94`
`array/reductions/reduce/Int64/dims=2L`	`4132354.5` ns	`4233875` ns	`0.98`
`array/shared/copy`	`215000` ns	`252417` ns	`0.85`
`array/shared/copyto!/cpu_to_gpu`	`83625` ns	`80750` ns	`1.04`
`array/shared/copyto!/gpu_to_cpu`	`84000` ns	`80667` ns	`1.04`
`array/shared/copyto!/gpu_to_gpu`	`84667` ns	`83083` ns	`1.02`
`array/shared/iteration/findall/bool`	`1478334` ns	`1427208.5` ns	`1.04`
`array/shared/iteration/findall/int`	`1586958` ns	`1559875` ns	`1.02`
`array/shared/iteration/findfirst/bool`	`1672604` ns	`1649000` ns	`1.01`
`array/shared/iteration/findfirst/int`	`1713729` ns	`1672458` ns	`1.02`
`array/shared/iteration/findmin/1d`	`2206812.5` ns	`2115583` ns	`1.04`
`array/shared/iteration/findmin/2d`	`1871021` ns	`1792625` ns	`1.04`
`array/shared/iteration/logical`	`2259416` ns	`2292167` ns	`0.99`
`array/shared/iteration/scalar`	`204562.5` ns	`199958` ns	`1.02`
`integration/byval/reference`	`1582563` ns	`1544250` ns	`1.02`
`integration/byval/slices=1`	`1595791` ns	`1560229.5` ns	`1.02`
`integration/byval/slices=2`	`2704083` ns	`2598333.5` ns	`1.04`
`integration/byval/slices=3`	`19409291.5` ns	`8092333` ns	`2.40`
`integration/metaldevrt`	`859791` ns	`868125` ns	`0.99`
`kernel/indexing`	`473937.5` ns	`592667` ns	`0.80`
`kernel/indexing_checked`	`500021` ns	`598292` ns	`0.84`
`kernel/launch`	`12625` ns	`11791.5` ns	`1.07`
`kernel/rand`	`538000` ns	`570709` ns	`0.94`
`latency/import`	`1446676250` ns	`1425597062.5` ns	`1.01`
`latency/precompile`	`25920507833` ns	`25453724708` ns	`1.02`
`latency/ttfp`	`2381656291.5` ns	`2341177208` ns	`1.02`
`metal/synchronization/context`	`19750` ns	`19667` ns	`1.00`
`metal/synchronization/stream`	`19375` ns	`18459` ns	`1.05`

This comment was automatically generated by workflow using github-action-benchmark.

aplavin · 2026-03-23T00:55:27Z

+end
+
+@inline function _unsafe_execute!(f, p::MtlFFTPlan{T, S, backward, inplace, N}, x, y) where {T <: FFTNumber, S <: FFTNumber, N, backward, inplace}
+    graph = MPSGraph()


each execution creates a new MPSGraph? it causes quite some latency, would be great to cache graph somehow

They can be added in a different PR

Claude was instructed to implement this like the MPSGraphs matmul caching

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds FFT support for MtlArray by implementing the AbstractFFTs.jl interface on top of MPSGraph FFT operations, along with a comprehensive test suite and required dependency wiring.

Changes:

Implemented AbstractFFTs plans and execution for Metal-backed arrays using cached MPSGraph FFT graphs.
Added MPSGraphs bindings for FFT descriptor creation and FFT operations.
Added FFT-focused tests (including Float16 CPU shims) and updated project dependencies.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/fft.jl`	Implements `AbstractFFTs` plans/execution for `MtlArray`, including MPSGraph graph caching.
`lib/mpsgraphs/fft.jl`	Adds Objective-C bindings for MPSGraph FFT descriptor and FFT ops.
`lib/mpsgraphs/MPSGraphs.jl`	Includes the new FFT bindings module.
`src/Metal.jl`	Wires FFT module into the main package and adds `Reexport`.
`Project.toml`	Adds `AbstractFFTs` and `Reexport` deps/compat.
`test/Project.toml`	Adds `AbstractFFTs` and `FFTW` for testing.
`test/fft.jl`	Adds a large FFT test suite and Float16 CPU reference shims.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T20:28:12Z

+# Get or create cached graph
+function _get_cached_graph!(graph_cache_lock, graph_cache, key::FFTGraphKey)
+    # Fast path: check cache without lock (safe for reads)
+    cached = get(graph_cache, key, nothing)
+    if cached !== nothing
+        return cached
+    end
+
+    # Slow path: acquire lock and build graph
+    @lock graph_cache_lock get!(graph_cache, key) do
+        CachedFFTGraph(key)
+    end
+end


Reading from a Dict concurrently with writes is not thread-safe in Julia; the lockless get(graph_cache, ...) can race with the get! mutation and lead to memory corruption/crashes. Protect all Dict accesses (including reads) with the lock, or switch to a thread-safe cache strategy (e.g., always @lock around get! / get, or use a concurrency-safe map implementation if available in this codebase).

christiangnrd · 2026-04-11T21:49:49Z

@maleadt This failure happened because AppleAccelerate.jl was loaded in the flopscomp example test which heppened to be on the same runner as the fft tests. Should we just remove AppleAccelerate?

Also, Copilot seems to think that Dict reads while writes are potentially happening isn't thread safe. If that's true we'll need to fix the MPSGraph caching too.

maleadt · 2026-04-14T13:16:03Z

@maleadt This failure happened because AppleAccelerate.jl was loaded in the flopscomp example test which heppened to be on the same runner as the fft tests. Should we just remove AppleAccelerate?

Yeah I guess. It's unfortunate that AppleAccelerate breaks stuff.

Also, Copilot seems to think that Dict reads while writes are potentially happening isn't thread safe. If that's true we'll need to fix the MPSGraph caching too.

Yes, Dict isn't threadsafe. You'd want to wrap those concurrent accesses in a ReentrantLock.

KaanKesginLW mentioned this pull request Dec 3, 2025

FFT support #270

Open

github-actions Bot reviewed Dec 3, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

This comment was marked as spam.

Sign in to view

christiangnrd reviewed Dec 4, 2025

View reviewed changes

Comment thread .gitignore Outdated

christiangnrd reviewed Dec 4, 2025

View reviewed changes

Comment thread lib/mpsgraphs/operations.jl

KaanKesginLW force-pushed the feature/fft-support branch from 40871e3 to b88d77f Compare December 4, 2025 17:05

This comment was marked as resolved.

Sign in to view

christiangnrd force-pushed the feature/fft-support branch 2 times, most recently from 130ed6a to e3aeeea Compare January 19, 2026 01:09

This comment was marked as outdated.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as resolved.

Sign in to view

christiangnrd marked this pull request as draft February 2, 2026 01:36

christiangnrd force-pushed the feature/fft-support branch 2 times, most recently from e8d6b2c to ffdffe8 Compare February 4, 2026 02:59

christiangnrd marked this pull request as ready for review February 4, 2026 03:31

christiangnrd force-pushed the feature/fft-support branch from ffdffe8 to 6f6aa3c Compare February 4, 2026 03:32

gbene mentioned this pull request Feb 11, 2026

Other GPU backends gbene/HighSeas.jl#7

Open

4 tasks

christiangnrd force-pushed the feature/fft-support branch 3 times, most recently from 3c46dc2 to 60077f5 Compare February 19, 2026 16:38

simone-silvestri mentioned this pull request Feb 20, 2026

Poisson eigenvalues should use Float32 for Metal GPUs CliMA/Oceananigans.jl#5316

Open

christiangnrd force-pushed the feature/fft-support branch from 60077f5 to 25108fd Compare February 24, 2026 15:42

christiangnrd force-pushed the feature/fft-support branch from 25108fd to 57a51b7 Compare March 5, 2026 00:56

aplavin reviewed Mar 23, 2026

View reviewed changes

christiangnrd force-pushed the feature/fft-support branch from 57a51b7 to e0db641 Compare March 24, 2026 16:52

christiangnrd force-pushed the feature/fft-support branch from e0db641 to c115f38 Compare April 8, 2026 21:14

christiangnrd added 22 commits April 11, 2026 16:38

[NFC] Cleanup

af40fbc

Rename to MPSGraphFFTDescriptor

460dc93

assert_applicable

2938c12

Add inplace stuff

73708c3

Merge MtfFFTPlan and MtlFFTInplacePlan

eb6bbe7

Move some code around

0b2110a

Remove unecessary helper function

31cf0e3

Add output type parameter

1a76cf4

Big merge

37d9e91

Fix show

f6eb0fb

Metaprogram ALL the things

874ad87

Minor refactor

cf6bac7

Bring implementation more in line with the AbstractFFTs interface

19ad529

Add some missing CUDA stuff

6355910

autoreleasepool tweaks

19efec0

More validation + fixup tests

5c9a4ff

Rename stuff for consistency with CUDA fft test

f812b1f

Integer FFTs

dbde2dc

Remove unused MPSGraph operations

f9a1d96

They can be added in a different PR

Cleanup

23be5d7

Comment cleanup

555d5f4

Cache plans

d9d399a

Claude was instructed to implement this like the MPSGraphs matmul caching

christiangnrd force-pushed the feature/fft-support branch from d3d9416 to d9d399a Compare April 11, 2026 19:38

christiangnrd requested a review from Copilot April 11, 2026 20:20

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Copilot started reviewing on behalf of christiangnrd April 11, 2026 20:31 View session

Copilot suggestion

9bd038f

AshtonSBradley mentioned this pull request May 5, 2026

Run examples in isolated test processes #771

Open

Conversation

KaanKesginLW commented Dec 3, 2025 • edited by christiangnrd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Example Usage

Uh oh!

codecov Bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Metal Benchmarks

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as spam.

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as outdated.

Uh oh!

This comment was marked as off-topic.

This comment was marked as resolved.

aplavin Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

christiangnrd commented Apr 11, 2026

Uh oh!

maleadt commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

KaanKesginLW commented Dec 3, 2025 •

edited by christiangnrd

Loading

codecov Bot commented Dec 3, 2025 •

edited

Loading

github-actions Bot left a comment •

edited

Loading