diff --git a/docs_nnx/examples/core_examples.rst b/docs_nnx/examples/core_examples.rst
index e714fe74c..530152589 100644
--- a/docs_nnx/examples/core_examples.rst
+++ b/docs_nnx/examples/core_examples.rst
@@ -20,7 +20,7 @@ Transformers
    Transformer encoder trained on the One Billion Word Benchmark.
 
 
-- :octicon:`mark-github;0.9em` `Diffusion Models  <https://github.com/google/flax/tree/main/examples/digits_diffusion_model/>`__ :
+- :octicon:`mark-github;0.9em` `Diffusion Models  <https://github.com/google/flax/tree/main/examples/digits_diffusion_model.ipynb>`__ :
      A simple example of an image diffusion model using a U-Net architecture.
 
 Toy examples
diff --git a/docs_nnx/examples/digits_diffusion_model.ipynb b/examples/digits_diffusion_model.ipynb
similarity index 99%
rename from docs_nnx/examples/digits_diffusion_model.ipynb
rename to examples/digits_diffusion_model.ipynb
index 2516fbc7b..f98b9b041 100644
--- a/docs_nnx/examples/digits_diffusion_model.ipynb
+++ b/examples/digits_diffusion_model.ipynb
@@ -7,7 +7,7 @@
    "source": [
     "# Example: Train a diffusion model for image generation\n",
     "\n",
-    "This example guides you through developing and training a simple [diffusion model](https://en.wikipedia.org/wiki/Diffusion_model) using a [U-Net architecture](https://en.wikipedia.org/wiki/U-Net) for image generation using Flax NNX. "
+    "This example guides you through developing and training a simple [diffusion model](https://en.wikipedia.org/wiki/Diffusion_model) using a [U-Net architecture](https://en.wikipedia.org/wiki/U-Net) for image generation using Flax NNX."
    ]
   },
   {
@@ -118,7 +118,7 @@
    "source": [
     "## Building the U-Net\n",
     "\n",
-    "In this example, we'll use a specific diffusion variant known as *flow matching*. A flow matching diffusion model is a neural network representing a velocity field that can transform Gaussian noise into a space of images. To get an image, we'll sample a starting value $x_0 \\sim \\mathcal{N}(0, 1)$, and then propagate it according the differential equation $\\frac{dx}{dt} = f_\\theta(x,t)$ where $f_\\theta(x,t)$ applies the neural network with parameters $\\theta$. We'll integrate the differential equation from $t=0$ starting at state $x_0$ up to $t=1$, producing image $x_1$. "
+    "In this example, we'll use a specific diffusion variant known as *flow matching*. A flow matching diffusion model is a neural network representing a velocity field that can transform Gaussian noise into a space of images. To get an image, we'll sample a starting value $x_0 \\sim \\mathcal{N}(0, 1)$, and then propagate it according the differential equation $\\frac{dx}{dt} = f_\\theta(x,t)$ where $f_\\theta(x,t)$ applies the neural network with parameters $\\theta$. We'll integrate the differential equation from $t=0$ starting at state $x_0$ up to $t=1$, producing image $x_1$."
    ]
   },
   {
@@ -468,7 +468,7 @@
    "id": "58539a2e",
    "metadata": {},
    "source": [
-    "While training, we'll want to periodically visualize the image samples our diffusion model generates. To do this, we'll use the `diffrax` library for numerical integration. "
+    "While training, we'll want to periodically visualize the image samples our diffusion model generates. To do this, we'll use the `diffrax` library for numerical integration."
    ]
   },
   {
@@ -699,7 +699,7 @@
    "source": [
     "## Visualizing Reconstructions\n",
     "\n",
-    "A well-trained flow matching model defines an invertible map between noise and images: integrating the learned velocity field forward takes noise to images, and integrating it *backward* (negating the field, running time from 1 to 0) takes images back to noise. We can use this to sanity-check the model — encoding real images to noise and decoding back should recover the originals faithfully. If this check fails, it would mean that our learned velocity field wasn't smooth enough for numerical integration to work properly. "
+    "A well-trained flow matching model defines an invertible map between noise and images: integrating the learned velocity field forward takes noise to images, and integrating it *backward* (negating the field, running time from 1 to 0) takes images back to noise. We can use this to sanity-check the model — encoding real images to noise and decoding back should recover the originals faithfully. If this check fails, it would mean that our learned velocity field wasn't smooth enough for numerical integration to work properly."
    ]
   },
   {
@@ -765,7 +765,7 @@
    "id": "4f2a3447-e04c-46c2-8cc7-7518c8d15177",
    "metadata": {},
    "source": [
-    "As we hoped, encoding and decoding an image brings us back to the same place. The middle row supposedly representing Gaussian noise samples doesn't look exactly Gaussian: you can still make out the '2' lying in the background. This is a known property of flow matching diffusion models: the latent codes are just noisy versions of the original images. "
+    "As we hoped, encoding and decoding an image brings us back to the same place. The middle row supposedly representing Gaussian noise samples doesn't look exactly Gaussian: you can still make out the '2' lying in the background. This is a known property of flow matching diffusion models: the latent codes are just noisy versions of the original images."
    ]
   },
   {
diff --git a/docs_nnx/examples/digits_diffusion_model.md b/examples/digits_diffusion_model.md
similarity index 99%
rename from docs_nnx/examples/digits_diffusion_model.md
rename to examples/digits_diffusion_model.md
index ca636827c..c304dc0cf 100644
--- a/docs_nnx/examples/digits_diffusion_model.md
+++ b/examples/digits_diffusion_model.md
@@ -12,7 +12,7 @@ jupyter:
 
 # Example: Train a diffusion model for image generation
 
-This example guides you through developing and training a simple [diffusion model](https://en.wikipedia.org/wiki/Diffusion_model) using a [U-Net architecture](https://en.wikipedia.org/wiki/U-Net) for image generation using Flax NNX. 
+This example guides you through developing and training a simple [diffusion model](https://en.wikipedia.org/wiki/Diffusion_model) using a [U-Net architecture](https://en.wikipedia.org/wiki/U-Net) for image generation using Flax NNX.
 
 
 In this tutorial, you'll learn how to:
@@ -72,7 +72,7 @@ plt.show()
 
 ## Building the U-Net
 
-In this example, we'll use a specific diffusion variant known as *flow matching*. A flow matching diffusion model is a neural network representing a velocity field that can transform Gaussian noise into a space of images. To get an image, we'll sample a starting value $x_0 \sim \mathcal{N}(0, 1)$, and then propagate it according the differential equation $\frac{dx}{dt} = f_\theta(x,t)$ where $f_\theta(x,t)$ applies the neural network with parameters $\theta$. We'll integrate the differential equation from $t=0$ starting at state $x_0$ up to $t=1$, producing image $x_1$. 
+In this example, we'll use a specific diffusion variant known as *flow matching*. A flow matching diffusion model is a neural network representing a velocity field that can transform Gaussian noise into a space of images. To get an image, we'll sample a starting value $x_0 \sim \mathcal{N}(0, 1)$, and then propagate it according the differential equation $\frac{dx}{dt} = f_\theta(x,t)$ where $f_\theta(x,t)$ applies the neural network with parameters $\theta$. We'll integrate the differential equation from $t=0$ starting at state $x_0$ up to $t=1$, producing image $x_1$.
 
 
 Our neural network will use a [U-Net architecture](https://en.wikipedia.org/wiki/U-Net) which consists of the following:
@@ -341,7 +341,7 @@ def train_step(model, data, rngs, optimizer):
 ## Visualizing Samples
 
 
-While training, we'll want to periodically visualize the image samples our diffusion model generates. To do this, we'll use the `diffrax` library for numerical integration. 
+While training, we'll want to periodically visualize the image samples our diffusion model generates. To do this, we'll use the `diffrax` library for numerical integration.
 
 ```python
 import diffrax as dfx
@@ -405,7 +405,7 @@ plt.show()
 
 ## Visualizing Reconstructions
 
-A well-trained flow matching model defines an invertible map between noise and images: integrating the learned velocity field forward takes noise to images, and integrating it *backward* (negating the field, running time from 1 to 0) takes images back to noise. We can use this to sanity-check the model — encoding real images to noise and decoding back should recover the originals faithfully. If this check fails, it would mean that our learned velocity field wasn't smooth enough for numerical integration to work properly. 
+A well-trained flow matching model defines an invertible map between noise and images: integrating the learned velocity field forward takes noise to images, and integrating it *backward* (negating the field, running time from 1 to 0) takes images back to noise. We can use this to sanity-check the model — encoding real images to noise and decoding back should recover the originals faithfully. If this check fails, it would mean that our learned velocity field wasn't smooth enough for numerical integration to work properly.
 
 ```python
 @nnx.jit
@@ -442,7 +442,7 @@ plt.tight_layout()
 plt.show()
 ```
 
-As we hoped, encoding and decoding an image brings us back to the same place. The middle row supposedly representing Gaussian noise samples doesn't look exactly Gaussian: you can still make out the '2' lying in the background. This is a known property of flow matching diffusion models: the latent codes are just noisy versions of the original images. 
+As we hoped, encoding and decoding an image brings us back to the same place. The middle row supposedly representing Gaussian noise samples doesn't look exactly Gaussian: you can still make out the '2' lying in the background. This is a known property of flow matching diffusion models: the latent codes are just noisy versions of the original images.
 
 
 ## Summary