05 Jun 03:12

e3e134c

Latest

🛠️ Foundry Local v1.2.1 Release Notes

We're excited to announce Foundry Local v1.2.1, a patch on top of v1.2.0 with bring-your-own-model (BYOM) cache discovery while running, more resilient Azure catalog/registry calls, broader multilingual ASR coverage, and a Windows DLL-load fix for non-ANSI paths.

🆕 What's New

🔄 BYOM cache discovery — no service restart required

You can now drop a model directory into the local cache while Foundry Local is running and have the SDKs surface it without restarting the service or waiting for the 4-hour catalog refresh using getCachedModels / getLoadedModels / getModel(alias) / getModelVariant(id).

using Microsoft.AI.Foundry.Local;
using Microsoft.Extensions.Logging.Abstractions;

var config = new Configuration { AppName = "MyApp" };
await FoundryLocalManager.CreateAsync(config, NullLogger.Instance);
var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();

// At any point during the process lifetime, drop a model directory into the ModelCacheDir.
// For a valid model, inference_model.json must set the model's id, e.g. { "Name": "my-byom-model:1" }.

// 1. No service restart needed - the SDK re-scans the cache on miss for getCachedModelsAsync.
foreach (var m in await catalog.GetCachedModelsAsync())
     Console.WriteLine($"cached: {m.Alias} ({m.Id})");

var byom   = await catalog.GetModelVariantAsync("my-byom-model:1"); // 2. resolves the BYOM id
var hot    = await catalog.GetModelAsync("my-byom-model");                 // 3. resolves the BYOM alias
var loaded = await catalog.GetLoadedModelsAsync();

JavaScript

import { FoundryLocalManager } from 'foundry-local-sdk';
const manager = FoundryLocalManager.create({ appName: 'MyApp' });
const catalog = manager.catalog;

// At any point during the process lifetime, drop a model directory into the modelCacheDir.
// For a valid model, inference_model.json must set the model's id, e.g. { "Name": "my-byom-model:1" }.

// 1. No service restart needed - the SDK re-scans the cache on miss for getCachedModels.
for (const m of await catalog.getCachedModels()) {
     console.log(`cached: ${m.alias} (${m.id})`);
}

const byom   = await catalog.getModelVariant('my-byom-model:1'); // 2. resolves the BYOM id
const hot    = await catalog.getModel('my-byom-model');                  // 3. resolves the BYOM alias
const loaded = await catalog.getLoadedModels();

Python

from foundry_local_sdk import Configuration, FoundryLocalManager

config = Configuration(app_name="MyApp")
FoundryLocalManager.initialize(config)
catalog = FoundryLocalManager.instance.catalog

// At any point during the process lifetime, drop a model directory into the model_cache_dir.
// For a valid model, inference_model.json must set the model's id, e.g. { "Name": "my-byom-model:1" }.

// 1. No service restart needed - the SDK re-scans the cache on miss for get_cached_models.
for m in catalog.get_cached_models():
     print(f"cached: {m.alias} ({m.id})")

byom   = catalog.get_model_variant("my-byom-model:1")  # 2. resolves the BYOM id
hot    = catalog.get_model("my-byom-model")                    # 3. resolves the BYOM alias
loaded = catalog.get_loaded_models()

Rust

use foundry_local_sdk::{FoundryLocalConfig, FoundryLocalManager};

let config = FoundryLocalConfig::new("MyApp");
let manager = FoundryLocalManager::create(config)?;
let catalog = manager.catalog();

// At any point during the process lifetime, drop a model directory into the model.
// For a valid model, inference_model.json must set the model's id, e.g. { "Name": "my-byom-model:1" }.

// 1. No service restart needed - the SDK re-scans the cache on miss for get_cached_models.
for m in catalog.get_cached_models().await? {
     println!("cached: {} ({})", m.alias(), m.id());
}

let byom   = catalog.get_model_variant("my-byom-model:1").await?; // 2. resolves the BYOM id
let hot    = catalog.get_model("my-byom-model").await?;                   // 3. resolves the BYOM alias
let loaded = catalog.get_loaded_models().await?;

🐛 Fixed

JavaScript SDK: Fixed a bug on Windows where onnxruntime.dll failed to load when the path contained characters outside the active ANSI code page.

⚡ Improved

🌐 Regional fallback for Azure catalog and registry requests

A single throttled or unhealthy Azure region (HTTP 408/429/5xx, DNS/connect failures, per-attempt timeouts) no longer blocks model listing or download. The client retries against nearby regions and caches a healthy one for the process.

🗣️ Multilingual ASR language coverage

Language-to-ID mappings in AudioStreamingSession are aligned with the canonicalNVIDIA-Nemotron-3.5-ASR-Streaming-Multilingual-0.6b prompt dictionary:

- Added region-specific variants (e.g., hi-IN, ja-JP, ko-KR)
- Added Lithuanian, Thai, Vietnamese, Estonian, Latvian, Slovenian, Hebrew, Maltese, and Norwegian variants

📦 Runtime & Dependency Updates

Upgraded to ONNX Runtime GenAI 0.14.1 for performance, security, and model support improvements

📚 Resources

Resource	Link
📖 MSLearn Docs	learn.microsoft.com/en-us/azure/foundry-local/get-started
🐙 GitHub	github.com/microsoft/Foundry-Local
🧪 Samples	samples/

💙 Thank You

Thank you to our community for your feedback and contributions!

Assets 2

04 Jun 20:35

Wayne-Ch

cli-preview-0.10.0

4b430a4

Foundry Local CLI 0.10.0 (Preview) Pre-release

Pre-release

Public preview of the Foundry Local CLI — the first release of the new CLI built on the Foundry Local SDKs, replacing the earlier service-based CLI.

Important

Preview. This is an early build — expect rough edges, missing polish, and changes between releases. Please file issues (tag titles with [cli]) for anything you hit.

What it's for

The Foundry Local CLI is the terminal entry point to Foundry Local: install and manage models, start and inspect the local server, run quick chat / completion / transcription tests, and pull diagnostics. It pairs with the SDK — the CLI is what you reach for from a terminal or script, the SDK is what you call from your app.

Install

Pick the asset for your platform from the downloads list below.

Platform	Asset
Windows x64	`foundry-0.10.0-win-x64-winml.msix` (recommended) or `-win-x64.msix`
Windows ARM64	`foundry-0.10.0-win-arm64-winml.msix` (recommended) or `-win-arm64.msix`
macOS (Apple Silicon)	`foundry-0.10.0-osx-arm64.pkg`
Linux x64	`foundry-0.10.0-linux-x64.tar.gz`
Linux ARM64	`foundry-0.10.0-linux-arm64.tar.gz`

On Windows and macOS you can also double-click the installer to install with the OS installer. The commands below are the scripted equivalents.

# Windows
Add-AppxPackage .\foundry-0.10.0-win-x64-winml.msix
foundry --version

# macOS
sudo installer -pkg foundry-0.10.0-osx-arm64.pkg -target /
foundry --version

# Linux
tar xzf foundry-0.10.0-linux-x64.tar.gz
./foundry/foundry --version

All binaries are signed by Microsoft Corporation. SHA-256 digests are shown on the asset list above (click an asset's filename).

Quick start

foundry status                                    # system info and service state
foundry model list                                # available models
foundry model load qwen3-0.6b                     # download (if needed) and load
foundry chat qwen3-0.6b                           # interactive chat
foundry transcribe -m whisper-tiny -f audio.wav   # transcribe a local audio file
foundry server stop                               # release the daemon when done

Most commands accept --output json for scripting. Run foundry --help for the full command surface.

Coming from the older Foundry Local CLI?

The earlier service-based CLI (required to use Foundry Local before the SDKs shipped) has been replaced. Common commands map roughly as follows:

Old (service-based CLI)	New
`foundry service start` / `stop` / `restart` / `ps` / `diag`	`foundry server start` / `stop` / `restart` / `logs`
`foundry cache remove`	`foundry cache rm`
`foundry model run <alias>`	`foundry run <alias>`
`foundry model info <alias>`	`foundry model show <alias>`

Run foundry --help for the full surface.

Feedback

File issues at https://github.com/microsoft/Foundry-Local/issues with the [cli] tag and include the output of foundry report.

Assets 10

28 May 23:20

bmehta001

v1.2.0

2992fda

v1.2.0 Foundry Local

🚀 Foundry Local v1.2.0 Release Notes

We're excited to announce Foundry Local v1.2.0. This release builds on v1.1 with improvements across SDK usability, runtime stability, platform support, and the EP download experience.

🆕 What's New

🛑 Cancellable model and EP downloads

Long-running model downloads and execution provider (EP) downloads can now be canceled from each SDK using the platform’s standard cancellation pattern. This lets applications stop downloads from a timeout, UI cancel button, signal handler, or background task.

Note: this applies to model and EP downloads, not inference requests such as chat completions.

// mgr and model already initialized
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
await mgr.DownloadAndRegisterEpsAsync(ct: cts.Token);
await model.DownloadAsync(ct: cts.Token);

JavaScript

// manager and model already initialized
const controller = new AbortController();
setTimeout(() => controller.abort(), 5000);
await manager.downloadAndRegisterEps(controller.signal);
await model.download(controller.signal);

Python

import threading
# manager and model already initialized
cancel_event = threading.Event()
threading.Timer(5.0, cancel_event.set).start()
manager.download_and_register_eps(cancel_event=cancel_event)
model.download(cancel_event=cancel_event)

Rust

use std::sync::{
     atomic::AtomicBool,
     Arc,
};
// manager and model already initialized
let cancel_flag = Arc::new(AtomicBool::new(false));
// Set cancel_flag to true from another task or signal handler to cancel.
manager
     .download_and_register_eps_builder()
     .cancel(Arc::clone(&cancel_flag))
     .run()
     .await?;
model
     .download_builder()
     .cancel(Arc::clone(&cancel_flag))
     .run()
     .await?;

C++

#include <atomic>
std::atomic<bool> cancelRequested{false};
auto isCancellationRequested = [&]() {
     return cancelRequested.load();
};
// manager and model already initialized
manager.DownloadAndRegisterEps(nullptr, isCancellationRequested);
model->Download(
     [](float pct) {
         printf("\rDownloading: %5.1f%%", pct);
         fflush(stdout);
         return true;
     },
     isCancellationRequested);

🧰 SDK & API

Support for SDK-based model versioning, so only models supported by the SDK version in use are shown
Added support for multilingual ASR speech models

🖥️ Platform & Runtime

Upgraded to WinML 2.0
- The WinAppSDKRuntime bootstrapper is no longer needed
- Extended support for WinML EP downloads to Windows 10.0.18362.0 and higher
Added WebGPU EP variant for WinML and plug-in auto-update support
Added support for Linux ARM64 / aarch64

🐛 Fixed

Fixed EP progress reporting issue where 100% could be reported prematurely, causing exceptions
Restored the missing Microsoft.AI.Foundry.Local.Core dependency in the Microsoft.AI.Foundry.Local C# NuGet package
Addressed issues loading Microsoft.AI.Foundry.Local.Core.dll in netstandard2.0 applications

⚡ Improved

🔎 Observability & Debugging

Improved SDK logging to include ORT, GenAI, EP versions, and detailed stack traces on error

🧱 Reliability

Added support for region-based downloads to improve model download latency and throughput
Improved handling of large model downloads by removing the 5-minute timeout

📦 Runtime & Dependency Updates

Updated dependencies for improved performance, security, and stability
Upgraded to ONNX Runtime 1.26.0 and ONNX Runtime GenAI 0.14.0 for performance, security, and model support improvements

📚 Resources

Resource	Link
📖 MSLearn Docs	learn.microsoft.com/en-us/azure/foundry-local/get-started
🐙 GitHub	github.com/microsoft/Foundry-Local
🧪 Samples	samples/

💙 Thank You

Thank you to our community for your feedback and contributions!

Assets 2

05 May 20:34

prathikr

v1.1.0

598fc19

v1.1.0 Foundry Local

🚀 Foundry Local v1.1.0 Release Notes

We're excited to announce Foundry Local v1.1.0 — packed with new capabilities for on-device AI! This release brings expanded platform support, new model types, and performance improvements across the board.

🆕 What's New

🎯 .NET `netstandard2.0` / `net8.0` Support

The C# SDK now targets both net8.0 and netstandard2.0, broadening compatibility to .NET Framework 4.6.1+, .NET Core 2.0+, Xamarin, Unity, and more. Ship on-device AI to virtually any .NET application!

<PackageReference Include="Microsoft.AI.Foundry.Local" Version="1.1.0" />

📖 C# SDK Documentation

🎙️ Live Audio Transcription

Real-time speech-to-text is here! Stream microphone audio directly to the SDK and receive transcription results as they arrive — no cloud round-trips, no latency. Built on the Nemotron ASR model with an OpenAI Realtime-compatible API surface.

Python

audio_client = model.get_audio_client()
session = audio_client.create_live_transcription_session()
session.settings.sample_rate = 16000
session.settings.channels = 1
session.settings.language = "en"

session.start()

# Push audio
session.append(pcm_bytes)

# Read results (typically on a background thread)
for result in session.get_stream():
    print(result.content[0].text)        # transcribed text
    print(result.is_final)               # True for final results

session.stop()

📖 Python live audio transcription sample

JavaScript

const audioClient = model.createAudioClient();
const session = audioClient.createLiveTranscriptionSession();
session.settings.sampleRate = 16000;
session.settings.channels = 1;
session.settings.language = 'en';

await session.start();

// Push audio
await session.append(pcmBytes);

// Read results
for await (const result of session.getStream()) {
    console.log(result.content[0].text);       // transcribed text
    console.log(result.is_final);              // true for final results
}

await session.stop();

📖 JavaScript live audio transcription sample

var audioClient = await model.GetAudioClientAsync();
var session = audioClient.CreateLiveTranscriptionSession();
session.Settings.SampleRate = 16000;
session.Settings.Channels = 1;
session.Settings.Language = "en";

await session.StartAsync();

// Push audio
await session.AppendAsync(pcmBytes);

// Read results
await foreach (var result in session.GetStream())
{
    Console.WriteLine(result.Content[0].Text);       // transcribed text
    Console.WriteLine(result.IsFinal);               // true for final results
}

await session.StopAsync();

📖 C# live audio transcription sample

Rust

let audio_client = model.create_audio_client();
let session = audio_client.create_live_transcription_session();
session.start(None).await?;

// Push audio
session.append(&pcm_bytes).await?;

// Read results
let mut stream = session.get_stream().await?;
while let Some(result) = stream.next().await {
    let r = result?;
    if let Some(content) = r.content.first() {
        println!("{}", content.text);       // transcribed text
        println!("{}", r.is_final);         // true for final results
    }
}

session.stop().await?;

📖 Rust live audio transcription sample

📐 Embeddings

Generate text embeddings entirely on-device for semantic search, RAG, clustering, and more. The new qwen3-0.6b-embedding model delivers high-quality vector representations in a compact footprint.

Python

model = manager.catalog.get_model("qwen3-0.6b-embedding")
model.download()
model.load()

client = model.get_embedding_client()

# Single embedding
response = client.generate_embedding("The quick brown fox jumps over the lazy dog")
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

# Batch embeddings
batch_response = client.generate_embeddings([
    "Machine learning is a subset of artificial intelligence",
    "The capital of France is Paris",
    "Rust is a systems programming language",
])

📖 Python embeddings sample

JavaScript

const model = await manager.catalog.getModel('qwen3-0.6b-embedding');
await model.download();
await model.load();

const embeddingClient = model.createEmbeddingClient();

// Single embedding
const response = await embeddingClient.generateEmbedding(
    'The quick brown fox jumps over the lazy dog'
);
console.log(`Dimensions: ${response.data[0].embedding.length}`);

// Batch embeddings
const batchResponse = await embeddingClient.generateEmbeddings([
    'Machine learning is a subset of artificial intelligence',
    'The capital of France is Paris',
    'Rust is a systems programming language'
]);

📖 JavaScript embeddings sample

var model = await catalog.GetModelAsync("qwen3-0.6b-embedding");
await model.DownloadAsync();
await model.LoadAsync();

var embeddingClient = await model.GetEmbeddingClientAsync();

// Single embedding
var response = await embeddingClient.GenerateEmbeddingAsync(
    "The quick brown fox jumps over the lazy dog");
var embedding = response.Data[0].Embedding;
Console.WriteLine($"Dimensions: {embedding.Count}");

// Batch embeddings
var batchResponse = await embeddingClient.GenerateEmbeddingsAsync([
    "Machine learning is a subset of artificial intelligence",
    "The capital of France is Paris",
    "Rust is a systems programming language"
]);

📖 C# embeddings sample

Rust

📖 Rust embeddings sample

👁️ Qwen 3.5 Vision Language Model

Introducing Qwen 3.5 VL — a multimodal vision-language model that runs entirely on-device. Analyze images, understand visual content, and answer questions about what's in a picture — all without sending data to the cloud.

model = manager.catalog.get_model("qwen3.5-vision")
model.download()
model.load()

📦 JavaScript SDK — Koffi Dependency Removed

The JavaScript SDK no longer depends on koffi for native interop. This results in a leaner dependency tree, faster installs, and fewer compatibility issues across platforms and Node.js versions.

✅ Smaller node_modules — no more large native FFI dependency
✅ Fewer platform quirks — prebuilt N-API addon replaces runtime FFI binding
✅ Faster install times — less to download, nothing to compile

📖 JavaScript SDK Documentation

🖥️ WebGPU Plugin Execution Provider

The new WebGPU execution provider is delivered as a plug-in — it's not bundled with the core runtime, keeping your base binary small. When WebGPU acceleration is needed, Foundry Local automatically downloads and registers the EP on the fly, so your users only pay the size cost if their hardware benefits from it.

This approach means:

✅ Smaller default install — the core package stays lean (~20 MB)
✅ On-demand download — the WebGPU EP is fetched and registered at runtime only when needed
✅ Broader GPU coverage — unlocks hardware acceleration through the WebGPU Execution Provider for our cross-platform solution

Please ensure you've run the necessary download and register EPs function to enable WebGPU EP

Python

# Discover available EPs
eps = manager.discover_eps()
for ep in eps:
    print(f"  {ep.name} (registered: {ep.is_registered})")

# Download and register all EPs with progress
current_ep = ""

def on_progress(ep_name: str, percent: float) -> None:
    global current_ep
    if ep_name != current_ep:
        if current_ep:
            print()
        current_ep = ep_name
    print(f"\r  {ep_name}  {percent:5.1f}%", end="", flush=True)

result = manager.download_and_register_eps(progress_callback=on_progress)
print()
print(f"Success: {result.success}, Status: {result.status}")

JavaScript

// Discover available EPs
const eps = await manager.discoverEps();
for (const ep of eps) {
    console.log(`  ${ep.name} (registered: ${ep.isRegistered})`);
}

// Download and register all EPs with progress
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') {
            process.stdout.write('\n');
        }
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName}  ${percent.toFixed(1)}%`);
});
process.stdout.write('\n');

// Discover available EPs
var eps = await mgr.DiscoverEpsAsync();
foreach (var ep in eps)
{
    Console.WriteLine($"  {ep.Name} (registered: {ep.IsRegistered})");
}

// Download and register all EPs with progress
string currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "")
        {
            Console.WriteLine();
        }
        currentEp = epName;
    }
    Console.Write($"\r  {epName}  {percent,6:F1}%");
});
Co...

Assets 2

09 Apr 23:09

baijumeswani

v1.0.0

3504910

v1.0.0 Foundry Local - General Availability

We are excited to announce the General Availability of Foundry Local, a unified on-device AI runtime that brings generative AI directly into your applications. All inference runs locally: user data never leaves the device, responses are instant with zero network latency, and everything works offline. No per-token costs, no backend infrastructure.

SDKs

Foundry Local ships production SDKs for C#, JavaScript, Python, and Rust, each providing a consistent API surface for model management, chat completions, audio transcription, and tool calling.

	SDK	Package
	C#	`Microsoft.AI.Foundry.Local`
	JavaScript	`foundry-local-sdk`
🐍	Python	`foundry-local-sdk`
🦀	Rust	`foundry-local-sdk`

WinML Variants

Each SDK also ships a WinML variant that unlocks more GPU and NPU devices on Windows, available through the Windows ML execution provider catalog.

	SDK	Package
	C#	`Microsoft.AI.Foundry.Local.WinML`
	JavaScript	`foundry-local-sdk-winml`
🐍	Python	`foundry-local-sdk-winml`
🦀	Rust	`foundry-local-sdk` with `winml` feature flag

Platform Support

	OS	Architectures
	Windows	x64, ARM64
	macOS	ARM64
	Linux	x64

What You Can Build

Chat Completions

Full OpenAI-compatible chat completions API with multi-turn conversations, and configurable inference parameters (temperature, top-k, top-p, max tokens, frequency/presence penalty, random seed).

Audio Transcription

On-device speech-to-text. Transcribe audio files with language selection and temperature control.

Embedded Web Server

Start an OpenAI-compatible HTTP server from your application with a single call. Useful for multi-process architectures or bridging to tools that speak the OpenAI REST protocol.

Hardware Acceleration

Powered by ONNX Runtime, Foundry Local automatically detects available hardware and selects the best execution provider, with zero hardware detection code needed in your application.

Supported execution providers:

Execution Provider	Hardware	Platform
CPU	Universal fallback	All platforms
WebGPU	GPU acceleration	Windows x64, macOS arm64
CUDA	NVIDIA GPUs	Windows x64, Linux x64
OpenVINO	Intel GPUs and NPUs	Windows x64
QNN	Qualcomm NPUs	Windows ARM64
TensorRT RTX	NVIDIA GPUs	Windows x64
VitisAI	AMD NPUs	Windows x64

Execution providers can be discovered, downloaded, and registered at runtime through the SDK's discoverEps() and downloadAndRegisterEps() APIs, with per-provider progress callbacks.

Model Catalog & Management

Foundry Local includes a built-in model catalog with popular open-source models, optimized with state-of-the-art quantization and compression for on-device performance.

Model management features:

Browse & search the catalog programmatically
Multi-variant models - each alias maps to multiple variants optimized for different hardware (CPU, GPU, NPU)
Automatic variant selection - the SDK picks the best variant based on what's cached and what hardware is available, with manual override via selectVariant()
Download with progress tracking - real-time percentage callbacks
Load / unload lifecycle - explicit control over which models are in memory
Version management - query the catalog for the latest version of any model

Get Started

Documentation and Samples

Language	Cross-platform	Windows ML
JavaScript	`npm install foundry-local-sdk`	`npm install foundry-local-sdk-winml`
C#	`dotnet add package Microsoft.AI.Foundry.Local`	`dotnet add package Microsoft.AI.Foundry.Local.WinML`
Python	`pip install foundry-local-sdk`	`pip install foundry-local-sdk-winml`
Rust	`cargo add foundry-local-sdk`	`cargo add foundry-local-sdk --features winml`

Assets 2

22 Jan 01:26

natke

v0.8.119

449bd19

Foundry Local Release 0.8.119 Pre-release

Pre-release

Foundry Local 0.8.119 Release Notes 🚀

This release is an incremental build, targeting tool calling scenarios.

🐛 Bug fixes

#373 Function specs without parameters cause server error
#372 Tools not indexed in streaming mode

Assets 6

23 Dec 18:58

natke

v0.8.117

495b266

Foundry Local Release 0.8.117 Pre-release

Pre-release

Foundry Local 0.8.117 Release Notes 🚀

This release is an incremental build, targeting tool calling scenarios.

🐛 Bug fixes

#346 Tool calling doesn't return tool_calls results in streaming mode
#341 Exception when network is disconnected

📝 Known issues

#363 Tool calling fails on NVIDIA GPUs.

Assets 6

12 Dec 22:53

natke

v0.8.115

fdcce52

Foundry Local Release 0.8.115 Pre-release

Pre-release

Foundry Local Release Notes: v0.8.115 🚀

This release is an incremental build targeting tool calling scenarios.

🐛 Bug fixes

#335 Guidance error when tool_choice=required
#336 Foundry Local enforcing "required" field of function parameters

📝 Known issues

#346 Tool calling doesn't return tool_calls results in streaming mode

Assets 6

26 Nov 20:00

natke

v0.8.113

111b6cd

Foundry Local Release 0.8.113 Pre-release

Pre-release

Foundry Local Release Notes: v0.8.113 🚀

✨ New Features

Add support for tool calling. Models that support tool calling have the supportsToolCalling tag, which is also exposed via the SDKs.

🐛 Bug fixes

Fix crash on context length exhaustion. CLI now exits when context length is exhausted and the REST API returns an error if the request requires more tokens than max_length configuration allows.

📝 Known issues

This release only allows one tool call per request.

Assets 6

12 Nov 18:32

natke

v0.8.103

5366b90

Foundry Local Release 0.8.103 Pre-release

Pre-release

Foundry Local Release Notes: v0.8.103 🚀

🔨 Filter out automatic speech recognition models from foundry model list

These models can be listed using the /foundry/list endpoint and run using the standalone SDK

⭐ Sign Up for Foundry Local SDK vNext Private Preview – Fill in form ⭐

Assets 6

Releases: microsoft/Foundry-Local

v1.2.1 Foundry Local

🛠️ Foundry Local v1.2.1 Release Notes

🆕 What's New

🔄 BYOM cache discovery — no service restart required

🐛 Fixed

⚡ Improved

🌐 Regional fallback for Azure catalog and registry requests

🗣️ Multilingual ASR language coverage

📦 Runtime & Dependency Updates

📚 Resources

💙 Thank You

Uh oh!

Foundry Local CLI 0.10.0 (Preview)

What it's for

Install

Quick start

Coming from the older Foundry Local CLI?

Links

Feedback

Uh oh!

v1.2.0 Foundry Local

🚀 Foundry Local v1.2.0 Release Notes

🆕 What's New

🛑 Cancellable model and EP downloads

🧰 SDK & API

🖥️ Platform & Runtime

🐛 Fixed

⚡ Improved

🔎 Observability & Debugging

🧱 Reliability

📦 Runtime & Dependency Updates

📚 Resources

💙 Thank You

Uh oh!

v1.1.0 Foundry Local

🚀 Foundry Local v1.1.0 Release Notes

🆕 What's New

🎯 .NET netstandard2.0 / net8.0 Support

🎙️ Live Audio Transcription

📐 Embeddings

👁️ Qwen 3.5 Vision Language Model

📦 JavaScript SDK — Koffi Dependency Removed

🖥️ WebGPU Plugin Execution Provider

Uh oh!

v1.0.0 Foundry Local - General Availability

SDKs

WinML Variants

Platform Support

What You Can Build

Chat Completions

Audio Transcription

Embedded Web Server

Hardware Acceleration

Model Catalog & Management

Get Started

Uh oh!

Foundry Local Release 0.8.119

Foundry Local 0.8.119 Release Notes 🚀

Uh oh!

Foundry Local Release 0.8.117

Foundry Local 0.8.117 Release Notes 🚀

Uh oh!

Foundry Local Release 0.8.115

Foundry Local Release Notes: v0.8.115 🚀

Uh oh!

Foundry Local Release 0.8.113

Uh oh!

Foundry Local Release 0.8.103

Foundry Local Release Notes: v0.8.103 🚀

Uh oh!

🎯 .NET `netstandard2.0` / `net8.0` Support