Skip to content
Change the repository type filter

All

    Repositories list

    • ScaleLLM

      Public
      A high-performance inference system for large language models, designed for production environments.
      C++
      Apache License 2.0
      40497488Updated Dec 19, 2025Dec 19, 2025
    • cutlass

      Public
      CUDA Templates for Linear Algebra Subroutines
      C++
      Other
      1.8k000Updated Nov 6, 2025Nov 6, 2025
    • nixl

      Public
      NVIDIA Inference Xfer Library (NIXL)
      C++
      Apache License 2.0
      292000Updated Nov 4, 2025Nov 4, 2025
    • dynamo

      Public
      A Datacenter Scale Distributed Inference Serving Framework
      Rust
      Apache License 2.0
      1k000Updated Nov 4, 2025Nov 4, 2025
    • whl

      Public
      repository to host python whl package.
      HTML
      0000Updated Sep 13, 2025Sep 13, 2025
    • flux

      Public
      A fast communication-overlapping library for tensor/expert parallelism on GPUs.
      C++
      Apache License 2.0
      100000Updated Apr 15, 2025Apr 15, 2025
    • 3FS

      Public
      A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
      C++
      MIT License
      1k000Updated Feb 28, 2025Feb 28, 2025
    • FlashInfer: Kernel Library for LLM Serving
      Cuda
      Apache License 2.0
      903000Updated Feb 27, 2025Feb 27, 2025
    • FlashMLA

      Public
      C++
      MIT License
      1k000Updated Feb 26, 2025Feb 26, 2025
    • 0000Updated Jun 5, 2024Jun 5, 2024
    • Fast and memory-efficient exact attention
      Python
      BSD 3-Clause "New" or "Revised" License
      2.6k000Updated Oct 15, 2023Oct 15, 2023
    • 💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
      Rust
      Apache License 2.0
      1.1k000Updated Aug 4, 2023Aug 4, 2023
    • xformers

      Public
      Hackable and optimized Transformers building blocks, supporting a composable construction.
      Python
      Other
      772000Updated Aug 1, 2023Aug 1, 2023
    • Transformer related optimization, including BERT, GPT
      C++
      Apache License 2.0
      935000Updated Jul 28, 2023Jul 28, 2023
    • optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
      C++
      Apache License 2.0
      37000Updated Jul 24, 2023Jul 24, 2023
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.