Skip to content

lzyrapx/LLM-Grandmaster-Notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Grandmaster Notes

📚The path to LLM mastery is paved with broken embeddings and resurrected gradients.

  • base
    • transformer
    • vit transformer
    • lm head
    • kv cache
    • GPU Architecture
      • SM80
      • SM90
      • SM100
      • SM120
  • attention
    • self attention
    • online attention
    • flash attention
    • flash attention 2
    • flash attention 3
    • flash decoding
    • flash decoding++
    • scaled dot-product attention (SDPA)
    • multi-head self-attention (MHSA)
    • multi-head attention (MHA)
    • grouped-query attention (GQA)
    • multi-query attention (MQA)
    • multi-head latent attention (MLA)
    • multi-token attention (MTA)
    • sage attention 1
    • sage attention 2
    • sage attention 2++
    • sage attention 3
    • paged attention
    • ring attention
    • ring flash attention
    • linear attention
    • lightning attention
    • native sparse attention (NSA)
    • grouped latent attention (GLA)
    • grouped-tied attention (GTA)
  • softmax
    • softmax
    • safe softmax
    • online softmax
  • kv cache optimization
    • sparse
    • quantization
    • allocator
    • window
    • share
  • norm
    • Batch Norm
    • Layer Norm
    • RMS Norm
  • position embedding
    • RoPE
    • AliBi
    • 2D RoPE
    • 3D RoPE
    • NTK-Award RoPE
    • Yarn
  • quantization
    • smooth quant
    • AWQ
    • KIVI
    • GPTQ
  • design
    • chunked prefill
    • continous batching
    • speculative decoding
      • Medusa
      • Lookahead decoding
      • NGram
      • OSD
      • Eagle 1,2,3
    • sliding window
    • multi-token prediction (MTP)
  • reinforcement learning
    • PPO
    • GRPO
    • DAPO
    • GPG
  • gemm
    • deep gemm
    • cutlass
      • cooperative and ping-pong gemm scheduler
    • cublas
  • open source
    • flash mla
  • ptx instructions
    • mbarrier
    • cp.async
    • ldmatrix
    • mma
    • wgmma

About

🎓The path to LLM mastery is paved with broken embeddings and resurrected gradients. (CHINESE NOTES)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages