LocalLLaMA noneabove1182 • 11mo ago • 100%

TensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13B

H200 is up to 1.9x faster than H100. This performance is enabled by H200's larger, faster HBM3e memory.

Comments 0