localllama
LocalLLaMA noneabove1182 11mo ago 100%

TensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13B

https://github.com/NVIDIA/TensorRT-LLM/blob/release/0.5.0/docs/source/blogs/H200launch.md

H200 is up to 1.9x faster than H100. This performance is enabled by H200's larger, faster HBM3e memory.

https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform

10
0
Comments 0