A couple of things I’d love community input on:
- Has anyone reproduced the LLM.int8() default energy regression on Hopper (H100/H200)? My data stops at Blackwell (RTX 5090).
- Any GPTQ / AWQ / GGUF k-quant numbers with wall-power (NVML / RAPL)? My benchmark only covers bitsandbytes NF4 / INT8.
- Apple Silicon / Jetson — unified memory likely changes the dequant story; I have no numbers there.
Scripts + raw CSVs: GitHub - hongping-zh/ecocompute-ai: 🔋 RTX 5090 energy benchmark suite for LLMs — real NVML power data, not estimates · GitHub
Dataset DOI: EcoCompute: Energy Efficiency Benchmark for Quantized Language Models
Happy to add submitted hardware rows to the public dataset with attribution.