Getting quota exceeded even though requested seconds is less than what's left

error.jpg

Screenshot says it all. This hasn’t happened to me before the recent major changes to ZeroGPU (switching the backing hardware …etc)

Hi @jc28735250 , thanks for reporting this! It might be related to the recent ZeroGPU backend migration. I’ll check with the team.

Further observations seem to suggest that ZeroGPU is counting twice the amount of expended quotas, both requested and actual seconds consumed. I checked this by looking at the dashboard “Billing” tab.

Currently the progress bar in the dashboard seems to first add the requested seconds to the progress, then when the job completes it reduces the progress to what was actually used. I tested on my space which requested 35 seconds and actually used 26. The progress bar first went to 1.2 and then dropped to 0.8. Therefore it seems to me that everything is counted twice.

The quota exceeded bug in the first post may be related as well. It’s possible that the quota exceeded check is checking the remaining quota that used the requested seconds of the last job instead of the actual. Then when the popup is shown the actual seconds (a lower value) is used, resulting in the math being wrong (requested seconds of the new job is lower than the quota remaining).

Also, if you try to do a generation, and it fails because it can’t get a GPU, you still lose about half the quota time, which is super frustrating as it fails to get GPUs frequently. It’s probably related to the same bug. It adds twice the quota time, then refunds the correct quota time on a failure, so you don’t get it all back.

@jc28735250
Thanks for the detailed reports, and sorry for the confusing behavior. Let me sort out what is going on, since a few different things are bundled together here.

1. The “2x consumption” you’re observing (this isn’t a bug)

What is most likely happening here is the new auto-fallback to xlarge. As part of the recent hardware migration, the GPUs backing ZeroGPU were changed (updated details are in the docs: Spaces ZeroGPU: Dynamic GPU Allocation for Spaces · Hugging Face). The per-GPU memory on the new hardware is smaller than before, so Spaces that no longer fit in large are automatically run on xlarge. Per the docs, xlarge has a 2x quota cost, but it also gives you 2x the GPU resources, so the higher quota cost generally corresponds to a faster wall-clock time per call. The exact speedup depends on the workload (compute-bound vs memory-bandwidth bound, whether the workload can fully utilize the larger GPU, etc.), so it is not always a clean 2x, but the extra quota is not pure overhead either.

This auto-fallback is not currently surfaced in the UI, which is the main reason it looks like everything is suddenly being counted twice. The progress bar going up to the reserved amount during a call and then settling to the actual usage afterward is the normal reserve-and-settle behavior; what changed is that both the reserved and settled values are now 2x compared to when the Space was running on large.

2. The “quota exceeded” popup (this part is a real bug, on the display side)

The popup was showing the value of your @spaces.GPU(duration=...) argument as the “requested” number, instead of the duration the backend was actually reserving for the call. For an xlarge-promoted call, the actual reservation is ~2x the displayed duration value, which is why “90s requested vs. 129s left” still triggered the exceeded message.

The fix is already in internally and will go out with the next deploy. After the fix, the popup will display the actual backend reservation, which can be larger than the duration= value in your code.

3. On overall quota

As part of this migration, the per-user ZeroGPU daily quota was also increased to help offset the hardware change. The updated values are in the docs linked above. For the before-and-after of the migration (hardware, VRAM, and quota numbers), the diff is in this PR: [ZeroGPU] Blackwell update by cbensimon · Pull Request #2474 · huggingface/hub-docs · GitHub.

@Terotrous Thanks for the report. We’ll check this one separately on our side.