>>115995Got some numbers for each model, but my initial impressions from checking up on the output as it's in progress are that Tiny seemed basically unusable, Base was okay, Small good, Medium better, and Large near-perfect. As you would expect pretty much. The other thing I found interesting was that there was no difference in VRAM usage between transcription and translation, but time taken was very variable. Also, it seems the figures for Whisper,
at least from Whisper.cpp, were off pretty widely. That, or maybe it utilizes more VRAM if available? I'm not sure. But either way, my maximum VRAM usage was ~11GB, NOT 4.7GB like Whisper.cpp claims.
Tiny:
Transcribe:
VRAM: 1795MB
Total Time: 4 minutes 3 seconds
Second-per-Minute: 2.5 seconds
Translate:
VRAM: 1804MB
Time: 6 minutes
Second-per-Minute: 3.7 seconds
Base:
Transcribe:
VRAM: 3223MB
Time: 5 minutes 38 seconds
Second-per-Minute: 3.5 seconds
Translate:
VRAM: 3192MB
Time: 3 minutes 47 seconds
Second-per-Minute: 2.3 seconds
Small:
Transcribe:
VRAM: 3157MB
Time: 4 minutes 26 seconds
Second-per-Minute: 2.7 seconds
Translate:
VRAM: 3172MB
Time: 4 minutes 42 seconds
Second-per-Minute: 2.9 seconds
Medium:
Transcribe:
VRAM: 6177MB
Time: 9 minutes 15 seconds
Second-per-Minute: 5.7 seconds
Translate:
VRAM: 6153MB
Time: N/A (Forgot to write the end time. Whoops)
Second-per-Minute: N/A
Large:
Transcribe:
VRAM: 11251MB
Time: 11 minutes 52 seconds
Second-per-Minute: 7.37
Translate:
VRAM: 11242MB
Time: 8 minutes 48 seconds
Second-per-Minute: 5.4 seconds