Vocab_size value for facebook/w2v-bert-2.0

jcsilva · November 13, 2024, 11:15am

Hi,

I was trying to use AutoModelForCTC.from_pretrained("facebook/w2v-bert-2.0") to load the w2v-bert model, but I always get the error:

File “/home/jcsilva/huggingsound/.venv/lib/python3.11/site-packages/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py”, line 1185, in init
raise ValueError(
ValueError: You are trying to instantiate <class ‘transformers.models.wav2vec2_bert.modeling_wav2vec2_bert.Wav2Vec2BertForCTC’> with a configuration that does not define the vocabulary size of the language model head. Please instantiate the model as follows: Wav2Vec2BertForCTC.from_pretrained(..., vocab_size=vocab_size). or define vocab_size of your model’s configuration.

Investigating the issue, I saw two possible causes:

The vocab_size param defined in the config file at config.json · facebook/w2v-bert-2.0 at main is equal to null. @reach-vb or @ylacombe , would it be possible to remove this param (vocab_size) from the model config file? If not, what do you think about setting any valid value (e.g 32, such as what we see at config.json · facebook/wav2vec2-large-xlsr-53 at main).
The vocab_size default value for W2VBert model is None (please see it here), but it is 32 for Wav2Vec2 models as you can see here. Could we have both vocab_size default value as 32? This way the ValueErrorexception I mentioned in this ticket is not seen when using AutoModelForCTC.

Thank you

Topic		Replies	Views
RuntimeError: blank must be in label range Models	7	2244	August 14, 2022
Fine-Tune Wav2Vec2 for English ASR with 🤗 Transformers article bug Beginners	15	2878	March 7, 2024
HUBERT Implementation with increased vocabulary size 🤗Transformers	0	104	May 13, 2024
How to set vocabulary size? Beginners	0	425	September 19, 2022
Wav2vec2 : expected sequence of length 49863 at dim 2 (got 68198) Beginners	0	249	November 17, 2021

Vocab_size value for facebook/w2v-bert-2.0

Related topics