How to get probabilities per label in finetuning classification task?

emmakelo · November 26, 2021, 11:32am

Hello, I foloow the huggingface web site to finetune FlauBert for classification task. What I would like to know is how to get probabilities for the classification. Something like this [0.75,0.85,0.25], because I have 3 classes, so far when priinting the results I get this : but it seems to correspond to the logits and not the probabilities ? Furthermore , they contains negative numbers. I thought probabilities were positive numbers between [0>1].

PredictionOutput(predictions=array([[ 0.53947556,  0.42591393, -0.8021714 ],
       [ 1.6963196 , -3.3902004 ,  1.8755357 ],
       [ 1.9264233 , -0.35482746, -2.339029  ],
       ...,
       [ 2.8833866 , -1.1608589 , -1.2109699 ],
       [ 1.1803235 , -1.4036949 ,  0.48559391],
       [ 1.9253297 , -1.0417538 , -1.2987505 ]], dtype=float32), label_ids=array([0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 2, 2, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 1, 0, 2, 0, 0, 2, 0, 0, 1, 0, 1, 2, 2, 2, 1, 2, 0, 0,
       0, 2, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 2, 1, 1, 0, 0, 0, 0, 1, 0, 1,
       1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 2, 0, 2, 1, 2, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0, 1, 0, 0, 1,
       1, 0, 2, 0, 0, 0, 0, 0, 1, 2, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0,
       0, 1, 2, 1, 1, 2, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1,
       1, 1, 0, 2, 0, 1, 1, 1, 1, 0, 0, 0, 2, 2, 0, 0, 1, 1, 2, 1, 1, 0,
       0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 2, 0,
       2, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 2, 0, 0, 1, 0, 0, 2, 0,
       2, 2, 0, 0, 2, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 1, 1, 0, 1, 2, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1,
       0, 0, 1, 1, 0, 0, 0, 1, 2, 0, 0, 2, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 2, 2, 1, 1, 2, 0, 2, 1, 1, 1, 0, 2, 0, 0, 0, 2, 2, 0,
       1, 1, 1, 1, 1, 0, 0, 1, 2, 0, 0, 0, 1, 0, 1, 1, 2, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 2, 2, 2, 0, 1, 2, 0, 1, 0, 1, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 2, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 1, 1, 0,
       1, 0, 0, 1, 0, 1, 2, 2, 0, 1, 1, 0, 2, 1, 0, 0, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 0, 0, 2, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 2, 0, 0, 0,
       1, 0, 2, 2, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 2, 0, 0,
       0, 0, 1, 2, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 2, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 2, 0, 1, 0, 1, 0, 0, 2, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1,
       1, 2, 0, 0, 2, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 1, 2, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 0, 0, 1, 0, 2, 0, 0,
       1, 2, 0, 1, 0, 0, 1, 1, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 1, 0, 0, 2,
       0, 1, 0, 1, 2, 0, 0, 1, 0, 0]), metrics={'test_loss': 1.164217233657837, 'test_accuracy': 0.565028901734104, 'test_f1_mi': 0.565028901734104, 'test_f1_ma': 0.42953547487160565, 'test_runtime': 1.4322, 'test_samples_per_second': 483.16, 'test_steps_per_second': 7.68})

```


Code for getting this results is adapted from the notebook for finetuning for task classification :slight_smile: 
```


PRE_TRAINED_MODEL_NAME = '/gpfswork/rech/kpf/umg16uw/expe_5/model/sm'

class FlauBertForSequenceClassification(FlaubertModel):
	"""
	FlauBert Model for Classification Tasks.

	"""
	def __init__(self, config, num_labels, freeze_encoder=False):

		"""
		@param    FlauBert: a FlauBertModel object
		@param    classifier: a torch.nn.Module classifier
		@param    freeze_encoder (bool): Set `False` to fine-tune the FlauBERT model
		
		"""

		# instantiate the parent class FlaubertModel
		super().__init__(config)
		
		# Specify hidden size of FB hidden size of our classifier, and number of labels

		# instantiate num. of classes
		self.num_labels = num_labels
		
		# instantiate and load a pretrained FlaubertModel 
		self.encoder = FlaubertModel.from_pretrained(PRE_TRAINED_MODEL_NAME)
		

		
		# freeze the encoder parameters if required (Q1)
		if freeze_encoder: 
		  for param in self.encoder.parameters():
			  param.requires_grad = False

		# the classifier: a feed-forward layer attached to the encoder's head
		self.classifier = torch.nn.Sequential(
											  torch.nn.Linear(in_features=config.emb_dim, out_features=512),
											  torch.nn.Tanh(),  # or nn.ReLU()
											  torch.nn.Dropout(p=0.1), 
											  torch.nn.Linear(in_features=512, out_features=self.num_labels, bias=True),
											  )
		# instantiate a dropout function for the classifier's input
		self.dropout = torch.nn.Dropout(p=0.1)


	def forward(
		self,
		input_ids=None,
		attention_mask=None,
		head_mask=None,
		inputs_embeds=None,
		labels=None,
		output_attentions=None,
		output_hidden_states=None,
	):
		# encode a batch of sequences
		encoder_output = self.encoder(
			input_ids=input_ids,
			attention_mask=attention_mask,
			head_mask=head_mask,
			inputs_embeds=inputs_embeds,
			output_attentions=output_attentions,
			output_hidden_states=output_hidden_states,
		)
		# extract the hidden representations from the encoder output
		hidden_state = encoder_output[0]  # (bs, seq_len, dim)
		pooled_output = hidden_state[:, 0]  # (bs, dim)
		# apply dropout
		pooled_output = self.dropout(pooled_output)  # (bs, dim)
		# feed into the classifier
		logits = self.classifier(pooled_output)  # (bs, dim)

		outputs = (logits,) + encoder_output[1:]
		
		if labels is not None:
			#multiclassification
			loss_fct = torch.nn.CrossEntropyLoss()  #crossEntropyLoss
			loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
			outputs = (loss,) + outputs

		return outputs  # (loss), logits, (hidden_states), (attentions)

model = FlauBertForSequenceClassification(
    config=model.config, num_labels=3, freeze_encoder = False
    )


training_args = TrainingArguments(
    output_dir='/gpfswork/rech/kpf/umg16uw/results_hf/sm',          
    logging_dir='/gpfswork/rech/kpf/umg16uw/logs/sm',
	do_train=True,
    do_eval=True,
	evaluation_strategy="steps",
	logging_first_step=True,
    logging_steps=10,
    num_train_epochs=3.0,              
    per_device_train_batch_size=16,
	per_device_eval_batch_size=16,	
    learning_rate=2e-5,
    weight_decay=0.01
)

trainer = Trainer(
	model=model,                         
	args=training_args,                  
	train_dataset = process_and_tokenize_file(X_train, y_train),
	eval_dataset = process_and_tokenize_file(X_val, y_val), 	
	compute_metrics=compute_metrics
	)

	# Train pre-trained model

	# Start training loop
	print("Start training...\n") 

	train_results = trainer.train()
	val_results = trainer.evaluate()


for root, subdirs, files in os.walk(test_dir):
		#print(root,"...")
		#print(files,"...")
		for f in files:
			path_file = os.path.join(root, f)
			input, input_label = input_file(path_file)
			test_dataset = process_and_tokenize_file(input, input_label)
			test_results = trainer.predict(test_dataset)
			print(test_results)  # give the results above

```

nielsr · November 26, 2021, 1:36pm

Hi,

What models in the Transformers library output are called logits (they are called predictions in your case), these are the unnormalized scores for each class, for every example in a batch. You can turn them into probabilities by applying a softmax operation on the last dimension, like so:

import tensorflow as tf

probabilities = tf.math.softmax(predictions, axis=-1)
print(probabilities)

emmakelo · November 26, 2021, 3:44pm

Hi, thank you

But is it possible to use tensorflow with pytorch ? I used pytorch for the finetuning task ?

Can I modified the code you proposed in torch ?

nielsr · November 26, 2021, 4:00pm

Oh apologies, I’m also using PyTorch, for some reason I thought you were using Tensorflow.

In PyTorch, it can be done as follows:

from torch import nn

probabilities = nn.functional.softmax(predictions, dim=-1)
print(probabilities)

marlon89 · February 16, 2022, 1:16pm

Hey,

I have nearly the same output like emmakelo and have a predictions array including the logits and the label. When I use your pytorch function I get the following error “AttributeError: ‘numpy.ndarray’ object has no attribute ‘softmax’”. Do you know how to tackle this? Thanks in advance

emmakelo · February 18, 2022, 4:20pm

You have to use softmax function

                    #print(test_results)
					labels_id = test_results.label_ids
					#print("Origine---- :" , labels_id)
					y_pred_pt = torch.from_numpy(test_results.predictions)
					probs = nn.functional.softmax(y_pred_pt, dim=-1) #raw_pred = logits from what it is return from the self.classifier
					#print(probs)
					# Get the max probability and the index of the highest prediction for each output.
					max_probabilities, max_indices = torch.max(probs, dim=1)
					print(type(max_probabilities))
					predicted_labels = test_results.predictions.argmax(-1)

Topic		Replies	Views
Why do probabilities output for a model does not correspond to label predicted by the finetune model? Beginners	3	1424	December 3, 2021
Transform Logits to probabilities doesn't work Beginners	4	9591	February 17, 2022
Transform VisualBERT prediction_logits to probabilities Beginners	0	194	July 16, 2023
Applying finetuned model to data Beginners	7	1278	June 9, 2022
Confusion about trainer.predict(dataset['test']) output 🤗Transformers	0	548	November 3, 2022

How to get probabilities per label in finetuning classification task?

Related topics