Hugging Face Course 2026
2.Virtual Environment setup
a.Hugging Face Installation for Natural Language Processing
3.PyTorch and Hugging Face
4.Datasets
5.Tokenizers
6.Models
7.Tasks
A.Natural Language Processing
a.Table Question Answering
b.Zero-Shot Classification
c.Fill-Mask
d.Question Answering
e.Translation
f.Summary
g.Token Classification
h.Text Classification
8.Evaluate
Evaluate / accuracy, f1 score, precision, recall metrics example
Seqeval / classification report example
Evaluate / squad metric example
Advanced Hugging Face topics
How to fine-tune a pretrained Hugging Face modelHugging Face
Hugging Face is an ecosystem that provides models, datasets, and other tools to develop NLP (Natural Language Processing) and other machine learning models. Natural language is inherently ambiguous, which makes NLP challenging to learn. However, using Hugging Face tools can make the process easier. For example, you can use Hugging Face pretrained models instead of creating and training a model from scratch. You can find a brief introduction to Hugging Face Datasets, Tokenizers, and Hugging Face Models. You can explore these concepts in more detail in the Tasks section. There, you'll learn how to use datasets, tokenizers, and models. The process typically begins with preparing your data, followed by tokenizing the input. Then, you pass the tokenized data to the model. For some models, you need to decode the output as the final step. You should also keep in mind that Hugging Face has its own complexity, and it takes time to learn how to use it. Pretrained models have different syntax rules, and you should know how to use them. Using pipelines is the easiest way to get a result from a Hugging Face model. You can load a model directly as well. There are many models, and you should find the right model for your dataset (and the task). For some models, Scikit-learn metrics are also compatible. Some models perform better than others; you can test them using your data.
We will use other Python libraries as well. If you want to learn about the Pandas library, please visit the Pandas tutorial. If you want to learn about the NumPy library, please visit the NumPy tutorial. If you're interested in a specific topic, feel free to jump straight to it. Otherwise, every topic contains useful information. You will learn how to use Hugging Face libraries locally. You can use your editor to test your code. Visual Studio Code is used for the examples below. You should be familiar with Python and PyTorch. The Hugging Face course below is for beginners, and you don't need any prior knowledge. You can always consult the Hugging Face website for more information.
Setting Up Your Environment for Hugging Face
Creating a Python Virtual Environment
You should set up the environment before installing Hugging Face libraries. You need to set up a virtual environment in Python. You need to install virtualenv. If you are using pip, run the command below:
pip install virtualenv
If you are using pip3 or pipx, use pip3 or pipx instead of pip.
You need to create a virtual environment in your Python project folder. If you are using pip, run the command below:
python -m venv new_env
If you are using python3, use python3 instead of python. We named the virtual environment "new_env" but you can choose another name.
You can activate the environment:
source new_env/bin/activate
We will need the following Python libraries, you need to install them as well. If you are using pip, run the command below:
pip install -U scikit-learn
To install scikit-learn using conda, check the official website.
After the installation, you may need to close and reopen your folder.
To check the version of the scikit-learn library:
import sklearn
print(sklearn.__version__)
We will use the pandas and numpy libraries. If you are using pip, run the commands below:
pip install pandas
pip install numpy
If you are using conda, run the commands below:
conda install pandas
conda install numpy
Import the pandas and other libraries:
import pandas as pd
import numpy as np
How to Install Hugging Face Libraries for Natural Language Processing (NLP)
You need to install the Natural Language Processing libraries to use Hugging Face locally. You need to install datasets and tokenizers libraries to use Hugging Face datasets and tokenizers:
pip install datasets
pip install tokenizers
Hugging Face datasets 3.5.0 and tokenizers 0.21.1 will be used for the tutorial below. If you installed a Hugging Face library successfully and want to learn the library version, you can run the command below:
import datasets
print(datasets.__version__)
You can use the syntax above for other Hugging Face libraries; you just need to change the library name.
The Transformers library is the necessary library to run the Transformers models.
pip install transformers
Transformers 4.51.0 will be used in the tutorial below.
You should install PyTorch as well:
pip install torch
PyTorch 2.6.0 will be used for the course below. If you want to run the codes below without any installation, you can use Google Colab as well. However, some of the libraries, like datasets, may not be up-to-date, and they may not work as expected. You are likely to get ImportErrors.
Since they use older versions of the libraries, examples generated by AI tools may also be out of date. Therefore, errors may occur if you run them locally.
Device Types and Hugging Face
PyTorch will be used for the Hugging Face examples below. The compute platform for PyTorch depends on your device's operating system. MacOS will be used for the examples below. Therefore, "MPS" will be used. If your device supports NVIDIA GPUs, you should use "CUDA". CPU can be used if no GPU is available or for small models, lightweight inference, or preprocessing tasks, but it is slower than both MPS and CUDA. You can run the commands to learn your compute platform:
import torch
print(torch.backends.cuda.is_built())
print(torch.backends.mps.is_available())
print(torch.backends.cpu.get_cpu_capability())
If you still need help, you can visit the PyTorch website. Unfortunately, training models can take a long time and the device you use can determine the length of the training time.
Working with Hugging Face Datasets
Hugging Face Datasets is a library for Natural Language Processing tasks. If you haven't installed it yet, run the command below:
pip install datasets
You can check the full list of Hugging Face Datasets. You can use the "Tasks", "Libraries", and "Languages" categories to specialize/filter your dataset search. If you need a dataset for a specific task, you can choose the task and see your dataset options. You can read the Dataset card to get more detailed information about the dataset. For example, you can go to the Tasks category and choose the "Question Answering" task under the Natural Language Processing title.
How to load a Hugging Face dataset
If you want to use a dataset locally, you should import it. You should go to the dataset's page and click "</>Use this dataset". You can copy and paste the code, or you can manually write the code:
from datasets import load_dataset
ds = load_dataset("dataset_name")
Keep in mind that some of the datasets have multiple versions, you should import the version you want.
Running NLP models can take a long time. You can select a part of the dataset instead of a full dataset:
small_dataset = ds["train"].shuffle(seed=42).select(range(300))
Your dataset must match what the model expects. Reading dataset cards can be very helpful. You can find the links (and credits) in the comments or below the examples.
How to split a Hugging Face dataset
You can split your data into training and testing sets using different methods.:
ds = load_dataset("dataset_name", streaming=True, split="train")
ds = load_dataset("dataset_name", split="test")
dataset = load_dataset("dataset_name", split="test[:5%]")
dataset = load_dataset("dataset_name", split="train[:1000]")
Understanding Tokenizers in Hugging Face
As mentioned earlier, you need to tokenize your data for the model. Hugging Face offers different types of tokenizers. If you haven't installed the tokenizers library yet, run the command below:
pip install tokenizers
You need to load and import your tokenizer:
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_pretrained("bert-base-uncased")
You can also use auto classes to tokenize your data:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
The tokenizer and model should always be from the same checkpoint. Depending on your task, you may need to decode your output using tokenizers.
Tokenizers return input_ids, attention_masks, and token_type_ids (not present for DistilBERT). Input IDs are lists of token IDs. Attention masks indicate whether a token is real or padding. Some tokenizers may return additional outputs. Keep in mind that different models require different inputs.
Exploring Hugging Face Models
As mentioned earlier, Hugging Face has great pretrained models. You can use them instead of training your model from scratch. If you haven't installed the transformers library yet, run the command below:
pip install transformers
Hugging Face Transformers
Transformers is a cutting-edge deep learning architecture that enables machines to understand language contextually. Transformers consider all words in a sentence simultaneously, allowing machines to grasp meaning, relationships, and nuances more effectively. This approach has significantly advanced the field of Natural Language Processing and enabled more accurate language understanding tasks such as translation, question answering, and text generation. You should find the best model for your task and dataset. The process is similar to the datasets. You can choose a task and select a suitable model. You can read model cards to get more detailed information about the model. You can click the Use this model button and see your options.
How to use a Hugging Face model
You can specify a task and use the pipeline() function:
from transformers import pipeline
qa_model = pipeline("question-answering")
The pipeline function carries out all stages of the process. Although using a pipeline without specifying a model name and revision in production is not recommended, you don't have to specify a model like in the example above. You can also explicitly specify a model:
from transformers import pipeline
question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')
You can also load a model directly:
from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained("google-bert/bert-base-uncased")
To train or fine-tune a pretrained model, you need to load it directly.
You can also use auto classes. You should find the auto classes for the model you want to use and specify the model:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased")
If you want to learn more about your data, you can check its configuration:
print(model.config)
Reading model cards can be very helpful. You can find the links (and credits) in the comments or below the examples. Unfortunately, some of the model cards lack sufficient information.
Natural Language Processing Tasks with Hugging Face
Natural Language Processing (NLP) Models in Hugging Face
Natural Language Processing (NLP) enables computers to understand, interpret, and generate human language, and Hugging Face provides one of the largest collections of pretrained NLP models for developers and researchers. With Hugging Face, you can easily perform tasks like text classification, question answering, text summarization, translation, and token classification using powerful models such as BERT, DistilBERT, BART, TAPAS, and MarianMT. In this tutorial, we'll guide you through the most popular Hugging Face NLP models, show practical examples, and explain how to use smaller, faster variants to save time and storage while experimenting. To save time and storage, we will use smaller versions of the original models in these examples. The smaller models behave similarly and are suitable for learning and experimentation.
Table Question Answering Examples
How to Use Table Question Answering Models with Hugging Face Pipeline
We will use the pipeline for Table Question Answering. We will create simple synthetic data. The table will display the names of the products and the number of products.
from transformers import pipeline
import pandas as pd
# prepare table
data = {"Products": ["jeans", "jackets", "shirts"], "Number of products": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)
#prepare your question
question = "how many shirts are there?"
# pipeline model
tqa = pipeline(task="table-question-answering", model="google/tapas-large-finetuned-wtq", aggregator="SUM")
# result
print(tqa(table=table, query=question))
{'answer': 'SUM > 69', 'coordinates': [(2, 1)], 'cells': ['69'], 'aggregator': 'SUM'}
If we change the data and make two columns for the shirts, the answer changes:
#new data
data = {"Products": ["jeans", "jackets", "shirts", "shirts"], "Number of products": ["87", "53", "69", "21"]}
print(tqa(table=table, query=question))
The answer:
{'answer': 'COUNT > 69, 21', 'coordinates': [(2, 1), (3, 1)], 'cells': ['69', '21'], 'aggregator': 'COUNT'}
We can get the total number of shirts:
z = tqa(table=table, query=question)["cells"]
x= []
for i in z:
x.append(int(i))
print(sum(x))
The answer is 90.
*google/tapas-large-finetuned-wtq model from Hugging Face — licensed under the Apache 2.0 License.
How to use TAPAS Table Question Answering Model Without the Pipeline
In this example, we'll use the google/tapas-base-finetuned-wtq model to perform table question answering on the same data without using the pipeline:
from transformers import TapasTokenizer, TapasForQuestionAnswering
import pandas as pd
import torch
# Load model and tokenizer
model_name = "google/tapas-base-finetuned-wtq"
tokenizer = TapasTokenizer.from_pretrained(model_name)
model = TapasForQuestionAnswering.from_pretrained(model_name)
# Example table
data = {"Products": ["jeans", "jackets", "shirts"], "Number of products": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)
# Question
question = "how many shirts are there?"
# Tokenize inputs
inputs = tokenizer(table=table, queries=[question], return_tensors="pt")
# Forward pass
with torch.no_grad():
outputs = model(**inputs)
# Decode predicted answer
logits = outputs.logits
logits_agg = outputs.logits_aggregation
# Get the most probable cell answer
predicted_answer_coordinates, predicted_aggregation_indices = tokenizer.convert_logits_to_predictions(
inputs,
outputs.logits,
outputs.logits_aggregation
)
# Extract the answer from the table
answers = []
for coordinates in predicted_answer_coordinates:
if not coordinates:
answers.append("No answer found.")
else:
cell_values = [table.iat[row, column] for row, column in coordinates]
answers.append(", ".join(cell_values))
# Print the result
print("Answer:", answers[0])
Answer: 69
*The pipeline model, google/tapas-large-finetuned-wtq based on code from https://huggingface.co/google/tapas-large-finetuned-wtq (Apache 2.0)
The model's syntax can be a bit complex. Let's analyze this step by step. logits are raw output scores. logits_aggregation returns the scores of numeric aggregation operations. The model can perform basic operations like SUM using the table data.
For more details, you can refer to the google/tapas-base-finetuned-wtq model card on Hugging Face.
Zero-Shot Classification Example
How to Use Zero-Shot Classification with Hugging Face Pipeline
Zero-shot classification is used to predict the class of unknown data. Zero-shot classification models require text and labels. Let's see an example of Zero-shot classification using a pipeline:
from transformers import pipeline
classifier = pipeline("zero-shot-classification")
print(classifier(
"Is this a good time to buy gold?",
candidate_labels=["education", "politics", "business", "finance"]
))
{'sequence': 'Is this a good time to buy gold?', 'labels': ['finance', 'business', 'education', 'politics'], 'scores': [0.5152193307876587, 0.38664010167121887, 0.057615164667367935, 0.040525417774915695]}
You see the results in descending order. The "finance" label has the highest score.
How to Use BART Zero-Shot Classification Model Without the Pipeline
In this example, we'll use the facebook/bart-large-mnli model to perform zero-shot classification without using the pipeline. We will load the model directly:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
# Load model and tokenizer
model_name = "facebook/bart-large-mnli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Input sentence
sequence = "The pi is the ratio of the circumference of any circle to the diameter of that circle"
# Candidate labels
labels = ["education", "psychology", "sports", "finance", "math"]
# Create NLI-style premise-hypothesis pairs
premise = sequence
hypotheses = [f"This text is about {label}." for label in labels]
# Tokenize and get model outputs for each hypothesis
inputs = tokenizer([premise]*len(hypotheses), hypotheses, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
logits = model(**inputs).logits
# Convert logits to probabilities (softmax over entailment class)
entailment_logits = logits[:, 2]
probabilities = F.softmax(entailment_logits, dim=0)
print(probabilities)
# Print results
for label, score in zip(labels, probabilities):
print(f"{label}: {score:.4f}")
tensor([0.0125, 0.0091, 0.0089, 0.0109, 0.9586])
education: 0.0125
psychology: 0.0091
sports: 0.0089
finance: 0.0109
math: 0.9586
*facebook/bart-large-mnli model from Hugging Face — licensed under the MIT License.
The BART MNLI model has complex syntax rules. Let's simplify this. We need to get model outputs for each hypothesis. There are 5 labels. Therefore, the premise ("sequence") must be provided five times. The model returns logits for contradiction, neutral, and entailment. We are interested in entailment, and its index is 2. That's why we selected the logits at index 2.
For more details about this model, please refer to the facebook/bart-large-mnli model card on Hugging Face.
What's softmax?
softmax in PyTorch is applied to all slices along dim, and will re-scale them so that the elements lie in the range [0, 1] and sum to 1.
The sum of the scores [0.0125 + 0.0091 + 0.0089 + 0.0109 + 0.9586] in the example above is 1 and the "math" label has the highest score.
For more information about softmax, visit the PyTorch docs.
Fill-Mask Example
How to Use Fill-Mask Models with Hugging Face Pipeline
The fill-mask models replace the masked word/words in a sentence.
from transformers import pipeline
unmasker = pipeline("fill-mask")
print(unmasker("The most popular sport in the world is <mask>.", top_k=2))
[{'score': 0.11612111330032349, 'token': 4191, 'token_str': ' soccer', 'sequence': 'The most popular sport in the world is soccer.'},
{'score': 0.10927936434745789, 'token': 5630, 'token_str': ' cricket', 'sequence': 'The most popular sport in the world is cricket.'}]
How to Use BERT Fill-Mask Model Without the Pipeline
In this example, we'll use the google-bert/bert-base-uncased model to perform fill-mask tasks without using the pipeline:
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"google-bert/bert-base-uncased"
)
model = AutoModelForMaskedLM.from_pretrained(
"google-bert/bert-base-uncased",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="sdpa"
)
#See the device type explanation below
inputs = tokenizer("The most popular sport in the world is [MASK].", return_tensors="pt").to("mps")
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits
masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
prediction = tokenizer.decode(predicted_token_id)
print(f"The most popular sport in the world is {prediction}.")
The most popular sport in the world is football.
*google-bert/bert-base-uncased model from Hugging Face — licensed under the Apache 2.0 License.
You can use "mps" for macOS and "cuda" for devices compatible with CUDA. You can also remove it.
For more details about this model, please refer to the google-bert/bert-base-uncased model card on Hugging Face.
What's argmax?
The argmax returns the indices of the maximum value of all elements in the input tensor.
It returns the index of the maximum value to decode in the example above. For more information about argmax, visit the PyTorch docs.
Question Answering Example
How to Use Question Answering with Hugging Face Pipeline
There are different types of Question Answering (QA) tasks. If you use a pipeline for QA without specifying a model, the distilbert/distilbert-base-cased-distilled-squad model is used. It is used for extractive QA tasks. In other words, the model extracts the answer from a given text. Let's see an example of an extractive QA task using a pipeline:
from transformers import pipeline
question_answerer = pipeline("question-answering")
print(question_answerer(
question="Where does Julia live?",
context="Julia is 40 years old. She lives in London and she works as a nurse."
))
{'score': 0.9954689741134644, 'start': 36, 'end': 42, 'answer': 'London'}
How to Use BERT Question Answering Model Without the Pipeline
In this example, we'll use the deepset/bert-base-cased-squad2 model to perform question answering without using the pipeline. You can load the QA model directly:
from transformers import AutoTokenizer, BertForQuestionAnswering
import torch
tokenizer = AutoTokenizer.from_pretrained("deepset/bert-base-cased-squad2")
model = BertForQuestionAnswering.from_pretrained("deepset/bert-base-cased-squad2")
#question, text
question, text = "Where does Julia live?", "Julia is 40 years old. She lives in London and she works as a nurse."
#tokenize question and text
inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
result = tokenizer.decode(predict_answer_tokens, skip_special_tokens=True)
print(result)
{'score': 0.9954689741134644, 'start': 36, 'end': 42, 'answer': 'London'}
*deepset/bert-base-cased-squad2 model from Hugging Face — licensed under the CC BY 4.0 License.
For more details about this model, please refer to the deepset/bert-base-cased-squad2 model card on Hugging Face.
Translation Example
How to Use Translation with Hugging Face Pipeline
Our model will translate a sentence from French to English. However, there are other models for other languages.
from transformers import pipeline
translator = pipeline("translation", "Helsinki-NLP/opus-mt-fr-en")
print(translator("C'est un beau roman."))
How to Use MarianMT Translation Model Without the Pipeline
In this example, we'll use the Helsinki-NLP/opus-mt-en-fr model to translate the same sentence from English-to-French without using the pipeline:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
text = "The food is very delicious."
inputs = tokenizer(text, return_tensors="pt").input_ids
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
outputs = model.generate(inputs, max_new_tokens=40, do_sample=True, top_k=30, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
*"Helsinki-NLP/opus-mt-en-fr”model from Hugging Face — licensed under the Apache 2.0 License.
For more details about this model, please refer to the Helsinki-NLP/opus-mt-en-fr model card on Hugging Face.
Summary Example
How to use Summary Models with Hugging Face Pipeline
You can use summary models to summarize a text:
from transformers import pipeline
from datasets import load_dataset
ds = load_dataset("dataset_name")
text = ds["train"][0]["context"]
classifier = pipeline("summarization", max_length=100)
print(classifier(text))
How to Use BART Summary Model Without the Pipeline
We'll use the Hugging Face dataset abisee/cnn_dailymail to perform text summarization. You can write your own paragraph. In this example, we'll use the facebook/bart-large-cnn model to perform text summarization without using the pipeline.
from transformers import AutoTokenizer, BartForConditionalGeneration
checkpoint = "facebook/bart-large-cnn"
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
from datasets import load_dataset
ds = load_dataset("abisee/cnn_dailymail", "1.0.0")
text = ds["train"][0]["article"]
inputs = tokenizer(text, max_length=100, return_tensors="pt")
# Generate Summary
summary_ids = model.generate(inputs["input_ids"], max_length=180,
min_length=40,
do_sample=False,
no_repeat_ngram_size=3)
print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
Harry Potter star Daniel Radcliffe turns 18 on Monday. He gains access to a reported $41.1 million fortune. Radcliffe says he has no plans to fritter his cash away on fast cars.
*"abisee/cnn_dailymail", "1.0.0" dataset from Hugging Face — licensed under the Apache 2.0 License.
*"facebook/bart-large-cnn" model from Hugging Face — licensed under the MIT License.
You can control how the model generates a summary. For example, you might set the minimum and maximum length of the output, as shown above.
For more details about this model, please refer to the facebook/bart-large-cnn model card on Hugging Face.
Token Classification Example
How to use Token Classification with Hugging Face Pipeline
Token classification models are used to identify entities in a text. What type of entities can a token classification model identify? It depends on the model. For example, dslim/bert-base-NER can identify four types of entities: location (LOC), organizations (ORG), person (PER), and miscellaneous (MISC).
from transformers import pipeline
classifier = pipeline("token-classification")
z = "I'm Alicia and I live in Milano."
d = classifier(z)
print(d)
for token in d:
print(token["word"], token["entity"])
[{'entity': 'B-PER', 'score': np.float32(0.9941089), 'index': 4, 'word': 'Alicia', 'start': 4, 'end': 10},
{'entity': 'B-LOC', 'score': np.float32(0.9950382), 'index': 9, 'word': 'Milano', 'start': 25, 'end': 31}]
Alicia B-PER
Milano B-LOC
How to Use BERT Token Classification Model Without the Model
In this example, we'll use the dslim/bert-base-NER model to perform token classification (NER) on the same text as before without using the pipeline.
import torch
from transformers import BertTokenizerFast, BertForTokenClassification
# Load model and tokenizer
model_name = "dslim/bert-base-NER"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForTokenClassification.from_pretrained(model_name)
# Sample input
text = "I'm Alicia and I live in Milano."
# Tokenize
tokens = tokenizer(text, return_tensors="pt", truncation=True, is_split_into_words=False)
# Forward pass
with torch.no_grad():
outputs = model(**tokens)
logits = outputs.logits # shape: (batch_size, seq_len, num_labels)
# Get predicted class indices
predictions = torch.argmax(logits, dim=2)
# Convert IDs to label names
id2label = model.config.id2label
# Token IDs
input_ids = tokens["input_ids"][0]
predicted_labels = [id2label[label_id.item()] for label_id in predictions[0]]
print(predicted_labels)
['O', 'O', 'O', 'O', 'B-PER', 'O', 'O', 'O', 'O', 'B-LOC', 'O', 'O']
*"dslim/bert-base-NER" model from Hugging Face — licensed under the MIT License.
B refers to the beginning of the entity: B-PER - Beginning of a person's name right after another person's name, B-LOC - Beginning of a location right after another location.
For more detailed information about the dslim/bert-base-NER model, please visit the dslim/bert-base-NER model card.
Text Classification Example
How to Use Text Classification with Hugging Face Pipeline
Text classification models are designed to categorize text into predefined labels. They are widely used in tasks like sentiment analysis, spam detection, and topic labeling. In the example below, the model will determine whether a given text expresses a positive or negative sentiment.
from transformers import pipeline
text = "Your dog is super cute."
pipe = pipeline("text-classification")
result = pipe(text)
print(result[0]["label"])
POSITIVE
How to Use DistilBERT Text Classification Model Without the Pipeline
In this example, we'll use the distilbert/distilbert-base-uncased-finetuned-sst-2-english model to perform text classification without using the pipeline.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
inputs = tokenizer("Your dog is super cute.", return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]
print(model.config.id2label[predicted_class_id])
POSITIVE
*"distilbert-base-uncased-finetuned-sst-2-english" model from Hugging Face (Apache 2.0).
We used a simple text, but you can use the model for more complicated texts like reviews as well.
For more details about this model, please refer to the distilbert/distilbert-base-uncased-finetuned-sst-2-english model card on Hugging Face.
How to Evaluate Hugging Face Model Performance
There are several ways to evaluate your Hugging Face models. Hugging Face provides a variety of evaluation metrics tailored to different tasks. You can use the evaluate library to assess the performance of Hugging Face models and datasets. To get started, you need to install the evaluate library:
pip install evaluate
Different models return different data types, so it's important to choose the right evaluation metric for each model. You can find a full list of available metrics in the Hugging Face documentation. In this section, you will learn how to use the Hugging Face evaluate library. We will reuse some of the earlier examples to gain a better understanding of how Hugging Face models work. Keep in mind that using a full dataset may require slightly different syntax compared to evaluating a single sample. Feel free to revisit the earlier example if you need a refresher on how the model works. We will be working with the stanfordnlp/sst2 dataset. Let's evaluate the sentiment analysis model using the following metrics: accuracy, f1 score, precision, recall metrics:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
model.eval()
from datasets import load_dataset
ds = load_dataset("stanfordnlp/sst2")
sent = ds["validation"]["sentence"][:100]
labels = ds["validation"]["label"][:100]
inputs = tokenizer(sent, truncation=True, padding=True, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
results = [i.argmax().item() for i in logits]
import evaluate
accuracy = evaluate.load("accuracy")
result = accuracy.compute(predictions=results, references=labels)
print("Accuracy:", result)
from sklearn.metrics import precision_recall_fscore_support
precision, recall, f1, _ = precision_recall_fscore_support(labels, results, average='binary')
print("precision, recall, f1: ", precision, recall, f1)
Accuracy: {'accuracy': 0.94}
precision, recall, f1: 0.9259259259259259 0.9615384615384616 0.9433962264150944
We evaluated the AutoModelForSequenceClassification model using the "distilbert/distilbert-base-uncased-finetuned-sst-2-english" checkpoint. The evaluation included accuracy, precision, recall, and f1 scores. Accuracy is calculated using the Hugging Face evaluate library. Alternatively, you can use Scikit-learn's built-in metrics to evaluate Hugging Face models. The example above shows how to use Scikit-learn metrics with Hugging Face models in this context. We used Sikit-learn to compute precision, recall, f1 score.
Seqeval is a Python library designed for evaluating sequence labeling (NER and POS tagging) tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. It calculates precision, recall, and F1 score. You need to install the seqeval library to use the seqeval metrics:
pip install seqeval
We will evaluate the token classification model example above using the "tomaarsen/conll2003" dataset. We will use the seqeval metric's classification report for the evaluation:
import torch
from transformers import BertTokenizerFast, BertForTokenClassification
from datasets import load_dataset
from seqeval.metrics import classification_report
import datasets
# Load model/tokenizer
model_name = "dslim/bert-base-NER"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForTokenClassification.from_pretrained(model_name)
model.eval()
# Load dataset
dataset = load_dataset("tomaarsen/conll2003")
texts = dataset["validation"]["tokens"][:100]
true_tags = dataset["validation"]["ner_tags"][:100]
label_names = dataset["validation"].features["ner_tags"].feature.names
id2label = model.config.id2label
predicted_labels = []
true_labels = []
#Tokenize the Dataset
for tokens, tag_ids in zip(texts, true_tags):
encoding = tokenizer(tokens,
is_split_into_words=True,
return_tensors="pt",
padding=True,
truncation=True)
with torch.no_grad():
outputs = model(**encoding)
logits = outputs.logits
predictions = torch.argmax(logits, dim=2)
word_ids = encoding.word_ids()
preds = []
trues = []
prev_word_id = None
for idx, word_id in enumerate(word_ids):
if word_id is None or word_id == prev_word_id:
continue # Skip special tokens and subwords
pred_id = predictions[0][idx].item()
preds.append(id2label[pred_id])
trues.append(label_names[tag_ids[word_id]])
prev_word_id = word_id
predicted_labels.append(preds)
true_labels.append(trues)
# Evaluation
print(classification_report(true_labels, predicted_labels))
*The dataset used in the example above may cause errors in Google Colab. You can either test the code with a similar dataset or run it in your local environment.
As mentioned earlier, different models require different evaluation metrics. Question answering models have more complex syntax, which can make evaluation more challenging. We will use the rajpurkar/squad_v2 dataset to evaluate the question answering model. Let's evaluate the question answering model example above:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, BertForQuestionAnswering
import evaluate
import torch
# Load the SQuAD validation set
dataset = load_dataset("rajpurkar/squad_v2", split="validation[:50]")
# Load pretrained model and tokenizer
model_name = "deepset/bert-base-cased-squad2"
tokenizer = AutoTokenizer.from_pretrained("deepset/bert-base-cased-squad2")
model = BertForQuestionAnswering.from_pretrained("deepset/bert-base-cased-squad2")
model.eval()
# Tokenize the question and the text
def preprocess(example):
return tokenizer(
example["question"],
example["context"],
truncation=True,
padding="max_length",
max_length=384,
return_tensors="pt"
)
tokenized_dataset = dataset.map(preprocess, batched=True)
answers = []
for example in tokenized_dataset:
input_ids = example["input_ids"]
attention_mask = example["attention_mask"]
# Convert to tensors
inputs = {
"input_ids": torch.tensor([input_ids]),
"attention_mask": torch.tensor([attention_mask])
}
with torch.no_grad():
outputs = model(**inputs)
start_idx = torch.argmax(outputs.start_logits)
end_idx = torch.argmax(outputs.end_logits)
# Decode answer span
answer_ids = inputs["input_ids"][0][start_idx:end_idx + 1]
answer = tokenizer.decode(answer_ids, skip_special_tokens=True)
answers.append(answer)
import evaluate
metric = evaluate.load("squad_v2")
# Format predictions and references
pred = [ {"id": dataset[i]["id"], "prediction_text": answers[i], 'no_answer_probability': 0.}
for i in range(len(answers))]
references = [
{"id": dataset[i]["id"], "answers": dataset[i]["answers"]}
for i in range(len(answers))
]
results = metric.compute(predictions=pred, references=references)
print(results)
{'exact': 46.0, 'f1': 48.0407876230661, 'total': 50, 'HasAns_exact': 28.571428571428573, 'HasAns_f1': 33.430446721585966, 'HasAns_total': 21, 'NoAns_exact': 58.62068965517241, 'NoAns_f1': 58.62068965517241, 'NoAns_total': 29, 'best_exact': 64.0, 'best_exact_thresh': 0.0, 'best_f1': 64.22222222222223, 'best_f1_thresh': 0.0}
We evaluated the model using the SQuAD v2 metric.
What's the difference between SQuAD and SQuAD v2?
SQuAD is the official scoring script for version 1 of the Stanford Question Answering Dataset (SQuAD). You can find more details in the official SQuAD documentation. SQuAD v2 is the official scoring script for version 2 of the Stanford Question Answering Dataset (SQuAD). For more information, please refer to the official SQuAD 2.0 documentation. To perform well on SQuAD 2.0, systems must not only provide answers when they are supported by the paragraph but also correctly identify when no answer is available and refrain from answering.
How to Fine-Tune a Pretrained Hugging Face Model
This article serves as the capstone of our work with the Hugging Face ecosystem, walking through practical fine-tuning across multiple NLP tasks. It begins with sentiment analysis, progresses to named entity recognition, and concludes with a full question-answering setup. Together, these examples demonstrate how transformer models can be systematically adapted to different real-world objectives. For the complete implementation and detailed breakdown, continue to the full article.