Open Speech Emotion Recognition Leaderboard
Welcome to the Open SER Leaderboard — part of the CAMEO project!
This leaderboard tracks how well different models recognize emotions in speech across multiple languages.
Everything is open, transparent, and reproducible - you're invited to explore, evaluate, and contribute.
This tab shows how different models perform across the entire CAMEO collection. You’ll find macro F1, weighted F1, and accuracy scores for each model, tested at different temperature settings.
It's a great place to get a quick overview of how models compare on the full dataset.
1 |
Curious how models handle different languages? This view lets you compare performance across languages like English, French, German, and more. Use the checkboxes to pick which languages you want to see, and switch between metrics like macro F1, weighted F1, or accuracy using the radio buttons.
This is especially useful if you’re working on multilingual models or looking to improve performance in a specific language.
1 |
This tab breaks down results by individual datasets included in the CAMEO collection. You can choose which datasets to view and which metric to focus on.
It’s helpful for spotting differences in performance, potential data overlap, or just understanding how models behave on different kinds of emotional speech.
1 |
Which emotions are easier for models to recognize - and which ones still trip them up? This view shows how models perform on specific emotional states.
Pick the emotions and metric you’re interested in, and see which models handle them best. It's a great tool for digging deeper into model behavior.
1 |
📝 About
CAMEO (Collection of Multilingual Emotional Speech Corpora) is a benchmark dataset designed to support research in Speech Emotion Recognition (SER) - especially in multilingual and cross-lingual settings.
The collection brings together 13 emotional speech datasets covering 8 languages, including English, German, Spanish, French, Serbian, and more. In total, it contains 41,265 audio samples, with each sample annotated for emotion, and in most cases, also for speaker ID, gender, and age.
Here are a few quick facts about the dataset:
- Over 33% of the samples are in English.
- 17 distinct emotional states are represented across datasets.
- 93.5% of samples fall under the seven primary emotions: neutral, anger, sadness, surprise, happiness, disgust, and fear.
- Gender annotations are available for over 92% of samples.
All datasets included in CAMEO are openly available. We've made the full collection accessible on Hugging Face, along with metadata, tools, and a leaderboard for evaluation.
🔗 View the CAMEO Dataset on Hugging Face
Whether you're building SER models or exploring emotion understanding across languages, CAMEO is here to support your research.
🔢 Evaluate your model
To evaluate your model according to the methodology used in our paper, you can use the following code.
import os
import string
from Levenshtein import ratio
from datasets import load_dataset, Dataset, concatenate_datasets
from sklearn.metrics import classification_report, f1_score, accuracy_score
# 🔧 Change this path to where your JSONL prediction files are stored
outputs_path = "./"
_DATASETS = [
"cafe", "crema_d", "emns", "emozionalmente", "enterface",
"jl_Corpus", "mesd", "nemo", "oreau", "pavoque",
"ravdess", "resd", "subesco",
]
THRESHOLD = 0.57
def get_expected(split: str) -> tuple[set, str, dict]:
"""Load expected emotion labels and language metadata from CAMEO dataset."""
ds = load_dataset("amu-cai/CAMEO", split=split)
return set(ds["emotion"]), ds["language"][0], dict(zip(ds["file_id"], ds["emotion"]))
def process_outputs(dataset_name: str) -> tuple[Dataset, set, str]:
"""Clean and correct predictions, returning a Dataset with fixed predictions."""
outputs = Dataset.from_json(os.path.join(outputs_path, f"{dataset_name}.jsonl"))
options, language, expected = get_expected(dataset_name)
def preprocess(x):
return {
"predicted": x["predicted"].translate(str.maketrans('', '', string.punctuation)).lower().strip(),
"expected": expected.get(x["file_id"]),
}
outputs = outputs.map(preprocess)
def fix_prediction(x):
if x["predicted"] in options:
x["fixed_prediction"] = x["predicted"]
else:
predicted_words = x["predicted"].split()
label_scores = {
label: sum(r for r in (ratio(label, word) for word in predicted_words) if r > THRESHOLD)
for label in options
}
x["fixed_prediction"] = max(label_scores, key=label_scores.get)
return x
outputs = outputs.map(fix_prediction)
return outputs, options, language
def calculate_metrics(outputs: Dataset, labels: set) -> dict:
"""Compute classification metrics."""
y_true = outputs["expected"]
y_pred = outputs["fixed_prediction"]
return {
"f1_macro": f1_score(y_true, y_pred, average="macro"),
"weighted_f1": f1_score(y_true, y_pred, average="weighted"),
"accuracy": accuracy_score(y_true, y_pred),
"metrics_per_label": classification_report(
y_true, y_pred, target_names=sorted(labels), output_dict=True
),
}
# 🧮 Main Evaluation Loop
results = []
outputs_per_language = {}
full_outputs, full_labels = None, set()
for dataset in _DATASETS:
jsonl_path = os.path.join(outputs_path, f"{dataset}.jsonl")
if not os.path.isfile(jsonl_path):
print(f"Jsonl file for {dataset} not found.")
continue
outputs, labels, language = process_outputs(dataset)
metrics = calculate_metrics(outputs, labels)
results.append({"language": language, "dataset": dataset, **metrics})
if language not in outputs_per_language:
outputs_per_language[language] = {"labels": labels, "outputs": outputs}
else:
outputs_per_language[language]["labels"] |= labels
outputs_per_language[language]["outputs"] = concatenate_datasets([
outputs_per_language[language]["outputs"], outputs
])
full_outputs = outputs if full_outputs is None else concatenate_datasets([full_outputs, outputs])
full_labels |= labels
# 🔤 Per-language evaluation
for language, data in outputs_per_language.items():
metrics = calculate_metrics(data["outputs"], data["labels"])
results.append({"language": language, "dataset": "all", **metrics})
# 🌍 Global evaluation
if full_outputs is not None:
metrics = calculate_metrics(full_outputs, full_labels)
results.append({"language": "all", "dataset": "all", **metrics})
# 💾 Save results
Dataset.from_list(results).to_json(os.path.join(outputs_path, "results.jsonl"))
📬 Submit Here!
Want your model to appear on the leaderboard?
Send us an email at iwona.christop@amu.edu.pl with the subject line "CAMEO Leaderboard Submission".
Please include:
- Your model's name and a short description.
- The temperature setting you used.
- A JSONL file with your predictions.
- Any other details you'd like to share.
If you don’t have access to the resources needed to run the evaluation yourself, no problem - just send us a link to the model (e.g., a Hugging Face model page), and we’ll do our best to run the evaluation for you.
We’ll review your submission and add your results to the leaderboard!