CIDEr Score Evaluator

Evaluate the alignment between an image and a text using CIDEr Score.

Examples
References Predictions

Metric Card for CIDEr

Module Card Instructions: This module implements the CIDEr metric for image captioning evaluation.

Metric Description

CIDEr (Consensus-based Image Description Evaluation) is a metric used to evaluate the quality of image captions by measuring their similarity to human-generated reference captions. It does this by comparing the n-grams of the candidate caption to the n-grams of the reference captions, and measuring how many n-grams are shared between the candidate and the references.

How to Use

To use this metric, you can call the compute method with the following parameters:

Inputs

  • predictions (batch of list of strings): The generated captions to evaluate.
  • references (batch of list of strings): The reference captions for each generated caption.

Output Values

  • score (dict): The CIDEr score, which ranges from 0 to 1, with higher scores indicating better quality captions.

Examples

import evaluate

metric = evaluate.load("sunhill/cider")
results = metric.compute(
    predictions=[["train traveling down a track in front of a road"]],
    references=[
        [
            "a train traveling down tracks next to lights",
            "a blue and silver train next to train station and trees",
            "a blue train is next to a sidewalk on the rails",
            "a passenger train pulls into a train station",
            "a train coming down the tracks arriving at a station",
        ]
    ]
)
print(results)

Citation

@InProceedings{Vedantam_2015_CVPR,
    author = {Vedantam, Ramakrishna and Lawrence Zitnick, C. and Parikh, Devi},
    title = {CIDEr: Consensus-Based Image Description Evaluation},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2015}
}

Further References