This study evaluates whether Italian-trained Large Language Models (LLMs) can interpret metaphors by comparing their performance to both human judgments and human-produced interpretations. Using three datasets containing metaphors, human interpretations, and implausible alternatives, we assess model performance via log-likelihood scores. Results show that LLMs partially replicate human understanding and are influenced by expression conventionality and linguistic context.

Language Models and the Magic of Metaphor: A Comparative Evaluation with Human Judgments

Simone Mazzoli;Alice Suozzi;Gianluca Lebani
2025

Abstract

This study evaluates whether Italian-trained Large Language Models (LLMs) can interpret metaphors by comparing their performance to both human judgments and human-produced interpretations. Using three datasets containing metaphors, human interpretations, and implausible alternatives, we assess model performance via log-likelihood scores. Results show that LLMs partially replicate human understanding and are influenced by expression conventionality and linguistic context.
2025
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
File in questo prodotto:
File Dimensione Formato  
2025.clicit-1.68.pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 2.58 MB
Formato Adobe PDF
2.58 MB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5108949
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact