This study focuses on the performance of saliency models concerning a specific type of image, namely hand tools. These objects are characterized by functionally distinct segments with various manipulative roles (i.e. end-effectors) and parts that are used to grasp and operate (i.e. handles). A highlighted by various studies in behavioral science, en-effectors of tools inherently draw humans’ attention. However, it remains unclear whether saliency models effectively address this intrinsic aspect. To shed light on this, we selected four recent notable saliency models, i.e. EML-NET, SalGAN, DeepGaze IIE, and DeepGaze III, known for their reliance on transfer learning. Our aim is to evaluate their performance in capturing the influence of semantic segments within tools. To conduct the assessment, we carefully chose a set of images featuring tool and nontool objects from a large standardized dataset and applied each saliency model to these images. Subsequently, these images were presented to a group of human participants, and empirical gaze data was recorded. Finally, we evaluated the correspondence (or discrepancy) between the saliency maps and the empirical data using six different evaluation criteria (i.e. CC, NSS, LL, IG, KL, SIM). Our findings reveal that, across all four models, the accuracy in predicting saliency concerning tool images often lags behind that of non-tool images. Moreover, two out of the four models exhibited consistently lower accuracy in predicting saliency on tool images compared to non-tool images across all six evaluation criteria. This indicates a lack of adequate consideration for the specificity of tools in these recent saliency models, highlighting the necessity to propose solutions to rectify this limitation.
Effect of Tool Specificity on the Performance of DNN-Based Saliency Prediction Methods
Yucel, Zeynep
2023-01-01
Abstract
This study focuses on the performance of saliency models concerning a specific type of image, namely hand tools. These objects are characterized by functionally distinct segments with various manipulative roles (i.e. end-effectors) and parts that are used to grasp and operate (i.e. handles). A highlighted by various studies in behavioral science, en-effectors of tools inherently draw humans’ attention. However, it remains unclear whether saliency models effectively address this intrinsic aspect. To shed light on this, we selected four recent notable saliency models, i.e. EML-NET, SalGAN, DeepGaze IIE, and DeepGaze III, known for their reliance on transfer learning. Our aim is to evaluate their performance in capturing the influence of semantic segments within tools. To conduct the assessment, we carefully chose a set of images featuring tool and nontool objects from a large standardized dataset and applied each saliency model to these images. Subsequently, these images were presented to a group of human participants, and empirical gaze data was recorded. Finally, we evaluated the correspondence (or discrepancy) between the saliency maps and the empirical data using six different evaluation criteria (i.e. CC, NSS, LL, IG, KL, SIM). Our findings reveal that, across all four models, the accuracy in predicting saliency concerning tool images often lags behind that of non-tool images. Moreover, two out of the four models exhibited consistently lower accuracy in predicting saliency on tool images compared to non-tool images across all six evaluation criteria. This indicates a lack of adequate consideration for the specificity of tools in these recent saliency models, highlighting the necessity to propose solutions to rectify this limitation.File | Dimensione | Formato | |
---|---|---|---|
c_43_scai_effect.pdf
non disponibili
Tipologia:
Documento in Pre-print
Licenza:
Copyright dell'editore
Dimensione
1.86 MB
Formato
Adobe PDF
|
1.86 MB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.