The academic and scientific world in general is increasingly concerned about their inability to determine and ascertain the identity of the writer of a text. More and more often the question arises as to whether a scientific article or work handed in by a student was actually produced by the alleged author of the questioned text. The role of artificial intelligence (AI) is increasingly debated due to its dangers of undeclared use. A current example is undoubtedly the undeclared use of ChatGPT to write a scientific text. The article promotes an AI model-independent redundancy measure to support discrimination between hypotheses on authorship of various multilingual texts written by humans or produced by intelligence media such as ChatGPT. The syntax of texts written by humans tends to differ from that of texts produced by AIs. This difference can be grasped and quantified even with short texts (i.e. 1800 characters). This aspect of length is extremely important, because short texts imply a greater difficulty of analysis to characterize authorship. To meet the efficiency criteria required for the evaluation of forensic evidence, a probabilistic approach is implemented. In particular, to assess the value of the redundancy measure and to offer a consistent classification criterion, a metric called Bayes factor is implemented. The proposed Bayesian probabilistic method represents an original approach in stylometry. Analyses performed over multilingual texts (English and French) covering different scientific and human areas of interest (forensic science and socio-psycho-artistic topics) reveal the feasibility of a successful authorship discrimination with limited misclassification rates. Model performance is satisfactory even with small sample sizes.

A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach

Bozza S.
;
2023-01-01

Abstract

The academic and scientific world in general is increasingly concerned about their inability to determine and ascertain the identity of the writer of a text. More and more often the question arises as to whether a scientific article or work handed in by a student was actually produced by the alleged author of the questioned text. The role of artificial intelligence (AI) is increasingly debated due to its dangers of undeclared use. A current example is undoubtedly the undeclared use of ChatGPT to write a scientific text. The article promotes an AI model-independent redundancy measure to support discrimination between hypotheses on authorship of various multilingual texts written by humans or produced by intelligence media such as ChatGPT. The syntax of texts written by humans tends to differ from that of texts produced by AIs. This difference can be grasped and quantified even with short texts (i.e. 1800 characters). This aspect of length is extremely important, because short texts imply a greater difficulty of analysis to characterize authorship. To meet the efficiency criteria required for the evaluation of forensic evidence, a probabilistic approach is implemented. In particular, to assess the value of the redundancy measure and to offer a consistent classification criterion, a metric called Bayes factor is implemented. The proposed Bayesian probabilistic method represents an original approach in stylometry. Analyses performed over multilingual texts (English and French) covering different scientific and human areas of interest (forensic science and socio-psycho-artistic topics) reveal the feasibility of a successful authorship discrimination with limited misclassification rates. Model performance is satisfactory even with small sample sizes.
2023
13
File in questo prodotto:
File Dimensione Formato  
Bozzaetal2023.pdf

non disponibili

Tipologia: Versione dell'editore
Licenza: Accesso chiuso-personale
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5056320
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact