This paper is the first part of an introduction to Chiron, a full-stack software framework for automated metrical analysis, applicable to any language, metrical tradition and digital formats. This part focuses on data collection, which is carefully distinct from data interpretation. In the context of a layered and distributed architecture, the system provides a number of components, which get composed into an analysis pipeline configured to fit a specific scenario (e.g. poetry or prose, Greek or Latin, appositives detection, etc.). The data model for its results is inspired from the linguistic structures underlying the analysis, up to the most granular level of phonemes and traits, and gets stored into a RDBMS, together with the texts and its metadata. This provides a fully detailed metric database, usable as a sort of laboratory for making observations and data analysis. Yet, the architecture is designed to cope with a highly variable level of granularity in its input documents, including true lack of information, so that the system can produce results even with unmarked text sources. Prosodical, syntactical and metrical analysis are capable of handling even the lowest level of information, and their algorithms are designed to extract as much data as possible by leveraging the system’s linguistic and metrical competences. At the same time, they can also take advantage of any number and type of additional resources, through integration of third party functionalities, either at the beginning of the analysis process (e.g. macronizers or POS taggers), or at its end (e.g. data science tools, machine learning techniques, etc.). An overview of the system architecture and analysis flow, from text to metrical scan through prosodies and syntax, provides a number of methodologically relevant insights on the system, and on the role and potential of such tools in scholarly research and digital publications.

Introducing Chiron, a Full-Stack Framework for Metrical Analysis: Part 1 – Data Collection

Daniele Fusi
2021-01-01

Abstract

This paper is the first part of an introduction to Chiron, a full-stack software framework for automated metrical analysis, applicable to any language, metrical tradition and digital formats. This part focuses on data collection, which is carefully distinct from data interpretation. In the context of a layered and distributed architecture, the system provides a number of components, which get composed into an analysis pipeline configured to fit a specific scenario (e.g. poetry or prose, Greek or Latin, appositives detection, etc.). The data model for its results is inspired from the linguistic structures underlying the analysis, up to the most granular level of phonemes and traits, and gets stored into a RDBMS, together with the texts and its metadata. This provides a fully detailed metric database, usable as a sort of laboratory for making observations and data analysis. Yet, the architecture is designed to cope with a highly variable level of granularity in its input documents, including true lack of information, so that the system can produce results even with unmarked text sources. Prosodical, syntactical and metrical analysis are capable of handling even the lowest level of information, and their algorithms are designed to extract as much data as possible by leveraging the system’s linguistic and metrical competences. At the same time, they can also take advantage of any number and type of additional resources, through integration of third party functionalities, either at the beginning of the analysis process (e.g. macronizers or POS taggers), or at its end (e.g. data science tools, machine learning techniques, etc.). An overview of the system architecture and analysis flow, from text to metrical scan through prosodies and syntax, provides a number of methodologically relevant insights on the system, and on the role and potential of such tools in scholarly research and digital publications.
2021
63
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/3725652
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact