In this paper we will present an approach to natural language processing which we define as "hybrid", in which symbolic and statistical approaches are reconciled. In fact, we claim that the analysis of natural language sentences (and texts) is at the same time a deterministic and a probabilistic process. In particular, the search space for syntactic analysis is inherently deterministic, in that it is severely limited by the grammar of the specific language, which in turn is constituted by a set of peripheral rules to be applied in concomitance with the more general rules of core grammar. Variations on these rules are only determined by genre, which can thus contribute a new set of peripheral rules, or sometimes just a set of partially overlapping rules to the ones already accepted by a given linguistic community. In the paper we will criticize current statistical approaches for being inherently ill -founded and derived from a false presupposition. People working in the empirical framework have tried to credit the point of view that what happened to the speech research paradigm was also applicable to the NLP paradigm as a whole. In other words, the independence hypothesis which is at the heart of the use of Markov models imported from the speech community into the empirical statistical approaches to NLP does not seem to be well suited to the task at hand simply because the linguistic unit under consideration – the word, the single tag or both – are insufficient to assure enough contextual information, given language model building techniques based on word- tags with tagsets containing only lexical and part - of - speech information. In line with the effort carried out by innovative approaches like the ones proposed within the LinGO ERG proposal, we purport our view that the implementation of sound parsing algorithm must go hand in hand with sound grammar construction. Extragrammaticalities can be better coped with within a solid linguistic framework rather than without it. A number of parsing strategies and graceful recovery procedures are then proposed which follow a strictly parameterized approach to their definition and implementation. Finally, a shallow or partial parser needs to be implemented in order to produce the default baseline output to be used by further computation, like Information Retrieval, Summarization and other similar tools which however need precise linguistic information with a much higher coverage than what is being recently offered by the currently available parsers.

Deep and Shallow Linguistically Based Parsing

DELMONTE, Rodolfo
2005

Abstract

In this paper we will present an approach to natural language processing which we define as "hybrid", in which symbolic and statistical approaches are reconciled. In fact, we claim that the analysis of natural language sentences (and texts) is at the same time a deterministic and a probabilistic process. In particular, the search space for syntactic analysis is inherently deterministic, in that it is severely limited by the grammar of the specific language, which in turn is constituted by a set of peripheral rules to be applied in concomitance with the more general rules of core grammar. Variations on these rules are only determined by genre, which can thus contribute a new set of peripheral rules, or sometimes just a set of partially overlapping rules to the ones already accepted by a given linguistic community. In the paper we will criticize current statistical approaches for being inherently ill -founded and derived from a false presupposition. People working in the empirical framework have tried to credit the point of view that what happened to the speech research paradigm was also applicable to the NLP paradigm as a whole. In other words, the independence hypothesis which is at the heart of the use of Markov models imported from the speech community into the empirical statistical approaches to NLP does not seem to be well suited to the task at hand simply because the linguistic unit under consideration – the word, the single tag or both – are insufficient to assure enough contextual information, given language model building techniques based on word- tags with tagsets containing only lexical and part - of - speech information. In line with the effort carried out by innovative approaches like the ones proposed within the LinGO ERG proposal, we purport our view that the implementation of sound parsing algorithm must go hand in hand with sound grammar construction. Extragrammaticalities can be better coped with within a solid linguistic framework rather than without it. A number of parsing strategies and graceful recovery procedures are then proposed which follow a strictly parameterized approach to their definition and implementation. Finally, a shallow or partial parser needs to be implemented in order to produce the default baseline output to be used by further computation, like Information Retrieval, Summarization and other similar tools which however need precise linguistic information with a much higher coverage than what is being recently offered by the currently available parsers.
UG and External Systems
File in questo prodotto:
File Dimensione Formato  
LBPH-finale.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Accesso gratuito (solo visione)
Dimensione 2.47 MB
Formato Adobe PDF
2.47 MB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/10278/11159
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact