Information retrieval and extraction from forums, complaints and technical reviews

Quintavalle, Bruno

Complaints and technical reviews often describe complex problems, most of the times in very articulated ways. Over that kind of corpora, we are considering here three classical tasks: Information Retrieval, Text Classification and Information Extraction. In this context however, these tasks should take into special consideration the structure of the sentence, with special attention to verbal phrases, as complaints are usually descriptions of actions that have been performed whilst they shouldn’t (or the other way around). We want to leverage results from traditional NLP tasks like Semantic Role Labeling and Dependency Parsing, but also to employ the most recent advances in the field of Word and Sentence Embedding. Moreover, Semantic Web technologies should be employed when background knowledge is required. In order to deal with these three heterogeneous approaches, a particular implementation of the SPARQL query language has been developed. It provides a language for template extraction that seamlessly mixes the state of the art of the above-mentioned tasks. Its main difference from SPARQL is the ability to deal with similarity and uncertainty. However, its syntax is strictly the same, simplifying the integration with OWL ontologies and allowing its use as an endpoint for other engines in a federated query context. The case studies illustrated here focuses mainly on problems related to telecommunication companies, using publicly available corpora and forums threads extracted from the web. However, the language has been designed to be used in any context that requires extracting information from user generated corpora of complex technical descriptions.

Information retrieval and extraction from forums, complaints and technical reviews / Quintavalle, Bruno. - (2019 Jul 18).