Veterinary epidemiology, one of the multifaceted applications of statistics, primarily aims to investigate hypothesized relationships between covariates or predictors of interest and one, or more, outcome variables. Commonly, the biological processes, which generated the data, are extremely complex, resulting in multiple dependencies between explanatory and response variables. Standard epidemiological and statistical approaches have shown a limited ability to sufficiently describe such inter-dependent multivariate connections. The following work extends and improves a methodology that addresses these issues: additive Bayesian networks (ABNs). ABNs are types of graphical model that extend the usual Generalized Linear Model (GLM) to multiple dependent variables through the representation of their joint probability. The PhD thesis consists of four parts. The work begins with the presentation of the commonly ‘used’ ABN methodology in veterinary epidemiology. Two relevant case studies are presented, giving evidence that ABN models offer added value compared to existing standard statistical and epidemiological methods, i.e., GLM. The multivariate data analyzed are mainly binary, but also continuous and count data. The objective of the first case study was to identify factors associated with Leptospira interrogans sv Pomona infection by exploring the advantages and disadvantages of the two methodologies. Thanks to ABN’s capacity to model the relationships between all the variables, the results prove that personal protective gears increased the odds of infection, hence they are in fact not protective. This information was not obtained when the data were analyzed only with GLM. The second case study examines the attitudes of Austrian veterinarians towards euthanasia of small animals. Association between gender and age with views on euthanasia have been found. ABN methodology helped to disentangle the role of gender in relation with age, mainly young females working in small animal practices were influencing the outcome. These features were revealed by ABN due to its ability to capture the natural complexity of data more effectively. Evidence on the importance of the number of veterinarians working together was demonstrated considering the highest number of links, in ABN models, to others variables. This highlights the supporting role of a team in stressful situations. To ensure robustness and reliability of ABN models a parametric bootstrapping approach was implemented, using a Markov Chain Monte Carlo (MCMC) technique in the software JAGS. The third part consists of the update and improvement of a software for fitting and learning ABN models: the R package abn. Modifications of functions, more related to the model graphical representation, were implemented and the documentations related to the R package entirely restructured and rewritten. The final part of this work relies on an improvement related to the underlying theory for ABN models. Two main challenges posed by Bayesian model selection have been addressed: the specification of parameter priors and the computation of the resulting posterior model probabilities via the marginal likelihood. A suitable conjugate prior for ABN which generalizes the Dirichlet density for additive parameters has been introduced. This prior satisfies the desirable independence assumptions for Bayesian networks and overcomes the issue of complete data separation occurring with previous prior choices. Furthermore, an analytic expression for the marginal likelihood was found, which avoids using the Laplace Approximation or MCMC method. Then, the score equivalence property, i.e., equivalent networks get the same score function, has been shown. This work contributes to a better promotion of ABN methodology by illustrating their practical application to veterinary epidemiology, by improving software useful to deal with these models and by gaining better knowledge of the posterior density and an easier computation of the marginal likelihood.

Additive Bayesian networks for multivariate data: parameter learning, model fitting and applications in veterinary epidemiology

Pittavino Marta
2016-01-01

Abstract

Veterinary epidemiology, one of the multifaceted applications of statistics, primarily aims to investigate hypothesized relationships between covariates or predictors of interest and one, or more, outcome variables. Commonly, the biological processes, which generated the data, are extremely complex, resulting in multiple dependencies between explanatory and response variables. Standard epidemiological and statistical approaches have shown a limited ability to sufficiently describe such inter-dependent multivariate connections. The following work extends and improves a methodology that addresses these issues: additive Bayesian networks (ABNs). ABNs are types of graphical model that extend the usual Generalized Linear Model (GLM) to multiple dependent variables through the representation of their joint probability. The PhD thesis consists of four parts. The work begins with the presentation of the commonly ‘used’ ABN methodology in veterinary epidemiology. Two relevant case studies are presented, giving evidence that ABN models offer added value compared to existing standard statistical and epidemiological methods, i.e., GLM. The multivariate data analyzed are mainly binary, but also continuous and count data. The objective of the first case study was to identify factors associated with Leptospira interrogans sv Pomona infection by exploring the advantages and disadvantages of the two methodologies. Thanks to ABN’s capacity to model the relationships between all the variables, the results prove that personal protective gears increased the odds of infection, hence they are in fact not protective. This information was not obtained when the data were analyzed only with GLM. The second case study examines the attitudes of Austrian veterinarians towards euthanasia of small animals. Association between gender and age with views on euthanasia have been found. ABN methodology helped to disentangle the role of gender in relation with age, mainly young females working in small animal practices were influencing the outcome. These features were revealed by ABN due to its ability to capture the natural complexity of data more effectively. Evidence on the importance of the number of veterinarians working together was demonstrated considering the highest number of links, in ABN models, to others variables. This highlights the supporting role of a team in stressful situations. To ensure robustness and reliability of ABN models a parametric bootstrapping approach was implemented, using a Markov Chain Monte Carlo (MCMC) technique in the software JAGS. The third part consists of the update and improvement of a software for fitting and learning ABN models: the R package abn. Modifications of functions, more related to the model graphical representation, were implemented and the documentations related to the R package entirely restructured and rewritten. The final part of this work relies on an improvement related to the underlying theory for ABN models. Two main challenges posed by Bayesian model selection have been addressed: the specification of parameter priors and the computation of the resulting posterior model probabilities via the marginal likelihood. A suitable conjugate prior for ABN which generalizes the Dirichlet density for additive parameters has been introduced. This prior satisfies the desirable independence assumptions for Bayesian networks and overcomes the issue of complete data separation occurring with previous prior choices. Furthermore, an analytic expression for the marginal likelihood was found, which avoids using the Laplace Approximation or MCMC method. Then, the score equivalence property, i.e., equivalent networks get the same score function, has been shown. This work contributes to a better promotion of ABN methodology by illustrating their practical application to veterinary epidemiology, by improving software useful to deal with these models and by gaining better knowledge of the posterior density and an easier computation of the marginal likelihood.
2016
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_Pittavino.pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Accesso libero (no vincoli)
Dimensione 6.54 MB
Formato Adobe PDF
6.54 MB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5052360
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact