Modern data collection techniques allow to analyze a very large number of endpoints. In biomedical research, for example, expressions of thousands of genes are commonly measured only on a small number of subjects. In these situations, traditional methods for comparison studies are not applicable. Moreover, the assumption of normal distribution is often questionable for high-dimensional data, and some variables may be at the same time highly correlated with others. Hypothesis tests based on interpoint distances are very appealing for studies involving the comparison of means, because they do not assume data to come from normally distributed populations and comprise tests that are distribution free, unbiased, consistent, and computationally feasible, even if the number of endpoints is much larger than the number of subjects. New tests based on interpoint distances are proposed for multivariate studies involving simultaneous comparison of means and variability, or the whole distribution shapes. The tests are shown to perform well in terms of power, when the endpoints have complex dependence relations, such as in genomic and metabolomic studies. A practical application to a genetic cardiovascular case-control study is discussed.

Interpoint distance tests for high-dimensional comparison studies

Marco Marozzi
;
2020

Abstract

Modern data collection techniques allow to analyze a very large number of endpoints. In biomedical research, for example, expressions of thousands of genes are commonly measured only on a small number of subjects. In these situations, traditional methods for comparison studies are not applicable. Moreover, the assumption of normal distribution is often questionable for high-dimensional data, and some variables may be at the same time highly correlated with others. Hypothesis tests based on interpoint distances are very appealing for studies involving the comparison of means, because they do not assume data to come from normally distributed populations and comprise tests that are distribution free, unbiased, consistent, and computationally feasible, even if the number of endpoints is much larger than the number of subjects. New tests based on interpoint distances are proposed for multivariate studies involving simultaneous comparison of means and variability, or the whole distribution shapes. The tests are shown to perform well in terms of power, when the endpoints have complex dependence relations, such as in genomic and metabolomic studies. A practical application to a genetic cardiovascular case-control study is discussed.
File in questo prodotto:
File Dimensione Formato  
jas2020.pdf

non disponibili

Descrizione: articolo completo
Tipologia: Versione dell'editore
Licenza: Accesso chiuso-personale
Dimensione 1.5 MB
Formato Adobe PDF
1.5 MB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/10278/3716345
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 9
social impact