Interpoint distance tests for high-dimensional comparison studies

Modern data collection techniques allow to analyze a very large number of endpoints. In biomedical research, for example, expressions of thousands of genes are commonly measured only on a small number of subjects. In these situations, traditional methods for comparison studies are not applicable. Moreover, the assumption of normal distribution is often questionable for high-dimensional data, and some variables may be at the same time highly correlated with others. Hypothesis tests based on interpoint distances are very appealing for studies involving the comparison of means, because they do not assume data to come from normally distributed populations and comprise tests that are distribution free, unbiased, consistent, and computationally feasible, even if the number of endpoints is much larger than the number of subjects. New tests based on interpoint distances are proposed for multivariate studies involving simultaneous comparison of means and variability, or the whole distribution shapes. The tests are shown to perform well in terms of power, when the endpoints have complex dependence relations, such as in genomic and metabolomic studies. A practical application to a genetic cardiovascular case-control study is discussed.

Interpoint distance tests for high-dimensional comparison studies

Marco Marozzi;Amitava Mukherjee;Jan Kalina

2020-01-01

Abstract

Modern data collection techniques allow to analyze a very large number of endpoints. In biomedical research, for example, expressions of thousands of genes are commonly measured only on a small number of subjects. In these situations, traditional methods for comparison studies are not applicable. Moreover, the assumption of normal distribution is often questionable for high-dimensional data, and some variables may be at the same time highly correlated with others. Hypothesis tests based on interpoint distances are very appealing for studies involving the comparison of means, because they do not assume data to come from normally distributed populations and comprise tests that are distribution free, unbiased, consistent, and computationally feasible, even if the number of endpoints is much larger than the number of subjects. New tests based on interpoint distances are proposed for multivariate studies involving simultaneous comparison of means and variability, or the whole distribution shapes. The tests are shown to perform well in terms of power, when the endpoints have complex dependence relations, such as in genomic and metabolomic studies. A practical application to a genetic cardiovascular case-control study is discussed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2020
			
	Titolo della Rivista
	
				JOURNAL OF APPLIED STATISTICS
			
	N° Volume
	
				47
			
	DOI
	
				https://dx.doi.org/10.1080/02664763.2019.1649374
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
jas2020.pdf non disponibili Descrizione: articolo completo Tipologia: Versione dell'editore Licenza: Accesso chiuso-personale Dimensione 1.5 MB Formato Adobe PDF Visualizza/Apri	1.5 MB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/3716345

Citazioni

ND

22

21

social impact