On selection and conditioning in multiple testing and selective inference

We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting as well as modern data-carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this article, we take a holistic view of such methods, considering the selection, conditioning and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We provide general theory and intuition before investigating in detail several case studies where a shift to a nonselective or unconditional perspective can yield a power gain.

On selection and conditioning in multiple testing and selective inference

Goeman, Jelle J;Solari, Aldo

2024

Abstract

We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting as well as modern data-carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this article, we take a holistic view of such methods, considering the selection, conditioning and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We provide general theory and intuition before investigating in detail several case studies where a shift to a nonselective or unconditional perspective can yield a power gain.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2024
			
	Titolo della Rivista
	
				BIOMETRIKA
			
	N° Volume
	
				111
			
	DOI
	
				https://dx.doi.org/10.1093/biomet/asad078
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
asad078.pdf accesso aperto Tipologia: Versione dell'editore Licenza: Creative commons Dimensione 559.94 kB Formato Adobe PDF Visualizza/Apri	559.94 kB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5059422

Citazioni

ND

2

1

social impact