The matched case–control design, up until recently mostly pertinent to epidemiological studies, is becoming customary in biomedical applications as well. For instance, in omics studies, it is quite common to compare cancer and healthy tissue from the same patient. Furthermore, researchers today routinely collect data from various and variable sources that they wish to relate to the case–control status. This highlights the need to develop and implement statistical methods that can take these tendencies into account. We present an R package penalizedclr, that provides an implementation of the penalized conditional logistic regression model for analyzing matched case–control studies. It allows for different penalties for different blocks of covariates, and it is therefore particularly useful in the presence of multi-source omics data. Both L1 and L2 penalties are implemented. Additionally, the package implements stability selection for variable selection in the considered regression model. The proposed method fills a gap in the available software for fitting high-dimensional conditional logistic regression models accounting for the matched design and block structure of predictors/features. The output consists of a set of selected variables that are significantly associated with case–control status. These variables can then be investigated in terms of functional interpretation or validation in further, more targeted studies.

penalizedclr: an R package for penalized conditional logistic regression for integration of multiple omics layers

Djordjilovic, Vera
;
2024-01-01

Abstract

The matched case–control design, up until recently mostly pertinent to epidemiological studies, is becoming customary in biomedical applications as well. For instance, in omics studies, it is quite common to compare cancer and healthy tissue from the same patient. Furthermore, researchers today routinely collect data from various and variable sources that they wish to relate to the case–control status. This highlights the need to develop and implement statistical methods that can take these tendencies into account. We present an R package penalizedclr, that provides an implementation of the penalized conditional logistic regression model for analyzing matched case–control studies. It allows for different penalties for different blocks of covariates, and it is therefore particularly useful in the presence of multi-source omics data. Both L1 and L2 penalties are implemented. Additionally, the package implements stability selection for variable selection in the considered regression model. The proposed method fills a gap in the available software for fitting high-dimensional conditional logistic regression models accounting for the matched design and block structure of predictors/features. The output consists of a set of selected variables that are significantly associated with case–control status. These variables can then be investigated in terms of functional interpretation or validation in further, more targeted studies.
2024
25
File in questo prodotto:
File Dimensione Formato  
s12859-024-05850-2.pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 1.33 MB
Formato Adobe PDF
1.33 MB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5062863
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact