Matchtigs: minimum plain text representation of k-mer sets

We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26x over unitigs and 2.10x over previous work.

Matchtigs: minimum plain text representation of k-mer sets

Schmidt, Sebastian;Khan, Shahbaz;Alanko, Jarno N;Pibiri, Giulio E;Tomescu, Alexandru I

2023-01-01

Abstract

We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26x over unitigs and 2.10x over previous work.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2023
			
	Titolo della Rivista
	
				GENOME BIOLOGY
			
	N° Volume
	
				24
			
	DOI
	
				https://dx.doi.org/10.1186/s13059-023-02968-z
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
GBIO2023.pdf accesso aperto Tipologia: Versione dell'editore Licenza: Accesso libero (no vincoli) Dimensione 2.75 MB Formato Adobe PDF Visualizza/Apri	2.75 MB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5034706

Citazioni

5

11

9

social impact