In this article we propose a new model for document clustering, based on game theoretic principles. Each document to be clustered is represented as a player, in the game theoretic sense, and each cluster as a strategy that the players have to choose in order to maximize their payoff. The geometry of the data is modeled as a graph, which encodes the pairwise similarity among each document and the games are played among similar players. In each game the players update their strategies, according to what strategy has been effective in previous games. The Dominant Set clustering algorithm is used to find the prototypical elements of each cluster. This information is used in order to divide the players in two disjoint sets, one collecting labeled players, which always play a definite strategy and the other one collecting unlabeled players, which update their strategy at each iteration of the games. The evaluation of the system was conducted on 13 document datasets and shows that the proposed method performs well compared to different document clustering algorithms.
Document clustering games
TRIPODI, ROCCO;PELILLO, Marcello
2016-01-01
Abstract
In this article we propose a new model for document clustering, based on game theoretic principles. Each document to be clustered is represented as a player, in the game theoretic sense, and each cluster as a strategy that the players have to choose in order to maximize their payoff. The geometry of the data is modeled as a graph, which encodes the pairwise similarity among each document and the games are played among similar players. In each game the players update their strategies, according to what strategy has been effective in previous games. The Dominant Set clustering algorithm is used to find the prototypical elements of each cluster. This information is used in order to divide the players in two disjoint sets, one collecting labeled players, which always play a definite strategy and the other one collecting unlabeled players, which update their strategy at each iteration of the games. The evaluation of the system was conducted on 13 document datasets and shows that the proposed method performs well compared to different document clustering algorithms.I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.