Chapter 12 - Detecting conversational groups in images using clustering games

Vascon, Sebastiano; Pelillo, Marcello

doi:10.1016/B978-0-12-814601-9.00024-9

Detecting groups of people in images is of great importance in many different contexts such as video surveillance, activity recognition and social robotics. Standing conversational groups (a.k.a. F-formations) represent a well-studied class of social interactions which play a prominent role in everyday human interactions. An F-formation is a type of social aggregation occurring when two or more persons are engaged in a conversation, of the type taking place, e.g., at a cocktail party or at a coffee break. Essentially, an F-formation defines a set of constraints on how the interactants have to be mutually located and oriented, and also the plausible zone in which the interactions may occur. In this chapter, we will describe an approach to detecting groups of conversing people in images based on game theory. The approach improves upon existing methods by building a stochastic model of social attention which captures the likelihood that two individuals take part in a conversation. This is used to derive a payoff function between detected individuals which defines the underlying clustering game. As it turns out, the stable equilibrium points of this game represent maximally coherent groups, and we used simple and effective evolutionary game dynamics to extract them. Extensive experimental results on several publicly available benchmark datasets demonstrate the superiority of the proposed approach over standard methods.