Document similarity self-join with MapReduce