We thank the reviewer for your valuable and detailed comments. Please see our response below.

Q1. It is nice to have minimax regret upper/lower bound.

We are grateful to the reviewer for highlighting the significance of deriving the minimax regret bound. This bound is crucial as it showcases the algorithm’s performance in scenarios where the minimum gap is extremely small. In our work, we derived the instance-dependent regret upper and lower bounds, which aligns with common practices in the multi-player multi-armed bandit literature. Deriving the minimax regret upper and lower bounds represents an interesting and valuable direction for future research. We look forward to exploring this area further. The discussion of alternative approach to derive the minimax regret bound will be added in the updated version.

Q2. Developing UCB or TS-based algorithms would be more useful.

We recognize that algorithms based on UCB and TS are more adaptive compared to the elimination-based algorithm. The reason lies in the fact that once an arm is eliminated in the elimination - based algorithm, it will no longer be explored. However, it is important to note that in the heterogeneous multi-player bandit setting, the elimination-based algorithm excels at distributing exploration efforts among different players in a round-robin manner. This is a crucial step in conflict resolution and adaptation to a decentralized environment. In contrast, UCB or TS-based algorithms are more prone to encounters of collisions and non-uniform exploration patterns among players. Furthermore, designing a decentralized UCB or TS-based algorithm poses significant challenges. The absence of a platform to allocate arms in each round makes it difficult to implement such algorithms in a decentralized context.

Q3. In Section 3.1, why and use the statistics of ?

Since this paper studies the heterogeneous multi-player multi-armed bandit setting, each player-arm pair shares an expected reward . Thus we need to design and in terms of to control the confidence radius of estimation .