暂无评分数据
ICLR 2025
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement
TL;DR
We propose an algorithm that could learn a stochastic policy in offline Multi-armed bandit in the data-starved case, where there are only a few samples for each arm.
摘要
关键词
Multi-armed bandithigh dimensional decision makingreinforcement learning.
评审与讨论
作者撤稿通知
I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.