PaperHub

暂无评分数据

ICLR 2025

Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement

OpenReviewPDF
提交: 2024-09-27更新: 2024-11-03
TL;DR

We propose an algorithm that could learn a stochastic policy in offline Multi-armed bandit in the data-starved case, where there are only a few samples for each arm.

摘要

关键词
Multi-armed bandithigh dimensional decision makingreinforcement learning.

评审与讨论

撤稿通知

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.