The paper is not clear to me. For example, in Algorithm 1

It seems only are involved in the following procedure. Why sampling on line 3?
Set is not initialized or populated anywhere, but referenced on line 6. What elements does contain?
There is summation over on line 6, but does not occur in the summed term.
Do and refer to the same tuple? If so, the notation should be unified.
On line 9, argmax is applied to prompt . How does this lead to tuple ?
The elements in are prompts and are tuples . How does the operation work?
I understand line 9 calls the oracle to obtain the preference, which should be explicitly stated.
Set is populated on line 11 but the terminal condition is on line 10. To my understanding the inner loop is an infinite loop?

I encourage the authors to revise the paper for the ease of understanding.