In Figure 1, the best traditional CTR baseline really outperforms fine-tuned LLM in both head and tail items. However, in Table 2 and Table 3, traditional CTR baselines almost cannot outperform LLM-based baselines. Why does this happen? In clickpromt, authors claimed that semantic information is helpful in tail item prediction compared with traditional CTR models.
The prompt strategy is really important to fine-tune LLM for CTR tasks and also influence Equation 2 . Should this be considered when the author argues feature attribution is important?
The datasets in this paper are too small to verify the cost of SLLMCTR because the click embedding matrix is heavily related to the batch size, which is set to a small number of 32.
Can authors give an in-depth analysis of why adaptive temperature and label matching loss works?