PaperHub
6.3
/10
Poster4 位审稿人
最低4最高8标准差1.8
5
4
8
8
4.3
置信度
正确性3.0
贡献度2.8
表达3.3
NeurIPS 2024

Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists

OpenReviewPDF
提交: 2024-05-15更新: 2024-11-06
TL;DR

Small user collectives can effectively promote artists through simple reordering of playlists, leveraging the sequential nature of transformer-based recommendation systems.

摘要

We investigate algorithmic collective action in transformer-based recommender systems. Our use case is a collective of fans aiming to promote the visibility of an underrepresented artist by strategically placing one of their songs in the existing playlists they control. We introduce two easily implementable strategies to select the position at which to insert the song and boost recommendations at test time. The strategies exploit statistical properties of the learner to leverage discontinuities in the recommendations, and the long-tail nature of song distributions. We evaluate the efficacy of our strategies using a publicly available recommender system model released by a major music streaming platform. Our findings reveal that even small collectives (controlling less than 0.01% of the training data) can achieve up to $40\times$ more test time recommendations than songs with similar training set occurrences, on average. Focusing on the externalities of the strategy, we find that the recommendations of other songs are largely preserved, and the newly gained recommendations are distributed across various artists. Together, our findings demonstrate how carefully designed collective action strategies can be effective while not necessarily being adversarial.
关键词
collective actionplatform powersequential recommender systemstransformer modelsmusic recommendation

评审与讨论

审稿意见
5

This paper explores the possibility of boosting the recommendation of a song in an automatic playlist continuation system by using a collective strategy of adding the song into the training playlists of the APC at a specific position. The paper shows that adopting a strategy that targets low frequency context makes it possible to very significantly boost the exposition of the song in the output of the APC.

优点

  • Very interesting idea which is novel in the domain of music recommendation
  • Quite surprising experimental result that is worth sharing (possibility of boosting exposition of a song with the DirLof strategy).

缺点

  • Limited scope:
    • The idea is tested in the limited context of automatic playlist continuation and is likely not adaptable to broader applications.
    • Only one APC model was tested, so the results may be specific to this model. It would have been a good idea to test the impact of the collective action on other models (such as the baseline proposed in the reference paper for the APC model). Also, it’s unclear whether the effect would be robust to hyperparameter changes in the APC model.
    • the method is tested in a static context (not in the real world), so the actual impact of the method on usage (for instance, whether a user exposed through the APC to a song boosted by the collective strategy would listen to it or not) is not tested (this would require access to the production of a streaming music service, though)
  • Very limited insights on the design of the efficient strategy (DirLof) and why it works. The presentation of the strategy is actually quite unclear, while it’s central to the paper.
  • There may be other simpler baselines that would be worth testing. The paper shows that inserting the song at the end of the playlists is less performance than inserting the song randomly, which suggests that inserting it earlier in the playlist sequence may help. So a baseline that would always insert the song at the beginning of the sequence could have decent results (the first place is likely to be avoided as it would mean that the song would never appear as the target in the training of the APC, but low positions that appear regularly as targets in the training could be considered).

问题

  • Could the authors rephrase / clarify the definition of the DirLof strategy and rationale behind it? something which is very surprising is that “the collective selects the anchor songs s0 with the smallest overall song frequency”. But it’s likely that smallest song frequency is for song appearing only once among playlists and these song are then mostly noise in playlists and it’s hard to get why those anchors would help improving recommandation of the inserted song. Also, statistics on songs with low frequency are likely much more difficult to obtain (the statistics will be much less reliable than for popular songs).
  • It’s very unclear what the distributions over playlists are (P, P0, P*). It seems it’s almost used as a synonym of dataset, and the expectations are actually just empirical expectations. I’d like the author to clarify this point and to explain why it’s necessary to introduce such notations.
  • In table 2, it’s unclear how statistics are estimated when there is no training data to compute them from (while there is still a significant amplification).

局限性

  • As mentioned in the weaknesses, there are other simple baselines that would be worth testing. That would help support the claim that the specially designed DirLof strategy is actually efficient.
  • Also the authors should comment on the possible transferability of their results to other APC models, and to modifications of hyperparameters of the model.

On the ethical side, the method is a bit too much presented as a way for artists to attack a recommender system, somewhat rerank their songs and make money out of it, which has some important ethical implications in terms of fairness among artists: As the payment system of most streaming services is based on subscription (so a fixed amount of money), artificially boosting an artist comes at the detriment of other artists. I would then rather avoid this message of "opportunity for artists" in the paper (and even in the title) and turn it instead as a warning to music streaming platforms that recommender systems may lack robustness to attacks and that they should be aware of it and take action. I think presenting the paper in this second way would solve this ethical issue, but the current message is, to me, ethically borderline.

Minor comments

  • The authors claim that “most large platforms have shifted from relying on collaborative-based models for APC to building deep learning based recommenders …” but 1) the references are only Spotify based 2) the references are research papers which doesn’t mean that the shift actually happened and that platforms no longer use collaborative-based models part in their complex architecture.

  • In equation (1):

    • The authors likely want to specify that the recommended song shouldn’t be part of the playlist i.e S’ p = in the argmax
    • The argmax actually corresponds to the top K elements in terms of similarity, which is quite straightforward. But the sum in the equation makes it not very clear at first read. It could be worth it to state it explicitly before the equation.
  • There are several notations that use the letter h (song embedding, playlist embedding, playlist mapping). I think using different letters may help make things clearer.

  • In Figure 3, it seems like s* is subtituted to s_i (there is no s_i in (b)) while just before equation 4, it’s said that only insertions are considered.

  • Existing data poisoning techniques, for example, operate in different settings, not complying with Constraint 1 and typically assuming white-box access to the model or involving test-time profile manipulations.” ⇒ this needs references.

  • *“Thus, collective action can make a tremendous difference for these artists: suppose an artist’s song is streamed 10,000 times, yielding a revenue for the artist of 40forroyaltiesof40 for royalties of 0.004 per stream [32]; an amplification of 25 would increase this revenue to 1,000.Iunderstandthisispurelyillustrative,butthefigureswontreflectmuchthetruth.First0.0041, 000.”* I understand this is purely illustrative, but the figures won’t reflect much the truth. First 0.004 per stream is an average that depends on the platform, but also on the kind of registration of the user and on the country of the user with quite important variations. Second, several platforms (Spotify, Deezer…) are shifting to a payment model were all streams don’t have the same value, especially recommended streams are “discounted”, and some “real” artist may get a boost. Once again, I get it’s for illustrative purpose, but it brings confusion on how music payment system works, and then should be avoided in my opinion.

  • “Moreover, when considering s as a relevant recommendation, collective action even enhances the system’s performance”* Isn’t this completely obvious? if presentation of s* is boosted (which happens, given the amplification), other recommendations are barely affected (as shown on figure 6, solid line), and you consider s* a valid recommendation, then for sure recommendation metrics will increase, won’t they?

作者回复

Thank you for the careful reading and the detailed feedback. We have performed the additional investigations in response to your suggestions. Let us elaborate on the individual comments below.

Other baselines. We have implemented baselines where the song is placed in earlier positions in the playlist. The resulting amplification is shown in Table 1 in the supplementary PDF. None of the simple baselines is better than random.

Robustness to hyperparameters. We have modified several hyperparameters of the Deezer model (number of attention heads, learning rate, dropout rate, weight decay). The results, presented in Table 3 of the supplementary PDF, show that the DirLoF strategy consistently outperforms the random strategy across all configurations.

These results are not surprising, as the strategy builds on a generalizable statistical intuition rather than specifics of the model architecture. This makes the strategies robust to model configurations, provided the model approximates the conditional probabilities in the training data sufficiently well.

Rationale behind DirLoF. A sequential recommender is trained to predict the K most likely songs to follow a given seed context. The lever that DirLoF exploits is that certain contexts are overrepresented in the playlists the collective controls. Taking the random baseline as a reference, Fig. 4 shows that for α<0.001\alpha<0.001, randomly inserting ss^* is unlikely to lead to recommendations at test time. But by concentrating effort on overrepresented contexts the collective can meet the threshold more often. To identify overrepresented contexts, DirLoF uses that certain songs with global frequency <α< \alpha still appear in the collective. DirLoF targets contexts that end on such overrepresented songs.

As you mentioned, DirLoF relies on global frequency estimates of songs in the overall training data, and estimating small frequencies from few data points is hard in general. To justify this, we have implemented the strategy with only approximate statistics. And we find that the strategy is still effective; knowing only 10% of the data is sufficient to achieve decent gains (see Fig. 12 in the Appendix). Even scraping public song statistics that do not match the year of the data (we can only scrape current statistics, e.g., total number of streams) provides some useful signal. This convinced us that it is a worthwhile and feasible approach to pursue for small collectives.

Notation. The notation (PP,PP^*,P0P_0) serves to frame our work within the original framework of algorithmic collective action by Hardt et al. 2023. There is a dataset DD of NN playlists sampled from a universe of playlists P0P_0, and a fraction α\alpha for these playlists is modified according to the strategy hh. As a result, the platform does not observe samples from the base distribution, but from a mixture PP. Training is done using empirical expectations. At test time, the model is evaluated against unseen samples from P0P_0. We will make this more clear in the write-up.

Table 2. We show average recommendations of (anchor) songs at test time and the difference in recommendations (ΔR\Delta R) with vs. without collective action. These statistics are computed based on results from 5 different train-test fold splits. For each fold, we report the mean and the 95% confidence intervals.

Questions related to framing and fairness:

The focus of algorithmic collective action is to demonstrate that participants can exert control over the algorithm a platform deploys through strategic data reporting. This is a powerful message because it gives users a lever in an otherwise highly imbalanced power relationship. As the title suggests, we show how ‘participants can promote songs through sequence reordering’.

As you point out, how this lever is used can lead to different solutions. The redistribution of recommendations is inherently zero-sum among artists due to the limited budget of recommendations. But it is worth noting that this applies to any updates to a recommendation algorithm, as well as features like ‘Showcase’ on Spotify where one can pay for recommendations.

In an ideal world, this can be used for bottom-up unfairness mitigation in recommender systems, however, a positive effect on overall welfare is by no means guaranteed. But it is not justified to assume that every update a platform implements is beneficial and ethical, or that every update participants promote is harmful. Thus, we think it is valuable to take an approach complementing much of the existing literature and present collective action as a potential opportunity, rather than only a threat, as users often pursue legit interests. Understanding how interactions can be designed to promote positive change while suppressing exploitation is a very intriguing question. What our work contributes is empirical evidence that the fully adversarial model is not sufficiently nuanced to address this question.

Collective action increases system performance. This is only obvious once you take the perspective of participants being economic agents pursuing their own legitimate interests. If incentives were completely misaligned such a gain would not be possible.

Example about revenue. The purpose of this example is to illustrate how recommendations relate to monetary value. A study of collective action is incomplete without discussing these incentives, and it is hard to deny that more recommendations imply more revenue. But we will make this point without explicit examples to avoid misunderstanding.

We hope to have convinced the reviewer of the robustness of our proposed strategies and the valuable perspective for the NeurIPS community. In any case, we will dedicate a separate section in the final version to expand on the results in Appendix D.4 and discuss effects on other artists in more detail, making negative externalities and the need for more work very clear. Thank you for the feedback!

评论

We are glad we could address most of your concerns with the additional experiments and clarifications. Let us follow up on the last one related to the potential for abuse of collective actions strategies. We agree with the reviewer that misuse is a concern and should not be understated. But we still think it is important that the potential for abuse does not prevent the community from working on algorithmic collective action as an interesting and valuable research direction.

That being said, we have revisited the paper with your comments in mind, and we can see how it could benefit from additional discussion and a broader perspective. We are happy to use the additional page in the final version to move the broader impact statement to the main body, as suggested by the ethics review, to acknowledge the limitations more prominently. While we have mentioned the threat that a single individual could control many playlists, we will also be more specific that a popular artist could have an advantage to gather large collectives. To further mitigate your concern we will also complement this by expanding more on the study of negative externalities on other artists in Section 4.3. We hope this is satisfactory and you consider voting for acceptance after these changes.

评论

Thanks for the extra comment. Based on this, I will increase my ratings.

评论

I'd like to thank the authors for the very detailed answer to my review. Most of my concerns are now addressed. However, I have to disagree on one of the most important ones: presenting the proposed method as a way to empower users to support their artist is, to me, oversimplistic and a bit disconnected from the reality of streaming services. Collective actions (not linked to recommendation) to divert revenue has been widely used in the music streaming industry, first by legit artists (see the Sleepify album by vulfpeck) but then (much more widely) by fraudsters. Both cases are unfair because they trick the payment system, but it's obviously even worse when done by fraudsters. The proposed method also concerns tricking royalties payment through the recommender system, which raises important fairness issues and is obviously also open to fraudsters (who usually control many accounts) and could also be used by major labels (that have strong marketing power) to divert revenue. I think then the work has critical ethical implications that must be discussed and acknowledged and that the positioning of the proposed method as an opportunity is problematic. I won't change my rating if this aspect is not considered in the paper.

审稿意见
4

They proposed a strategy for the streaming platform users to collectively act in order to promote targeted songs. The promotion efficacy is measured by the targeted songs' recommendation frequency boost at testing time. This strategy is approved to be effective through simulation experiment. Another finding is that this strategy has minimum impact to the performance of the recommendation system as a whole, i.e., by preserving user experience to other non-targeted songs.

优点

  • This is a novel idea, presented in an interesting domain with a good amount of related work.
  • The writing of the paper is clear. The motivation is sound.
  • The experiment performed by the author successfully verified the efficacy of the collective strategy.

缺点

  • Limited scope. See limitations below.
  • Lack of technical novelty and contribution. The evaluation result would be a good report but I would recommend the author to seek publication in a different conference.

问题

The context selection methods discussed in section 3.2 seems pretty arbitrary to me. What reason or motivation does it make you choose the InClust or DirLoF approach? How would you approve these are guaranteed to work?

What if I say a good approach could be to find the most similar song (embedding similar higher than x) that is at least y popular and place our targeted song z ( z could be -1, 1, 2, 3) positions after it?

局限性

The strategy could be effective but it is built on top of the assumption that the serving platform has deployed minimum control against collective behavior. As I am aware, there are various user-side anomaly detection mechanisms usually deployed in production in streaming services like what is described in the paper. As an example, there could be real time monitoring of such collective behaviors. Such an anomaly detection system could be constructed by building a user-entity graph. In this use case, we could use artist or songs as the entity. When in a short period of time, there are bursty events that are related to an entity that happens (user promoting a song in their playlist), it could be an important indicator that something unusual happens (because this is not a popular song, it does not receive so many promotes on average). A follow-up action in the system could tag those relevant data as "spurious" thus they never go to the training data of the transformer model (for continuous training). As a result, the collective action could have a significantly less effect when such monitoring is turned on; or depending on what threshold they are set up.

作者回复

Thank you for your feedback. Let us elaborate on your comments below.

Intuition for the proposed strategies. The design of our strategies relies on the idealized assumption that transformer-based models perform next song prediction by learning to model the conditional probability of songs, given context, in the training data. Achieving that the song is among the K most likely next songs implies a recommendation for similar contexts at test time. With this in mind, it makes sense to select a context cc and put the song ss^* right after this context to influence conditional probability p(sc)p(s^*|c). DirLoF and InClust are two different ways to select contexts cc to target. DirLoF exploits the overrepresentation of certain contexts in the playlists of the collective to pick those where they can make a larger difference. InClust targets a set of similar contexts that appear multiple times in the collective and targets them in a coordinated fashion. These are two principled levers from a statistical perspective. How we implement them is designed to make them practical.

  • DirLoF. DirLoF targets contexts that end on a song that occurs infrequently in the overall training data (typically once among the playlists of the collective). If these contexts have an overall probability smaller than α\alpha, they are overrepresented. Given the long-tail nature of the song distribution such songs are not that infrequent. Compared to inserting the song at random positions, the top K threshold can be met more often in that way (see Fig. 4).
  • InClust. InClust targets contexts that precede a song s0s_0 that frequently occurs in the playlists controlled by the collective. Thereby they implicitly target contexts that are similar from the perspective of the recommender, and close to ϕ(s0)\phi(s_0) in embedding space, by how the model is trained.

Alternative strategy. There might be other ways to design strategies, but we offer a strong initial baseline for a widely underexplored question. In contrast to the strategy you proposed, it is crucial that our approaches do not rely on knowledge of the embedding function. The collective does not require access to the model parameters to implement our strategies, nor to train a surrogate model to approximate them. It just needs to gather song statistics to identify the contexts worth targeting.

The fact that the strategies are designed based on a statistical intuition of sequential generation, implies that they are not specific to the model architecture. We demonstrate the robustness with additional experiments where we vary the hyperparameters of the model (see Table 3 in the supplementary PDF). With the additional ablation, we hope to have convinced the reviewer of the robust assumption underlying the design of our strategies. It is also in line with our argument that the effectiveness of the strategy decreases if model training is stopped early (see Table 2 in the supplementary PDF).

Theoretical guarantees. End-to-end theoretical guarantees, beyond intuition in an idealized setting, is a lot to ask for, given that related data poisoning strategies, including popular Shilling attacks, rarely come with any guarantees in the context of deep learning-based recommender systems. Instead, we opted for a rigorous empirical evaluation using the example of an industry-scale recommender system that has been deployed in production.

Anomaly detection. We agree that countermeasures could reduce the effectiveness of our strategies. However, they usually come at a cost for the firm. What makes the problem interesting is that it is not a zero-sum game between the platform and the users, in contrast to adversarial attacks. It is possible that agents pursue legitimate interests. In fact, our work is the first to provide empirical evidence that collective action strategies are not necessarily zero-sum. With this in mind – inspecting which use cases the platform would want to protect against is an interesting and open question for future work.

Contribution. We are the first to study the impact of authentic sequence modifications on sequential recommender systems. We show that such strategies are effective and might be a realistic aspect we have to take into account when building future systems. Thus, our work empirically carves out a novel space of questions around machine learning in a broader social-economic context. We provide solid empirical evidence that the typical adversarial threat model is not applicable to algorithmic collective action and call for a more nuanced study of incentives around ML models. This seems relevant for the NeurIPS community and an important contribution. Is there anything else we can do to convince you of this?

审稿意见
8

The paper shows that strategic collective action by a small fraction of the population can lead to significant amplification of a particular song in a recommender system. The authors propose two strategies for the collective (for a transformer-based song recommender) that achieve this amplification.

优点

  • The setup is original and focuses on strategic collective action by a fraction of users in a recommender system and how this can increase a target song’s reach.
  • The performance loss for the platform is negligible, the collective’s strategies are not adversarial (for e.g. fake profiles, artificial perturbations) and based on 1-edit distance to the original playlist. The paper shows that recommendations are largely preserved.

缺点

  • The experiments follow the MF-Transformer in [7], to make the paper self-contained it would be beneficial to have a description of ϕ(.)\phi(.) and of g(.)g(.) and the loss function in Section 2.1 or the Appendix.
  • I found the strategies in Sec 3.2 hard to parse, perhaps a figure showing the original playlist and the possible changes a user in the collective can do under the two strategies would be helpful.
  • Minor: I think the notation h(.) is overloaded for the song/playlist embedding in 2.1 and for the strategy mapping which inserts s* into a playlist. Also, fig 6 could use different colors for the different strategies.

问题

  • Can you clarify the intuition and communication required by the collective for the two playlist modification strategies in 3.2? InClust seems to place the target song before the most popular songs in the collective’s playlist whereas DirLoF places the target song after a low popularity song? Both strategies provide amplification in the experiments. How much communication among the collective do both require, is one modification easier practically?
  • Section 1.1 mentions “Our strategies exploit the properties of the attention function in the transformer model, without requiring knowledge of the model parameters”, can you elaborate?

局限性

The authors discuss limitations in Appendix A.

作者回复

Thank you for the feedback, we will incorporate your suggestions and adjust the notation. In response to your comment, we also decided to add pseudo code to make the strategies more clear. It can be found in the supplementary PDF.

In the following let us elaborate on your questions:

Our strategies exploit the properties of the attention function in the transformer model, without requiring knowledge of the model parameters. The assumption we operate under when designing the strategies is that the system aims to approximate the conditional probability of the next song, given prior songs within a given context window. We place the target song after a specific context with the goal that the model picks up on these correlations from the training data. This is a probabilistic assumption on the sequence model and it is not specific to the details of the model architecture or the training algorithm used. Thus, it is not surprising that the performance does not change much as hyperparameters of the model are varied (see the new Table 3 in the supplementary PDF). It is worth noting that similar approaches have proven to be very fruitful when designing collective action strategies in classification, where strategies have been designed under a Bayes-optimality assumption (see Ref. [24] in the manuscript).

Intuition for the strategies. Within the context of the last paragraph, our strategies implement two heuristics to select the context cc after which to place ss^*. For comparison, it is useful to have the random baseline in mind: for α<0.1\alpha<0.1, placing the song ss^* after a randomly selected context is not sufficient for it to be among the top K for any of the contexts (it leads to no recommendation at test time, see Fig. 4).

  • The DirLoF strategy aims to exploit contexts that are overrepresented in the data the collective controls to increase the chance of meeting the top K threshold. To this end, it selects contexts cc that end on a low-frequency song (for small collectives, these typically appear only once in the controlled playlists). If these contexts have an overall probability smaller than α\alpha, it means that they are overrepresented. As a result, targeting these contexts is more likely to effectively result in a recommendation at test time compared to random selection, which is what we see in the experiments.
  • The InClust strategy aims to target contexts that appear multiple times among the playlists of the collective in a coordinated fashion. The intuition is that for contexts cc that precede a song s0s_0 that is popular within the collective, the context embedding g(c)g(c) forms a cluster around the song embedding ϕ(s0)\phi(s_0), due to the way embeddings are trained. As a result, the collective can achieve large mass for this specific region of the context space by placing ss^* before s0s_0. They would hope this is sufficient to meet the top K threshold for similar contexts at test time. Here we find that α>1\alpha>1% is necessary for this to be effective. The ablation in Figure 5 illustrates how this strategy increases the similarity between ss^* and the targeted contexts, leaving other similarity values widely untouched.

Communication. Both strategies require communication among the members of the collective to coordinate on a target song ss^* and coordinate the execution time. In addition, each of the strategies requires one more statistic to be gathered: The InClust strategy requires participants to pool information about their playlists to identify the songs that occur most frequently in the playlists they control – no additional data needs to be collected. The DirLoF strategy requires global song statistics to get a sense of which songs are overrepresented. These statistics are then shared with other members of the collective. We envision the aggregation and communication of song counts to happen through an app or an online forum. Notably, there is a growing literature on developing tools for coordinating platform participants (see e.g., [1,2]) which could be used here.

While both strategies are easy to implement, DirLoF comes with some additional overhead for identifying low-frequency songs. These statistics need to be gathered from external sources but can be shared among the participants. For example, approximate statistics and information based on scraping public playlists and streaming statistics are sufficient for the strategy to be effective (Fig. 12). Beyond this, none of the strategies require access to model weights, model architecture, or the training algorithm used. And none of the strategies require any computations that go beyond simple song counts as they do not use surrogate models.

References:

[1] Do, K., De Los Santos, M., Muller, M., & Savage, S. (2024, May). Designing Gig Worker Sousveillance Tools. In Proceedings of the CHI Conference on Human Factors in Computing Systems (pp. 1-19).

[2] Imteyaz, K., Flores-Saviaga, C., & Savage, S. (2024). GigSense: An LLM-Infused Tool forWorkers' Collective Intelligence. arXiv preprint arXiv:2405.02528.

评论

Thanks for addressing my questions comprehensively and for the figures for the two strategies, I am raising my score to a strong accept.

As a general note directed to another reviewer, it would be nice if as a community we provide constructive feedback and improvement, and not attacks like "good as a report ... seek publication in a different conference".

审稿意见
8

This research work proposes a novel solution to promote songs on music streaming platforms strategically.

Under the following assumptions:

  1. Fans can collaborate to promote a specific song by collectively reordering playlists.
  2. The visibility of a song in a playlist affects its recommendation frequency.
  3. Users are influenced by the position of songs in playlists when making listening choices.
  4. The impact of collective action on song visibility is measurable and significant.

The authors suggest that fans strategically reorder playlists to promote a targeted song, thereby increasing its visibility in the recommender system. By leveraging algorithmic collective action, even small groups of fans can substantially impact the recommendation frequency of the promoted song. This strategy aims to enhance the visibility (capability of being discovered) of songs and artists, which will benefit both fans and musicians in the music streaming industry.

The evaluation focuses on quantifying the amplification of recommendations achieved by strategically placing songs in playlists, using metrics such as recommendation probability and change in the number of recommendations for a song. The evaluation also includes the impact on the recommendations of other songs and the overall performance of the recommender system. The analysis of results reveals that the collective action strategies can lead to a substantial increase in the recommendation frequency of the targeted song, with up to 25x higher recommendation probability compared to average songs.

The main contributions are:

  1. The paper introduces two innovative collective action strategies where participants strategically insert a target song into their playlists to promote an emerging artist. These strategies aim to increase the recommendations of the targeted song at test time, thereby boosting the artist's visibility on the platform.

  2. The research demonstrates that even small collectives, controlling less than 0.01% of the training data, can achieve significant amplification of recommendations by strategically placing songs in playlists. This finding highlights the effectiveness of algorithmic collective action in promoting songs without major disruptions to the user experience.

  3. Preservation of Other Recommendations, as the study reveals that while promoting a specific song through collective action, the recommendations of other songs are largely preserved. This indicates that the proposed strategies can enhance the visibility of targeted songs without significantly compromising the overall recommendation system's performance.

优点

  • Its innovation is a significant strength as it provides a new approach to increasing the visibility of emerging artists in music streaming platforms.
  • Empirical Validation and Real-World Application: the research is empirically validated using an open-source APC model deployed on Deezer, a platform with millions of users.
  • important result: the study demonstrates that even small collectives can achieve substantial amplification of recommendations by strategically placing songs in playlists
  • The findings show the potential for diverse artist promotion, which can make fairer use of the platforms but also fights against the long-tail problem in recommender systems. It can also help the serendipity effect.

缺点

  • The paper assumes that users are influenced by the position of songs in playlists when making listening choices. This assumption may oversimplify user behavior and overlook other factors that influence song recommendations and user engagement, potentially leading to biased results.

问题

Even if the problem is not exactly the same, could you relate your work with the one described in this cite: Walid Bendada, Guillaume Salha-Galvan, Thomas Bouabça, and Tristan Cazenave. 2023. A Scalable Framework for Automatic Playlist Continuation on Music Streaming Services. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23). Association for Computing Machinery, New York, NY, USA, 464–474. https://doi.org/10.1145/3539618.3591628 ?

局限性

The paper does not explicitly discuss possible limitations of the approach to addressing problems of privacy and fairness. However, considering the ethical implications of data manipulation and collective action in recommender systems is crucial for ensuring transparency and equity in algorithmic interventions.

作者回复

Thank you for your feedback and the positive assessment of our work.

To respond to your question we relate our work to Bendada et al. (2023). The authors describe a transformer-based recommender system for automatic playlist continuation (APC) that Deezer has deployed in production. The model implements the represent-than-aggregate paradigm to achieve better scalability and the authors demonstrate that the model achieves state-of-the-art performance on the Spotify Million Playlist Dataset. We use their model as a case study for our work. It is one of the few industry-scale APC models that are publicly available, allowing us to retrain it and inspect the effect of playlist modifications on recommendations. In contrast to Bendada et al. (2023), we are interested in the sensitivity of the model to strategic playlist modifications in the training data, and inspect the recommendation of specific artists. This is a dimension that has not been considered in prior work which focused on aggregate performance metrics of the model trained on a fixed dataset. We are the first to focus on algorithmic collective action in this context.

Regarding limitations, we will add a dedicated section in the future version to discuss the limitations of collective action and the negative externalities on other artists in more detail. We refer to the response to Reviewer tKsL for an extended discussion. We agree that this is valuable to put our work in context.

评论

Thank you very much for your reply

作者回复

We would like to thank all reviewers for their insightful feedback. Based on the reviewers' comments, we ran additional experiments and made some updates to the write-up, which we describe in detail in the individual rebuttals.

To support our discussion, we provide additional experiments and illustrations in the supplementary PDF. Figure 1, along with Algorithms 1 and 2, offers a visual explanation of the proposed strategies. Additional empirical results, including ablations related to model hyperparameters and the evaluation of alternative baseline strategies, can be found in Tables 1-3.

最终决定

The paper considers a novel approach for a collective to promote songs on music streaming platforms based on automatic playlist continuation.

Several issues were raised by the reviewers, including limitations of the proposed techniques in real systems due to anomaly detection, and the potential of being misused by malevolent actors. While these issues are valid, the novelty factor of the paper and the potential of generating fruitful discussion and further research on the topic support the acceptance of the paper.

We strongly advise the authors revise the paper according to the author-reviewer discussions, with focus on the limitations and the ethical concerns raised.