/10

Poster4 位审稿人

最低3最高4标准差0.5

ICML 2025

E-LDA: Toward Interpretable LDA Topic Models with Strong Guarantees in Logarithmic Parallel Time

提交: 2025-01-22更新: 2025-07-24

TL;DR

We provide a novel non-gradient-based, combinatorial approach to estimating topic models with strong provable guarantees. We show that this solves key problems in topic modeling for social science applications.

摘要

关键词

Topic models; Causal inference; Social Science; Latent Dirichlet Allocation (LDA); Natural language processing; Text.

评审与讨论

审稿意见

评分: 32025-03-06

The paper introduces a novel approach to Latent Dirichlet Allocation (LDA) topic modeling by developing a combinatorial method to efficiently infer topic assignments with strong interpretability guarantees. The proposed method, Exemplar-LDA (E-LDA), departs from traditional gradient-based approaches and instead leverages combinatorial optimization techniques inspired by clustering and data summarization. The authors claim that this method provides near-optimal posterior probability solutions exponentially faster than existing LDA algorithms, ensures interpretability by associating each learned topic with a known keyword, and is compatible with causal inference applications. The experimental results demonstrate that E-LDA achieves better topic coherence and posterior probability compared to traditional LDA and neural topic models.

给作者的问题

论据与证据

The paper provides provable guarantees for topic inference in LDA, addressing a long-standing challenge in probabilistic topic modeling.
Unlike conventional LDA methods that often produce topics requiring manual interpretation, E-LDA ensures that each topic is formally associated with a known keyword, which significantly improves interpretability.
The experimental results show that E-LDA outperforms previous LDA and neural topic models on topic coherence.

方法与评估标准

In the experiments, how about the performance with larger numbers of topics? Like 50 to 100 topics.
For the experiments, can you also evaluate the inferred topic distributions of documents, $\theta_d$ ?

理论论述

I checked the proofs and theoretical claims.

实验设计与分析

I checked the experimental designs. It lacks some important baselines and evaluations.

The mentioned datasets are of medium size. Can E-LDA still perform well on large-scale datasets? What is its empirical execution time?
In Section 9, the neural topic modeling baselines are a bit old. The authors should compare with some recent ones, like FASTopic(https://arxiv.org/pdf/2405.17978).

补充材料

I reviewed the supplementary material.

与现有文献的关系

The contribution in this paper may provide a good alternative baseline for LDA.

遗漏的重要参考文献

The reference list is a bit outdated. Consider adding some recent topic modeling survey papers:

[1] A Survey on Neural Topic Models: Methods, Applications, and Challenges
[2] Topic Modeling Algorithms and Applications: A survey
[3] The Evolution of Topic Modeling

其他优缺点

其他意见或建议

AVITM was proposed in 2017, but the paper denotes its publication time as 2022.
Neither of BERTopic, AVITM, and ETM is LLM-based.
Figure 2 only has subcaptions.

作者回复

2025-04-01

Thank you for your overall positive review, and for remarking that our algorithm provides "strong provable guarantees" that "address a longstanding challenge in topic modeling", and that it improves interpretability and achieves strong empirical performance.

We recognize that your remaining reservations and reason for a score of 3 instead of 4 or 5 relate to clarifying some aspects of the experiments and adding a new baseline (FASTopic). We address these aspects and add FASTopic to all plots below, and we would appreciate it if you would consider improving your score in light of our response.

Re: "In the experiments, how about the performance with larger numbers of topics? Like 50 to 100 topics."

Our experiments already do this, and there is a subtle aspect to clarify here. Specifically, we set all baselines' number of topics $k=100$ in all experiments (except BERTopic, where $k$ is not user-specified). For E-LDA, the overall number of topics that are assigned is not user-specified. Instead, for E-LDA, we specify $\kappa$ (note $\kappa \ne k$ , see our paper lines 182-7, right column): "...we seek a MAP solution where documents have varying counts of topics, but the average number of topics linked to each document is bounded by $\kappa$ . Thus, choosing a smaller $\kappa$ yields a sparser solution...". For example, if we set $\kappa=5$ , then a single document is assigned approximately 5 topics on average, even though there are many topics overall.
Fig. $b$ on p. 8 then shows that the choice of $\kappa$ has minimal effect on performance. E-LDA obtains solutions of high coherence regardless of whether users set it to compute a sparse summary of their texts (where each document has $\kappa=1$ one topic, like BERTopic) or a nuanced and dense one (where each document is assigned to $\kappa=10$ topics).

Re: "Can you evaluate the inferred topic distributions $\theta_d$ of documents?"

In our use-case, documents' underlying topic distributions $\theta_d$ are nuisance parameters, and we do not estimate them directly (see discussion in Section 3). Instead, we infer the topic assignments $z$ from which each word and document were probably drawn. It is nonstandard to plot topic assignments, yet we recognize your point, and we have begun preparing some additional plots for our camera-ready manuscript.

Re: "The datasets are of medium size. Can E-LDA still perform well on large-scale datasets? What is its empirical execution time?"

Here is our performance on an additional larger dataset of 54,595 short texts (DBLP, Pan et al. 2016, mean length of 5 words), which is particularly challenging for some models:

https://imgur.com/a/LO1bUqw

On this larger dataset, E-LDA's efficient theoretic design allows it to complete in 2.8 seconds.

We discuss execution time briefly in paper lines 371-4 (right column). Our research-grade Python code of our non-parallel algorithm takes an average of 2.7 seconds on Experiments Set 1, which is ~4.6 times faster than the highly optimized Java MALLET Gibbs sampler on the same processor.
We could achieve additional speedups by offloading bottlenecked operations (the elementwise products in the Update subroutine) to a GPU.
For very large data, our parallel algorithm offers a similar approximation guarantee in logarithmic parallel time, which is a further exponential speedup in overall runtime (see Section 6 and Appendices I and J).

Re: Adding a 7th baseline, FASTopic (Wu et al., NeurIPS 2024).

Thank you for this suggestion. Here are the plots with FASTopic (in light blue):

https://imgur.com/4GobskW

E-LDA co-occurrence outperforms FASTopic by a factor of 2.83 to 5.84 on Reuters, 3.11 to 4.77 on Congress, and 2.24 to 4.20 on NewsGroups. FASTopic is the best of the baselines on NewsGroups, and it performs similarly to BERTopic on the other datasets.

Re: Reference list is missing recent surveys [1], [2], [3].

Thank you for these references! We have incorporated them into our camera-ready draft.

Re: Typo where AVITM's publication date was replaced by BERTopic's in the reference in the paper, and Fig. 2 has only subcaptions.

Great catch! Thank you for pointing this out---we have updated our paper accordingly.

Re: our statement that we compare against an LLM-based topic model.

We often hear BERTopic referred to as an "LLM-based topic model", presumably because BERTopic operates on sentence embeddings generated by pretrained transformer language models. We used this description, but we share your discomfort with calling BERTopic "LLM-based", since in 2025, BERT is no longer a "large" language model (and it's trained differently than LLM's). We will revise this description to be more precise.

We believe that these clarifications and additional plots/baseline have addressed your remaining reservations, and we would appreciate it if you would update your review accordingly.

审稿意见

评分: 42025-03-13

The authors present a novel non-gradient-based, combinatorial approach for estimating topic model parameters. The model converges to an algorithm that achieves near-optimal posterior probability in logarithmic parallel computation time, significantly faster than known algorithms for LDA. Furthermore, this manuscript demonstrates that candidate topics can be efficiently generated using standard coherence functions and that the document independence required for downstream causal inference frameworks can be guaranteed.

update after rebuttal

Thanks to the authors for their responses, which address my previous questions. Thus, I maintain my positive score on this submission.

给作者的问题

The manuscript repeatedly emphasizes the interpretability of traditional LDA methods in comparison to neural topic models and LLM-based topic models in the introduction. I would like to request the authors to provide further explanation on this part.

论据与证据

Yes.

方法与评估标准

Yes.

理论论述

There is no obvious problem with the theorem proof in the submission.

实验设计与分析

The topic consistency and run speed evaluated in this submission are closely related to the topic modeling domain as well as to the contributions of this study.

补充材料

I've checked all the supplementary materials.

与现有文献的关系

The paper intuitively reconstructs the process of topic modeling using greedy algorithms and enhances the interpretability of topic models, providing enlightening insights for related fields.

遗漏的重要参考文献

The important literature relevant to the contribution of this paper has been discussed.

其他优缺点

Strengths: The approach proposed in this paper enhances the LDA model from a novel perspective and offers theoretical guarantees regarding the model’s speed of convergence and its independence in downstream inference applications, demonstrating high originality and inspiration for future work in the field.

Weaknesses:

Previous work [1] has found that some topic models exhibit poor stability, meaning that even with similar topic quality, the models may generate vastly different topic words for different random seeds. While the manuscript emphasizes the progress of the method proposed in this paper in terms of interpretability and downstream tasks, the lack of sufficient robustness in the model may diminish the value of these innovations. I believe the authors should have conducted further analysis on this aspect.
Some more recent baseline methods should be included for comparison. The baseline models used in this paper, such as ETM, are commonly regarded as simple baselines, whereas in recent years, neural topic models have demonstrated significant improvements in topic quality.
The authors should demonstrate the model’s interpretability using some simple examples.

[1] Hoyle A M, Goel P, Sarkar R, et al. Are Neural Topic Models Broken?. EMNLP Findings. 2022: 5321-5344.

其他意见或建议

Figures 1 and 2 are missing captions.
Figure 1 seems a bit cluttered, and it is recommended that the authors reorganize it.
There is an extra ‘(for example)’ on line 70.
In the manuscript, the authors make extensive use of italics, bold, upper and lower case, and brackets to convey emphasis and additional information. I believe that the excessive use of these marks detracts from the readability of the text and I suggest that the authors could have used a more concise writing style.

作者回复

2025-04-01

Thank you for your thoughtful and detailed review, and your comment that "The approach proposed in this paper enhances the LDA model from a novel perspective and offers theoretical guarantees...demonstrating high originality and inspiration for future work in the field."

We recognize that your main question relates to the stability of our approach, as it is well-known that many topic models obtain vastly different results from different random seeds (per the Hoyle et al EMNLP paper).

We are grateful that you raised this important point. When we presented this work a couple weeks ago, a colleague who is a prominent social scientist studying text analysis commented that he believed the stability of our approach is a high-impact contribution for practical applications, and he emphasized that we do not emphasize this contribution sufficiently in our writeup, so it is not clearly apparent. We realize that this deserves greater attention.

Our approach improves stability in three key ways:

1. E-LDA's deterministic optimization leads to perfect stability under the Hoyle et al. definition. Our approach is the first topic model with deterministic, non-gradient-based optimization. Our non-parallel algorithm, FastGreedy-E-LDA has no random seeds, so re-running the algorithm yields an exactly identical, near-optimal solution. Under the stability metric defined by Hoyle et al. (i.e., pairwise topic distance across runs), our model therefore achieves zero distance, i.e., perfect stability.

2. Stable, monotonic convergence of posterior probability. Due to the submodularity of our objective, FastGreedy-E-LDA also exhibits stable, monotonic convergence towards its near-optimal posterior probability. Specifically, as we show in the plots in Appendix Q Figs. a and b (starting at line 1375), each iteration of FastGreedy-E-LDA improves the posterior probability, and each subsequent improvement is slightly less than the previous improvement (i.e. diminishing returns), yielding a smooth and concave convergence of the posterior objective. Conveniently, this is also true on a per-document basis: each document's posterior probability also converges monotonically.

We contrast our smooth and monotonic convergence with the stochastic convergence of any alternative topic model optimization routine. We also emphasize that alternatives may have some very low-performing runs due to local modes/unlucky random seed, whereas our approach does not experience these issues since we provably obtain a near-optimal solution.

3. Stable coherence over different solution sizes (different sparsities). Fig 2.b (paper line 401) plots the stability of E-LDA’s topic coherence as we grow its solutions from sparse to dense (i.e., over the course of optimization). We observe that running our algorithm with very different sparsity constraints yields solutions that are quite stable in terms of topic coherence.

---Responses to additional questions/suggestions---

Re: Adding a very recent baseline***.

Thank you for this suggestion. We note that reviewer ZFSN agrees and suggests FASTopic (Wu et al., NeurIPS 2024). Here are the plots with FASTopic added (in light blue):

https://imgur.com/4GobskW

E-LDA co-occurrence outperforms FASTopic by a factor of 2.83 to 5.84 on Reuters, 3.11 to 4.77 on Congress, and 2.24 to 4.20 on NewsGroups. FASTopic is the best of the baselines on NewsGroups, and it performs similarly to BERTopic on the other datasets.

Re: Figures 1 and 2 are missing captions, and there is an extra ‘(for example)’ on line 70.

Great catch---thank you! We've corrected both.

Re: Figure 1 seems a bit cluttered, and it is recommended that the authors reorganize it.

Agreed -- it had to be compressed in order to fit the 8-page submission limit. We will revise and improve layout in the camera-ready version, where we have 9 pages.

Re: use of italics, bold, upper and lower case, and brackets.

Thank you for this constructive feedback. Our goal was to provide a description of our techniques that could be useful to researchers in three very different fields: NLP, combinatorial optimization, and applied econometrics/social science. We noted that each of these fields have very different expectations about how the technical aspects should be described and which aspects should be highlighted. We agree with you that we can reduce the use of italics and brackets and also streamline some areas, and we're revising accordingly.

审稿意见

评分: 42025-03-18

The paper introduces new algorithms for Latent Dirichlet Allocation (LDA). The algorithms address the problem of inferring the topics assigned to each document in an LDA topic model. The approach is non-gradient-based and combinatorial.

The algorithms converge to near-optimal posterior probability in logarithmic parallel computation time. The approach also provides interpretability guarantees, associating each learned topic with a known keyword. Unlike alternative approaches, this approach can maintain the independence assumptions necessary for downstream causal inference methods that use the learned topic model to study topics as treatments.

The authors argue their approach returns solutions of higher semantic quality than state-of-the-art LDA algorithms, neural topic models, and LLM-based topic models across a range of text datasets and evaluation parameters.

给作者的问题

See "Claims And Evidence" and "Methods And Evaluation Criteria".

论据与证据

Claims Supported by Evidence:

"The proposed algorithm obtains a deterministic worst-case 1−1/e approximation of the constrained log MAP probability objective." This claim is supported by the detailed description of the algorithm, the theorems presented, and the proofs provided in the appendix. The authors also provide experimental results that align with this claim.
"The algorithm can learn solutions of higher semantic quality (coherence) than those found by state-of-the-art LDA algorithms, neural topic models, and LLM-based topic models." The authors present results from experiments comparing their algorithm to several baselines, using measures of semantic quality, providing evidence that supports this claim.

Claims With Less Clear Evidence:

"The algorithms converge to near-optimal posterior probability in logarithmic parallel computation time (adaptivity) exponentially faster than any known LDA algorithm." While the authors provide a theorem and analysis related to the parallel computation time, the claim of being "exponentially faster" than any known LDA algorithm is a strong statement that would benefit from more explicit comparisons to the computational complexity of a wide range of LDA algorithms. The analysis focuses on the FAST algorithm and its properties but does not provide a clear comparison showing the exponential speedup relative to other LDA algorithms.
"The approach also provides interpretability guarantees, such that each learned topic is formally associated with a known keyword." The authors describe how they generate candidate topics based on coherence functions and keywords. While this associates topics with keywords, the degree to which this provides a formal guarantee of interpretability could be debated. The interpretability of topics is somewhat subjective, and while the authors' method provides a more structured way to connect topics and keywords, it might not fully guarantee interpretability in all contexts.
"Unlike alternatives, our approach can maintain the independence assumptions necessary to use the learned topic model for downstream causal inference methods that allow researchers to study topics as treatments." The authors discuss the limitations of other LDA approaches for causal inference and how their method addresses some of these limitations. However, the complexities of causal inference and the potential for other factors to influence independence assumptions make this claim a bit unspecified. While the authors' approach may be better than some alternatives, it's difficult to definitively claim that it fully maintains independence assumptions in all cases. It is important to note that still in many realistic social science usecase using this topic model for downstream causal inference can create bias (for example if there are fixed effects).

方法与评估标准

Methods:

The authors frame the LDA topic-word assignment problem as a combinatorial optimization problem. The formulation allows the authors to leverage submodularity, which enables approximation guarantees.
The authors propose efficient algorithms (FASTGREEDY-E-LDA, FAST) to solve the combinatorial optimization problem. These algorithms incorporate several speedup techniques, like memoization, heap data structures, and lazy updates. The algorithms make sense for the problem, as one of the main challenges with applying submodular optimization to LDA is the computational cost.
For interpretability, the authors suggest generating candidate topics using coherence functions. Using standard coherence functions ensures that the generated topics are aligned with human judgment of semantic quality.
The authors also modify their approach to handle out-of-sample inference for causal inference. Given the challenges of using topic models in causal inference, the modifications are appropriate to this usecase.

Evaluation Criteria:

The authors use log posterior probability as one of their evaluation metrics. This is a natural choice, as it directly measures how well the model fits the data under the LDA assumptions.
The authors use UMass coherence to measure the semantic quality of the topics. This is a widely used metric that correlates with human judgment. They also examine other properties of the learned topics, like how the model behaves as the number of topics per document increases.
The baselines used for comparison are appropriate, as they include traditional LDA algorithms, neural topic models, and LLM-based topic models. This covers a wide range of existing approaches and demonstrates the advantages of the proposed method over different types of competitors.

The datasets used (Reuters, US Congressional speeches, and social media posts) are standard datasets in topic modeling and represent a variety of text types.

理论论述

Yes, the theoretical claims seems well supported by the accompanying proofs.

实验设计与分析

I didn't check too carefully, but it looks like a pretty standard experimental design and analysis. See my answer to "Methods And Evaluation Criteria"

补充材料

No, I did not,

与现有文献的关系

The paper is very thorough in its discussion of the topic modeling literature.

Authors can do a better job in highlighting the causal inference use-case. References and discussion of how economists and other causality-intensive social sciences use topic models would drastically improve the paper.

遗漏的重要参考文献

Many of the text-as-data literature, and the causality+NLP literature are missing here. An example of an essential missing reference on application of topic models for causality is Ash et al.'s "Text Algorithms in Economics".

其他优缺点

See my answer to "Claims And Evidence".

其他意见或建议

作者回复

2025-04-01

We thank you for your positive review, and we are sincerely grateful for your many detailed and thoughtful comments. We respond below to your specific points about comparing our computational complexity with alternatives, interpretability, causal inference, and the additional references.

Re: the reviewer's comment "The algorithms converge to near-optimal posterior probability in logarithmic parallel computation time (adaptivity) exponentially faster than any known LDA algorithm... the claim ... would benefit from more explicit comparisons to the computational complexity of a wide range of LDA algorithms."

Thank you for this suggestion, and we appreciate the opportunity to delineate this contribution. The only mainstream topic models with provable convergence guarantees are those of the anchor word variants of Arora et al.'s group (2013, 2016, 2018) and the SVD approach of Anandkumar et al., (2012, 2014). Setting aside their differences in assumptions and objectives, both have polynomial time complexity, and neither has logarithmic adaptivity in parallel. For example, the Arora et al. (2013, 2018) approach computes a co-occurrence matrix, then computes anchor words ( $O(|V|^2 + |V|k/\epsilon^2)$ ) and topic recovery ( $O(|V|^2k + Vk^3/\epsilon^2)$ ). Anchor word selection is inherently sequential (each iteration depends on the previous one), so there is no known parallel algorithm with logarithmic adaptivity. Furthermore, this procedure only obtains topics, not document distributions or assignments—those require additional data-dependent complexity (Arora et al. 2016). Similarly, in the Anandkumar approach, SVD is a major bottleneck, as there is no known generic log-adaptive SVD algorithm.
The most popular solvers for LDA—VI and Gibbs sampling—have no known polynomial time bounds for convergence/mixing. Even in parallel implementations, both require global synchronization after each iteration, which precludes logarithmic adaptivity.
Similarly, BERTopic and neural topic models have no guarantees, and they typically proceed via sequential optimizations (e.g. BERTopic's UMAP requires sequential computations in parallel), so there is no known logarithmic parallel approach.

Re: the reviewer's comment "The interpretability of topics is somewhat subjective, and while the authors' method provides a more structured way to connect topics and keywords, it might not fully guarantee interpretability in all contexts."

We agree—the term interpretability has been used to refer to many different concepts in the topic modeling literature, and we do not claim that our approach is "interpretable" in a general sense. However, our method conveys a specific property that is novel among topic models, yet familiar and desirable to researchers trained in econometrics. As you correctly note, each topic in our model is formally associated with a keyword, such that the probability mass that the topic assigns to each word in the vocabulary is a known function of that word's co-occurrence with the keyword. In social science applications, where the goal is often parameter estimation, it is desirable to adopt models in which parameters correspond to known functions—for example, in a GLM, a regression coefficient represents the expected change in the dependent variable given a unit change in a feature. We distinguish this property from other topic models, where a topic may assign mass to a word for reasons that are opaque and specific to a (somewhat arbitrary) optimization routine.

Re: the reviewer's comment "Unlike alternatives, our approach can maintain the independence assumptions necessary to use the learned topic model for downstream causal inference..." ...While the authors' approach may be better than some alternatives, it's difficult to definitively claim that it fully maintains independence assumptions in all cases. .

We agree, and we provide additional discussion of how our approach can be used for causal inference in Appendices J, K, L, and M. There is an important nuance here: Unlike alternatives, our approach obtains topic assignments that are well-defined as causal treatments, and that can be computed out-of-sample (with provable guarantees) to maintain document-to-document independence assumptions that are necessary (but not sufficient) for SUTVA. However, in virtually any causal inference setting, including when using our approach, it is possible that independence assumptions are still violated (for example, by unobserved confounders).

Re: the reviewer's comment that "The paper is very thorough in its discussion of the topic modeling literature" but should add the missing Ash et al. review paper and additional text-as-data and causality+NLP citations.

Thank you for this helpful suggestion! We have added this reference and we are expanding our literature review accordingly to take advantage of the expanded page limit in the camera ready version.

审稿意见

评分: 32025-03-19

The paper introduces Exemplar-LDA (E-LDA), a novel combinatorial approach to Latent Dirichlet Allocation (LDA) topic modeling, addressing the NP-hard problem of inferring document-topic assignments. Unlike traditional gradient-based methods, E-LDA uses a non-gradient, submodular optimization framework, achieving near-optimal Maximum A Posteriori (MAP) solutions in logarithmic parallel time. It ensures interpretability by associating topics with keywords and supports causal inference applications. Experiments on datasets like Reuters and Congressional speeches show E-LDA outperforms Gibbs LDA, neural, and LLM-based models in posterior probability and topic coherence, offering faster convergence and higher semantic quality across sparse to dense solutions.

update after rebuttal

After viewing the reply and other comments, I decided to keep my rating as an acceptance score.

给作者的问题

How were the neural and LLM-based baselines tuned in the experiments and why was AVITM’s performance notably poor

论据与证据

Most claims are supported by clear and convincing evidence, blending rigorous theory (submodularity, complexity analysis) with solid empirical results.

Concerns:

Only three datasets (Reuters, Congress, NewsGroups) are tested, all from social science domains. Some short texts should included in the experiments.

方法与评估标准

The developed methods: combinatorial optimization, interpretable topic generation, and causal inference adaptation well address the problem’s challenges: guarantees, interpretability, and causal compatibility. And the experiments also show the improvements of the model.

理论论述

The proofs in the submission seem correct.

实验设计与分析

The experimental results well support the proposed model.

补充材料

I've read the appendix, but not very carefully.

与现有文献的关系

I think this paper is interesting and gives insight to the community.

遗漏的重要参考文献

The experiments in this paper is solid.

其他优缺点

Weaknesses:

While the combination is original, individual elements: submodular greedy algorithms, FAST parallelization, and coherence metrics are well-established. The innovation lies in their application to LDA, but this may feel incremental.
It is better to add more datasets, such as short texts.

其他意见或建议

None

作者回复

2025-04-01

Thank you for your overall positive review, and for your positive comments about our "rigorous theory" and "solid empirical results." We also thank you for your positive comments that the paper "is interesting and gives insight to the community."

We believe we address your remaining reservations here and would appreciate it if you would consider improving your score in light of our response.

The main concern and reason for a score of 3 instead of 4 or 5 seems to be:

Whether our algorithm also performs well on short texts. We note that our Reuters News Wires dataset consists of short texts (mean of 80 words, see Appendix J). Here is our performance on an additional larger dataset of 54,595 very short texts (DBLP, Pan et al. 2016, with a mean length of 5 words):

https://imgur.com/a/LO1bUqw
Your assessment about impact here: "While the combination is original, individual elements: submodular greedy algorithms, FAST parallelization, and coherence metrics are well-established. The innovation lies in their application to LDA, but this may feel incremental."

We strongly believe that this is a high-impact paper. Specifically, we strongly believe that the ICML community will be interested in the first practical algorithms with provable guarantees for a central problem in topic modeling, as well as a novel, non-gradient-based way to solve topic models that is exponentially faster than all known algorithms (the best known guarantees of any kind for LDA are poly-time). Provable guarantees for topic models have long been considered one of the most desirable results in NLP, and our paper describes the first new theoretic guarantees for topic modeling in several years.

While it is true that these building blocks of submodular optimization, low-adaptivity parallel optimization, and coherence are well-established, we emphasize that:

The connection between submodularity and LDA is wholly unknown, and the analysis is nontrivial;
Even given access to our submodular formulation of LDA (eq. 3), it is practically infeasible to obtain solutions using generic fast submodular optimization techniques, such as Lazier-than-Lazy-Greedy or FAST low-adaptive parallelization. Specifically, our nonparallel algorithm obtains a per-iteration complexity that scales logarithmically with the size of the data $|D|$ , and obtains its approximation guarantee deterministically (Theorem 2). In contrast:
- Vanilla Greedy matches our approximation guarantee, but each iteration has complexity that scales polynomially in the size of the data $|D|$ (proof in Appendix F);
- As we prove in Appendix H, the state-of-the-art generic nonparallel algorithm for submodular maximization, Lazier-than-Lazy-Greedy, has a per-iteration complexity of $O(\ell |\Phi| |D| log(1/\epsilon))$ , plus sampling complexity. This linear dependence on the data $|D|$ makes it practically infeasible for most datasets. Appendix H also shows that Lazier-than-Lazy-Greedy remains slower than our algorithm even when accelerated with our techniques. Its approximation guarantee is also worse than ours by an ε factor and only holds in expectation, whereas our guarantees are deterministic.
- It is also practically infeasible to use low-adaptivity parallel algorithms like FAST for this problem without our techniques because their practical compute time is dominated by compute-heavy objective function evaluations that scale as a function of the data size $|D|$ . Appendices I and J show how to remove this data size dependency and accelerate them to just $O(\ell)$ or $O(\ell | |\Phi|)$ . This enables fast parallel solutions in practice.

---Responses to additional questions---

Re: How were neural and LLM-based baselines tuned?

We provide replication code in the Supplemental Materials and details in Appendix P (line 1353 on). For ETM, AVITM, and BERTopic, we used the authors' codes. We set k=100 topics, except for BERTopic where k not user-specified, and we set all other parameters to match those in the authors' papers. So, for example, we set BERTopic to use the all-MiniLM-L6-v2 transformer per Maarten Grootendorst.

Re: Why is AVITM’s performance notably poor?

Our experience in many projects is that AVITM performs very well on some data, but poorly on others absent much custom preprocessing/tuning (see Appendix lines 1357-9). We confirmed that when we swap the AVITM authors' exact preprocessed datasets into our codebase and switch to their metrics, then we can replicate their results, so its poor performance on our datasets is just due to its instability on some data. We expect it can be improved with custom tuning of batch size and preprocessing, etc., but that is beyond our scope (and we do zero tuning for E-LDA).

We believe that our responses address your remaining questions, and we would sincerely appreciate it if you would consider updating your score.

最终决定Accept (poster)

2025-05-01

This paper contributes a novel and theoretically grounded approach to topic modeling, offering improvements in interpretability, inference speed, and potential causal applications. While some experimental limitations such as outdated baselines (currently best neural topic model assuming you use a Dirichlet prior, is ECRTM [1] to my knowledge, which extends ETM), lack of sufficient datasets, and limited evaluation with a causal inference pipeline exist, the core methodology is sound, the empirical results are competitive, and the ideas are innovative and timely. With modest revisions and expanded evaluation, this work has strong potential to impact both topic modeling and broader text-as-data research communities. In particular, the conclusion should not be shifted to the Impact Statement which could be counted as cheating the page limit.

[1] Xiaobao Wu, Xinshuai Dong, Thong Thanh Nguyen, and Anh Tuan Luu. 2023. Effective neural topic modeling with embedding clustering regularization. In International Conference on Machine Learning, pages 37335–37357. PMLR.