Faster Maximum Inner Product Search in High Dimensions
We propose a novel, faster algorithm for the Maximum Inner Product Search problem in high dimensions based on multi-armed bandits
摘要
评审与讨论
The paper studies the maximum inner product search (MIPS) problem in high dimensions. The authors present a multi-armed bandit-based approach, called BanditMIPS, whose idea is to adaptively subsample coordinates for more promising atoms. Empirically, BanditMIPS can achieve 20 times faster than the prior algorithm.
优点
- The MIPS problem is important and relevant. A fast algorithm for this problem is useful.
- The introduction of the multi-armed bandit idea is natural.
- The empirical results look convincing.
缺点
- The format of the paper is incorrect.
- The theoretical result is as simple as a standard corollary of UCB bounds.
- The paper only studies the runtime for a single query vector, which is not sufficient in practice since we usually face multiple queries.
问题
- The paper discusses a single query vector and hence, mentions that BanditMIPS requires no preprocessing of the data. However, we note that there is usually a long sequence of queries in practice, which requires some preprocessing, e.g., HNSW. A discussion of the two settings maybe helpful.
The paper does not follow the ICLR formatting guidelines (the paper is two columns etc) and does not have any references in the main submission. Even ignoring this, I'm not sure what the main theorem is even saying. Roughly it states that 'if some parameter is small enough', then we only need to read few coordinates for maximum inner product search. But the 'parameter' in question seems to be defined circularly as 'how many coordinates are read'.
优点
See summary above.
缺点
See summary above.
问题
See summary above.
This paper studies the problem of Maximum Inner Product Search problem (MIPS): Given a dataset of n vectors in a d-dimensional space, and a new vector, the goal is to find the vector within the dataset that yields the highest inner product with it. This paper introduces a randomized algorithm inspired by bandit methods to enhance the time complexity from to , and showcases its effectiveness on four synthetic and real-world datasets.
优点
- the problem studied in this paper is interesting
- the paper is well-organized, and easy to follow.
- there are extensive evaluations of their methods over various datasets
缺点
- The paper assumes that follows subgaussian distribution, this is a strong assumption.
- The presentation of this paper needs major improvement. Some of the figures are super large(Figure 3,5,6), while some of the figures are extra small to read(Figure 4).
Also, the current format of this paper doesn't align with the ICLR 2024 format file, and the references are not in the submitted 9 pages.
Minor Comments:
- Duplicate reference (in the appendix version): Morozov, S.; and Babenko, A. 2018a. Non-metric Similarity Graphs for Maximum Inner Product Search. Morozov, S.; and Babenko, A. 2018b. Non-Metric Similarity Graphs for Maximum Inner Product Search. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
- "Accuracy:1.0" in figure 5 and figure 6 is not very clear to read and isn't been explained in the caption.
- subplots in Figure 4 lacks effort, note that if we print this paper out, these subgraphs won't be big enough to read.
问题
see weakness on the subgaussian distribution.
This work considers the problem of maximum inner product search (MIPS), where given a static database v_1...v_n of n points, each in , and a query point , asks to find the point in the database with maximum inner product with the query point. This is a practically important problem, with the trivial solution requiring computations. The key question here is can we speed up this search, i.e. identify the point in far fewer than operations. In this paper, the authors derive a scheme by reducing this problem to the best arm identification problem in stochastic bandits, by essentially treating each point in the static database as an arm, with mean reward of arm corresponding to the database vector effectively being . The "stochastic" component comes from the fact the we can obtain a noisy estimate of this quantity in time by sampling a dimension uniformly at random (with replacement) and computing the product . This problem then exactly becomes the best arm identification problem, with the total computational complexity of MIPS being the sample complexity of the corresponding best arm identification problem. Following standard results in the BAI literature, the authors prove that MIPS can be solved with high probability in computations, which is dimension invariant.
优点
None.
缺点
I apologize for the harshness in this review, but I strongly believe it is needed with this submission. There is absolutely nothing technically interesting going on in this paper. The idea of using best arm identification to solve MIPS is very obvious and has already been explored before in the result of Liu et.al. AAAI 19, with the only difference being sampling with vs without replacement. The statement that they achieve dimension independent bounds is a huge overclaim -> the dependence on directly scales with the dimension, and you explicitly need well-separation assumptions in order to get around this (or settle for an -approximate solution which then directly bounds this quantity by ). None of these ideas are at all novel, due to which for this result, I find it extremely difficult to recommend an accept at any venue, let alone one as competitive as ICLR.
问题
What is novel in this result?