PaperHub
6.8
/10
Poster4 位审稿人
最低4最高5标准差0.4
5
4
4
4
3.0
置信度
创新性2.8
质量3.0
清晰度3.5
重要性2.8
NeurIPS 2025

Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry

OpenReviewPDF
提交: 2025-05-09更新: 2025-10-29

摘要

The Johnson-Lindenstrauss (JL) lemma is a cornerstone of dimensionality reduction in Euclidean space, but its applicability to non-Euclidean data has remained limited. This paper extends the JL lemma beyond Euclidean geometry to handle general dissimilarity matrices that are prevalent in real-world applications. We present two complementary approaches: First, we show how the JL transform can be applied to vectors in pseudo-Euclidean space with signature $(p,q)$, providing theoretical guarantees that depend on the ratio of the $(p, q)$ norm and Euclidean norm of two vectors, measuring the deviation from Euclidean geometry. Second, we prove that any symmetric hollow dissimilarity matrix can be represented as a matrix of generalized power distances, with an additional parameter representing the uncertainty level within the data. In this representation, applying the JL transform yields multiplicative approximation with a controlled additive error term proportional to the deviation from Euclidean geometry. Our theoretical results provide fine-grained performance analysis based on the degree to which the input data deviates from Euclidean geometry, making practical and meaningful reduction in dimensionality accessible to a wider class of data. We validate our approaches on both synthetic and real-world datasets, demonstrating the effectiveness of extending the JL lemma to non-Euclidean settings.
关键词
Johnson-Lindenstrauss LemmaNon-Euclidean Geometry

评审与讨论

审稿意见
5

This paper formulates the case of the JL lemma in non Euclidean cases. The authors propose two complementary approaches to achieve this generalization:

  • Pseudo-Euclidean representation, which shows how any symmetric hollow dissimilarity matrix can be represented as distances between vectors with a specific signature (p,q)(p, q). JL transform in this space guarantees w.p. a multiplicative factor that depends on the ratio of the squared Euclidean distance and the squared (p,q) distance. This offers a fine-grained performance analysis based on the data's deviation from Euclidean geometry.
  • Generalized power distances, which shows that any symmetric hollow dissimilarity matrix can be expressed as a matrix of generalized power distances. Here, the JL transform yields the multiplicative approximation with an additional additive error term, proportional to a parameter 'r' (radius) that quantifies the uncertainty level within the data (and its deviation from Euclidean geometry). This approach also offers a statistical interpretation using silhouette coefficients of Gaussian distributions.

The theoretical results provide error analyses that depend on parameters characterizing how the input data deviates from Euclidean geometry, allowing for practical and meaningful dimensionality reduction in a wider range of data types.

The authors validate their approaches on both synthetic and real-world datasets, showing their effectiveness and superior performance compared to the classical JL transform on non-Euclidean data.

优缺点分析

Strengths

This paper is very well written, and easy to follow. Additionally, the authors provide relevant real-world applications to the proposed methods, e.g. Minkowski space and Casey’s Theorem.

Weakness

JL transform is, by now, a very studied topic, with many efficient and scalable implementations. However, the authors chose only trivial/toy datasets to showcase their method. Additionally, Table 2 presents only non-robust metrics (i.e. max and mean) for comparison with classic JL - this is slightly unfair and the bounds hold only w.p. and the worst case error might be unbounded.

问题

  • How do you think the proposed method(s) will compare in Table 2, if we measure a robust statistic of relative error like median, ans/or some higher percentile
  • There are some existing methods that already use non Euclidean (pseudo) metrics in high dimensions, such as face recognition and information retrieval. I would appreciate a short discussion on the impact on large(r) sacle problems like these.

局限性

The authors do not dedicate a section to limitations w.r.t. the current SOTA, i.e. classic JL. Notwithstanding, the authors do point to possible future directions and cases where the applicability of the proposed approach is not known. Additionally, they note that a theoretical connection between the two proposed methods is not clear.

最终评判理由

This is a very nice paper, which addresses an gap in an important lemma. It was worthy of publication even before the rebuttal, and the authors have addressed my main concerns and questions

格式问题

No

作者回复

We sincerely thank you for the positive feedback and valuable suggestions! Responses to your concern are given below, please feel free with follow-up questions.

Response to “Table 2 presents only non-robust metrics for comparison with classic JL”

Thanks for pointing it out. We obtain the results on the median and 75 percentile of distortion for all three methods, and present them below. The same observations on their performances hold.

MethodSimplexBallBrainBreastRenalMNISTCIFAR10EmailFacebookMOOC
JL Median6.366.682.69e121.95e111.18e121.782.297.005.086.13
JL-PE Median2.572.781284.06174.743160.582.112.042.065.836.03
JL-Power Median12.791.001.01e78.20e67.57e71.301.351.351.161.99
--------------------------------------------------------------------
JL 75 percentile13.8514.514.69e122.30e118.02e125.788.0917.7011.0810.45
JL-PE 75 percentile5.315.7838864.333318.651.18e64.183.964.0112.5513.09
JL-Power 75 percentile15.301.001.50e88.59e73.85e81.621.911.921.312.20

Reporting the max and mean follows from a broad class of algorithms designed for metric embedding. In these problems, a point set A in one space is projected to a point set B in another space such that pairwise distances are preserved. The analysis is commonly done for the worst-case distortion and average distortion, corresponding to the max and mean relative error reported in our experiments. We completely agree a robust metric such as the median would greatly help with a comprehensive evaluation.

Response to “Impact on large scale problems such as facial recognition, which also uses non-Euclidean metrics in high-dimensions.”

Thank you for the thoughtful question. In applications like face recognition and information retrieval, non-Euclidean metrics such as cosine similarity or learned distances are commonly used. However, these approaches typically operate in the original high-dimensional space or rely on data-dependent embeddings learned through supervision.

The non-Euclidean JL, on the other hand, is data-independent. Though our results have extra terms in the bounds, they offer insights on uncertainty (rr in JL-power) or how deviated is a certain data pair from Euclidean (CijC_{ij} in JL-pq). In addition to dimension reduction helping with efficiency and scalability of those tasks, we also obtain some understanding of the geometry the dataset lies on.

Lastly, we largely agree with you that many valuable questions on ML tasks can be asked about the new JL transforms. We regard these as future directions.

Response to “The authors do not dedicate a section to limitations”

We thank the reviewer for the valuable suggestion. We will add a limitation section to the paper as below.

We establish two approaches for non-Euclidean JL transform. However, we do not have a proof showing the conditions that one outperforms the other; neither a connection between the two representations. Another limitation is we are not aware of any lower bound result complementing the fine-grained JL guarantee from the pseudo-Euclidean representation. Last, our work is mainly theoretical, the practical impact of applying the proposed JL transforms in downstream ML applications is one of the future directions.

评论

I wish to thank the authors for the comprehensive response, which addressed my concerns.

A small clarification on the large scale question -
My intention was that even if such problems (like face recognition) operate on non-Euclidean metrics (line cosine), they might still enjoy any boost in performance that JL might offer. If the approximation is tighter, and/or better fit to the metric, the boost might be more significant w.r.t. 'standard' JL.

A small note on the limitation -
Thanks for sharing this. Indeed, both in Table 2 and the new Table in the rebuttal, one can observe the inconsistency between the two suggested methods; Even if one of them always improves distortion, usually the other might still be very close to standard JL.

Last question on the tables -
Did you share any 2nd order statistics? IOW - how consistent is the improvements in any of the variants

评论

Thank you for the thoughtful follow-ups!

Other applications

Thanks for clarification. Indeed, although in the literature there were strong worst-case lower bounds for applying standard JL on non-Euclidean metric spaces, as our results show, it does not mean there is no hope for JL type dimension reduction. We believe that such fine-grained study of dimension reduction performance for methods that fit well with the specific non-Euclidean metric will have potentials for these applications. Exploring more conditions on what types of JL variants works will be a interesting future direction. We’ll add this to the paper.

Limitation

That is an excellent observation and a great question to ask which method performs better. We do observe that when there are more negative eigenvalues (but not extremely), JL-PE often outperforms JL-power and vice versa. However we do not have a rigorous proof to show this connection. We will incorporate this discussion in the paper.

Experiment results

Thank you for raising this point! We present the variance/range of JL-median after 5 runs as below. We will also add this to other experiments in the paper, including the addition experiments on clustering + JL. We skip the Brain, Breast and Renal dataset because the scale is already too large.

MethodSimplexBallBrainBreastRenalMNISTCIFAR10EmailFacebookMOOC
JL Median6.36 ±\pm 0.136.68 ±\pm 0.062.69e121.95e111.18e121.78 ±\pm 0.012.29 ±\pm 0.027.00 ±\pm 0.115.08 ±\pm 0.096.13 ±\pm 0.01
JL-PE Median2.57 ±\pm 0.052.78 ±\pm 0.011284.06174.743160.582.11 ±\pm 0.052.04 ±\pm 02.06 ±\pm 05.83 ±\pm 0.046.03 ±\pm 0.01
JL-Power Median12.79 ±\pm 0.191.00 ±\pm 01.01e78.20e67.57e71.30 ±\pm 01.35 ±\pm 01.35 ±\pm 0.021.16 ±\pm 01.99 ±\pm 0.05

Appreciate your input! We believe it really helps towards a better presentation and deeper insight of the current draft.

评论

I think the note about the eigenvalue is great! Even if there's no rigorous proof, consider sharing this as a note or a Conjecture

审稿意见
4

The authors present variants to the Johnson Lindenstrauss lemma for non-metric settings. The first uses a generalization of metric distances by defining an inner product in Rp+q\mathbb{R}^{p + q} which adds the inner product of the first p dimensions and then subtracts the inner product of the next q dimensions. The second uses a separate generalization of distances termed "power distances", where each point is endowed with a radius and the distances between two points correspond to the length of the tangent lines between their corresponding circles. In both cases, the authors show theoretical results which are essentially consistent with those for the standard JL transform.

优缺点分析

Strengths:

  • I consider both cases interesting with regards to the JL lemma but find the main results to be the evidence that any symmetric hollow matrix can be written as either a pseudo-distance matrix or a power-distance matrix. I have questions regarding one of these results, but if everything is correct then this is a nice fact which would be useful for many applications beyond simply the JL-transform.
  • The paper is very clearly wriitten and quite easy to follow. I have experience in other classical dimensionality reduction algorithms (PCA, MDS, LLE, etc.) and this text introduced the relevant ideas so that I could follow them, despite not being particularly well-versed in JL theory.

Weaknesses:

  • A few steps are presented without sufficient explanation for their validity (specifically in Section 2.1)
  • I am not sure what purpose the long exposition on Ptolemaic identities serves.
  • The experimental results are presented with insufficient care in my opinion. Some small issues include:
    • The text in the plots is tiny
    • It is unclear what the entropic affinity is or why JL-PE suddenly performs better on this metric when it dramatically fails on the other ones.
    • It is not clear how one applies the JL transform to a graph dataset (perhaps I missed this somewhere?)
  • Absence of a limitations section (as encouraged by the guidelines). The limitations discussed in the checklist are quite spurious I and get the sense the authors were just trying to check a box rather than provide meaningful insight into how their work could be improved upon.

问题

My primary questions are regarding Section 2.1. I am not convinced by the provided text that one can take an arbitrary dissimilarity matrix and convert it into a set of observations in the given manner. The arguments are much too handwavy and something isn't passing the smell test for me. It would be good to write up a formal lemma or proposition and proof to clean this exposition up.

Furthermore, I believe that the "non-metric" pseudo-euclidean dissimilarities might actually be metric, just in a complex space.

Regarding the centering:

  • In the case of D containing squared Euclidean distances, it is clear that we have the equality 0.5CDC=CGXC-0.5 C D C = C G_X C. However, this is explicitly due to the fact that the distances are induced by a norm and therefore we can distribute DD.

    • Specifically, for some dataset XRn×d\mathbf{X} \in \mathbb{R}^{n \times d}, let α\mathbf{\alpha} be the vector of αi=xi,xi\alpha_i = \langle x_i, x_i \rangle and 1\mathbf{1} be the all ones vector. This lets us write any matrix DD of pairwise metric squared distances via D=1α+α12GXD = \mathbf{1} \mathbf{\alpha}^\top + \mathbf{\alpha} \mathbf{1}^\top - 2 \mathbf{G}_X. In this case, we get that the centering matrix cancels with the all-ones vectors and we get 12CDC=CGXC-\frac{1}{2} C D C = C G_X C.
    • However, in the setting considered in the paper, we can make no such assumption on the structure of DD since it is not obtained from an inner product space. Thus, I am not convinced that, for non-metric DD, we can re-write it as a Gram matrix via centering. This then throws the subsequent derivations into question.
  • Moreover... why are the centering matrices necessary at all for the operations as they are laid out in the paper? Ostensibly, since the matrix D is already symmetric, one could immediately compute its singular value decomposition and perform all of the subsequent steps without ever applying the centering. Am I missing something where the centering is actually necessary for the downstream calculations? Something doesn't feel right.

    • Specifically, the paper states "Since CDC/2−CDC/2 is a symmetric matrix, its eigenvalues are real..." But D is also symmetric by definition. So why are we applying the centering?

In summary, it seems the centering is necessary for the theory to obtain a Gram matrix which can be analyzed accordingly. But I am not convinced that this is well-motivated in the non-metric setting. This is my primary concern and resolving it would certainly increase my score.

Regarding the complex space

  • To make sure I'm understanding, the Λ\Lambda matrix in Section 2.1 could be replaced by simply having the xx values be complex values, right? I.e., we have some negative eigenvalues, so the XXTX X^T which produces those would consist of complex-valued observations.
    • But then this would imply that the dissimilarity matrix being considered is actually metric; it just happens to come from points which live in a complex space. If this line of reasoning is correct, then I feel that the presentation should be amended significantly to mention that the paper presents results for JL transforms in non-metric spaces and in complex metric spaces.

局限性

As mentioned above, the limitations as presented could be expanded on but it's not a big issue. I think the work could use an explicit limitations section which describes which assumptions were necessary, how these assumptions factor through, and how these assumptions may be avoided in the future.

最终评判理由

The authors' rebuttal addressed my concerns and I believe the paper should be accepted. My primary remaining concern is that I am not convinced that this is an immediately necessary result, but it is clean and I believe the neurips community is likely to find it useful.

格式问题

Some typos appear in the paper, like "hallow" instead of "hollow". Would be good to pass these through an LLM before camera-ready to double check that all words used are the correct ones. But no real issues with this.

作者回复

We are sincerely grateful for the valuable insights and thoughtful questions. We would like to address the questions by response below. Please feel free to add any follow-up questions.

Response to “Clarification of Section 2.1”

We provide more details related to Section 2.1. Hopefully this helps to clarify. We will add the clarification to the paper in the revised version.

  • The differences between the Gram matrix for Euclidean data and non-Euclidean data.

In the Euclidean setting, the Gram matrix B=CDC/2B=-CDC/2 with DD as the squared Euclidean distances and CC as the centering matrix has Bij=xi,xjB_{ij} = \langle x_i, x_j \rangle where ,\langle, \rangle is the standard inner product. The squared Euclidean distance is then induced by this inner product as d2(xi,xj)=xixj,xixj=xi,xi2xi,xj+xj,xjd^2(x_i, x_j) =\langle x_i - x_j, x_i - x_j \rangle = \langle x_i, x_i \rangle - 2\langle x_i, x_j \rangle + \langle x_j, x_j \rangle.

Now, for the pseudo-Euclidean space we replace the standard inner product with the bilinear form p,q\langle \rangle_{p,q} where u,vp,q=k=1puivik=p+1p+quivi\langle u, v \rangle_{p,q}=\sum_{k=1}^p u_iv_i-\sum_{k=p+1}^{p+q} u_i v_i. We keep a similar definition of (squared) distance: D(xi,xj)=xixj,xixjp,q=xi,xip,q2xi,xjp,q+xj,xjp,qD(x_i, x_j) =\langle x_i - x_j, x_i - x_j \rangle_{p,q} = \langle x_i, x_i \rangle_{p,q} - 2\langle x_i, x_j \rangle_{p,q} + \langle x_j, x_j \rangle_{p,q}. The relationship between the Euclidean squared distances matrix and the Gram matrix still holds in the pseudo-Euclidean space, as will be elaborated next.

  • A proof of the following claim.

Claim: For any hollow symmetric matrix DD, there is a symmetric bilinear form ,p,q\langle, \rangle_{p,q} for integers p,qp, q and nn vectors v1,,vnv_1, \cdots, v_n such that Dij=vivj,vivjp,qD_{ij}=\langle v_i - v_j, v_i - v_j \rangle_{p,q}.

Proof: Define B=CDC/2B=-CDC/2 where the centering matrix C=IJ/nC=I-J/n with JJ as the matrix of entry 11 everywhere. Expanding the multiplication, we have B=0.5(DDJ/nJD/n+JDJ/n2)B=-0.5 (D-DJ/n-JD/n+JDJ/n^2). In particular, the (i,j)(i, j) element BijB_{ij} is given by Bij=0.5(Dij1/nk(Dik+Dkj)+1/n2kDk)B_{ij}= -0.5 (D_{ij} - 1/n \sum_{k} (D_{ik}+D_{kj}) + 1/n^2 \sum_k \sum_{\ell} D_{k\ell}) . Call this equation (*). Notice that the last term is a constant that is independent of i,ji, j. We can also verify that the sum of each row in matrix BB is zero. So BB has at least one zero eigenvalue with an all-1 vector as the corresponding eigenvector.

Since BB is a symmetric matrix, all eigenvalues are real numbers. Suppose that there are pp positive eigenvalues and qq negative eigenvalues (we ignore the eigenvalue 0). Define Λ\Lambda to be a diagonal matrix of dimension p+qp+q by p+qp+q with the first pp entries to be 11 and the next qq diagonal entries to be 1-1. Now we can find nn vectors v1,vnv_1, \cdots v_n each of dimension p+qp+q, such that write B=VTΛVB=V^T \Lambda V where VV has jjth column vector as vjv_j. In particular, Bij=viTΛvjB_{ij}=v_i^T \Lambda v_j. Using the notation of the bilinear form of the (p,q)(p, q) signature, we can write Bij=vi,vjp,qB_{ij}=\langle v_i, v_j \rangle_{p,q}.

Now we claim that Dij=(vivj)TΛ(vivj)=vivj,vivjp,qD_{ij}=(v_i-v_j)^T \Lambda (v_i-v_j)=\langle v_i-v_j, v_i-v_j \rangle_{p,q}. Re-organizing, this is equivalent to Dij=Bii+Bjj2BijD_{ij}=B_{ii}+B_{jj}-2B_{ij}. To show that, we plug in the equation ( * ) on BijB_{ij} and verify that Bii+Bjj2BijB_{ii}+B_{jj}-2B_{ij} indeed gives DijD_{ij}. Here we use the fact that the last term in ( * ) is a constant and thus is cancelled out. Also DD is a symmetric hollow matrix, i.e., Dii=0D_{ii}=0 and Dij=DjiD_{ij}=D_{ji}.

Response to questions on the centering matrix.

It is correct that since DD is symmetric, we can directly write down D=ZTΛZD=Z^T \Lambda Z with Λ\Lambda as again a diagonal matrix with pp (qq) as the number of positive (negative) eigenvalues of DD. This means Dij=ZiTΛZj=Zi,Zjp,qD_{ij}=Z_i^T \Lambda Z_j = \langle Z_i, Z_j \rangle_{p,q}. This way, DijD_{ij} takes the form of doing a “dot product” of ZiZ_i and ZjZ_j with the (p,q)(p, q) bilinear form. We hope to keep the format to be consistent with the Euclidean setting where we take DijD_{ij} as the “dot product” of XiXjX_i-X_j with XiXjX_i-X_j. The centering trick and transforming DD into the Gram matrix BB achieve this, as shown above. This also means that when BB has all eigenvalues to be non-negative, we are indeed in the Euclidean setting and the coordinates are exactly the Euclidean coordinates recovering DD – in that case, the pseudo Euclidean JL is exactly standard Euclidean JL.

Response to questions on the complex space.

It is true that the pseudo-Euclidean bilinear form, defined by the matrix Λ\Lambda with entries of +1 and -1, can be represented algebraically by introducing complex-valued coordinates. Specifically, for a vector xRp,qx \in \mathbb{R}^{p,q}, we can construct a corresponding vector zCp+qz \in \mathbb{C}^{p+q} where coordinates corresponding to the -1 entries in Λ\Lambda are pure imaginary. The squared pseudo-Euclidean distance is then equivalent to the standard Euclidean squared norm of the difference between these complex vectors, as the imaginary unit ii squares to -1.

On the other hand, we are not using the complex inner product space. In a complex inner product space the inner product u,v\langle u,v \rangle is defined using the complex conjugate (i.e., u,v=uHv\langle u,v \rangle = u^H v), which ensures that the induced norm u2=u,u=kuk2||u||^2 = \langle u,u \rangle = \sum_k |u_k|^2 is always real and non-negative. In our case, the bilinear form u,vp,q=uTΛv\langle u,v \rangle_{p,q} = u^T \Lambda v does not involve a complex conjugate. As a result, the squared "distance" Dij=xixjp,q2D_{ij} = ||x_i - x_j||_{p,q}^2 for points in the (p,q)(p, q)-space can be negative. Further, our input dissimilarity measures DD do not necessarily meet triangle inequality. Thus we are handling non-metric measures.

We hope this helps to clarify. We will also consider adding these discussions to the paper.

Response to “What purpose the long exposition on Ptolemaic identities serves”

We will cut it shorter to make space for adding more clarification to section 2.1. The reason we put it here is because the Ptolemaic identities serve to introduce the power distance notion. More precisely, our introduction to the generalized power distance was inspired by the Casey’s theorem on Ptolemaic identity.

Response to Questions regarding the experiments

  • “The text in the plots is tiny.” Thank you for the observation, we have adopted larger fonts.
  • “Not clear what the entropic affinity is and why JL-PE performs better.” The entropic affinity is only used for the Genomics datasets and has been considered as a reasonable metric by a line of previous work, here we follow the common practice. We will add its definition to appendix. It is an excellent question to ask when JL-PE performs better than JL-power. Our observation is if there are many (but not too extreme) negative eigenvalues, JL-PE usually performs better. The case of genomics data is exactly due to this. However this is only an observation without rigorous proof.
  • “Not clear how to apply the JL transform to a graph dataset.” A graph with shortest distance metric is non-Euclidean and very common in data mining tasks. The input dissimilarity matrix has shape nn, corresponding to nn vertices, and each entry refers to the shortest distances between two vertices.

Response to “Absence of a limitations section with elaboration on assumptions”

We will add a section to make it explicit 1) the assumptions made in the paper; 2) limitations of current results; 3) computation and suggestions for practitioners, as below.

The assumption of the current work is that a matrix of non-Euclidean pairwise dissimilarities is given. We establish two approaches for JL transform in this setting, both requiring O(n3)O(n^3) time. There are several possible ways to speed up this step in practice – by using landmark MDS for example. Further, if the dimension reduction step is already embedded in some machine learning pipeline (e.g., with a representation learning step), there can be better ways to integrate the non-Euclidean JL step. For example, some work already considers learning a bilinear form (to replace standard dot product) [57, 58]. One may consider using JL in the pseudo Euclidean space. Another approach, motivated by the generalized power distance formulation, is to consider appending a proper term 4r24r^2 to the input matrix DD and later use the power distance (instead of standard Euclidean distance) with the recovered coordinates/vectors.

One limitation is we do not have a proof showing the conditions that one outperforms the other; neither a connection between the two representations. Another limitation is we are not aware of any lower bound result complementing the fine-grained JL guarantee from the pseudo-Euclidean representation. Last, our work is mainly theoretical, the practical impact of applying the proposed JL transforms in downstream ML applications is one of the future directions.

评论

Thank you for the detailed thoughts. I agree on all points.

For the added claim about the hollow symmetric matrix DD, I think this discussion would be useful in the main text. Indeed, I was missing some of these insights for how one can generalize the centering argument to general hollow symmetric matrices to obtain something akin to gram matrices from them.

I completely agree regarding the complex space -- I had overlooked the complex conjugate component of it all. There's likely no need to include this in the text as it does not actually apply; I was simply misunderstanding something and wanted to double check.

Thank you, I have updated my score accordingly.

评论

Thank you for the thoughtful review and swift reply! We will add the discussion on centering matrix in the revised version.

审稿意见
4

This paper generalizes the Johnson-Lindenstrauss lemma to (slightly) non-Euclidean geometries. We are given as input a dissimilarity matrix D for n points, which need only be symmetric and hollow (diagonal entries 0). The existential result gives points of dimension O(log n / eps^2) such that the pairwise distances equal the entries of D up to multiplicative (1 + eps) and additive 4 * eps * r^2, where r measures how much D deviates from Euclidean distances.

Two different versions are given. First, the desired dimensionality reduction is shown for points in "pseudo-Euclidean space" where, when computing the inner product, some coordinates are negated. Second, it is shown that for any symmetric hollow D, one can find points in "generalized power distance" space whose pairwise distances give the entries of D, and dimensionality reduction is also given for such points.

优缺点分析

Strengths

  • The second result is quite general and can work for any dissimilarity matrix D
  • Gives two nice generalizations of the JL lemma to non-Euclidean data
  • Appear effective in a few experiments

Weaknesses

  • The results, as stated, are non-algorithmic; it would be nice to know how efficiently these can be computed
  • It is not clear that these are the correct ways to generalize to non-Euclidean data for ML applications. It is only hinted at what the ML applications are.

问题

Please write "hollow" instead of "reflexive" on line 49, since that's the word you use in the rest of the paper.

What is the running time of your algorithm for finding the points in theorem 1.3?

Is there evidence that these notions ("pseudo-Euclidean space" and "generalized power distance") are the correct ones to use in ML applications?

局限性

Yes

最终评判理由

This is a nice theoretical paper. The main weakness is that it is unclear whether these notions ("pseudo-Euclidean space" and "generalized power distance") are the correct ones to use in ML applications, but the authors give some data and say they are studying this in future work.

格式问题

None

作者回复

We appreciate your suggestions and questions! Please see our responses below, and feel free with follow-up questions.

Response to “The results are non-algorithmic; it would be nice to know how efficiently these can be computed” and “What is the running time of your algorithm for finding the points in theorem 1.3?”

Thank you for raising the issue. We will clarify this in the paper.

For the two proposed JL algorithms for dimension reduction, as presented in Section 4, there are two steps: (1) Obtain the Gram Matrix and find the coordinates by our proposed methods. (2) Perform JL accordingly.

To find coordinates from the Gram matrix, it takes O(n3)O(n^3) time since we need to find eigenvalues of an n×nn \times n matrix, which is the same order of running time even in classical Euclidean settings (such as classical multi-dimensional scaling). For Theorem 1.3, we first find the smallest eigenvalue of the Gram matrix of DD. Using it to amend the input matrix DD, and then solve for the coordinates. If the coordinates are given, we can directly use JL which is O(nk)O(nk) with kk as the target dimension. There are several possible ways to speed up the coordinate recovery part in practice – by using landmark MDS for example. Further, if the dimension reduction step is already embedded in some machine learning pipeline (e.g., with a representation learning step), there can be better ways to integrate the non-Euclidean JL step. For example, some work already considers learning a bilinear form (to replace standard dot product) [57, 58]. One may consider using JL in the pseudo Euclidean space. Another suggestion is to append a proper term 4r24r^2 to the input matrix DD and later use the power distance (instead of standard Euclidean distance) with the recovered coordinates/vectors.

Response to “It is not clear that these are the correct ways to generalize to non-Euclidean data for ML applications.” and "“Is there evidence that these notions are the correct ones to use in ML applications?”"

Thank you for the valuable insight. Downstream ML applications are great directions and in fact these are our current/future work. Algorithms that go beyond non-Euclidean non-metric data for many of the data processing tasks are largely unexplored. We showcase the potential with a clustering example below.

Many real-world tasks do clustering after JL, which pre-processes the data into low-dimension with the expectation that the clustering quality would not degrade. We evaluate three JL transforms for k-means clustering on 4 datasets. Below we report the k-means cost on the original dataset, and the projected data by three JL transforms. It can be observed that the JL-pq and JL-power give much better results than JL in the Brain and Breast datasets. On the graph datasets, they are at least competitive and sometimes better. We believe this illustrates potentials of the proposed JL transforms to be a better dimension reduction for other downstream learning tasks with non-Euclidean data.

Dataset#Clusters (k)OriginalJLJL-pqJL-power
Brain53.92e-3728.003.1333.26
Breast40.038207.6928.9312.53
Email425614226342124420764
Facebook152049928414291644303203

We agree that a deeper insight and further study on how these notions fit the non-Euclidean setting would benefit a lot. Our work is making one step towards that --- with theoretical worst case performance analysis as well as empirical evaluation of their performance. It will be very interesting future work to consider integrate our work with additional downstream tasks and further evaluate the empirical performance.

Response to “Please write "hollow" instead of "reflexive" on line 49.”

Thank you. We have corrected the wording.

评论

Thank you, that all makes sense, and I maintain my support for the paper.

审稿意见
4

This submission aims to extend the Johnson-Lindenstrauss (JL) lemma to general non-Euclidean dissimilarity matrices by proposing two approaches: one based on pseudo-Euclidean embeddings and another based on generalized power distances. The authors provide theoretical guarantees and empirical results on both synthetic and real-world datasets.

优缺点分析

Strengths: Motivated Extension of JL Lemma: The work addresses an important and timely question: extending dimensionality reduction techniques beyond standard Euclidean spaces. The authors provide a structured attempt to generalize JL guarantees to non-metric or non-Euclidean dissimilarities, which are common in practice (e.g., in genomics, information retrieval, etc.). Conceptual Contribution: The connection between generalized power distance and data uncertainty is interesting and provides a new lens for interpreting structure in non-Euclidean datasets. The interpretation via eigenvalues of the Gram matrix offers some insight into when JL-like guarantees are meaningful.

Weaknesses: Incremental Techniques: While the problem setup is novel, many of the tools (e.g., pseudo-Euclidean embedding via centralized Gram matrices, standard JL transforms) are not new. The contribution lies mostly in framing and packaging, rather than in fundamentally new techniques or insights. Heavy Computation Assumptions: The approach assumes full access to the dissimilarity matrix and requires SVD of the Gram matrix, which is cubic in time and not scalable to large datasets. This makes the current form more of a theoretical contribution than a practical algorithm.

问题

Can the authors demonstrate the utility of the proposed JL embeddings beyond preserving pairwise dissimilarities? For example, do these embeddings improve performance in downstream tasks such as clustering, classification, or nearest neighbor search on non-Euclidean data?

局限性

The impossibility of worst-case dimension reduction guarantees for general non-Euclidean data.

最终评判理由

The authors' detailed rebuttal effectively strengthened the paper in two critical areas: (1) the new k-means clustering experiments provide compelling evidence of the proposed embeddings' downstream utility, directly addressing the need for enhanced empirical validation; and (2) the explicit acknowledgment of the work's primarily theoretical nature and the clear identification of scalability as future work appropriately frame the paper's scope and limitations. While scalability remains an open challenge, this is acceptably deferred for future research given the paper's theoretical focus. These positive developments in demonstrating practical relevance and clarifying the work's positioning confirm the initial recommendation. Therefore, I maintain my original score.

格式问题

N/A

作者回复

We sincerely thank you for the valuable feedback, please find our responses below, and feel free with follow-up questions.

Response to “Incremental techniques”

Indeed, the dimension reduction part, after our formulation to obtain the “coordinates”, resembles the classical JL lemma. This is on purpose as we wish to keep the simplicity and applicability of JL transform. The part of obtaining the coordinates so that we can apply JL style dimension reduction is the main contribution of the paper – when the input is a general hollow symmetric matrix and is no longer Euclidean. We propose two ways (pseudo Euclidean framework and the generalized power distance framework) to construct vectors such that applying JL type of random projection still have provable guarantees. Our proof of these guarantees are new, and any hollow symmetric matrix can be written as the generalized power distance may be of independent interest.

Response to “Heavy computation assumptions”

Thank you for raising the concern. The assumption of the setting is that we are given the input of a dissimilarity matrix of size n×nn \times n, for the purpose of exploring JL type random projection on non-Euclidean data input.

In this setting, running conventional dimension reduction methods (such as classical MDS) to recover coordinates takes O(n3)O(n^3) time, which matches with our computation time. There are several possible ways to speed up this step in practice – by using landmark MDS for example. Further, if the dimension reduction step is already embedded in some machine learning pipeline (e.g., with a representation learning step), there can be better ways to integrate the non-Euclidean JL step. For example, some work already considers learning a bilinear form (to replace standard dot product) [57, 58]. One may consider using JL in the pseudo Euclidean space. Another approach, motivated by the generalized power distance formulation, is to consider appending a proper term 4r24r^2 to the input matrix DD and later use the power distance (instead of standard Euclidean distance) with the recovered coordinates/vectors. The current paper is mainly theoretical and we consider the above discussion to be possible future work.

Response to “The utility of the proposed JL embeddings beyond preserving pairwise dissimilarities? For example, do these embeddings improve performance in downstream tasks?”

These are great directions and in fact these are our current/future work. Algorithms that go beyond non-Euclidean non-metric data for many of the data processing tasks are largely unexplored. We showcase the potential with a clustering example below.

Clustering is one of the downstream applications of JL, which pre-processes the data into low-dimension with the expectation that the clustering quality would not degrade too much. We evaluate three JL transforms for k-means clustering on 4 datasets. Below we report the k-means cost on the original dataset and the projected data by three JL transforms. It can be observed that the JL-pq and JL-power give much better results than JL in the Brain and Breast datasets. On the graph datasets, they are at least competitive and sometimes better. This helps to illustrate potentials of the proposed JL transforms for downstream learning tasks with non-Euclidean data.

Dataset#Clusters (k)OriginalJLJL-pqJL-power
Brain53.92e-3728.003.1333.26
Breast40.038207.6928.9312.53
Email425614226342124420764
Facebook152049928414291644303203

Response to “The impossibility of worst-case dimension reduction guarantees for general non-Euclidean data.”

We largely agree with you on this. Indeed it has been established in prior work [13, 14, 15] that JL transform has to use dimensions polynomial in nn when we consider non-Euclidean norms (L1L_1 or LpL_p). Thus we cannot expect to achieve the same dimension reduction results for our setting (which is more general). On the other hand, our results show performance guarantees that depend on parameters characterizing how the dissimilarities deviate from Euclidean distances. This means that distances that are close to Euclidean can still enjoy good performance.

Thanks again for helping us improving the paper and pointing out future directions!

评论

Thank you for the detailed rebuttal.

The new experimental results on k-means clustering are particularly helpful in demonstrating the downstream utility of the proposed embeddings. I also appreciate that the authors acknowledged the primarily theoretical nature of the current work and identified scalability as a direction for future work.

Since my initial assessment was already slightly positive, I will maintain my original score.

评论

Thank you! We will incorporate the clustering results to the paper.

最终决定

The paper studies dimensionality reduction for general dissimilarity matrix of n points. The key observation is that any symmetric dissimilarity matrix with 0s on the diagonal can be factorized into distances with the positive dot product component and the component with the negative sign. The paper shows that dimensionality reduction still works with approximation factor depending on the ratio between the Euclidean distance and the dissimilarity. Intuitively this characterizes how close the dissimilarity to Euclidean distances.

All reviewers are positive about the paper with varying degrees of support. Given the universal support from the reviewers, I recommend accepting the paper.