PaperHub
5.5
/10
Poster4 位审稿人
最低2最高4标准差0.7
2
3
3
4
ICML 2025

Exponential Family Variational Flow Matching for Tabular Data Generation

OpenReviewPDF
提交: 2025-01-24更新: 2025-08-16
TL;DR

TabbyFlow extends flow matching to tabular data generation by leveraging exponential family distributions to handle mixed data types efficiently.

摘要

关键词
Variational Flow MatchingExponential FamiliesTabular Data

评审与讨论

审稿意见
2

The paper introduces the application of Variational Flow Matching to tabular data generation. To extend VFM, the authors propose to represent the variational distribution in VFM as an exponential family. The motivation behind this proposal stems from the heterogeneous nature of tabular data thus, the claim is that the exponential family is suitable for each data type commonly found in tables i.e. Gaussian distributions for continuous variables like age or income, categorical distributions for discrete variables like education level etc.

A nice property of the exponential family is that its “linear” whereby E[x1]E[x_1] can be computed in closed form, enabling loss functions for different column types. To compute the loss, the authors employed the concepts of Bregman divergence. Hence, each exponential family will induce its own Bregman divergence for a unified handling of the heterogeneous tabular datasets.

update after rebuttal

My "experimental" concerns have not all been addressed with clear evidence in this rebuttal. They have only claimed that "it will be included in the final manuscript" hence, I still stand with my decision of rejecting the paper.

"This is one example but overall, no ablations to be found. For instance, what is the impact of having a pure Gaussian distribution vs. your various distributions parameterized using the EFs etc.?"

"Number of Function Evaluations (NFEs) and Efficiency"

"Privacy Evaluation and MIA".

All of which have not been included in this rebuttal.

给作者的问题

My main concerns are in the paper’s empirical findings.

  • The method assumes an interpolation scheme between p0p_0 and p1p_1. They chose the simplest (linear). If the data distribution p1p_1 is complicated, linear interpolation might traverse unrealistic areas of space. These are some unanswered questions that can be satisfied with synthetic datasets to explore the properties of their design choices.
  • No analysis on privacy-preservation. Methods like diffusion and flows can memorize training data thus, I am hoping to see Membership Inference Attacks on the synthesized data to assess privacy.
  • This is one example but overall, no ablations to be found. For instance, what is the impact of having a pure Gaussian distribution vs. your various distributions parameterized using the EFs etc.?
  • No discussion on training duration, training convergence, sampling NFEs.

While TabbyFlow is top-performer, the margins over the best diffusion model (TabSyn) are relatively small (fractions of a percent) — even smaller if we include TabDiff into conversation. Additionally, TabbyFlow seems to underperform across all datasets in MLE.

论据与证据

  • Unified handling of mixed data via exponential families
    • Each feature is modeled by a suitable exponential-family distribution
    • EF-VFM turns the joint generative problem into separate moment-matching problems for each feature
  • Connection to Bregman divergences
    • VFM can be viewed as minimizing a Bregman divergence tailored to each feature’s distribution
    • Claims that it aligns with known relationships (e.g. Gaussian → squared-error, categorical → cross-entropy are instances of Bregman divergences)
    • However, missing ablation studies for these properties.
  • TabbyFlow achieves state-of-the-art (SOTA) results on standard tabular data benchmarks, improving over GAN, VAE, and diffusion-based baselines in both fidelity (realism) and diversity
    • Concerns regarding missing TabDiff results. It’s understood that code is not publicly available but it would be nice if TabDiff’s results from their paper can be included in the tables
    • Additionally, per literature review regarding “Flow Matching in Tabular Data Generation”, it would be nice to acknowledge or even have comparisons to TabUnite (https://openreview.net/forum?id=Zoli4UAQVZ) too since they employ CFM to generate tabular data.
    • Privacy-perservation is one of the most crucial aspects applying tabular generation to the real-world, where synthetic data is generated to protect sensitive information. However, no privacy-preserving metrics such as Membership Inference Attacks are conducted on the synthetic samples.

方法与评估标准

Methods and Evaluation criteria is sound, aligning with TabSYN’s work.

理论论述

Proofs have been checked for proposition 3.1 and 3.2 in the Appendix which are mathematically sound and consistent with the established literature cited in their paper.

实验设计与分析

Experimental designs align with TabSYN making it sound. Analyses are straightforward and easily understood too.

补充材料

Supplementary material contains code. Code was not reviewed as it is computationally expensive to run diffusion models.

与现有文献的关系

Paper applies a varied form of VFM to satisfy the heterogeneity of tabular data. It also addresses previous tabular generative diffusion models (e.g., STASY, CoDi, TabDDPM, TabSyn). Additionally, it incorporates literature from variational flow matching, exponential family statistics, and tabular data modeling.

遗漏的重要参考文献

To my knowledge and quoted from the above “Additionally, per literature review regarding “Flow Matching in Tabular Data Generation”, it would be nice to acknowledge or even have comparisons to TabUnite (https://openreview.net/forum?id=Zoli4UAQVZ) too since they employ CFM to generate tabular data.”

其他优缺点

Originality

The paper is quite original in its approach. While it builds on known components (flow matching, exponential family), the particular combination – applying VFM to tabular data via exponential-family moment matching – is novel.

Significance

High-quality tabular data generation has important applications such as data augmentation and privacy-preservation. But again, I am concerned that privacy-preservation is not addressed.

Clarity

Paper is clear. It can be better if section titles such as “Connection to Flow Matching” can be more informative regarding the context of tabular data.

其他意见或建议

N/A

作者回复

Dear reviewer FzAE,

We thank you for your effort to review our work. Moreover, we appreciate you mentioning the originality of our exponential-family formulation and the value of Bregman divergence connections. We will reply to the points mentioned in the review here:

  • Regarding TabDiff and TabUnite, while TabDiff’s code and data were not publicly available for reproducibility at the time of submission, we now include its reported results (where applicable) and discuss the challenges of direct comparison. We also cite and discuss TabUnite explicitly as we agree the work is related, but we originally excluded as it is available only as a withdrawn openreview submission.

  • Your privacy concerns are well-founded, and we have now addressed this gap by evaluating TabbyFlow using a the Distance to Closest Record (DCR). For this metric, we aim to verify that the synthetic individuals in the generated data are not a simple copy or the result of a simple addition of noise to the real individuals in the original data. The for a given synthetic individual is defined as the minimum distance between him and every original individual. We chose this metric as it has been used by TabSyn and TabDiff, thus allowing a fair comparison with said methods in this task. Our method offers competitive protection without sacrificing fidelity. See table below.

  • As for the linear interpolation, following a comment made to reviewer RH9Z, the linearity assumption on the conditional velocity field is in the end point, which mean it can be any linear function in x1x_1, e.g. all diffusion-based models, such as flow matching, diffusion models, or other models that combine the injection of Gaussian noise with blurring satisfy this assumption. Moreover, a linear conditional velocity does not imply we learn 'linear' dynamics, and as seen in many settings, flow matching and diffusion can learn highly complex dynamics. At last, and connected to the previous answer, the ODE formulation is indeed compatible with other geometries (as done in Riemannian (Variational) FM and Metric FM) and SDE (see the VFM paper or Albergo 2023 for stochastic interpolant formulation). As we notice this as a recurring confusion among the reviewers, we will add a section emphasising this fact in the final version of the paper.

  • Last, though we did not make this point clear enough in the first version of the work, TabbyFlow used significantly fewer NFE during inference, a point we elaborate on in our response to reviewer NBdr.

Comparison of DCR methods across five datasets.

MethodAdultDefaultShoppersBeijingNews
TabDDPM51.14±0.1852.15±0.2063.23±0.2580.11±2.6879.31±0.29
TabSyn50.94±0.1751.20±0.2852.90±0.2250.37±0.1350.85±0.33
TabDiff50.10±0.3251.11±0.3650.24±0.6250.50±0.3651.04±0.32
TabbyFlow50.32±0.1650.82±0.2750.17±0.3250.94±0.1350.83±0.29

Thanks again for the time to review our work and the useful comments.

审稿人评论

Thank you for your rebuttal. I am still unconvinced regarding the interpolation schemes and my point on "a pure Gaussian distribution vs. your various distributions parameterized using the EFs". These questions have not been answered with ablations/experiments. The same is also the case for NFEs, training/sampling duration and training convergence regarding experiments. Lastly, existing privacy ML literature such as [1] and [2] have conducted extensive research highlighting the “Inadequacy of Similarity-based Privacy Metrics” such as DCR. Thus, conducting MIAs per the initial review would strengthen your case for privacy-preservation.

[1] Ganev, Georgi et al. "The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks against “Truly Anonymous” Synthetic Datasets" [2] Ward, Joshua et al. "Data Plagiarism Index: Characterizing the Privacy Risk of Data-Copying in Tabular Generative Models”

作者评论

We thank the reviewer for their follow-up and for highlighting three remaining concerns:

  • The interpretation and implications of our use of exponential family (EF) distributions compared to "pure Gaussians," and the nature of the interpolation scheme.
  • The computational efficiency of our method, specifically regarding the number of function evaluations (NFEs) and training/inference dynamics.
  • The adequacy of our privacy evaluation metrics and the potential use of membership inference attacks (MIAs).

We will address the points below.

1. Interpolation Scheme and Use of Exponential Families

We would like to emphasize that the linear interpolation assumption made in our method is the standard setup in the flow matching literature, where it defines a conditional trajectory between endpoints. While this conditional interpolation is linear, the aggregated dynamics across the entire data distribution can be arbitrarily complex. In fact, all flow matching and diffusion-based models assume such linearity in their conditional paths.

Similarly, the use of a Gaussian distribution—parameterized via EF sufficient statistics—is a standard way to model a distribution over linear trajectories at each time point. As these parameters evolve over time, they can represent highly non-linear and complex generation dynamics. This is precisely what the Variational Flow Matching (VFM) framework formalizes.

To explore the impact of using different EF distributions, we performed an ablation in a continuous setting. We compare the use of Gaussian and exponential distributions for parametrizing the flow. Results from this toy experiment (now included in the Appendix) indicate that alternatives yield reasonable performance as well. We note that for discrete variables, the categorical distribution is essentially the only EF option. In the conclusion, we now outline further research directions involving more complex EFs (e.g. Wishart flows over covariance matrices).

2. Number of Function Evaluations (NFEs) and Efficiency

We now explicitly report the NFEs used by our method compared to diffusion-based baselines. Our model operates with only 100 NFEs, in contrast to the 1000 typically used in diffusion models. This tenfold reduction leads to significantly faster inference. Moreover, we highlight that diffusion model performance degrades substantially when restricted to only 100 NFEs, while our model retains its performance due to the deterministic nature of its integration. We have added this comparison in the Appendix and will also include it in the main table of the final version.

3. Privacy Evaluation and MIA

We understand the reviewer’s concerns regarding the limitations of similarity-based metrics like Distance to Closest Record (DCR). We agree that more rigorous approaches such as membership inference attacks (MIAs) or the Data Plagiarism Index provide stronger guarantees. At the time of rebuttal, we prioritized consistency with prior work (e.g., TabSyn, TabDiff) by using DCR to enable direct comparison.

However, in response to your suggestion, we have now computed the Data Plagiarism Index for our model and added a discussion of its implications in the Appendix. We also plan to implement MIA-based evaluations in the final version, as we agree that they provide a more robust perspective on privacy preservation.

We thank the reviewer again for their thoughtful feedback. We hope these additional experiments and clarifications address the remaining concerns.

审稿意见
3

In this work, the authors propose a new method called Exponential Family Variational Flow Matching which adds a variational formulation on top of VFM which helps them leverage sufficient statistics/moment matching procedure to obtain a probabilistic generative modelling framework. The exponential family perspective enables them to formulate as a single problem where all the different kinds of data seen in tabular data can be seen as inividual cases enabling their joint modelling rather than taking them inidivually. They also make connections between the objective from their formulation as a case of Bregman divergence minimization. The framework is tested on some popular tabular datasets and compare the performance with recent state of the art probabilistic modelling frameworks: CTGAN, TVAE, GOGGLE, TabSyn etc.

给作者的问题

  1. Is it possible to do classification with this approach, for eample one could have an additional row which can contain binary or multi class labels ? , if it is possible then one can compare against Transformer based approaches on tabular data classification and compare discriminative modelling vs generative modelling.
  2. Is it a limitation to consider only linear conditional velocity fields ?
  3. Does the framework only support ODE based formulation, will non-linearity or a SDE based formulation of conditional velocity field make the problem intractable.

论据与证据

The claims sound fine to me, I would have liked to see a more clear discussion on modelling assumptions and limitations as there were clearly many made.

方法与评估标准

The methods and evaluation criteria used in the paper make sense and it is good to see results reported for many metrics such as C2ST, alpha recall and so on, but I felt that they could have used more downstream tasks such as missing value imputation and compare the results to TabSyn which is clearly the best out of baseline methods.

理论论述

The theory , equations and derivations in the paper looked correct to me. The discussion on connections between Bregman divergences and flow matching objectives is enlightening and given supporting theorems add value to the paper.

实验设计与分析

The experimental design looked fine to me. The authors could come up with a similar spider plot as given in Fig 1 of Zhang ICLR 24.

补充材料

Supplementary material looked fine to me, I only did one quick pass of it.

与现有文献的关系

The paper builds on the earlier work on Flow matching objective for tabular data parameterising with Transfoermer architecture of Eijkelboom et al. 2024 and their work is contemporary to other diffusion based approaches of TabSyn

遗漏的重要参考文献

I think the paper coverred most references I could think of after doing a bit of literature review, althought this work is closely related to generative modelling some recent work on classification with tabular data for example: Prior-Data Fitted Networks Muller et al. 2022, tabPFN could be discussed too where Transformer based architectures have done well on tabular data.

其他优缺点

  1. More work/discussion for other downstream tasks for generative modelling such as missing value imputation could improve the paper.
  2. Theoretical discussion on connections between Bregman divergence and variational objective for CFM is a solid contribution of this work.
  3. Experiment results could benefit from having higher dimensionality for test datasets and benchmarking (highest was D=46).
  4. Paper looks well written to me and the story sounds cohesive.
  5. The work does not very clearly state its limitations: linearity of conditional velcoity field ?-
  6. The paper introduces C2ST as an evaluation metric, which I did not see earlier.
  7. In results in Table 3, authors only bolded their own results while TabSyn gets almost similar performance and well within standard error intervals. Please also make TabSyn result bold in Average Column.

其他意见或建议

Please write in caption of Table 3, how many runs were performed.

Written above and below. I will wait for other reviewers feedback who might be more up to date with contemporary literature than me.

伦理审查问题

NA

作者回复

Dear reviewer RH9Z,

First, we'd like to express our gratitude for the thorough and extensive feedback on the paper. Our response to the points raised is as follows:

  • We agree that the theoretical assumptions/limitations should be made more explicit, especially as this seems to be a recurring point of confusion for the reviewers. To this end, we will include an extra section in the paper covering these assumptions and why they are typically satisfied. We also agree that adding figures would be beneficial to better explain our work and as such will include those in the final version as well. We will respond to specific points of confusion at the end of this rebuttal as well.

  • We also agree that more downstream tasks could have been included. To this end, also aligning with feedback from other reviews, we included an analysis on a privacy metric, where we show SOTA performance (for a discussion on the privacy metric, please see our response to reviewer FzAE). We hope this addresses the provided point, even though another task then proposed has been considered due to time constraints.

  • Next, thank you for pointing out the bold font typos. These have been fixed in the final version of the paper, and we added a global ranking for all method for easier comparison. We also added more experimental details.

  • Regarding the questions to the authors, firstly, though it is definitely possible to do the proposed task, we believe it to be out of scope for our work as it is not part of the common benchmarks as far as we are aware. Second, we want to emphasise that the linearity assumption on the conditional velocity field is in the end point, which is not saying the interpolation needs to be a straight line, but can be any function linear in x1x_1, e.g. all diffusion-based models, such as flow matching, diffusion models, or other models that combine the injection of Gaussian noise with blurring satisfy this assumption. Moreover, a linear conditional velocity does not imply we learn 'linear' dynamics, and as seen in many settings, flow matching and diffusion can learn highly complex dynamics. At last, and connected to the previous answer, the ODE formulation is indeed compatible with other geometries (as done in Riemannian (Variational) FM and Metric FM) and SDE (see the VFM paper or Albergo 2023 for stochastic interpolant formulation).

Thank you one more time for reviewing our work and the useful comments.

审稿意见
3

This paper proposes a new method that introduces variational flow matching to table generation. Specifically, it incorporates a function family, exponential family, for mapping the table data to the prior, which is a more general form for flow starting from widely used priors.

给作者的问题

Please address my concerns on the exact effect of jointly modeling different modalities with VFM, issues of experimental designs and potential typos in the posted result. I will also update my review eagerly after checking the review of other reviewers.

论据与证据

The main claim on the performance improvement of this paper should be that using different priors for different data modalities in tables should better fit the need of different modalities, especially those not so matched with Gaussian distribution used in existing methods. Hence, more persuasive evidence should be given for those data modalities whose priors are changed in the proposed method compared to existing methods that performance on these modalities are distinctly improved. This is the main concern of the claims.

方法与评估标准

The proposed method introduces variational flow matching, an update version of flow matching more appropriate for multimodal problems to table generation. This should make sense if the method could empirically improve the performance on widely used benchmarks. Hence, please refer to the review on experimental designs for issues.

理论论述

I do not check the correctness of all theoretical claims and assume they are correct. I will refer to the opinions of other reviewers and update my review accordingly.

实验设计与分析

The experiment design roughly follows those in the existing literature, whose soundness is widely validated. However,

[1] There seems to be many typos in the tables, that highlights are given to not-the-best score among all compared methods. For example, in Table 1 Magic, TabDDPM yields 1.01 while the highlighted TabSyn gets 1.03. There are many similar issues in all tables, raising concerns on the validity of the experiment.

[2] The performance improvement brought by the proposed method seems to be minor, especially for precision and recall. Additionally, the design is slightly different from existing literature, that this paper separates the trend and shape error rates in comparison, compared to the original paper of TabSyn, which compares the error rates from column-wised and pair-wised correlation. Could the author explain the reason for such differences?

补充材料

I do not check the theoretical proofs in the supplementary material and assume the correctness of the result. I am open to refer to other reviewers' opinion and update my review accordingly on any potential issues.

与现有文献的关系

To the best of my knowledge, this is the first flow matching method for table generation. It

遗漏的重要参考文献

I do not cover any literature that should be additionally included.

其他优缺点

N/A

其他意见或建议

[1] There seems to be typos when highlighting the best performance in tables. In Table 6 - default, the best should be TabDDPM but the highlight is given to TabSyn. In Table 6 - Beijing, GOGGLE has much lower RMSE than the highlighted TabSyn. Similar things take place in Table 5 - Magic and Table 4 Beijing.

作者回复

Dear reviewer NBdr,

Thank you for carefully examining and giving feedback on our work. We will reply to the points raised one by one:

  • Regarding the typos in our work, that is poor validation on our end, thank you for pointing this out. We made sure that not only the bold fonts are now correct, but we also added a global ranking for each paper to make the comparison simpler to follow.

  • You rightly point out that our model only marginally outperforms the current SOTA in some instances. Though this is correct, we did not highlight an important benefit is our approach (and many other flow matching approaches) enough in the manuscript. Our model achieves this marginally better performance while not only being simpler to train - as has been the case between FM and diffusion alternatives in other settings - but especially while requiring significantly fewer NFE (network evaluation) than diffusion during inference. This means that inference with TabbyFlow is faster than the other SOTA models, without compromising on performance. We highlight this fact more clearly in the final version of the paper.

  • Regarding the metrics, you are correctly pointing out that we discuss the error rates on Shape and Trend while TabSyn uses column-wise density estimation and pair-wise column correlation. We used the terminology from TabDiff as it was the most recent work on tabular data, where the metrics are referred to as Shape and Trend. Shape corresponds to the column-wise density estimation, where one employs the Kolmogorov-Sirnov test for numerical columns and the total variation distance for categorical column. On the other hand, Trend corresponds to pair-wise column correlation, where we use Pearson correlation for numerical columns and contingency similarity for categorical columns. We have further explained these metrics and terminology on the experimental section of the paper.

Once again: thank you for your time to review our work and the useful comments provided.

审稿人评论

Thank you for the reply. I think my concerns on typos and performance have been well addressed. But my main concern lies in the potential cherry-pick in the experiment, that the metric may be carefully selected for maximizing the advantage of the proposed method. Admittedly this is a good paper so I give it a positive score. But I would like to see the same metric of TabSyn for the proposed method and TabSyn for a more comprehensive comparison. If then I will raise my score.

作者评论

Thank you for your comments regarding the typos and performances and further questions. We will again address them point by point.

First - and we now see the confusion - we want to emphasize that our metrics and TabSyn's metrics are the same. The only difference is how we named them, where we picked the names from TabDiff. We have highlighted this difference in naming, such that the paper now explicitly reports 'Error rate (%) of column-wise density estimation' (also called Shape), and 'Error rate (%) of pair-wise column correlation score' (also called Trend). That is, we would like to emphasize that we did not cherry pick the metrics and simply reported the standard metrics for tabular data.

For the final version, we recomputed these metrics for all baselines, including TabSyn, as also provided in the tables in the response to other reviewers. There is some variance with respect to the TabSyn paper results. In the TabDiff paper a similar variance was observed, and they show similar performance to what we obtained numerically. In line with another review, we have added TabDiff's results to the tables, where it was initially excluded due to the lack of available code to replicate its results.

We hope this addresses your final open questions.

审稿意见
4

They propose TabbyFlow, a variational flow-matching method for generating mixed tabular data. An advantage over previous methods is that the exponential-family version allow modelling mixed data (continuous and categorical), and contrary to other methods even other types of data such as Poisson counts. The theory aspect of the paper is strong. They derive interesting connections to Bregman divergence. The approach that they propose has strong theory and generality and thus could be expended to various types of data not explored in the paper. It include multiple metrics on diverse datasets.

给作者的问题

already asked

论据与证据

Claims:

  • their method allows modelling over mixed data and more (true)
  • EF-VFM objective and Bregman divergences (true, good theory to support it)
  • state-of-the-art performance on benchmark tabular datasets (true for the methods tested against, but lacks a distribution metric metric and flow-matching baselines)

方法与评估标准

It is strange that baseline comparisons are using diffusion, VAE, GANs, but not flow-matching. Since their approach is one of flow-matching, they should be comparing to at least one flow-matching mixed-data generator baseline which exist in the literature. A few exists: https://arxiv.org/abs/2309.09968, https://openreview.net/pdf?id=Zoli4UAQVZ (although the latter might be too recent to be included, I'm not sure what are ICML rules about concurrent work).

The metric "error rate" is not explained, which one is it KST or TVD? Why not just report both metrics separately? And similarly for trend, why not show separately and together the correlation for numerical and contingency sim for categorical pairs? At least having those in the appendix would be helpful to see how methods differs wrt categorical vs numeric features.

Please clarify when the alpha-precision and beta-recall metrics are, and not just an intuitive idea of what they measure.

Add ranking to Table 1 and 2 since you include it in Table 3 and 4.

It is missing a distribution metric which is fundamental to the task that is solved which is tabular data generation. Assessing performance should be done first and foremost by looking at the distance between real and fake distributions at the data-level (not per-feature, like done in Table 1). For this, Wasserstein distance or Maximum Mean Discrepancy (MMD) can be used. In https://arxiv.org/abs/2309.09968, they used a specific preprocessing to ensure that Wasserstein work on both categorical and numeric data.

理论论述

The variational formulation is correct and convergence is ensured by minimizing the KL divergence.

实验设计与分析

See "Methods And Evaluation Criteria"

补充材料

Implementation details and data details are correct.

与现有文献的关系

Overall, the contributions are correctly referring to the relevant prior work. The only thing, is that since the method is using flow-matching, the intro should not just mention diffusion methods, but also other references of flow-matching tabular generators (a few exists in the literature).

遗漏的重要参考文献

See "Relation To Broader Scientific Literature"

其他优缺点

.

其他意见或建议

.

作者回复

Dear reviewer rhRe,

Thank you for your thoughtful and constructive review. We are glad that the theoretical contributions and generality of the method came across clearly. We will reply to the points raised in the review pointwise:

  • We fully agree that including flow-matching baselines is important for a fair evaluation. We have now added results from the flow-based gradient-boosted tree model. We also now explicitly discuss TabUnite, which we initially excluded, as it was only available on OpenReview as a withdrawn submission. We agree that it is relevant, and we now cite and briefly discuss it in the related work section, but we are happy to report that TabbyFlow is still on par with SOTA approaches, and achieves this performance with viewer forward evaluation than e.g. the diffusion-based approaches (see rebuttal NBdr in case of interest).

  • We also acknowledge that some of the metrics and terminology were unclear in the original submission. In the revised version, we have clarified the definitions of error rate of trend/shape scores, and alpha-precision/beta-recall both in the main text and appendix. We used the aggregated values as it has previously been done in TabSyn and TabDiff, but we acknowledge it can be useful to disaggregate on numerical vs. categorical and, as such, provide these results too. We observe that on both modalities our approach performs well.

  • You rightly point out the absence of distributional distance metrics, which can be fundamental to the problem. In response, we now report the Wasserstein distances between the synthetic dataset and the original data, following recent benchmarks by reporting the distance to the train set and the test set. These additions confirm the strength of our approach from a distribution-matching perspective.

  • All result tables now include consistent bolding and method-wise ranking across datasets.

ModelWD (train)WD (test)
TVAE4.6±0.34.9±0.1
CTGAN7.8±0.27.7±0.1
TabDDPM3.1±0.63.9±0.5
TabSyn2.2±0.43.0±0.3
TabDiff2.4±0.32.9±0.2
TabbyFlow1.7±0.72.1±0.4

Table: Wasserstein Distance (WD) between synthetic data set and train/test datasets. Lower values are better.

Thank you once again for the kind words and time to review our work.

审稿人评论

Thank you for addressing my comments. This is a good paper.

作者评论

Thank you for your kind words and for taking the time to review our work. We appreciate your feedback and are glad that you found the paper to be of good quality.

最终决定

The authors propose a new deep generative model for mixed (i.e. heterogeneous) tabular data. They base their model on variational flow matching (Eijkelboom, et al., 2024) and they handle heterogeneity using exponential families. They draw a nice connection between their work and Bregman divergences. In the experiments, they compare against various other deep generative models for tabular data, for generative modelling and some downstream tasks.

Mixed-data modelling is somewhat overlooked in the deep generative modelling literature, and reviewers generally agreed this paper proposes an elegant solution to the problem, nicely combining exponential families and variational flow-matching. The main criticisms of the paper were mostly experimental: reviewers regretted the lack of use of a distributional metric, and the limited scope of the considered downstream tasks. The authors partially addressed these concerns in the rebuttal (given the limited amount of time, they did a remarkable job in my opinion). The lack of ablations studies was highlighted.

Since I believe the qualities of the paper slightly outweigh these issues, I am choosing the "Weak accept" recommendation. If the paper is accepted, I strongly encourage the authors to strengthen the experiments on downstream tasks (and add, say, missing data imputation, as suggested by one reviewers), in particular the privacy experiments, and evaluation metrics (additionally to Wasserstein distances, I think that looking at approximations of the test log-likehood would be very valuable, since most of those models come with likelihood approximations).

References mentioned in the submission

  • Eijkelboom, et al. Variational flow matching for graph generation. arXiv preprint, 2024