PaperHub
5.2
/10
Rejected5 位审稿人
最低3最高6标准差1.2
6
3
6
5
6
3.8
置信度
正确性2.6
贡献度2.6
表达2.0
ICLR 2025

Chinese Inertial GAN for Writing Signal Generation and Recognition

OpenReviewPDF
提交: 2024-09-26更新: 2025-02-05

摘要

关键词
Inertial SensorsHandwriting RecognitionSignal GenerationHuman-Computer InteractionDisabled People

评审与讨论

审稿意见
6

The paper presents an innovative approach to addressing the limitations in performance of inertial sensor-based systems for Chinese character recognition, which traditionally rely on extensive manual data collection. By introducing a Chinese Inertial Generative Adversarial Network (CI-GAN), the study offers a solution that generates unlimited, high-quality training samples, thereby providing a flexible and efficient data support platform for classification models. This method significantly reduces the dependency on labor-intensive data gathering and enhances the overall performance and feasibility of using inertial sensors for HCI in the context of disability.

优点

  • Clear Motivation and Feasible Approach: The paper is driven by a well-defined goal—to improve HCI for disabled individuals using inertial sensors. The proposed solution, a generative adversarial network (GAN) for data generation, is not only innovative but also practically feasible, as evidenced by the experimental results.

  • Innovation and High Performance: By introducing advanced techniques of CGE, FOT and SRA, the study significantly enhances the recognition accuracy of Chinese characters, with performance improvements reported from 6.7% to 98.4%.

  • Social Impact and Community Contribution: The research addresses significant accessibility issues for disabled individuals and adds substantial value to the community by releasing the first Chinese writing recognition dataset based on inertial sensors, enabling further advancements in the field.

缺点

  • Visualization and Clarity of Diagrams: The diagrams in the paper could be improved for better systematic representation and intuitiveness. Visualizing abstract constraints and regularization techniques more clearly would aid in understanding the complex interactions within the model. The task and symbols need a more detailed defination to improve the understanding.
  • Detailed Justification of Model Constraints: The paper could be improved by including more detailed exploration of the motivations and effectiveness of using specific constraints such as the Forced Optimal Transport (FOT). A deeper discussion on why aligning input stroke encoding features with generated signal features and real signal features; and why utilizing Wasserstein distance as regularization can mitigate mode mixing and mode collapse is necessary to validate the approach.
  • Analysis of Robustness Under External Disturbances: The paper lacks a thorough analysis of the system's robustness in the presence of external disturbances. Detailed insights into how these factors affect the system and recommendations for enhancing robustness would strengthen the paper.

These points should be addressed to enhance the overall comprehensibility and impact of the research.

问题

As shown in Weaknesses.

评论
  1. For W1: Thank you for your constructive feedback on the visualization and clarity of the diagrams. In response, we have significantly improved Figure 1 to provide a more systematic and intuitive representation of our framework. The updated diagram now explicitly defines the key tasks and symbols, ensuring that the roles of each component, such as CGE, GAN, FOT, and SRA, are visually clear and aligned with their descriptions in the text.  Additionally, we have enhanced the representation of abstract concepts like constraints and regularization techniques by incorporating detailed annotations and visual cues. For example, we illustrate how FOT mitigates mode collapse and mode mixing with specific examples, and the semantic alignment enforced by SRA is clearly depicted to highlight its interaction with other components. We sincerely appreciate your suggestion and kindly invite you to review the revised figures and look forward to your feedback.

  2. For W2: Unlike images, where the quality of generation can often be assessed visually, it is challenging to determine whether generated time-series signals are realistic or semantically correct. This necessitates the use of strong constraints like FOT to ensure the quality, diversity, and semantic accuracy of the generated signals. FOT achieves this by forcibly aligning the glyph encoding features, generated signal features, and real signal features using the Wasserstein distance. This alignment ensures both semantic correctness and motion fidelity in the generated signals, effectively mitigating mode collapse and mode mixing.

    To further address your concern, we have added a new section in the appendix titled Mathematical Explanation of FOT for Preventing Mode Collapse. In this section, we provide a rigorous mathematical derivation to demonstrate how FOT mitigates mode collapse. Briefly, FOT preserves the diversity of generated signals by penalizing incomplete mode coverage and mode mixing. We kindly invite you to review this section, which we believe offers a solid theoretical foundation for the effectiveness of FOT in addressing this critical issue.

  3. For W3: To address your concern, we conducted additional experiments to thoroughly evaluate the system's robustness under external disturbances, specifically by introducing varying levels of Gaussian noise to the real inertial signals during training. The Gaussian noise was added at proportions of 0.0%, 5.0%, 10.0%, and 20.0% of the original signal's standard deviation to simulate sensor inaccuracies and environmental interference. Under different levels of Gaussian noise added to the real inertial signals, we trained CI-GAN models to generate 15,000 IMU signals for each noise setting. These generated signals were then used to train six classifiers (1DCNN, LSTM, Transformer, RF, XGBoost, and SVM), and their classification accuracy was evaluated using 5-fold cross-validation. The results, presented in the table below, reflect the accuracy of the classifiers under varying noise conditions. |Noise ratio|1DCNN|LSTM|Transformer|RF|XGBoost|SVM|
    |-|-|-|-|-|-|-| |0.0%|95.7%|93.9%|98.4%|83.5%|93.1%|74.6%| |5.0%|95.2%|94.1%|98.0%|82.9%|93.3%|71.8%| |10.0%|94.5%|92.3%|97.1%|81.7%|92.6%|70.7%| |20.0%|93.9%|92.5%|95.9%|79.8%|91.0%|69.4%|

    These results demonstrate that the system maintains high performance even under significant noise levels. While performance slightly decreases with higher noise ratios, the overall degradation is minimal. This robustness is attributed to the combined contributions of Glyph Encoding Regularization (GER), Forced Optimal Transport (FOT), and Semantic Relevance Alignment (SRA). CGE introduces a regularization term based on Rényi entropy, which is the first embedding targeted at the shape of Chinese characters rather than their meanings, providing rich semantic guidance for generating handwriting signals. FOT establishes a triple-consistency constraint between the input prompt, output signal features, and real signal features, ensuring the authenticity and semantic accuracy of the generated signals and preventing mode collapse and mixing. SRA constrains the consistency between the semantic relationships among multiple outputs and the corresponding input prompts, ensuring that similar inputs correspond to similar outputs (and vice versa), significantly alleviating the hallucination problem of generative models. Together, these components ensure the system's resilience to external disturbances and its capacity to generate realistic and accurate signals under challenging scenarios.

We sincerely thank you for your thoughtful and constructive feedback, which has greatly helped us improve the rigor and robustness of our work. Your recognition would mean a great deal to us, and we truly hope that our revisions meet your expectations.

评论

I hope this message finds you well. In response to your comments, we provided a detailed point-by-point reply, addressing each concern thoroughly. We believe it is crucial to highlight that CI-GAN, as a generative model for inertial sensor handwriting signals, is fundamentally different from classification models and datasets designed for visual or pen-based handwriting recognition. Comparing CI-GAN to such methods or datasets would result in a methodological misalignment. This key distinction underscores the novelty and specificity of our contribution to this field.

Additionally, we clarified that our work fills a critical gap by creating the first IMU dataset of Chinese handwriting and demonstrated CI-GAN's effectiveness through rigorous experiments and comparisons with established augmentation techniques. Given the absence of comparable generative models for IMU signals, our approach represents a foundational step in this area. We hope our replies provide the necessary context to address your concerns comprehensively.

Since submitting our revisions, we have been anxiously awaiting your feedback. This work represents years of dedicated research by our team, and your recognition and support are invaluable to us. We humbly and earnestly request that you kindly review our responses at your convenience. Your endorsement would mean a great deal to us, and we deeply appreciate your time and understanding.

审稿意见
3

In this paper, the author proposes a method of sensor-style Chinese character data generation based on GAN. It mainly consists of three modules CGE, FOT and SRA. CGE encodes Chinese characters according to glyphs. FOT uses a ternary consistency constraint to monitor the consistency of the predicted sample, the real sample, and the glyph encoding vector. The SRA module aligns glyph and semantic encoding. The author collected 4500 samples of 500 Chinese characters, including 1500 samples in the training set and 3000 samples in the test set. The author uses the proposed CI-GAN to generate additional training sets to augment the original data set. The validity of the generated data is proved by comparing the recognition effect of training with different data quantities. The effectiveness of the proposed module is verified by ablation experiments.

优点

This paper makes a strong contribution to research in accessible human-computer interaction, focusing on Chinese handwriting recognition for disabled individuals. It addresses an important issue by introducing CI-GAN, a generative model with unique modules—Chinese Glyph Encoding, Forced Optimal Transport, and Semantic Relevance Alignment—that effectively tackle the challenges of data scarcity and segmentation. According to the visualization results of Chinese glyph encodings, the module proposed in this paper is effective in encoding Chinese character shapes. The experimental results in Table 3 and Table 4 prove that the generated data has useful value.

缺点

The proposed method section does not provide sufficient comparisons and analysis. The collected real data and generated data are insufficient, and the quality of the constructed dataset is not high. The experimental section lacks important comparative experiments and analysis, making it difficult to demonstrate the effectiveness of the proposed method. The specific issues are as follows:

  1. Methodology: The authors propose a GAN-based generation method but do not compare its generation quality with other generative approaches, such as diffusion models or other GAN-based methods.

  2. Dataset: The dataset collected is small in scale, with data from only nine individuals and without full coverage of the complete Chinese character set.

    (1) The complexity of different Chinese characters is very different, and the author only shows the generation and classification results of relatively simple Chinese characters in this paper, it is impossible to evaluate the model's generation effect on complex Chinese characters.

    (2) Writing habits vary greatly among individuals, leading to significant differences in handwriting styles. With data from only nine participants, how can the authors ensure that the generated data quality aligns with real-world scenarios?

  3. Experiments:

    (1) Comparative methods lack citations.

    (2) The algorithm’s performance has not been tested on other public datasets. Whether the CIGAN generation effect can be verified on other open source datasets of Chinese character data, such as IAHCC-UCAS2016, CASIA-OLHWDB (ICDAR 2013 Chinese Handwriting Recognition Competition).

    (3) The authors did not compare their method with other high-performing algorithms for Chinese character recognition, such as the one mentioned in [1]. As far as I know, [1] achieved a recognition accuracy of 96.78% on the dataset of all Chinese characters in the Level 1 Character Set (IAHCC-UCAS2016) and 97.86% on ICDAR-2013. I suggest the authors compare their method with more state-of-the-art (SOTA) approaches.

    (4) The experiments lack further analysis, such as individual-level performance testing and performance evaluation across characters with different stroke complexities.

These improvements would better support the effectiveness and applicability of the proposed approach.

[1] Gan J, Wang W, Lu K. A new perspective: Recognizing online handwritten Chinese characters via 1-dimensional CNN[J]. Information Sciences, 2019, 478: 375-390.

问题

Repeat:

  1. Methodology: The authors propose a GAN-based generation method but do not compare its generation quality with other generative approaches, such as diffusion models or other GAN-based methods.

  2. Dataset: The dataset collected is small in scale, with data from only nine individuals and without full coverage of the complete Chinese character set.

    (1) The complexity of different Chinese characters is very different, and the author only shows the generation and classification results of relatively simple Chinese characters in this paper, it is impossible to evaluate the model's generation effect on complex Chinese characters.

    (2) Writing habits vary greatly among individuals, leading to significant differences in handwriting styles. With data from only nine participants, how can the authors ensure that the generated data quality aligns with real-world scenarios?

  3. Experiments:

    (1) Comparative methods lack citations.

    (2) The algorithm’s performance has not been tested on other public datasets. Whether the CIGAN generation effect can be verified on other open source datasets of Chinese character data, such as IAHCC-UCAS2016, CASIA-OLHWDB (ICDAR 2013 Chinese Handwriting Recognition Competition).

    (3) The authors did not compare their method with other high-performing algorithms for Chinese character recognition, such as the one mentioned in [1]. As far as I know, [1] achieved a recognition accuracy of 96.78% on the dataset of all Chinese characters in the Level 1 Character Set (IAHCC-UCAS2016) and 97.86% on ICDAR-2013. I suggest the authors compare their method with more state-of-the-art (SOTA) approaches.

    (4) The experiments lack further analysis, such as individual-level performance testing and performance evaluation across characters with different stroke complexities.

These improvements would better support the effectiveness and applicability of the proposed approach.

[1] Gan J, Wang W, Lu K. A new perspective: Recognizing online handwritten Chinese characters via 1-dimensional CNN[J]. Information Sciences, 2019, 478: 375-390.

评论

Thank you for your review of our paper. We have distilled your comments into four key points and addressed each one individually.

  1. Review Comment: Comparing CI-GAN with other high-performing classification models, such as in reference [1].

    Response: CI-GAN is a generative model, not a classification model. CI-GAN can generate high-quality inertial measurement unit (IMU) signals for Chinese handwriting recognition, but can not classify or recognize handwriting characters. Therefore, comparing a generative model to classification models may be a methodological misunderstanding. We acknowledge that the classification and recognition methods you suggested are excellent, and we will cite these references to highlight their contributions. However, as a generative model, CI-GAN is fundamentally different from classification models, making a direct comparison with them highly impractical.

  2. Review Comment: CI-GAN generation effect can be verified on other open source datasets of Chinese character data, such as IAHCC-UCAS2016, CASIA-OLHWDB, and ICDAR 2013.

    Response: The IAHCC-UCAS2016, CASIA-OLHWDB, and ICDAR 2013 datasets are used for handwriting recognition tasks, based on visual or pen-tip trajectory data. CI-GAN is designed for generating IMU handwriting signals rather than recognizing them, so it would be challenging to evaluate a generative model on a classification dataset.

  3. Review Comment: The dataset collected is small in scale, with data from only nine individuals and without full coverage of the complete Chinese character set.

    Response: We would like to clarify that IMU signal data collection is inherently challenging and considerably more complex than collecting visual data. Handwriting signals are continuous, and therefore each segment corresponding to a specific character must be extracted from the continuous stream of handwriting signals. For images, videos, or pen trajectories, such segmentation is relatively straightforward due to visual cues. However, for IMU signals, which are time-series waveforms, it is extremely difficult to identify the start and end points of each character segment visually, requiring auxiliary optical equipment for precise annotation.

    We invested significant time and effort to obtain the 4,500 handwriting signals presented in this study, creating the first IMU dataset of Chinese handwriting that covers the official set of commonly used characters in China. This effort underscores the value of our CI-GAN model, which eliminates the need for such labor-intensive annotation by directly generating IMU signals for each Chinese character. This dataset sufficiently supports the training of our generative model for the IMU signal generation task, and our experiments have validated the practical effectiveness of CI-GAN on this scale and quality of data.

  4. Review Comment: Comparing the CI-GAN with other generative models.

    Response: It is important to highlight that this study is the first to propose a generative deep learning model for generating IMU handwriting signals. CI-GAN is specifically designed to address the unique challenges of IMU handwriting signal generation, with tailored modules and optimization strategies to suit IMU data. In the field of IMU signal generation research, there is currently no precedent or comparable model, meaning that no readily available generative model exists for direct comparison.

    Therefore, we conducted a thorough comparison of CI-GAN with twelve commonly used data augmentation methods spanning five major categories. This comprehensive evaluation demonstrates the rigor of our approach and the scientific validity of our results. While you suggested that we use Diffusion models or other image-based generative models for comparison, these models were originally designed for image data. Applying them directly to IMU signal generation would involve technical and theoretical misalignment. Adapting such image-based generative models to IMU signal generation would require substantial modifications to both model structure and algorithms, along with re-training and validation on IMU data. This adaptation alone would justify an entirely new research paper, extending far beyond the scope of our current study.

In conclusion, we sincerely hope that our replies address your concerns satisfactorily. Your understanding and recognition of our efforts are of utmost importance to us.

评论

The CI-GAN is used to generate Chinese IMU signal data, which is difficult to collect. Then, the author demonstrates the quality of the generated data by improving the performance of the classifier.

The author clarifies that IMU signal data collection is inherently challenging and considerably more complex than collecting visual data. Is IMU-based handwriting recognition of practical value? For instance, users need to wear additional sensors, which brings extra burden and inconvenience.

The experiments provided are based on authors’ own dataset. Whether real or generated samples, both the number of categories (the primary Chinese character set has 3755 categories, but the author only collected 4500 samples, with 1500 for training and 3000 for testing) and the data scale make it difficult to consider as a suitable evaluation environment. Additionally, the classifiers compared (Tables 2, 3, and 4) are not accompanied by relevant references. What is the specific structure of these classifiers, and can they be considered representative methods for evaluation? Are these the best classification methods available?

If the author conducts tests on similar publicly available time-series datasets, the generative method could be objectively assessed. Alternatively, comparing with other generative methods (e.g., [1]) could demonstrate the effectiveness of your approach if other generators perform worse. Furthermore, generating data for an existing public dataset and achieving SOTA results, with clear improvements after adding your generated data through CI-GAN, would validate the effectiveness of your method. Unfortunately, neither of these approaches was seen in your responses or manuscript. Therefore, before addressing these concerns, it is difficult for us to dismiss doubts regarding the significance and value of this work.

[1] Ren M S, Zhang Y M, Wang Q F, et al. Diff-Writer: A Diffusion Model-Based Stylized Online Handwritten Chinese Character Generator[C]//International Conference on Neural Information Processing. Singapore: Springer Nature Singapore, 2023: 86-100.

评论
  1. Response for Practicality of IMU-Based Handwriting Recognition: Thank you for your comment. As we've emphasized in the paper and previous responses, IMU-based handwriting systems are portable, lightweight, and resilient to environmental factors such as lighting and occlusions, making them ideal for a wide range of real-world scenarios. They can be easily integrated into wearable devices, providing an intuitive and seamless interaction, especially for users with visual impairments. However, the challenge arises during the dataset creation phase, where we need to segment the real IMU signals to accurately match them with the corresponding writing motions for training classifiers. This is where our CI-GAN comes into play, as it helps generate high-quality inertial signals by learning the relationship between writing motions and sensor data, significantly reducing the complexity of the dataset creation process.

  2. Response for Comparison Methods: It is important to note that the field of inertial sensor signal generation currently lacks established, widely available methods, and many image-based augmentation techniques are not directly applicable to this domain. Despite these challenges, we have adapted and applied over 10 recent, influential data augmentation methods to the field of inertial sensor signal generation, thereby creating a comprehensive comparison. Importantly, we have included the Diff-Writer method, which you recommended, in our comparison. Diff-Writer significantly outperforms all comparison methods except for our CI-GAN, highlighting its strength as a learning-based approach. However, since Diff-Writer was not designed for generating inertial sensor signals, it struggles to fully capture the motion dynamics and semantic fidelity required for this task. As a result, there remains a gap between its performance and that of CI-GAN, which excels in generating accurate and realistic IMU signals by addressing the unique challenges of inertial signal generation.

    Although Diff-Writer generates trajectory point sequences rather than sensor signals, our team has made substantial efforts to adapt and apply it to the task of sensor signal generation. We retrained and modified the model to accommodate our specific requirements, demonstrating the versatility of the Diff-Writer. Additionally, we have cited all the comparative methods and relevant literature, ensuring that our work is positioned within the current state of research. We kindly invite you to review our updated manuscript.

In summary, while publicly available inertial sensor datasets are limited, we have made every effort to demonstrate the effectiveness of CI-GAN through thorough comparisons with existing methods. Our entire team worked tirelessly, without sleep, to adapt the trajectory generation method you recommended to our inertial sensor signal generation task. We are confident that these experiments will address your concerns and demonstrate the significant value of our work. Your feedback and recognition are deeply important to us, and we eagerly await your review.

评论

After receiving your review on December 29, our entire team immediately began working on the experiments you requested and comparing them with the Diff-Writer method you recommended. Although Diff-Writer is designed to generate handwriting trajectory data, while our task focuses on generating inertial sensor signals, we recognized the difference in data modalities. Despite this, our team worked tirelessly overnight to adapt Diff-Writer to our task, successfully completing the experiments and revising the manuscript accordingly, with additional citations included.

However, we discovered that the ICLR paper update channel closed on December 27, and we did not receive your review until December 29. Unfortunately, this meant that we were unable to incorporate these updates. To facilitate your review of our changes, we have included the key modifications and experimental results below for your consideration. This work represents years of effort from our team, and we sincerely hope to gain your approval.

Considering the character limitations, we have attached some of the modifications below for your review, as well as for the review of other reviewers and the conference chair:

Due to the lack of deep learning-based augmentation methods in the sensor field, we introduced the diffusion model-based approach for generating handwriting trajectory, named Diff-Writer [Ren et al., 2023]. Although this approach generates trajectory point sequences rather than the sensor signals required in our study, its ability to produce high-quality and diverse handwriting data makes it highly valuable. We adapted this method through modifications and retraining, enabling its application to our inertial signal generation task for a meaningful comparison. As shown in Table 3, Diff-Writer significantly outperforms all baseline methods except for our CI-GAN, showcasing its strength as a learning-based approach for generating handwriting data. However, as Diff-Writer was not designed for generating inertial sensor signals, it struggles to fully capture the motion dynamics and semantic fidelity required for this task. Consequently, there remains a considerable gap between its performance and that of our CI-GAN, which achieves superior accuracy across all classifiers by addressing the unique challenges of inertial signal generation.

Table 3. Comparison of Data Augmentation Methods for Inertial Signal Generation

Data Augmentation Methods1DCNNLSTMTransformerRFXGBoostSVM
Cropping [Yue et al., 2022]15.7%9.1%7.7%12.8%16.3%9.6%
Noise Injection [Audibert et al., 2020]17.3%11.9%12.2%8.5%13.8%10.1%
Jittering [Flores et al., 2021]20.1%13.0%14.4%9.7%17.4%7.5%
APP [Chen et al., 2021]22.3%13.6%19.7%19.0%25.1%16.3%
AAFT [Lee et al., 2022]32.1%20.7%25.4%27.5%35.9%19.2%
Wavelet [Wang et al., 2024]19.9%12.1%10.6%13.8%22.6%9.5%
EMD [Otero et al., 2022]24.4%17.1%20.9%17.9%23.4%12.2%
CutMix [Yun et al., 2019]21.9%14.8%15.5%14.7%18.9%13.1%
Cutout [Devries et al., 2017]25.6%16.4%16.9%18.5%27.1%16.6%
RegMixup [Pinto et al., 2022]41.5%27.8%36.8%38.4%45.9%30.3%
cGAN [Douzas et al., 2018]18.5%14.8%15.7%12.4%20.5%8.4%
Diff-Writer [Ren et al., 2023]71.3%65.9%78.7%58.9%62.5%53.3%
CI-GAN (ours)95.7%93.9%98.4%83.5%93.1%74.6%
评论

I hope this message finds you well. The rebuttal period is coming to a close in less than a day, and while three other reviewers have kindly accepted our paper, we are still awaiting your final input. Your feedback is incredibly important to us. In response to your suggestion, even though the method you recommended and our task involve different modalities, our team worked tirelessly, without sleep, to adapt your approach to our specific task. Your feedback is extremely valuable to us, and we would be deeply grateful if you could spare a moment to review our final updates. We are genuinely grateful for your consideration and sincerely hope to hear from you soon. Thank you so much for your understanding and support.

审稿意见
6

This paper introduces CI-GAN, a generative adversarial network for Chinese writing recognition using inertial sensors, designed to aid disabled individuals. CI-GAN incorporates Chinese glyph encoding, forced optimal transport, and semantic relevance alignment to generate accurate signals. With these synthetic signals, classifier accuracy improved from 6.7% to 98.4%. The study also releases the first Chinese inertial sensor dataset for writing recognition, advancing accessible human-computer interaction.

优点

  1. The application of research has significant potential and creativity about addressing accessibility needs for disabled individuals.
  2. The proposed dataset contributes valuable inertial sensor data for Chinese writing. And the research introduces a novel method using GAN for data augmentation, effectively addressing data scarcity and enhancing handwriting recognition research.
  3. The experimental results show promising improvements in classifier accuracy.

缺点

  1. The concept of inertial data is introduced only in Section 4.2, making it somewhat difficult to understand when mentioned in the earlier parts of the paper. It is recommended to provide a brief introduction to this concept earlier on.
  2. The first point in the summary of contributions mentions that it "provides new tools for the study of the evolution and development of pictograms," which may not be suitable for the contributions summary, as it seems the research does not cover this aspect.
  3. The description of CGE mentioned in Section 3.1 seems to be just an embedding? In my opinion, the current version of the introduction may be somewhat complex.

问题

  1. According to my understanding, CGE can be divided into two parts: 1. converting one-hot encoding into dense features, and 2. using α-order Rényi entropy regularization in GER. Therefore, in the ablation study in Section 4.4, what specific configuration is being ablated when CGE is removed? Which part of these two components is being eliminated? Additionally, can this ablation experiment validate the effects of the glyph encoding regularization (GER) proposed in Section 3.1?
  2. What is the difference between the pre-trained VAE mentioned in Section 3.2 and CGE in Section 3.1? It seems that both can extract glyph features. Can VAE replace CGE?
  3. What are h_G, h_T, and e in Section 3.2? It seems that e comes from the GAN input, h_G comes from the GAN output, but where does h_T come from during training?
评论
  1. For W1: Thank you for your insightful suggestion regarding the introduction of the concept of inertial data earlier in the manuscript. In response, we have revised the paper to include a clear and concise explanation of the advantages of inertial sensors and their applications in IMU-based human-computer interaction systems right at the beginning. This sets a strong foundation for understanding the study's context. We kindly invite you to review the updated manuscript and look forward to your valuable feedback.

  2. For W2: We have revised the contributions summary to clarify the scope of our research, ensuring it aligns accurately with the study’s focus and avoids any potential misinterpretation.

  3. For W3: Thank you for your feedback. In response to your suggestion, we have streamlined the description of CGE in Section 3.1 to enhance readability and reduce complexity. At the same time, we want to emphasize that CGE is fundamentally different from a standard embedding. Unlike traditional embeddings that primarily encode semantic meanings, CGE captures the glyph-specific features of Chinese characters, such as shape, structure, and writing strokes, by leveraging the inherent relationship between the character glyph and its writing motion recorded by inertial sensor signals. Additionally, the Rényi entropy-based regularization we designed ensures that the encoding vectors are orthogonal and maximally informative, which not only strengthens the quality of glyph representations but also provides a generalizable mechanism that could benefit other representation learning tasks. This innovative approach goes beyond conventional embeddings, making CGE a key contribution of our framework. We kindly invite you to review the revised section.

  4. For Q1: In our original ablation experiment, removing CGE meant eliminating both two parts. To address your concern, we have conducted an additional ablation experiment where the first part (converting one-hot encoding into dense features) is retained, while only the second part (GER) is removed. |Ablation Model|1DCNN|LSTM|Transformer|RF|XGBoost|SVM| |-|-|-|-|-|-|-| |No augmentation|0.87%|2.6%|1.7%|4.9%|1.2%|6.7%| |w/o all (Base GAN)|18.5%|14.8%|15.7%|12.4%|20.5%|8.4%| |w/ OT|26.4%|28.6%|27.3%|21.0%|30.9%|20.9%| |w/ FOT|39.9%|38.0%|35.3%|31.9%|46.8%|27.3%| |w/ CGE|54.6%|51.2%|47.9%|38.6%|57.5%|34.1%| |w/ CGE (w/o GER)|35.7%|32.1%|30.9%|33.8%|41.1%|29.0%| |w/ CGE (w/o GER)+SRA|61.4%|58.1%|60.2%|51.0%|59.9%|45.2%| |w/ CGE (w/o GER)+FOT|59.6%|55.2%|54.0%|53.4%|58.3%|47.5%| |w/ CGE+SRA|84.9%|77.4%|86.8%|61.4%|68.9%|56.1%| |w/ CGE+FOT|80.7%|80.5%|80.9%|57.2%|70.4%|59.5%| |w/ CGE+FOT+SRA (CI-GAN)|95.7%|93.9%|98.4%|83.5%|93.1%|74.6%|

    The results, now included in the revised manuscript, demonstrate the significant impact of GER on the performance of the framework. Specifically, we observe that retaining the dense feature transformation without GER still improves performance over the baseline GAN, but the lack of regularization results in noticeably lower effectiveness compared to using CGE with GER fully enabled. This confirms the critical role GER plays in enhancing glyph encoding by ensuring orthogonality and maximizing the information entropy of the encoding vectors.

  5. For Q2: The pre-trained VAE and CGE serve fundamentally different roles in the framework and cannot be substituted for one another. The VAE is designed to extract features from inertial sensor signals, focusing on capturing signal-specific characteristics. In contrast, CGE is designed to encode the categorical features of Chinese characters. In essence, the VAE operates on the signal space, learning to represent the temporal and motion characteristics of IMU data, while CGE works in the character space, embedding class-level information that distinguishes one glyph from another.

  6. For Q3: hTh_T represents the real signal feature, hGh_G denotes the generated signal feature, ee is the glyph encoding derived from the CGE module, which encodes glyph-related features. During training, hTh_T is extracted from real IMU signals in the dataset using the pre-trained VAE, providing the ground-truth feature representation for supervising the generator. The Forced Feature Matching (FFM) loss aligns hTh_T, hGh_G, and ee, ensuring that the generated signals reflect both the motion dynamics of real IMU data and the glyph-specific semantics of the target character.

Thank you so much for your insightful and constructive feedback. It’s clear that you have a deep understanding of the field, and the detailed suggestions you provided have been incredibly helpful in improving the quality of our work. Your recognition is extremely important to us, and we truly appreciate the thought and effort you’ve put into reviewing our paper.

审稿意见
5

The paper propose CI-GAN, which enhances Chinese writing recognition for disabled users, generating high-quality samples and improving classifier performance significantly.

优点

  1. The article is clearly written and easy to understand.

  2. The motivation is clear: translating subtle movements of user’s hand into written text can help disabled people of writing.

  3. Experiments demonstrate the effectiveness of the proposed method.

缺点

  1. The dataset is relatively small and lacks comprehensive coverage of the Chinese character set, which may not support the generation of more complex Chinese characters.

  2. The proposed method should be tested on other public available benchmarks with other SOTA methods, such as IAHCC-UCAS2016 and CASIA-OLHWDB.

问题

See weaknesses

评论

Thank you for your recognition of our work, especially your positive feedback on the motivation, experiments, and writing of our paper! Regarding the two weaknesses you mentioned, we provide the following responses:

  1. Weaknesses: The dataset is relatively small and lacks comprehensive coverage of the Chinese character set.

    Response: We would like to clarify that collecting high-quality inertial sensor (IMU) handwriting signals is inherently challenging due to the nature of IMU data. Unlike visual data, IMU signals are continuous time-series waveforms, making it difficult to segment and label individual characters without auxiliary tools like optical tracking devices. Despite these challenges, we successfully collected a dataset of 4500 IMU signal samples, covering “Commonly Used Chinese Characters List” published by the Chinese government.

    While our dataset does not encompass every Chinese character, our model has learned the structural relationships and shape patterns between different character, as evidenced by the t-SNE visualizations where characters with similar structures and stroke patterns cluster closely together. As most complex Chinese characters are constructed by combining simple elements, our work represents a foundational step from 0 to 1 in this field.

  2. Weaknesses: The proposed method should be tested on other public available benchmarks with other SOTA methods, such as IAHCC-UCAS2016 and CASIA-OLHWDB.

    Response: Our CI-GAN is designed to generate handwriting signals captured by inertial sensors. However, the IAHCC-UCAS2016 dataset and the CASIA-OLHWDB dataset were collected by entirely different sensors. The IAHCC-UCAS2016 dataset relies on the Leap Motion, a vision-based optical device that captures hand trajectories in the air, while the CASIA-OLHWDB dataset uses an Anoto digital pen to record pen-tip trajectories data on specialized paper. Given the fundamental differences in data modalities, these datasets are not suitable for evaluating CI-GAN tailored for inertial sensor signal generation.

    In fact, there is currently no publicly available handwriting dataset based on inertial sensors, highlighting a significant gap in this field. Our work addresses this gap by creating CI-GAN to generate an infinite number of inertial-sensor-based handwriting signals.

Thank you for your insightful comment. We hope we have addressed your concerns.

评论

I hope this message finds you well. First and foremost, thank you for recognizing the clarity, motivation, and experimental contributions of our paper.

Datasets such as IAHCC-UCAS2016 and CASIA-OLHWDB are based on vision-based and pen-based systems, respectively, and are inherently incompatible with the inertial sensor signals that CI-GAN is specifically designed to address. As there are currently no publicly available datasets for inertial sensor handwriting signals, our work fills this gap by creating a framework that can generate a theoretically unlimited number of high-quality synthetic samples, providing a foundation for further research in this field. Overall, the absence of any publicly available inertial sensor handwriting datasets further underscores the novelty and necessity of our contribution.

We sincerely hope that our detailed responses adequately address your concerns. We humbly request that you kindly review our revisions at your convenience. Thank you so much for your time and understanding.

评论

I hope this message finds you well. We greatly appreciate your recognition of the motivation, experiments, and clarity of our work, and we have made every effort to thoroughly address the points you raised. As we mentioned in our response, we have provided detailed clarifications on the dataset limitations and the challenges related to testing on public benchmarks. We also added further explanations on the dataset creation process and how we overcame the unique difficulties involved in generating IMU-based handwriting signals. Given the importance of your feedback to the final decision on our manuscript, we kindly request that you review the updates we made based on your suggestions. Your recognition of our efforts would mean a great deal to us, and we are eager to hear your final thoughts. Thank you once again for your time and valuable input.

评论

I hope this message finds you well. As the rebuttal period is drawing to a close, I’m writing to humbly request your feedback on our revised manuscript. Three of the reviewers have already kindly accepted the paper, and your feedback is truly essential to us. We understand that you are very busy, but if you could spare a moment to review our response, we would be incredibly grateful.

We sincerely hope to receive your thoughts before the rebuttal period ends. Thank you so much for your consideration.

评论

I’m very sorry to trouble you again, but I’m writing to humbly ask if you could kindly review our responses to your feedback. The rebuttal period is coming to a close in less than a day, and while three other reviewers have kindly accepted our paper, we are still awaiting your final input. Your feedback is incredibly important to us. We are genuinely grateful for your consideration and sincerely hope to hear from you soon. Thank you so much for your understanding and support.

审稿意见
6

This paper tackles the challenge of data scarcity in Chinese writing recognition using inertial sensors by proposing the Chinese Inertial Generative Adversarial Network (CI-GAN). CI-GAN includes three innovative modules, Chinese Glyph Encoding (CGE), Forced Optimal Transport (FOT), and Semantic Relevance Alignment (SRA), to generate high-quality inertial signal samples. CGE captures the shape and stroke of Chinese characters, FOT ensures feature consistency to prevent mode collapse, and SRA aligns the semantic relevance of generated signals to their glyph structures. With CI-GAN, the authors establish a flexible data platform for Chinese writing recognition and claiming to release the first inertial-sensor-based dataset on GitHub.

优点

  1. The introduction of CI-GAN is a novel approach for enhancing data availability in Chinese inertial writing recognition, with modules designed specifically to tackle challenges unique to Chinese characters.
  2. The improvement from 6.7% to 98.4% in classifier performance highlights the potential of CI-GAN-generated data to enhance recognition accuracy, indicating practical benefits for downstream applications.

缺点

  1. In Figure 1, CI-GAN is presented as a framework overview, yet it lacks consistency in terminology, with CGE mislabeled as "GER" and FOT written in full without abbreviation. Additionally, SRA is not visually represented in the figure. This detracts from the clarity of the diagram and makes it harder for readers to grasp the full framework.
  2. The paper’s theoretical foundation could be strengthened. The current theoretical analysis is minimal, with only a few formulas provided. More detailed mathematical explanations, particularly for FOT’s role in preventing mode collapse, would lend greater credibility to the approach.
  3. The ablation studies are somewhat limited, and additional experiments testing more comprehensive combinations of CGE, FOT, and SRA would provide a clearer understanding of each module's contribution. More exhaustive ablation tests would validate the effectiveness of the modules individually and collectively.
  4. The example in Figure 1, intended to illustrate the framework's application for disabled individuals, doesn’t effectively convey this purpose. Including a more relatable example that directly addresses accessibility for disabled users would better align with the stated motivation of the study.

问题

  1. Can you clarify how Figure 1 relates to accessibility for disabled individuals, as the example seems disconnected?
  2. Could you provide more theoretical details on the FOT component to reinforce its foundation?
  3. Are more exhaustive ablation studies possible to validate the contributions of CGE, FOT, and SRA individually?
评论
  1. For W1: Thank you for pointing out the inconsistencies and omissions in the original Figure 1. We have unreservedly revised Figure 1 according to your suggestions. Specifically, "CGE" is now correctly labeled instead of "GER," the "SRA" module has been visually represented to provide a complete overview of the framework, and all modules now include both their full names and abbreviations. We kindly invite you to review our updated manuscript.

  2. For W2&Q2: Thank you for your suggestion. We have provided a rigorous mathematical proof demonstrating how FOT imposes strong constraints in the feature space to effectively mitigate the mode collapse problem in GANs. Due to page limitations in the main text, we included this proof in Appendix D. We kindly invite you to review the supplemental mathematical analysis and sincerely hope that this revision addresses your concerns.

  3. For W3&Q3: Thank you for your valuable feedback regarding the ablation studies. In response, we have expanded our experiments to include all possible combinations of CGE, FOT, and SRA, thoroughly exploring their individual and collective contributions. It is worth noting that SRA relies on input semantics and therefore must be used alongside CGE in this framework. As shown in the results, the addition of each module consistently improves performance across all tested base models, regardless of which modules are already included, demonstrating that each module contributes unique and complementary strengths to the framework. |Ablation Model|1DCNN|LSTM|Transformer|RF|XGBoost|SVM| |-|-|-|-|-|-|-| |No augmentation|0.87%|2.6%|1.7%|4.9%|1.2%|6.7%| |w/o all (Base GAN)|18.5%|14.8%|15.7%|12.4%|20.5%|8.4%| |w/ OT|26.4%|28.6%|27.3%|21.0%|30.9%|20.9%| |w/ FOT|39.9%|38.0%|35.3%|31.9%|46.8%|27.3%| |w/ CGE|54.6%|51.2%|47.9%|38.6%|57.5%|34.1%| |w/ CGE (w/o GER)|35.7%|32.1%|30.9%|33.8%|41.1%|29.0%| |w/ CGE (w/o GER)+SRA|61.4%|58.1%|60.2%|51.0%|59.9%|45.2%| |w/ CGE (w/o GER)+FOT|59.6%|55.2%|54.0%|53.4%|58.3%|47.5%| |w/ CGE+SRA|84.9%|77.4%|86.8%|61.4%|68.9%|56.1%| |w/ CGE+FOT|80.7%|80.5%|80.9%|57.2%|70.4%|59.5%| |w/ CGE+FOT+SRA (CI-GAN)|95.7%|93.9%|98.4%|83.5%|93.1%|74.6%|

    We believe these additional experiments provide a clearer understanding of the contributions of CGE, FOT, and SRA. Thank you again for your insightful suggestions. We kindly invite you to review the updated manuscript.

  4. For W4&Q1: We appreciate your suggestion and would like to clarify the positioning of our study and its broader applications. Our work primarily focuses on the development of a robust IMU signal generation algorithm, which can produce a large volume of high-quality inertial signals. These generated signals enable IMU-based human-computer interaction systems, offering significant advantages for accessibility, particularly for individuals with visual impairments. For example, by facilitating natural handwriting interactions, our algorithm has already contributed to the production of devices designed specifically for visually impaired users. That said, aiding disabled individuals is just one of many application scenarios for our algorithm. By enabling the generation of diverse and high-quality IMU handwriting signals, it supports the development of IMU-based systems in education, digital handwriting analysis, and personalized training for handwriting recognition algorithms. These applications demonstrate the algorithm’s potential to revolutionize human-computer interaction by providing high-fidelity motion data.  To better reflect this breadth, and considering the comments from reviewer a2VT, we have revised the manuscript to introduce the advantages of inertial sensors and IMU-based human-computer interaction systems at the very beginning. In this context, we present the example of assisting disabled individuals as a representative case, highlighting it as one key motivation but not the sole focus of our study.

We sincerely thank you for your valuable feedback, and we hope that our revisions have adequately addressed your concerns. Your recognition is truly invaluable to us and we look forward to your response and further insights.

评论

I hope this message finds you well. First and foremost, we truly appreciate the time and effort you have dedicated to reviewing our manuscript.

The moment we received your comments, our team immediately began working tirelessly to address every concern with the utmost care. We worked tirelessly, refining the manuscript, conducting additional experiments, strengthening mathematical foundation and revising figures, all with the hope that you could see our responses as soon as possible. For us, even a few seconds earlier felt meaningful, as it might allow you to review our efforts sooner. Since submitting our detailed responses and revised manuscript, we have been anxiously awaiting your feedback and we are confident that our response meets the high standards of this esteemed conference. Your endorsement would mean the world to us, not only as an affirmation of our work but also as a driving force for our continued efforts in this field.

We humbly and earnestly request that you kindly review our revisions at your earliest convenience. Your input is invaluable, and we deeply appreciate your understanding and consideration.

评论

I appreciate the authors' efforts to address most of my concerns. I have decided to maintain a relatively positive rating.

评论

We sincerely appreciate your positive recognition of the revisions we have made, and we are grateful for the time and effort you dedicated to reviewing our work. Your support means a great deal to us, and we are pleased that we could address your concerns effectively.

评论

We sincerely thank the reviewers and the conference chair for their valuable feedback and thoughtful consideration of our paper. First, we want to clarify that collecting handwriting samples of Chinese characters is not easy. During data collection, volunteers wrote different Chinese characters continuously. We had to accurately locate the signal segments corresponding to each character from long signal streams, as shown in APPENDIX. B. However, accurately segmenting and extracting signal segments requires synchronizing optical motion capture equipment and then comparing the inertial signals frame by frame with the optical capture results to find all character signal segments' starting and ending frames. Therefore, we expended significant time and effort to obtain 4,500 signal samples in this paper, establishing the first Chinese handwriting recognition dataset based on inertial sensors, which we have made open-source partially. By contrast, our CI-GAN can directly generate handwriting motion signals according to the input Chinese character, eliminating the complex processes of signal segmentation, extraction, and cleaning, as well as the reliance on optical equipment. We believe it provides an efficient experimental data platform for the field.

Unlike the fields of CV and NLP, many deep learning methods have not yet been applied to the sensor domain. More importantly, unlike image generation, where the performance can be visually judged, it is challenging to identify semantics in waveform by observation and determine whether the generated signal fluctuations are reasonable, which imposes high requirements on generative model design. Therefore, we had to design multiple guidance and constraints for the generator, resulting in the design of Chinese Glyph Encoding (CGE), Forced Optimal Transport (FOT), and Semantic Relevance Alignment (SRA).

  • CGE introduces a regularization term based on Rényi entropy, which increases the information content of the encoding matrix and the distinctiveness of class encodings, providing a new category representation method that can also be applied to other tasks. As far as we know, this is the first embedding targeted at the shape of Chinese characters rather than their meanings, providing rich semantic guidance for generating handwriting signals.
  • FOT establishes a triple-consistency constraint between the input prompt, output signal features, and real signal features, ensuring the authenticity and semantic accuracy of the generated signals and preventing mode collapse and mixing.
  • SRA constrains the consistency between the semantic relationships among multiple outputs and the corresponding input prompts, ensuring that similar inputs correspond to similar outputs (and vice versa), significantly alleviating the hallucination problem of generative models. Notably, the June 2024 Nature paper "Detecting Hallucination in Large Language Models Using Semantic Entropy," published after we released our paper, shares a similar idea with our proposed SRA. They assess model hallucination by repeatedly inputting the same prompts into generative models and evaluating the consistency of the outputs. Their approach essentially forces the model to produce similar outputs for similar prompts. Our SRA not only achieves this but also ensures that the relationships between prompts are mirrored in the relationships between the outputs. This significantly reduces hallucinations and enhances the model's practicality and stability.

CGE, FOT, and SRA not only guide and constrain the generator but also interact with each other, as shown in Section 3.4. The Chinese glyph encoding not only provides semantic guidance to the generator but also supplies the necessary encoding for FOT and SRA, and it is also supervised in the process. FOT and SRA share the VAE and generated signal features, providing different constraints for the generator, with FOT focusing on improving signal authenticity and enhancing the model's cognition of different categories through the semantic information injected by CGE, thereby mitigating mode collapse and mode mixing. In contrast, SRA ensures consistency between the relationships of multiple outputs and prompts through group-level supervision, which helps alleviate the hallucination problem of generative models.

In summary, the three modules proposed in CI-GAN are innovative and interlinked, significantly enhancing the performance of GANs in generating inertial sensor signals, as evidenced by numerous comparative and ablation experiments. This method is a typical example of deep learning empowering the sensor domain and has been recognized by the industry and adopted by a medical wearable device manufacturer. It has the potential to become a benchmark for data augmentation in the sensor signal processing field. We sincerely hope we have addressed the concerns of the reviewers, and once again, we thank everyone for their review and suggestions for this paper.

评论

I hope this message finds you well. Considering the discussion phase has now been ongoing for 10 days, and I have yet to receive any response. This has left us feeling increasingly anxious about the status of our submission. This paper holds immense importance to us, and we have devoted considerable effort to meticulously address every concern and suggestion raised in the initial reviews. Given the contribution of our work in advancing the fields of AI for sensors, and its potential impact on improving human-computer interaction for individuals with disabilities, we are eager to receive the reviewers' feedback on our responses.

We sincerely believe that the experimental results and the detailed explanations provided in our response have adequately resolved the issues you pointed out. This work is not only a critical part of our research but also has the potential to make a meaningful contribution to the field.

I humbly and earnestly request you to kindly review our responses and share your feedback at your convenience. Your input is invaluable, and I sincerely hope that our diligent efforts will meet your expectations.

Thank you so much for your understanding and for taking the time to support us in this process.

评论

I hope this message finds you well. I apologize for reaching out again, but it has now been two weeks since the discussion phase began, and I have not yet received any feedback from the reviewers. This prolonged silence has left us feeling quite anxious about the progress of our submission.

The moment we received the initial reviews, my team and I dedicated ourselves wholeheartedly to addressing every concern and suggestion. We worked tirelessly, even overnight, to conduct additional experiments and revise the manuscript with utmost care. Our only hope was that the reviewers could see our responses as soon as possible, even a few seconds earlier, as each second feels like it might bring this paper closer to a positive resolution.

We humbly and earnestly request you to kindly review our response and revisions, and provide your thoughts at your convenience. Thank you very much for your understanding and for the time and effort you have dedicated to reviewing this work.

AC 元评审

The paper introduces a specific Generative Adversarial Network (GAN) for generating inertial sensor-based writing signals of Chinese characters, named CI-GAN. The authors have collected a small-scale dataset consisting of 4500 signal samples from only nine individuals, without full coverage of the complete Chinese character set. Experimental results show that the synthetic signals generated by CI-GAN can significantly improve character recognition accuracy. However, the proposed method focuses on a narrow application domain, specifically the generation of inertial sensor-based writing signals for Chinese characters. This limited scope may be more suitable for specialized conferences or journals focused on pattern recognition or signal processing rather than the broader audience of ICLR. Moreover, the dataset used for training and evaluation is relatively small, thus the experimental results are not sufficiently convincing. Based on these considerations, the decision is not to recommend acceptance at this time.

审稿人讨论附加意见

This paper was reviewed by five experts in the field and finally received diverse scores: 6, 3, 6, 5, and 6. The major concerns of the reviewers (FZkk & RmES) are:

  1. the dataset collected is small in scale without full coverage of the complete Chinese character set, comprising data from only nine individuals
  2. the proposed method is not compared with other generative approaches, such as diffusion models or other GAN-based methods,
  3. questionable practical value of IMU-based handwriting recognition.

The authors didn’t successfully address these concerns during the discussion period. I fully agree with these concerns and, therefore, make the decision to reject the paper.

最终决定

Reject