It is claimed that prior methods do not learn a shared representation between skills. However, Caluwaerths'23 proposes Locomotion-Transformer, which in fact does this, and reports similar results to this paper. Comparing to Locomotion-Transformer would help put the results in context.
The presentation of the method is poor.
- Fig 1 - none of the symbols are defined.
- "Robot-specific encoding" section: are all undefined.
- "The latter is recommended for its simplicity, but the one hot encoding produces the same results". So which one was used in the experiments? Were all experiments performed with both?
- "Terrain encoding" section: the terrain encoding is not defined in this section. Is terrain encoding same as ? Also is undefined.
- Page 4 describes a method diagram in text which would be much more easily explained in a figure. Figure 1 presumably depicts the same information but is not helpful for understanding the method since Sec 3.1 doesn't reference the figure. The reader needs to guess what is the correspondence between the text and the figure. See e.g. Hafner'20 for an example of good presentation.
- Eq 1. Y, U, s are undefined. The LTI function is never mentioned in the rest of the paper. Since all of the symbols are undefined, it is unclear whether this is used in the method at all, and if yes where.
- "Training the VAE and Terrain AE" section: terrain encoder is undefined. Unclear how is produced. The VAE loss as far as I can tell doesn't depend on the terrain encoder, so it's unclear how the terrain encoder can be trained with those gradients. The terrain decoder is undefined.
- There is a missing reference in the second line of Sec 2.
- The citation for GECO is wrong. It is Rezende'18

Hafner'20, DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION. Rezende'18: Danilo Jimenez Rezende and Fabio Viola. Taming VAE, 2018.