What if the observation is extracted features instead of images and has much smaller dimension than latent?

Hi!
I'm not sure you still do Q&A support here :blush:, but I'm obsessed a certain problem beyond my math skills. I hope you could help me.

The question is related to the loss function of your RSSM which uses variational approach. The reconstruction loss of VAE is `p(o_t|s_t)` as it is decoder from latent to image. In this case, an observation(=image) has much bigger dimension than the latent. But when it comes to the case in which o_t has much smaller dimension (for example, 4 values like cartpole of OpenAI gym classic_control) than the latent(let's say this is 32~64 here), I think `p(o_t|s_t)` could not learn any meaningful distribution. Because the conditional s_t was sampled from variational posterior `q(s_t|a_1:t, o_1:t)` which already has seen the observation of current timestep o_t, I suspect that s_t could just learn to copy the full o_t inside s_t because the dimension of s_t is much bigger.

In this situation (non-image and small dimension of observation), can we still hold this VAE-like approach?
Or is there some other technique more reasonable in this case?
I hope this worry makes sense to you. :confused:



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What if the observation is extracted features instead of images and has much smaller dimension than latent? #59

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

What if the observation is extracted features instead of images and has much smaller dimension than latent? #59

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions