Loss Function Problem

Thank you for your work. I have the following questions to discuss with you:
1. why does the loss mention in equation 2 of the paper need to sum the t=2...T losses, while the code implementation only samples one time step t at each batch: `t, weights = self.schedule_sampler.sample(micro.shape[0], dist_util.dev())`
2. why is the MSE loss written in equation 2 of the paper `||EMB(w)-f(z1,1)||`, while the code implements `||EMB(w)-f(z0,0)||`: 
`t0_mask = (t == 0)`
`t0_loss = mean_flat((x_start_mean - model_out_x_start) ** 2)`
`terms["mse"] = th.where(t0_mask, t0_loss, terms["mse"])`
3. what is the meaning of `predict_xstart`, and how is the result of `model_out_x_start` different when it is True or False
4. Is the rounding process reversible? In other words, can I generate text and then work backwards to get the embedding before rounding, or even go forward to get the noise of each step of the denoising process?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loss Function Problem #87

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Loss Function Problem #87

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions