-
Notifications
You must be signed in to change notification settings - Fork 106
Open
Description
Thank you for your work. I have the following questions to discuss with you:
- why does the loss mention in equation 2 of the paper need to sum the t=2...T losses, while the code implementation only samples one time step t at each batch:
t, weights = self.schedule_sampler.sample(micro.shape[0], dist_util.dev()) - why is the MSE loss written in equation 2 of the paper
||EMB(w)-f(z1,1)||, while the code implements||EMB(w)-f(z0,0)||:
t0_mask = (t == 0)
t0_loss = mean_flat((x_start_mean - model_out_x_start) ** 2)
terms["mse"] = th.where(t0_mask, t0_loss, terms["mse"]) - what is the meaning of
predict_xstart, and how is the result ofmodel_out_x_startdifferent when it is True or False - Is the rounding process reversible? In other words, can I generate text and then work backwards to get the embedding before rounding, or even go forward to get the noise of each step of the denoising process?
Metadata
Metadata
Assignees
Labels
No labels