Some tests fail because the MP-SPDZ execution doesn't give the ground truth label. Usually this is a property of the model but not a problem with MP-SPDZ. So it would be better to test against executing inference in TF instead to make sure that the secure computation matches the usual computation.