AssertionError: No inf checks were recorded for this optimizer in Pytorch's AutomaticMixedPrecision

I'm using AutomaticMixedPrecision feature of PyTorch to train a network with smaller footprint and precision.
At a certain point some embeddings from the network have NaNs in their tensors, so I'd like to replace those with 0s in order to perform online hard negative samples mining.

However, after replacing the NaNs in the tensor like this:

tensor[torch.isnan(tensor)] = 0

I get the following error while doing the next scaler ste (scaler.step(optimizer):

 assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.

What's the correct way to zero out NaNs while getting rid of this error?

1

1 Answer

could you show us your full code. Generally it is advisable to just skip the step (batch) if it has NaNs.

Also take a look at torch.nan_to_num.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like