packet_write_wait: Connection to x.x.x.x: Broken pipe

I have been running codes on Google cloud GPU for some days now. Recently, a problem occurred frequently and had become a headache of mine for days.

For some time, an error message will occur

packet_write_wait: Connection to x.x.x.x: Broken pipe

and the program will just stop running.

I have tried some suggestions online, such as send a keepalive message to server. But nothing helps.

I am really appreciating your help here!

2 Answers

When connecting to a server using SSH, an idle connection could be terminated if there's no "apparent" activity over the SSH connection. If you start a program over the SSH connection, but if that program has no terminal input or output activity for a period of time, then the server may kill the connection.

As an example, the HAProxy software running on the server may disconnect idle client connections after a preset time has elapsed, say 30 minutes.

If your problem is caused by a seemingly idle SSH session, you can work around the problem by asking SSH to keep the connection alive by setting the ServerAliveInterval and ServerAliveCountMax parameters. In some cases, it may be sufficient to set ServerAliveInterval to 30 or 60 seconds but leave the ServerAliveCountMax to its default value of 3. But please read the man page to determine how the combination affects the behavior in various cases (idle connection vs links with connection issues).

ServerAliveInterval

Sets a timeout interval in seconds after which if no data has been received from the server, ssh(1) will send a message through the encrypted channel to request a response from the server. The default is 0, indicating that these messages will not be sent to the server.

From ssh man page:

man ssh_config

ssh(1) obtains configuration data from the following sources in the following order: 1. command-line options 2. user's configuration file (~/.ssh/config) 3. system-wide configuration file (/etc/ssh/ssh_config)

Try man ssh to see how to set the command-line option.

ssh -o ServerAliveInterval=30 -o ServerAliveCountMax=5

Got the same problem because someone plugged a new device in the network and set, erroneously, the same IP address of the device that I was accessing. I could identify this running

arp {IP}

(on linux) and checked the MAC address changing. After the device removal of the network, got an stable ssh connection with the host.

Another option is black hole the MAC address in the switch, if you can't find physically the device.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like