A few notes from an NFS debugging session

We were seeing intermittent failures with NFS, particularly when a user would send a large file to the NFS client machine. From that point on, all access to the directory caused the accessing process to hang.

Analysis with tetherial showed that the NFS client was send many retransmits to the NFS server, which was never responding. As it happens, NFS uses UDP by default. Watching both ends of the connection, it became clear that packets were being dropped somewhere inbetween.

The solution to this was to mount NFS with TCP, rather than UDP, since we have no control over the intervening network and its (probably numerous) firewalls. To do this you need to make sure TCP/NFS is configured into your kernel, and then just specify the tcp option to mount.

If you're using automount for home directories and such, you might modify your auto.home file to look something like

--- auto.home   2005-07-15 16:46:57.000000000 +1000
+++ auto.home.new       2005-07-15 16:46:45.000000000 +1000
@@ -1 +1 @@
-*      eisbock.ken.nicta.com.au:/home/&
+*      -tcp    eisbock.ken.nicta.com.au:/home/&

The other solution was to tunnel the NFS connection via SSH, or maybe a VPN.

In summary; if you're asked to debug an unreliable NFS server, checking for UDP packet loss or switching over to TCP is a good place to start.