Automounter Tuning
There are two aspects to the automounter that you should pay attention to as a system administrator:
There is an NFS mount option, retry, which is used to change the default number of retries. You can do:
# mount -o retry=0 filer:/vol/vol0 /mnt
In which case a single CLNT_CALL() attempt will be made to access the NFS server, and the mount will fail if the call times out. Why would you want to do such a thing? You probably wouldn't. But if you use the automounter, most likely that's exactly what you are doing. Most automounters will make one or two attempts to mount an NFS file system. That's not a very nice thing if the file system that doesn't get mounted is your home directory as you log in, or your database as your DBMS starts running. The good news is that you can override the automounter's default. Just add the option retry=1000 to your automounter maps, and you'll get much more robust automounting. The simplest is to add retry=1000 to the entries in your master automounter map file or table in NIS or LDAP. Note that the retrans option has nothing to do with mount retries. Page 98 of my book talks about the retry and retrans options.
Beware though of retry= for values higher than 2 on some versions of Solaris. My co-author for Managing NFS and NIS, Second Edition Ricardo Labiaga, who overhauled the automounter in Solaris 2.6, says that before Solaris 2.6 this was a problem, as http://www.sunhelp.org/faq/autofs.html (thanks to my colleague, Tom Haynes for the link) points out:
Unfortunately, so did:
Let's look at the duration. The automounter is also an auto-unmounter. The idea is that when NFS filesystems are no longer used, the automounter should unmount them. This a good thing, because from time to time, automounter maps are changed. If the automounter never unmounted anything, then the map updates would never be seen by the client. Ideally, the automounter would wait to attempt an unmount when it knew the file system hadn't been used for some amount of time. However, automounters don't have an interface to know if there are any processes currently with open files in the NFS file systems. As a result, the automounter has a simple minded approach: it waits some number (N) of seconds, and then attempts to unmount a file system, and does this every N seconds. If the file system is in use (busy), the unmount fails.
It turns out that an unmount attempt of a busy file system can be really bad performance-wise. An unmount attempt will flush all cached data, force all modified but unwritten blocks to be written to the NFS server, and flush all cached metadata (attributes, directories, and name cache). At the end of that, if there are still references to the filesystem, the unmount fails. This means that the processes benefiting from caching will now take latency hits as their working sets of cached data are rebuilt.
Thus, you will want to consider tuning your automount duration higher. For example, the automount command in Solaris has a -t option to set the duration to override the default of 600 seconds. You want to strike a balance between good performance and the benefits of re-synchronization with automounter map updates. If you change the location of an NFS file system no more than once a month, then setting the timeout to 86,400 seconds (24 hours) is reasonable. If you are changing things once every few days, you might find 3600 seconds is short enough; I have many years of experience with -t 3600 and vouch for it. Chapter 9 of my book goes into deep discussion of the automounter, including the -t option.
- The retry count on mount attempts.
- The duration of a mount.
There is an NFS mount option, retry, which is used to change the default number of retries. You can do:
# mount -o retry=0 filer:/vol/vol0 /mnt
In which case a single CLNT_CALL() attempt will be made to access the NFS server, and the mount will fail if the call times out. Why would you want to do such a thing? You probably wouldn't. But if you use the automounter, most likely that's exactly what you are doing. Most automounters will make one or two attempts to mount an NFS file system. That's not a very nice thing if the file system that doesn't get mounted is your home directory as you log in, or your database as your DBMS starts running. The good news is that you can override the automounter's default. Just add the option retry=1000 to your automounter maps, and you'll get much more robust automounting. The simplest is to add retry=1000 to the entries in your master automounter map file or table in NIS or LDAP. Note that the retrans option has nothing to do with mount retries. Page 98 of my book talks about the retry and retrans options.
Beware though of retry= for values higher than 2 on some versions of Solaris. My co-author for Managing NFS and NIS, Second Edition Ricardo Labiaga, who overhauled the automounter in Solaris 2.6, says that before Solaris 2.6 this was a problem, as http://www.sunhelp.org/faq/autofs.html (thanks to my colleague, Tom Haynes for the link) points out:
I found with Solaris 8, that Ricado is correct; retry=1000 works great. However, I had problems with Solaris 10. I set my master map, /etc/auto_master to:CAUTION: this can "hold up" other automount requests
for 15 seconds per retry specified, on some versions of
Solaris. Do not make this value much larger than 2!!
/net -hosts -nosuid,nobrowse,retry=10000I then put one of my NFS servers (mre1.sim) into a break point so that it would not respond. Then I did:
% ls /net/mre1.sim &as expected, the above hanged.
Unfortunately, so did:
% ls /net/server2 &and server2 and server3 are live. Setting retry=5 wasn't very satisfying either; it took about a minute for above to complete. As a workaround, I added "vers=3" to the map options, and things work correctly.
% ls /net/server3 &
Let's look at the duration. The automounter is also an auto-unmounter. The idea is that when NFS filesystems are no longer used, the automounter should unmount them. This a good thing, because from time to time, automounter maps are changed. If the automounter never unmounted anything, then the map updates would never be seen by the client. Ideally, the automounter would wait to attempt an unmount when it knew the file system hadn't been used for some amount of time. However, automounters don't have an interface to know if there are any processes currently with open files in the NFS file systems. As a result, the automounter has a simple minded approach: it waits some number (N) of seconds, and then attempts to unmount a file system, and does this every N seconds. If the file system is in use (busy), the unmount fails.
It turns out that an unmount attempt of a busy file system can be really bad performance-wise. An unmount attempt will flush all cached data, force all modified but unwritten blocks to be written to the NFS server, and flush all cached metadata (attributes, directories, and name cache). At the end of that, if there are still references to the filesystem, the unmount fails. This means that the processes benefiting from caching will now take latency hits as their working sets of cached data are rebuilt.
Thus, you will want to consider tuning your automount duration higher. For example, the automount command in Solaris has a -t option to set the duration to override the default of 600 seconds. You want to strike a balance between good performance and the benefits of re-synchronization with automounter map updates. If you change the location of an NFS file system no more than once a month, then setting the timeout to 86,400 seconds (24 hours) is reasonable. If you are changing things once every few days, you might find 3600 seconds is short enough; I have many years of experience with -t 3600 and vouch for it. Chapter 9 of my book goes into deep discussion of the automounter, including the -t option.
1 Comments:
This is very useful information. I've been trying to implement autofs over the last few months but one issue we're seeing in aix 5.3 and sun 5.8 is that it will randomly hang the entire server. Ever seen this before? I'm not totally sure if autofs is hanging or of some service that is dependent on nfs is hanging. I'm starting wonder if autofs is stable enough for our shop of 400 servers.
-Dan
Post a Comment
<< Home