Tuesday, June 19, 2007

NFSv4.1 Bakeathon and pNFS

Last week I was at Sun Microsystems' campus in Austin, Texas for the NFSv4.1 bakeathon, where various implementors tested NFSv4.1 against each other. The terms of Sun's confidentiality agreement don't allow me to provide details about companies and organizations that attended and how their code did. What I can say is that a total of 7 organizations, including NetApp, brought implementations to Austin, and all implementors had success with interoperability testing.

NFSv4.1 has two big chunks of functionality: sessions and pNFS. Sessions is a new infrastructure that enables exactly once semantics and trunking. By "exactly once" we mean that NFSv4.1 will be able to guarantee that every operation is executed exactly once. This is important for "non-idempotent" operations: operations that if executed twice return different results, for example the file REMOVE operation. Overcoming non-idempotency is necessary for all filesystems, but it is a significant practical problem when the filesystem and the storage are separated by a potentially unreliable communications link as is the case with NFS. Because sessions is a large piece of infrastructure, several implementors in Austin focused on getting sessions to work.

PNFS is parallel NFS: the striping of regular files across several data servers. NFSv4.1 entertains several types of data servers:
  • Blocks-based, where the pNFS client accesses data via Fibre Channel or iSCSI.
  • Object Storage-based, where the pNFS client accesses data via the OSD protocol.
  • File-based, where the pNFS client accesses data via the NFSv4.1 protocol.
Operations to create and delete files, and access directories are always done to a metadata server, regardless what type of data server is used to store regular files.

At Austin, all three pNFS server/data server flavors were there.

Recently Panasas had a press release or two on pNFS, and serveral articles were written. From my perspective, the Byte and Switch article is perhaps the most interesting one to use as fodder for the rest of this blog post, because it expresses opinions that are easy to take issue with.

"You could say NFS was invented by Bill Joy at Sun back in 1983, and the thing hasn't had a major performance upgrade in two decades,"

Welll NFSv3 did add asynchronous I/O, and NFSv4.0 added delegations. In addition, NFS/RDMA adds significant performance wins. I consider pNFS to be yet another step in NFS performance improvements, and doubt it will be the last one either.

When the IETF approves the new standard, which is anticipated by year's end, Panasas will have a significant first-mover advantage.

Note that pNFS has three flavors of data servers, so this is not necessarily the case. Panasas is backing the OSD data server. Whereas, EMC and NetApp are backing the blocks and NAS-based data servers, respectively. Given the amount of storage that is accessible via blocks protocols and NFS, versus object protocols, I would expect some impedance in the market to pNFS over OSD, unless EMC, NetApp, and others have no story for moving on blocks and NFS-based data servers.

The beauty of the files-based data server is that is uses the same protocol as that used to talk to the pNFS metadata server: NFSv4.1. Proponents of other data server protocols might come to appreciate this beauty, and wrap an NFSv4.1 front end onto their data servers.

At least one analyst thinks enterprises don't really need pNFS to improve the performance of clustered systems. "All of the clustered file system NAS vendors have at some fundamental level data coming in over Ethernet that's served by different nodes," says Arun Taneja of the Taneja Group consultancy. "They all do it differently. Panasas does it in a very different way, and I'd call them the odd duck of the group."

But Taneja acknowledges that, if large storage players get behind pNFS, the power of standardization could take over. Then, vendors like BlueArc, Exanet, Isilon, Polyserve (through its alliance with HP), and others would probably look to support it, he says.

While Panasas hopes to widen its appeal through pNFS, another expert says that, for now, pNFS solves problems very specific to HPC environments. "It's not just for big files, it's for multiple, time-sensitive computations," says analyst Mike Karp of Enterprise Management Associates. Where calculations are independent and require lots of instantaneous processing, pNFS could serve a big need.

These analysts are taking an extremely narrow view. "HPC" implies a narrow range of computing such as what United States National Laboratories use. In fact, the need for pNFS is much broader. For example, I constantly get asked by people doing analysis is seismic data (i.e. oil exploration) about pNFS. These folks have requirements for I/O to large files that needs to be accelerated. Storage clusters with clustered parallel filesystems will help (e.g. Data ONTAP GX High Performance Option), but to completely eliminate bottlenecks, pNFS is necessary. Even people doing grid computing with small files see the need for pNFS, because a nice by product of pNFS is that even if a single file is too small to stripe across data servers, lots of little files can be automatically distributed across lots of data servers, thereby removing hot spots and keeping load balanced.

Talk is cheap though; what concrete things are NetApp doing in the pNFS space?
  1. In Data ONTAP GX, the High Performance Option is a parallel, clustered version of WAFL that will the ideal back end filesystem for pNFS. Data ONTAP GX HPO is real and is available now for NFSv3 and CIFS.
  2. NetApp employees are active in finishing the actual standards document. Garth Goodson of NetApp wrote much of the text for first pNFS draft. Since then, it was integrated into the NFSv4.1 document, and I've done significant editing on it. The NFSv4.1 working group has had formal inspections of pNFS and other parts of NFSv4.1. Right now, Spencer Shepler of Sun is folding in the inspectors' comments on the generic pNFS chapter of NFSv4.1, and I will be folding in the comments for the files-based pNFS chapter. Many of the inspectors for the pNFS chapters were NetApp employees.
  3. NetApp wants to ensure that there is a robust pNFS client for the files-based data server in Linux, and to that end, a team of NetApp developers is working with the Linux NFS developer community. Indeed, NetApp brought that client to the Austin bakeathon last week. To be clear, this is work that complements the pNFS work being done at CITI. Think of CITI has the "owner" for the generic pNFS Linux code, and NetApp as driving the file-based-specific parts of the pNFS in Linux.
  4. What might be surprising is that the aforementioned group of NetApp Linux NFS developers are also working on a Linux pNFS data server that is files-based. Why would NetApp care, especially since the Linux NFS server is in theory competition to NetApp storage? In order to have a working client, it needs to have something to test against.
  5. #4 brings up the obvious: what about the Data ONTAP pNFS server as something to test against? NetApp is working on an NFSv4.1 server for Data ONTAP and it will have pNFS support in it, including a files-based data server. However, by deliberate choice, work on a production Data ONTAP NFSv4.1 server (note, NetApp has already demonstrated a prototype pNFS server for Data ONTAP in order to prove the viability of the technology) started after work on the Linux pNFS client and server. The rationale is simple: releasing a Data ONTAP pNFS server into the market well before there is a production quality Linux client it can interoperate with just causes frustration on the part of customers and sales teams.
I have not mentioned release names or actual timing here because (1) roadmaps are never fixed, and (2) even if they were, I'm not allowed to discuss such things. Instead, I'm summarizing activities that are deducible by anyone who reads blogs or mailing lists (Linux and IETF).

Thursday, June 07, 2007

NAS Conference Web Site

The NAS conference (aka the NFS Industry Conference) was a tradition Sun started in the 1980s during the beginnings of NFS but stop having by the early 1990s. In the mid 1990s it was brought back. In the 2000s it got much bigger and was expanded to include CIFS. A year or two ago Sun and SNIA agreed to give SNIA the conference. Anyway in the last month or so, the nasconf.com domain briefly expired, and so all the presentations from the 2000s were offline. After some whining on my part, I'm happy to report nasconf.com is back up.

A Database on NetApp Storage blog

Sanjay Gulabani is a performance engineer at NetApp who focuses on databases using NetApp storage. He's recently started a blog to discuss ideas and issues on this topic. I expect he'll write a great deal about using databases over NFS, and Oracle over NFS in particular.

NFSv4.1 Bakeathon in Austin next week

I'm going to Austin, TX next week to attend the NFSv4.1 interoperability testing event at Sun's facility. PNFS (parallel NFS) will be tested by several companies. I'll talk more about pNFS after the testing event. Feel free to post some questions now, and I'll follow up next week.

My reason for going is to get feed back on the NFSv4.1 draft specification which I've been editing along with Spencer Shepler of Sun, and Dave Noveck of NetApp. Back to my editing.

Data ONTAP GX paper summarized in ;login:

The June 2007 issue of ;login: has a summary from Avishay Traeger (of cs.sunysb.edu) of the GX paper on I co-authored for the 2007 USENIX FAST Conference.