From bhalevy@panasas.com Mon Dec 15 23:01:00 2003 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 65678 invoked from network); 16 Dec 2003 07:01:00 -0000 Received: from unknown (66.218.66.216) by m7.grp.scd.yahoo.com with QMQP; 16 Dec 2003 07:01:00 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta1.grp.scd.yahoo.com with SMTP; 16 Dec 2003 07:00:59 -0000 Received: from yang ([172.17.19.46]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY1CV6; Tue, 16 Dec 2003 02:00:52 -0500 To: Date: Tue, 16 Dec 2003 02:01:20 -0500 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0002_01C3C378.7FB8B890" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-eGroups-Remote-IP: 65.194.124.178 From: "Benny Halevy" Subject: FW: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy -----Original Message----- From: Garth Gibson [mailto:garth@panasas.com] Sent: Wednesday, December 10, 2003 22:27 To: Craig Everhart; John Muth; Brian Pawlowski; David Pease; Julian Satran; Spencer Shepler; Gary Grider; Brent Welch; Benny Halevy; Jon Haswell; Dean Hildebrand; Peter Honeyman; Jim Carlson; Garth Gibson; Andy Adamson; Tyce McLarty; Peter Corbett; David Black Cc: Garth Gibson Subject: NEPS-REQS: getting started So we are the requirements/problem statement subgroup of the NFS extension for parallel storage effort. Our job is to create the paper trail justification for adding something to NFS and provide a conceptual framework by which to identify possible solutions. In the beginning this document is used to justify in the IETF process that there are problems that people take seriously that cannot be handled well in the scope of NFS today and that should be. I asked around for examples to help us construct this document and I was pointed at the problem statement used to start the RDMA over IP effort (attached below). I was told that this was a particularly well done problem statement, and that we should not necessarily work this hard before giving the IETF something to look at. ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-ietf-rddp- problem-statement-02.txt RDDP Abstract: This draft addresses an IP-based solution to the problem of high system costs due to network I/O copying in end-hosts at high speeds. The problem is due to the high cost of memory bandwidth, and it can be substantially improved using "copy avoidance." The high overhead has limited the use of TCP/IP in interconnection networks especially where high bandwidth, low latency and/or low overhead of end-system data movement are required by the hosted application. So I suppose we could start with pNFS Abstract: This draft addresses an NFS-based solution to the problem of high system costs due to store-and-forward copying of storage data from storage devices through a file server mount point to high-speed end-hosts that also have connectivity to source storage devices. The problem is due to the high cost of funneling large storage bandwidths through NFS on single IP addresses, and it can be substantially improved using "out-of-band access." The high cost of high-bandwidth NFS servers has limited the use of NFS in data centers especially where high storage bandwidths are required and numerous storage serving devices are already networked together. A pNFS table of contents might be: 1. Introduction 2. The high cost of high bandwidth storage through NFS 2.1 Out-of-band access decreases bandwidth requirements in central file servers 3. Application level routing of storage data packets is the root cause of the problem 4. Storage bandwidth bottlenecks are problematic for many key file system applications 5. Out-of-band access techniques 5.1 A conceptual framework: pNFS delegated maps for distributing files over SBC, OSD and NFS storage subsystems 6. Security considerations 7. Acknowledgements 8. Informative references Please have a look at the RDDP problem statement draft and comment on my simplistic strategy of monkey-see-monkey-do :-) garth Begin forwarded message: > From: Garth Gibson > Date: Wed Dec 10, 2003 9:34:58 PM Canada/Eastern > To: Andy Adamson , David Black > , Don Cameron , Jim > Carlson , Peter Corbett , Craig > Everhart , Steve Fridella > , Garth Gibson , > Gary Grider , Benny Halevy , > Jon Haswell , Dean Hildebrand > , Peter Honeyman , > Xiaoye Jiang , Mike Kazar , > Tyce McLarty , John Muth , > Dave Noveck , Brian Pawlowski > , David Pease , > Julian Satran , Spencer Shepler > , Brent Welch > Subject: NFS Extensions for Parallel Storage, subgroup membership > > Folks, > > Thanks for a great workshop last Thursday! > > Materials presented that day are online: > http://www.citi.umich.edu/NEPS/agenda.html > > Below are the workshop followup subgroup memberships as they are now. > I think I heard Peter say that he would construct auto-managed email > lists, which from the additions I've received this week, I have > already decided would be great. Please Peter! Names like neps-all, > neps-reqs, neps-ops, neps-sbc, neps-osd, neps-nfs would be great. > > Our goals, to reprise, are to sketch a set of requirements for NFS > Extensions for Parallel Storage, or pNFS extensions, sketch a set of > NFS operation extensions (possibly including alternatives), sketch a > set of metadata definitions (possibly including alternatives) for > out-of-band data access over fixed block (SBC) SCSI protocols, object > (OSD) SCSI protocols and file (NFS) ONCRPC protocols. > > We want to do this quickly, over the next few months, and to take it > into the IETF NFS process as a set of suggestions and strawman > protocols. The current plan is that at that point those of us that > follow through with this will to it in the IETF NFS working group. In > order to convince the IETF and the NFS working group that we have > important, useful and viable ideas, we are taking a little time to > pull together starting material. > > The timelines discussed at the end of the workshop "heir of the dog" > session were: > - get workshop notes put together and out in December (Peter and Garth) > - 0th draft of a requirements/problem statement internet draft by mid > January > - IETF submission of an internet draft by first week of Feb, so it can > be part of the March IETF meeting and used as evidence for inclusion > of extensions for parallel storage into the NFS working group charter > - one or more documents (not necessarily fully agreeing) from each > subgroup into the IETF NFS email discussion for early to mid March > - a face-to-face followup workshop, open to the IETF NFS group at the > FAST 2004 conference, in San Francisco Mar 31 - Apr 2, at which all > further plans are proposed, argued and ratified (e.g. shall we be > absorbed into the IETF NFS group) > > To help move this along, we have asked one person in each subgroup to > push, prod and pull ideas and words out of us. Please help these > sacrificial volunteers with by contributing text, criticizing > constructively with alternative text, and finding the time to read > materials. > > These are volunteers in an unofficial process. We have no rules to be > applied by arbitration, no membership to take votes from. If this > consensus process, or these people, are not working out, then I > suggest grass roots alternatives be suggested and explored as a group. > Lets not get bogged down in process this early :-) > > But there are always going to be logistical and procedural issues that > we need to deal with as a group. The suggestion at the workshop was > that these multi-subgroup issues be taken into the requirements group. > For example, I suggest that "scope" issues -- what we include and > what we exclude from our agenda -- be dealt with in the requirements > group, where we would need to add/delete requirements for each > distinct aspect of our scope. > > I'm sure I'm way over the line giving this much direction :-) so I'll > leave it to the subgroups to decide mechanisms for progress. For > example, weekly conference calls, document exchange formats, > editorship delegation and/or rotation, agreement achieving processes, > .... > > And with that I'll go off and get to work on suggesting what our > problem statement needs to say. > > garth > 412-805-9878 (cell) > > ------------------------------------------------------- > > pNFS requirements: Garth Gibson > ----------------- > Andy Adamson > David Black > Jim Carlson > Peter Corbett > Craig Everhart > Garth Gibson > Gary Grider > Benny Halevy > Jon Haswell > Dean Hildebrand > Peter Honeyman > Tyce McLarty > John Muth > Brian Pawlowski > David Pease > Julian Satran > Spencer Shepler > Brent Welch Allyn Romanow (Cisco) Internet-Draft Jeff Mogul (HP) Expires: December 2003 Tom Talpey (NetApp) Stephen Bailey (Sandburst) RDMA over IP Problem Statement draft-ietf-rddp-problem-statement-02 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This draft addresses an IP-based solution to the problem of high system costs due to network I/O copying in end-hosts at high speeds. The problem is due to the high cost of memory bandwidth, and it can be substantially improved using "copy avoidance." The high overhead has limited the use of TCP/IP in interconnection networks especially where high bandwidth, low latency and/or low overhead of end-system data movement are required by the hosted application. Romanow, et al Expires December 2003 [Page 1] Internet-Draft RDMA Over IP Problem Statement June 2003 Table Of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 2. The high cost of data movement operations in network I/O . 3 2.1. Copy avoidance improves processing overhead . . . . . . . 5 3. Memory bandwidth is the root cause of the problem . . . . 6 4. High copy overhead is problematic for many key Internet applications . . . . . . . . . . . . . . . . . . . . . . . 7 5. Copy Avoidance Techniques . . . . . . . . . . . . . . . . 9 5.1. A Conceptual Framework: DDP and RDMA . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . 11 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 12 Informative References . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . 17 Full Copyright Statement . . . . . . . . . . . . . . . . . 18 1. Introduction This draft considers the problem of high host processing overhead associated with network I/O that occurs under high speed conditions. This problem is often referred to as the "I/O bottleneck" [CT90]. More specifically, the source of high overhead that is of interest here is data movement operations - copying. This issue is not be confused with TCP offload, which is not addressed here. High speed refers to conditions where the network link speed is high relative to the bandwidths of the host CPU and memory. With today's computer systems, one Gbits/s and over is considered high speed. High costs associated with copying are an issue primarily for large scale systems. Although smaller systems such as rack-mounted PCs and small workstations would benefit from a reduction in copying overhead, the benefit to smaller machines will be primarily in the next few years as they scale in the amount of bandwidth they handle. Today it is large system machines with high bandwidth feeds, usually multiprocessors and clusters, that are adversely affected by copying overhead. Examples of such machines include all varieties of servers: database servers, storage servers, application servers for transaction processing, for e-commerce, and web serving, content distribution, video distribution, backups, data mining and decision support, and scientific computing. Note that such servers almost exclusively service many concurrent sessions (transport connections), which, in aggregate, are responsible for > 1 Gbits/s of communication. Nonetheless, the cost of copying overhead for a particular load is the same whether from few or many sessions. Romanow, et al Expires December 2003 [Page 2] Internet-Draft RDMA Over IP Problem Statement June 2003 The I/O bottleneck, and the role of data movement operations, have been widely studied in research and industry over the last approximately 14 years, and we draw freely on these results. Historically, the I/O bottleneck has received attention whenever new networking technology has substantially increased line rates - 100 Mbits/s FDDI and Fast Ethernet, 155 Mbits/s ATM, 1 Gbits/s Ethernet. In earlier speed transitions, the availability of memory bandwidth allowed the I/O bottleneck issue to be deferred. Now however, this is no longer the case. While the I/O problem is significant at 1 Gbits/s, it is the introduction of 10 Gbits/s Ethernet which is motivating an upsurge of activity in industry and research [DAFS, IB, VI, CGZ01, Ma02, MAF+02]. Because of high overhead of end-host processing in current implementations, the TCP/IP protocol stack is not used for high speed transfer. Instead, special purpose network fabrics, using a technology generally known as remote direct memory access (RDMA), have been developed and are widely used. RDMA is a set of mechanisms that allow the network adapter, under control of the application, to steer data directly into and out of application buffers. Examples of such interconnection fabrics include Fibre Channel [FIBRE] for block storage transfer, Virtual Interface Architecture [VI] for database clusters, Infiniband [IB], Compaq Servernet [SRVNET], Quadrics [QUAD] for System Area Networks. These link level technologies limit application scaling in both distance and size, meaning that the number of nodes cannot be arbitrarily large. This problem statement substantiates the claim that in network I/O processing, high overhead results from data movement operations, specifically copying; and that copy avoidance significantly decreases the processing overhead. It describes when and why the high processing overheads occur, explains why the overhead is problematic, and points out which applications are most affected. In addition, this document introduces an architectural approach to solving the problem, which is developed in detail in [BT02]. It also discusses how the proposed technology may introduce security concerns and how they should be addressed. 2. The high cost of data movement operations in network I/O A wealth of data from research and industry shows that copying is responsible for substantial amounts of processing overhead. It further shows that even in carefully implemented systems, eliminating copies significantly reduces the overhead, as referenced below. Romanow, et al Expires December 2003 [Page 3] Internet-Draft RDMA Over IP Problem Statement June 2003 Clark et al. [CJRS89] in 1989 shows that TCP [Po81] overhead processing is attributable to both operating system costs such as interrupts, context switches, process management, buffer management, timer management, and to the costs associated with processing individual bytes, specifically computing the checksum and moving data in memory. They found moving data in memory is the more important of the costs, and their experiments show that memory bandwidth is the greatest source of limitation. In the data presented [CJRS89], 64% of the measured microsecond overhead was attributable to data touching operations, and 48% was accounted for by copying. The system measured Berkeley TCP on a Sun-3/60 using 1460 Byte Ethernet packets. In a well-implemented system, copying can occur between the network interface and the kernel, and between the kernel and application buffers - two copies, each of which are two memory bus crossings - for read and write. Although in certain circumstances it is possible to do better, usually two copies are required on receive. Subsequent work has consistently shown the same phenomenon as the earlier Clark study. A number of studies report results that data- touching operations, checksumming and data movement, dominate the processing costs for messages longer than 128 Bytes [BS96, CGY01, Ch96, CJRS89, DAPP93, KP96]. For smaller sized messages, per- packet overheads dominate [KP96, CGY01]. The percentage of overhead due to data-touching operations increases with packet size, since time spent on per-byte operations scales linearly with message size [KP96]. For example, Chu [Ch96] reported substantial per-byte latency costs as a percentage of total networking software costs for an MTU size packet on SPARCstation/20 running memory-to-memory TCP tests over networks with 3 different MTU sizes. The percentage of total software costs attributable to per-byte operations were: 1500 Byte Ethernet 18-25% 4352 Byte FDDI 35-50% 9180 Byte ATM 55-65% Although many studies report results for data-touching operations including checksumming and data movement together, much work has focused just on copying [BS96, B99, Ch96, TK95]. For example, [KP96] reports results that separate processing times for checksum from data movement operations. For the 1500 Byte Ethernet size, 20% of total processing overhead time is attributable to copying. The study used 2 DECstations 5000/200 connected by an FDDI network. (In this study checksum accounts for 30% of the processing time.) Romanow, et al Expires December 2003 [Page 4] Internet-Draft RDMA Over IP Problem Statement June 2003 2.1. Copy avoidance improves processing overhead A number of studies show that eliminating copies substantially reduces overhead. For example, results from copy-avoidance in the IO-Lite system [PDZ99], which aimed at improving web server performance, show a throughput increase of 43% over an optimized web server, and 137% improvement over an Apache server. The system was implemented in a 4.4BSD derived UNIX kernel, and the experiments used a server system based on a 333MHz Pentium II PC connected to a switched 100 Mbits/s Fast Ethernet. There are many other examples where elimination of copying using a variety of different approaches showed significant improvement in system performance [CFF+94, DP93, EBBV95, KSZ95, TK95, Wa97]. We will discuss the results of one of these studies in detail in order to clarify the significant degree of improvement produced by copy avoidance [Ch02]. Recent work by Chase et al. [CGY01], measuring CPU utilization, shows that avoiding copies reduces CPU time spent on data access from 24% to 15% at 370 Mbits/s for a 32 KBytes MTU using an AlphaStation XP1000 and a Myrinet adapter [BCF+95]. This is an absolute improvement of 9% due to copy avoidance. The total CPU utilization was 35%, with data access accounting for 24%. Thus the relative importance of reducing copies is 26%. At 370 Mbits/s, the system is not very heavily loaded. The relative improvement in achievable bandwidth is 34%. This is the improvement we would see if copy avoidance were added when the machine was saturated by network I/O. Note that improvement from the optimization becomes more important if the overhead it targets is a larger share of the total cost. This is what happens if other sources of overhead, such as checksumming, are eliminated. In [CGY01], after removing checksum overhead, copy avoidance reduces CPU utilization from 26% to 10%. This is a 16% absolute reduction, a 61% relative reduction, and a 160% relative improvement in achievable bandwidth. In fact, today's network interface hardware commonly offloads the checksum, which removes the other source of per-byte overhead. They also coalesce interrupts to reduce per-packet costs. Thus, today copying costs account for a relatively larger part of CPU utilization than previously, and therefore relatively more benefit is to be gained in reducing them. (Of course this argument would be specious if the amount of overhead were insignificant, but it has been shown to be substantial.) Romanow, et al Expires December 2003 [Page 5] Internet-Draft RDMA Over IP Problem Statement June 2003 3. Memory bandwidth is the root cause of the problem Data movement operations are expensive because memory bandwidth is scarce relative to network bandwidth and CPU bandwidth [PAC+97]. This trend existed in the past and is expected to continue into the future [HP97, STREAM], especially in large multiprocessor systems. With copies crossing the bus twice per copy, network processing overhead is high whenever network bandwidth is large in comparison to CPU and memory bandwidths. Generally with today's end-systems, the effects are observable at network speeds over 1 Gbits/s. A common question is whether increase in CPU processing power alleviates the problem of high processing costs of network I/O. The answer is no, it is the memory bandwidth that is the issue. Faster CPUs do not help if the CPU spends most of its time waiting for memory [CGY01]. The widening gap between microprocessor performance and memory performance has long been a widely recognized and well-understood problem [PAC+97]. Hennessy [HP97] shows microprocessor performance grew from 1980-1998 at 60% per year, while the access time to DRAM improved at 10% per year, giving rise to an increasing "processor- memory performance gap". Another source of relevant data is the STREAM Benchmark Reference Information website which provides information on the STREAM benchmark [STREAM]. The benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MBytes/s) and the corresponding computation rate for simple vector kernels measured in MFLOPS. The website tracks information on sustainable memory bandwidth for hundreds of machines and all major vendors. Results show measured system performance statistics. Processing performance from 1985-2001 increased at 50% per year on average, and sustainable memory bandwidth from 1975 to 2001 increased at 35% per year on average over all the systems measured. A similar 15% per year lead of processing bandwidth over memory bandwidth shows up in another statistic, machine balance [Mc95], a measure of the relative rate of CPU to memory bandwidth (FLOPS/cycle) / (sustained memory ops/cycle) [STREAM]. Network bandwidth has been increasing about 10-fold roughly every 8 years, which is a 40% per year growth rate. A typical example illustrates that the memory bandwidth compares unfavorably with link speed. The STREAM benchmark shows that a modern uniprocessor PC, for example the 1.2 GHz Athlon in 2001, Romanow, et al Expires December 2003 [Page 6] Internet-Draft RDMA Over IP Problem Statement June 2003 will move the data 3 times in doing a receive operation - 1 for the network interface to deposit the data in memory, and 2 for the CPU to copy the data. With 1 GBytes/s of memory bandwidth, meaning one read or one write, the machine could handle approximately 2.67 Gbits/s of network bandwidth, one third the copy bandwidth. But this assumes 100% utilization, which is not possible, and more importantly the machine would be totally consumed! (A rule of thumb for databases is that 20% of the machine should be required to service I/O, leaving 80% for the database application. And, the less the better.) In 2001, 1 Gbits/s links were common. An application server may typically have two 1 Gbits/s connections - one connection backend to a storage server and one front-end, say for serving HTTP [FGM+99]. Thus the communications could use 2 Gbits/s. In our typical example, the machine could handle 2.7 Gbits/s at its theoretical maximum while doing nothing else. This means that the machine basically could not keep up with the communication demands in 2001, with the relative growth trends the situation only gets worse. 4. High copy overhead is problematic for many key Internet applications If a significant portion of resources on an application machine is consumed in network I/O rather than in application processing, it makes it difficult for the application to scale - to handle more clients, to offer more services. Several years ago the most affected applications were streaming multimedia, parallel file systems and supercomputing on clusters [BS96]. In addition, today the applications that suffer from copying overhead are more central in Internet computing - they store, manage, and distribute the information of the Internet and the enterprise. They include database applications doing transaction processing, e-commerce, web serving, decision support, content distribution, video distribution, and backups. Clusters are typically used for this category of application, since they have advantages of availability and scalability. Today these applications, which provide and manage Internet and corporate information, are typically run in data centers that are organized into three logical tiers. One tier is typically a set of web servers connecting to the WAN. The second tier is a set of application servers that run the specific applications usually on more powerful machines, and the third tier is backend databases. Physically, the first two tiers - web server and application server - are usually combined [Pi01]. For example an e-commerce server communicates with a database server and with a customer site, or a Romanow, et al Expires December 2003 [Page 7] Internet-Draft RDMA Over IP Problem Statement June 2003 content distribution server connects to a server farm, or an OLTP server connects to a database and a customer site. When network I/O uses too much memory bandwidth, performance on network paths between tiers can suffer. (There might also be performance issues on SAN paths used either by the database tier or the application tier.) The high overhead from network-related memory copies diverts system resources from other application processing. It also can create bottlenecks that limit total system performance. There are a large and growing number of these application servers distributed throughout the Internet. In 1999 approximately 3.4 million server units were shipped, in 2000, 3.9 million units, and the estimated annual growth rate for 2000-2004 was 17 percent [Ne00, Pa01]. There is high motivation to maximize the processing capacity of each CPU, as scaling by adding CPUs one way or another has drawbacks. For example, adding CPUs to a multiprocessor will not necessarily help, as a multiprocessor improves performance only when the memory bus has additional bandwidth to spare. Clustering can add additional complexity to handling the applications. In order to scale a cluster or multiprocessor system, one must proportionately scale the interconnect bandwidth. Interconnect bandwidth governs the performance of communication-intensive parallel applications; if this (often expressed in terms of "bisection bandwidth") is too low, adding additional processors cannot improve system throughput. Interconnect latency can also limit the performance of applications that frequently share data between processors. So, excessive overheads on network paths in a "scalable" system both can require the use of more processors than optimal, and can reduce the marginal utility of those additional processors. Copy avoidance scales a machine upwards by removing at least two- thirds the bus bandwidth load from the "very best" 1-copy (on receive) implementations, and removes at least 80% of the bandwidth overhead from the 2-copy implementations. An example showing poor performance with copies and improved scaling with copy avoidance is illustrative. The IO-Lite work [PDZ99] shows higher server throughput servicing more clients using a zero-copy system. In an experiment designed to mimic real world web conditions by simulating the effect of TCP WAN connections on the server, the performance of 3 servers was compared. One server Romanow, et al Expires December 2003 [Page 8] Internet-Draft RDMA Over IP Problem Statement June 2003 was Apache, another an optimized server called Flash, and the third the Flash server running IO-Lite, called Flash-Lite with zero copy. The measurement was of throughput in requests/second as a function of the number of slow background clients that could be served. As the table shows, Flash-Lite has better throughput, especially as the number of clients increases. Apache Flash Flash-Lite ------ ----- ---------- #Clients Thruput reqs/s Thruput Thruput 0 520 610 890 16 390 490 890 32 360 490 850 64 360 490 890 128 310 450 880 256 310 440 820 Traditional Web servers (which mostly send data and can keep most of their content in the file cache) are not the worst case for copy overhead. Web proxies (which often receive as much data as they send) and complex Web servers based on SANs or multi-tier systems will suffer more from copy overheads than in the example above. 5. Copy Avoidance Techniques There have been extensive research investigation and industry experience with two main alternative approaches to eliminating data movement overhead, often along with improving other Operating System processing costs. In one approach, hardware and/or software changes within a single host reduce processing costs. In another approach, memory-to-memory networking [MAF+02], hosts communicate via information that allows them to reduce processing costs. The single host approaches range from new hardware and software architectures [KSZ95, Wa97, DWB+93] to new or modified software systems [BP96, Ch96, TK95, DP93, PDZ99]. In the approach based on using a networking protocol to exchange information, the network adapter, under control of the application, places data directly into and out of application buffers, reducing the need for data movement. Commonly this approach is called RDMA, Remote Direct Memory Access. As discussed below, research and industry experience has shown that copy avoidance techniques within the receiver processing path alone have proven to be problematic. The research special purpose host adapter systems had good performance and can be seen as precursors Romanow, et al Expires December 2003 [Page 9] Internet-Draft RDMA Over IP Problem Statement June 2003 for the commercial RDMA-based NICs [KSZ95, DWB+93]. In software, many implementations have successfully achieved zero-copy transmit, but few have accomplished zero-copy receive. And those that have done so make strict alignment and no-touch requirements on the application, greatly reducing the portability and usefulness of the implementation. In contrast, experience has proven satisfactory with memory-to- memory systems that permit RDMA - performance has been good and there have not been system or networking difficulties. RDMA is a single solution. Once implemented, it can be used with any OS and machine architecture, and it does not need to be revised when either of these changes. In early work, one goal of the software approaches was to show that TCP could go faster with appropriate OS support [CJR89, CFF+94]. While this goal was achieved, further investigation and experience showed that, though possible to craft software solutions, specific system optimizations have been complex, fragile, extremely interdependent with other system parameters in complex ways, and often of only marginal improvement [CFF+94, CGY01, Ch96, DAPP93, KSZ95, PDZ99]. The network I/O system interacts with other aspects of the Operating System such as machine architecture and file I/O, and disk I/O [Br99, Ch96, DP93]. For example, the Solaris Zero-Copy TCP work [Ch96], which relies on page remapping, shows that the results are highly interdependent with other systems, such as the file system, and that the particular optimizations are specific for particular architectures, meaning for each variation in architecture optimizations must be re-crafted [Ch96]. A number of research projects and industry products have been based on the memory-to-memory approach to copy avoidance. These include U-Net [EBBV95], SHRIMP [BLA+94], Hamlyn [BJM+96], Infiniband [IB], Winsock Direct [Pi01]. Several memory-to-memory systems have been widely used and have generally been found to be robust, to have good performance, and to be relatively simple to implement. These include VI [VI], Myrinet [BCF+95], Quadrics [QUAD], Compaq/Tandem Servernet [SRVNET]. Networks based on these memory-to-memory architectures have been used widely in scientific applications and in data centers for block storage, file system access, and transaction processing. By exporting direct memory access "across the wire", applications may direct the network stack to manage all data directly from application buffers. A large and growing class of applications has already emerged which takes advantage of such capabilities, Romanow, et al Expires December 2003 [Page 10] Internet-Draft RDMA Over IP Problem Statement June 2003 including all the major databases, as well as file systems such as DAFS [DAFS] and network protocols such as Sockets Direct [SDP]. 5.1. A Conceptual Framework: DDP and RDMA An RDMA solution can be usefully viewed as being comprised of two distinct components: "direct data placement (DDP)" and "remote direct memory access (RDMA) semantics". They are distinct in purpose and also in practice - they may be implemented as separate protocols. The more fundamental of the two is the direct data placement facility. This is the means by which memory is exposed to the remote peer in an appropriate fashion, and the means by which the peer may access it, for instance reading and writing. The RDMA control functions are semantically layered atop direct data placement. Included are operations that provide "control" features, such as connection and termination, and the ordering of operations and signaling their completions. A "send" facility is provided. While the functions (and potentially protocols) are distinct, historically both aspects taken together have been referred as "RDMA". The facilities of direct data placement are useful in and of themselves, and may be employed by other upper layer protocols to facilitate data transfer. Therefore, it is often useful to refer to DDP as the data placement functionality and RDMA as the control aspect. [BT02] develops an architecture for DDP and RDMA, and is a companion draft to this problem statement. 6. Security Considerations Solutions to the problem of reducing copying overhead in high bandwidth transfers via one or more protocols may introduce new security concerns. Any proposed solution must be analyzed for security threats and any such threats addressed. Potential security weaknesses due to resource issues that might lead to denial-of-service attacks, overwrites and other concurrent operations, the ordering of completions as required by the RDMA protocol, the granularity of transfer, and any other identified threats; need to be examined, described and an adequate solution to them found. Layered atop Internet transport protocols, the RDMA protocols will gain leverage from and must permit integration with Internet Romanow, et al Expires December 2003 [Page 11] Internet-Draft RDMA Over IP Problem Statement June 2003 security standards, such as IPSec and TLS [IPSEC, TLS]. A thorough analysis of the degree to which these protocols address potential threats is required. Security for an RDMA design requires more than just securing the communication channel. While it is necessary to be able to guarantee channel properties such as privacy, integrity, and authentication, these properties cannot defend against all attacks from properly authenticated peers, which might be malicious, compromised, or buggy. For example, an RDMA peer should not be able to read or write memory regions without prior consent. Further, it must not be possible to evade consistency checks at the recipient. The RDMA design must allow the recipient to rely on its consistent memory contents by controlling peer access to memory regions explicitly, and must disallow peer access to regions when not authorized. The RDMA protocols must ensure that regions addressable by RDMA peers be under strict application control. Remote access to local memory by a network peer introduces a number of potential security concerns. This becomes particularly important in the Internet context, where such access can be exported globally. The RDMA protocols carry in part what is essentially user information, explicitly including addressing information and operation type (read or write), and implicitly including protection and attributes. As such, the protocol requires checking of these higher level aspects in addition to the basic formation of messages. The semantics associated with each class of error must be clearly defined, and the expected action to be taken on mismatch be specified. In some cases, this will result in a catastrophic error on the RDMA association, however in others a local or remote error may be signalled. Certain of these errors may require consideration of abstract local semantics, which must be carefully specified so as to provide useful behavior while not constraining the implementation. 7. Acknowledgements Jeff Chase generously provided many useful insights and information. Thanks to Jim Pinkerton for many helpful discussions. 8. Informative References [BCF+95] N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N. Seizovic, and W. Su. "Myrinet - A gigabit-per- Romanow, et al Expires December 2003 [Page 12] Internet-Draft RDMA Over IP Problem Statement June 2003 second local-area network", IEEE Micro, February 1995 [BJM+96] G. Buzzard, D. Jacobson, M. Mackey, S. Marovich, J. Wilkes, "An implementation of the Hamlyn send-managed interface architecture", in Proceedings of the Second Symposium on Operating Systems Design and Implementation, USENIX Assoc., October 1996 [BLA+94] M. A. Blumrich, K. Li, R. Alpert, C. Dubnicki, E. W. Felten, "A virtual memory mapped network interface for the SHRIMP multicomputer", in Proceedings of the 21st Annual Symposium on Computer Architecture, April 1994, pp. 142-153 [Br99] J. C. Brustoloni, "Interoperation of copy avoidance in network and file I/O", Proceedings of IEEE Infocom, 1999, pp. 534-542 [BS96] J. C. Brustoloni, P. Steenkiste, "Effects of buffering semantics on I/O performance", Proceedings OSDI'96, USENIX, Seattle, WA October 1996, pp. 277-291 RFC Editor note: Replace following architecture draft-ietf- name, status and date with appropriate reference when assigned. [BT02] S. Bailey, T. Talpey, "The Architecture of Direct Data Placement (DDP) And Remote Direct Memory Access (RDMA) On Internet Protocols", Internet Draft Work in Progress, draft- ietf-rddp-arch-02, June 2003 [CFF+94] C-H Chang, D. Flower, J. Forecast, H. Gray, B. Hawe, A. Nadkarni, K. K. Ramakrishnan, U. Shikarpur, K. Wilde, "High- performance TCP/IP and UDP/IP networking in DEC OSF/1 for Alpha AXP", Proceedings of the 3rd IEEE Symposium on High Performance Distributed Computing, August 1994, pp. 36-42 [CGY01] J. S. Chase, A. J. Gallatin, and K. G. Yocum, "End system optimizations for high-speed TCP", IEEE Communications Magazine, Volume: 39, Issue: 4 , April 2001, pp 68-74. http://www.cs.duke.edu/ari/publications/end-system.{ps,pdf} Romanow, et al Expires December 2003 [Page 13] Internet-Draft RDMA Over IP Problem Statement June 2003 [Ch96] H.K. Chu, "Zero-copy TCP in Solaris", Proc. of the USENIX 1996 Annual Technical Conference, San Diego, CA, January 1996 [Ch02] Jeffrey Chase, Personal communication [CJRS89] D. D. Clark, V. Jacobson, J. Romkey, H. Salwen, "An analysis of TCP processing overhead", IEEE Communications Magazine, volume: 27, Issue: 6, June 1989, pp 23-29 [CT90] D. D. Clark, D. Tennenhouse, "Architectural considerations for a new generation of protocols", Proceedings of the ACM SIGCOMM Conference, 1990 [DAFS] DAFS Collaborative, "Direct Access File System Specification v1.0", September 2001, available from http://www.dafscollaborative.org [DAPP93] P. Druschel, M. B. Abbott, M. A. Pagels, L. L. Peterson, "Network subsystem design", IEEE Network, July 1993, pp. 8-17 [DP93] P. Druschel, L. L. Peterson, "Fbufs: a high-bandwidth cross- domain transfer facility", Proceedings of the 14th ACM Symposium of Operating Systems Principles, December 1993 [DWB+93] C. Dalton, G. Watson, D. Banks, C. Calamvokis, A. Edwards, J. Lumley, "Afterburner: architectural support for high- performance protocols", Technical Report, HP Laboratories Bristol, HPL-93-46, July 1993 [EBBV95] T. von Eicken, A. Basu, V. Buch, and W. Vogels, "U-Net: A user-level network interface for parallel and distributed computing", Proc. of the 15th ACM Symposium on Operating Systems Principles, Copper Mountain, Colorado, December 3-6, 1995 [FGM+99] R. Fielding, J. Gettys, J. Mogul, F. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, "Hypertext Transfer Protocol - HTTP/1.1", RFC 2616, June 1999 Romanow, et al Expires December 2003 [Page 14] Internet-Draft RDMA Over IP Problem Statement June 2003 [FIBRE] ANSI Technical Committee T10, "Fibre Channel Protocol (FCP)" (and as revised and updated), ANSI X3.269:1996 [R2001], committee draft available from http://www.t10.org/drafts.htm#FibreChannel [HP97] J. L. Hennessy, D. A. Patterson, Computer Organization and Design, 2nd Edition, San Francisco: Morgan Kaufmann Publishers, 1997 [IB] InfiniBand Trade Association, "InfiniBand Architecture Specification, Volumes 1 and 2", Release 1.1, November 2002, available from http://www.infinibandta.org/specs [KP96] J. Kay, J. Pasquale, "Profiling and reducing processing overheads in TCP/IP", IEEE/ACM Transactions on Networking, Vol 4, No. 6, pp.817-828, December 1996 [KSZ95] K. Kleinpaste, P. Steenkiste, B. Zill, "Software support for outboard buffering and checksumming", SIGCOMM'95 [Ma02] K. Magoutis, "Design and Implementation of a Direct Access File System (DAFS) Kernel Server for FreeBSD", in Proceedings of USENIX BSDCon 2002 Conference, San Francisco, CA, February 11-14, 2002. [MAF+02] K. Magoutis, S. Addetia, A. Fedorova, M. I. Seltzer, J. S. Chase, D. Gallatin, R. Kisley, R. Wickremesinghe, E. Gabber, "Structure and Performance of the Direct Access File System (DAFS)", accepted for publication at the 2002 USENIX Annual Technical Conference, Monterey, CA, June 9-14, 2002. [Mc95] J. D. McCalpin, "A Survey of memory bandwidth and machine balance in current high performance computers", IEEE TCCA Newsletter, December 1995 [Ne00] A. Newman, "IDC report paints conflicted picture of server market circa 2004", ServerWatch, July 24, 2000 http://serverwatch.internet.com/news/2000_07_24_a.html Romanow, et al Expires December 2003 [Page 15] Internet-Draft RDMA Over IP Problem Statement June 2003 [Pa01] M. Pastore, "Server shipments for 2000 surpass those in 1999", ServerWatch, February 7, 2001 http://serverwatch.internet.com/news/2001_02_07_a.html [PAC+97] D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, K. Yelick , "A case for intelligient RAM: IRAM", IEEE Micro, April 1997 [PDZ99] V. S. Pai, P. Druschel, W. Zwaenepoel, "IO-Lite: a unified I/O buffering and caching system", Proc. of the 3rd Symposium on Operating Systems Design and Implementation, New Orleans, LA, February 1999 [Pi01] J. Pinkerton, "Winsock Direct: The Value of System Area Networks", May 2001, available from http://www.microsoft.com/windows2000/techinfo/ howitworks/communications/winsock.asp [Po81] J. Postel, "Transmission Control Protocol - DARPA Internet Program Protocol Specification", RFC 793, September 1981 [QUAD] Quadrics Ltd., Quadrics QSNet product information, available from http://www.quadrics.com/website/pages/02qsn.html [SDP] InfiniBand Trade Association, "Sockets Direct Protocol v1.0", Annex A of InfiniBand Architecture Specification Volume 1, Release 1.1, November 2002, available from http://www.infinibandta.org/specs [SRVNET] R. Horst, "TNet: A reliable system area network", IEEE Micro, pp. 37-45, February 1995 [STREAM] J. D. McAlpin, The STREAM Benchmark Reference Information, http://www.cs.virginia.edu/stream/ [TK95] M. N. Thadani, Y. A. Khalidi, "An efficient zero-copy I/O framework for UNIX", Technical Report, SMLI TR-95-39, May 1995 Romanow, et al Expires December 2003 [Page 16] Internet-Draft RDMA Over IP Problem Statement June 2003 [VI] Compaq Computer Corp., Intel Corporation and Microsoft Corporation, "Virtual Interface Architecture Specification Version 1.0", December 1997, available from http://www.vidf.org/info/04standards.html [Wa97] J. R. Walsh, "DART: Fast application-level networking via data-copy avoidance", IEEE Network, July/August 1997, pp. 28-38 Authors' Addresses Stephen Bailey Sandburst Corporation 600 Federal Street Andover, MA 01810 USA Phone: +1 978 689 1614 Email: steph@sandburst.com Jeffrey C. Mogul Western Research Laboratory Hewlett-Packard Company 1501 Page Mill Road, MS 1251 Palo Alto, CA 94304 USA Phone: +1 650 857 2206 (email preferred) Email: JeffMogul@acm.org Allyn Romanow Cisco Systems, Inc. 170 W. Tasman Drive San Jose, CA 95134 USA Phone: +1 408 525 8836 Email: allyn@cisco.com Romanow, et al Expires December 2003 [Page 17] Internet-Draft RDMA Over IP Problem Statement June 2003 Tom Talpey Network Appliance 375 Totten Pond Road Waltham, MA 02451 USA Phone: +1 781 768 5329 Email: thomas.talpey@netapp.com Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Romanow, et al Expires December 2003 [Page 18] From bhalevy@panasas.com Mon Dec 15 23:02:52 2003 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 63766 invoked from network); 16 Dec 2003 07:02:51 -0000 Received: from unknown (66.218.66.166) by m18.grp.scd.yahoo.com with QMQP; 16 Dec 2003 07:02:51 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta5.grp.scd.yahoo.com with SMTP; 16 Dec 2003 07:02:49 -0000 Received: from yang ([172.17.19.46]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY1CWB; Tue, 16 Dec 2003 02:02:32 -0500 To: Date: Tue, 16 Dec 2003 02:03:00 -0500 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0007_01C3C378.BB3217E0" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-eGroups-Remote-IP: 65.194.124.178 From: "Benny Halevy" Subject: FW: Re: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy -----Original Message----- From: Gary Grider [mailto:ggrider@lanl.gov] Sent: Saturday, December 13, 2003 00:02 To: Garth Gibson; Craig Everhart; John Muth; Brian Pawlowski; David Pease; Julian Satran; Spencer Shepler; Brent Welch; Benny Halevy; Jon Haswell; Dean Hildebrand; Peter Honeyman; Jim Carlson; Garth Gibson; Andy Adamson; Tyce McLarty; Peter Corbett; David Black Cc: Garth Gibson Subject: Re: NEPS-REQS: getting started I decided to toss out a very quick and dirty draft with a lot of parts missing. Nothing sacred, just thoughts as they occurred to me partially organized. I put it in Word so I could get formatting, TOC, etc. I am attaching a Word and PDF. I would be happy to put this on a web site for us if you want. I also would be happy to centralize the edits and re-post it on the web etc. Thanks Gary At 10:26 PM 12/10/2003 -0500, Garth Gibson wrote: >So we are the requirements/problem statement subgroup of the NFS >extension for parallel storage effort. > >Our job is to create the paper trail justification for adding something >to NFS and provide a conceptual framework by which to identify possible >solutions. > >In the beginning this document is used to justify in the IETF process >that there are problems that people take seriously that cannot be >handled well in the scope of NFS today and that should be. > >I asked around for examples to help us construct this document and I >was pointed at the problem statement used to start the RDMA over IP >effort (attached below). I was told that this was a particularly well >done problem statement, and that we should not necessarily work this >hard before giving the IETF something to look at. > >ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-ietf-rddp- >problem-statement-02.txt > >RDDP Abstract: This draft addresses an IP-based solution to the problem >of high system costs due to network I/O copying in end-hosts at high >speeds. The problem is due to the high cost of memory bandwidth, and >it can be substantially improved using "copy avoidance." The high >overhead has limited the use of TCP/IP in interconnection networks >especially where high bandwidth, low latency and/or low overhead of >end-system data movement are required by the hosted application. > >So I suppose we could start with > >pNFS Abstract: This draft addresses an NFS-based solution to the >problem of high system costs due to store-and-forward copying of >storage data from storage devices through a file server mount point to >high-speed end-hosts that also have connectivity to source storage >devices. The problem is due to the high cost of funneling large >storage bandwidths through NFS on single IP addresses, and it can be >substantially improved using "out-of-band access." The high cost of >high-bandwidth NFS servers has limited the use of NFS in data centers >especially where high storage bandwidths are required and numerous >storage serving devices are already networked together. > >A pNFS table of contents might be: > >1. Introduction >2. The high cost of high bandwidth storage through NFS >2.1 Out-of-band access decreases bandwidth requirements in central file >servers >3. Application level routing of storage data packets is the root cause >of the problem >4. Storage bandwidth bottlenecks are problematic for many key file >system applications >5. Out-of-band access techniques >5.1 A conceptual framework: pNFS delegated maps for distributing files >over SBC, OSD and NFS storage subsystems >6. Security considerations >7. Acknowledgements >8. Informative references > >Please have a look at the RDDP problem statement draft and comment on >my simplistic strategy of monkey-see-monkey-do :-) > >garth > > > >Begin forwarded message: > >>From: Garth Gibson >>Date: Wed Dec 10, 2003 9:34:58 PM Canada/Eastern >>To: Andy Adamson , David Black >>, Don Cameron , Jim >>Carlson , Peter Corbett , Craig >>Everhart , Steve Fridella >>, Garth Gibson , >>Gary Grider , Benny Halevy , >>Jon Haswell , Dean Hildebrand >>, Peter Honeyman , >>Xiaoye Jiang , Mike Kazar , >>Tyce McLarty , John Muth , >>Dave Noveck , Brian Pawlowski >>, David Pease , >>Julian Satran , Spencer Shepler >>, Brent Welch >>Subject: NFS Extensions for Parallel Storage, subgroup membership >> >>Folks, >> >>Thanks for a great workshop last Thursday! >> >>Materials presented that day are online: >>http://www.citi.umich.edu/NEPS/agenda.html >> >>Below are the workshop followup subgroup memberships as they are now. >>I think I heard Peter say that he would construct auto-managed email >>lists, which from the additions I've received this week, I have >>already decided would be great. Please Peter! Names like neps-all, >>neps-reqs, neps-ops, neps-sbc, neps-osd, neps-nfs would be great. >> >>Our goals, to reprise, are to sketch a set of requirements for NFS >>Extensions for Parallel Storage, or pNFS extensions, sketch a set of >>NFS operation extensions (possibly including alternatives), sketch a >>set of metadata definitions (possibly including alternatives) for >>out-of-band data access over fixed block (SBC) SCSI protocols, object >>(OSD) SCSI protocols and file (NFS) ONCRPC protocols. >> >>We want to do this quickly, over the next few months, and to take it >>into the IETF NFS process as a set of suggestions and strawman >>protocols. The current plan is that at that point those of us that >>follow through with this will to it in the IETF NFS working group. In >>order to convince the IETF and the NFS working group that we have >>important, useful and viable ideas, we are taking a little time to >>pull together starting material. >> >>The timelines discussed at the end of the workshop "heir of the dog" >>session were: >>- get workshop notes put together and out in December (Peter and Garth) >>- 0th draft of a requirements/problem statement internet draft by mid >>January >>- IETF submission of an internet draft by first week of Feb, so it can >>be part of the March IETF meeting and used as evidence for inclusion >>of extensions for parallel storage into the NFS working group charter >>- one or more documents (not necessarily fully agreeing) from each >>subgroup into the IETF NFS email discussion for early to mid March >>- a face-to-face followup workshop, open to the IETF NFS group at the >>FAST 2004 conference, in San Francisco Mar 31 - Apr 2, at which all >>further plans are proposed, argued and ratified (e.g. shall we be >>absorbed into the IETF NFS group) >> >>To help move this along, we have asked one person in each subgroup to >>push, prod and pull ideas and words out of us. Please help these >>sacrificial volunteers with by contributing text, criticizing >>constructively with alternative text, and finding the time to read >>materials. >> >>These are volunteers in an unofficial process. We have no rules to be >>applied by arbitration, no membership to take votes from. If this >>consensus process, or these people, are not working out, then I >>suggest grass roots alternatives be suggested and explored as a group. >> Lets not get bogged down in process this early :-) >> >>But there are always going to be logistical and procedural issues that >>we need to deal with as a group. The suggestion at the workshop was >>that these multi-subgroup issues be taken into the requirements group. >> For example, I suggest that "scope" issues -- what we include and >>what we exclude from our agenda -- be dealt with in the requirements >>group, where we would need to add/delete requirements for each >>distinct aspect of our scope. >> >>I'm sure I'm way over the line giving this much direction :-) so I'll >>leave it to the subgroups to decide mechanisms for progress. For >>example, weekly conference calls, document exchange formats, >>editorship delegation and/or rotation, agreement achieving processes, >>.... >> >>And with that I'll go off and get to work on suggesting what our >>problem statement needs to say. >> >>garth >>412-805-9878 (cell) >> >>------------------------------------------------------- >> >>pNFS requirements: Garth Gibson >>----------------- >>Andy Adamson >>David Black >>Jim Carlson >>Peter Corbett >>Craig Everhart >>Garth Gibson >>Gary Grider >>Benny Halevy >>Jon Haswell >>Dean Hildebrand >>Peter Honeyman >>Tyce McLarty >>John Muth >>Brian Pawlowski >>David Pease >>Julian Satran >>Spencer Shepler >>Brent Welch > > Attachment (not stored) draft-ietf-pNFS-problem-statement.pdf Type: application/pdf Attachment (not stored) draft-ietf-pNFS-problem-statement.doc Type: application/msword From bhalevy@panasas.com Mon Dec 15 23:06:28 2003 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 27808 invoked from network); 16 Dec 2003 07:06:28 -0000 Received: from unknown (66.218.66.216) by m10.grp.scd.yahoo.com with QMQP; 16 Dec 2003 07:06:28 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta1.grp.scd.yahoo.com with SMTP; 16 Dec 2003 07:06:28 -0000 Received: from yang ([172.17.19.46]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY1CW2; Tue, 16 Dec 2003 02:05:59 -0500 To: Date: Tue, 16 Dec 2003 02:06:27 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-eGroups-Remote-IP: 65.194.124.178 From: "Benny Halevy" Subject: FW: Re: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy -----Original Message----- From: Tyce McLarty [mailto:mclarty3@llnl.gov] Sent: Monday, December 15, 2003 13:49 To: Gary Grider; Garth Gibson; Craig Everhart; John Muth; Brian Pawlowski; David Pease; Julian Satran; Spencer Shepler; Brent Welch; Benny Halevy; Jon Haswell; Dean Hildebrand; Peter Honeyman; Jim Carlson; Garth Gibson; Andy Adamson; Peter Corbett; David Black Cc: Garth Gibson Subject: Re: NEPS-REQS: getting started I've been wondering how important it is too cast the "problem" as one of cost, rather than as the ability to do things that cannot be done today with added benefits in cost reduction. I liked the list that Garth put up at the workshop: Scalable bandwidth Scalable capacity Load balancing capacity balancing plus the big winner - a standardized client. So the Introduction would be basically two paragraphs with (in either order): 1. proposal to extend NFSv4 to allow parallel out-of-band client access to data separate from metadata operations. 2. why it's important to do using the reasons outlined above. My question is - How close do we need to model the RDMA problem statement? Is cost the best/only justification or can we use new & needed capability plus value added? I think Gary has slanted his additions this direction, but seems like we should all agree on some basic principles before we get too deep in word-smithing. Thanks, Tyce At 10:02 PM 12/12/2003 -0700, Gary Grider wrote: >I decided to toss out a very quick and dirty draft with a lot of parts >missing. >Nothing sacred, just thoughts as they occurred to me partially organized. > >I put it in Word so I could get formatting, TOC, etc. > >I am attaching a Word and PDF. > >I would be happy to put this on a web site for us if you want. I also >would be happy to >centralize the edits and re-post it on the web etc. > >Thanks >Gary From bhalevy@panasas.com Mon Dec 15 23:11:59 2003 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 45118 invoked from network); 16 Dec 2003 07:11:56 -0000 Received: from unknown (66.218.66.167) by m9.grp.scd.yahoo.com with QMQP; 16 Dec 2003 07:11:56 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 16 Dec 2003 07:11:58 -0000 Received: from yang ([172.17.19.46]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY1CXS; Tue, 16 Dec 2003 02:11:56 -0500 To: Date: Tue, 16 Dec 2003 02:12:23 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-eGroups-Remote-IP: 65.194.124.178 From: "Benny Halevy" Subject: FW: (Garth Gibson) NFS Extensions for Parallel Storage, subgroup membership X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy ADVERTISEMENT -----Original Message----- From: Garth Gibson [mailto:garth@panasas.com] Sent: Thursday, December 11, 2003 01:54 To: Andy Adamson; David Black; Don Cameron; Jim Carlson; Peter Corbett; Craig Everhart; Steve Fridella; Garth Gibson; Gary Grider; Benny Halevy; Jon Haswell; Dean Hildebranz; Peter Honeyman; Xiaoye Jiang; Mike Kazar; Tyce McLarty; John Muth; Dave Noveck; Brian Pawlowski; David Pease; Julian Satran; Spencer Shepler; Brent Welch Cc: Garth Gibson Subject: NFS Extensions for Parallel Storage, subgroup membership Folks, Thanks for a great workshop last Thursday! Materials presented that day are online: http://www.citi.umich.edu/NEPS/agenda.html Below are the workshop followup subgroup memberships as they are now. I think I heard Peter say that he would construct auto-managed email lists, which from the additions I've received this week, I have already decided would be great. Please Peter! Names like neps-all, neps-reqs, neps-ops, neps-sbc, neps-osd, neps-nfs would be great. Our goals, to reprise, are to sketch a set of requirements for NFS Extensions for Parallel Storage, or pNFS extensions, sketch a set of NFS operation extensions (possibly including alternatives), sketch a set of metadata definitions (possibly including alternatives) for out-of-band data access over fixed block (SBC) SCSI protocols, object (OSD) SCSI protocols and file (NFS) ONCRPC protocols. We want to do this quickly, over the next few months, and to take it into the IETF NFS process as a set of suggestions and strawman protocols. The current plan is that at that point those of us that follow through with this will to it in the IETF NFS working group. In order to convince the IETF and the NFS working group that we have important, useful and viable ideas, we are taking a little time to pull together starting material. The timelines discussed at the end of the workshop "heir of the dog" session were: - get workshop notes put together and out in December (Peter and Garth) - 0th draft of a requirements/problem statement internet draft by mid January - IETF submission of an internet draft by first week of Feb, so it can be part of the March IETF meeting and used as evidence for inclusion of extensions for parallel storage into the NFS working group charter - one or more documents (not necessarily fully agreeing) from each subgroup into the IETF NFS email discussion for early to mid March - a face-to-face followup workshop, open to the IETF NFS group at the FAST 2004 conference, in San Francisco Mar 31 - Apr 2, at which all further plans are proposed, argued and ratified (e.g. shall we be absorbed into the IETF NFS group) To help move this along, we have asked one person in each subgroup to push, prod and pull ideas and words out of us. Please help these sacrificial volunteers with by contributing text, criticizing constructively with alternative text, and finding the time to read materials. These are volunteers in an unofficial process. We have no rules to be applied by arbitration, no membership to take votes from. If this consensus process, or these people, are not working out, then I suggest grass roots alternatives be suggested and explored as a group. Lets not get bogged down in process this early :-) But there are always going to be logistical and procedural issues that we need to deal with as a group. The suggestion at the workshop was that these multi-subgroup issues be taken into the requirements group. For example, I suggest that "scope" issues -- what we include and what we exclude from our agenda -- be dealt with in the requirements group, where we would need to add/delete requirements for each distinct aspect of our scope. I'm sure I'm way over the line giving this much direction :-) so I'll leave it to the subgroups to decide mechanisms for progress. For example, weekly conference calls, document exchange formats, editorship delegation and/or rotation, agreement achieving processes, .... And with that I'll go off and get to work on suggesting what our problem statement needs to say. garth 412-805-9878 (cell) ------------------------------------------------------- pNFS requirements: Garth Gibson ----------------- Andy Adamson David Black Jim Carlson Peter Corbett Craig Everhart Garth Gibson Gary Grider Benny Halevy Jon Haswell Dean Hildebranz Peter Honeyman Tyce McLarty John Muth Brian Pawlowski David Pease Julian Satran Spencer Shepler Brent Welch NFSv4 ops for pNFS: Peter Honeyman ------------------ Andy Adamson David Black Peter Corbett Craig Everhart Garth Gibson Benny Halevy Jon Haswell Dean Hildebranz Peter Honeyman Xiaoye Jiang John Muth Dave Noveck Brian Pawlowski Julian Satran Spencer Shepler Brent Welch SBC metadata for pNFS: David Black --------------------- Andy Adamson David Black Jim Carlson Craig Everhart Steve Fridella Garth Gibson Xiaoye Jiang Mike Kazar John Muth David Pease Julian Satran Spencer Shepler OSD metadata for pNFS: Brent Welch --------------------- Andy Adamson Don Cameron Peter Corbett Garth Gibson Benny Halevy John Muth Julian Satran Spencer Shepler Brent Welch NFS metadata for pNFS: Peter Corbett --------------------- Andy Adamson Peter Corbett Craig Everhart Garth Gibson Jon Haswell Dean Hildebranz Peter Honeyman Xiaoye Jiang John Muth Julian Satran Spencer Shepler From pnfs-reqs@yahoogroups.com Mon Dec 15 23:51:51 2003 Return-Path: Received: (qmail 39098 invoked from network); 16 Dec 2003 07:51:50 -0000 Received: from unknown (66.218.66.216) by m12.grp.scd.yahoo.com with QMQP; 16 Dec 2003 07:51:50 -0000 Received: from unknown (HELO n6.grp.scd.yahoo.com) (66.218.66.90) by mta1.grp.scd.yahoo.com with SMTP; 16 Dec 2003 07:51:50 -0000 X-eGroups-Return: notify@yahoogroups.com Received: from [66.218.67.252] by n6.grp.scd.yahoo.com with NNFMP; 16 Dec 2003 07:51:44 -0000 Date: 16 Dec 2003 07:51:43 -0000 Message-ID: <1071561103.2719.47454.w73@yahoogroups.com> X-eGroups-Application: files X-Yahoo-Group-Post: system From: pnfs-reqs@yahoogroups.com To: pnfs-reqs@yahoogroups.com Subject: New file uploaded to pnfs-reqs MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-eGroups-Remote-IP: 66.218.66.90 Hello, This email message is a notification to let you know that a file has been uploaded to the Files area of the pnfs-reqs group. File : /draft-ietf-pNFS-problem-statement.doc Uploaded by : benny_halevy Description : Gary Grider's draft 2003-12-13 You can access this file at the URL http://groups.yahoo.com/group/pnfs-reqs/files/draft-ietf-pNFS-problem-statement.doc To learn more about file sharing for your group, please visit http://help.yahoo.com/help/us/groups/files Regards, benny_halevy From garth@panasas.com Wed Dec 17 21:34:01 2003 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 58554 invoked from network); 18 Dec 2003 05:34:01 -0000 Received: from unknown (66.218.66.218) by m3.grp.scd.yahoo.com with QMQP; 18 Dec 2003 05:34:01 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 18 Dec 2003 05:34:00 -0000 Received: from panasas.com ([172.17.133.207]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY1NBZ; Thu, 18 Dec 2003 00:33:58 -0500 Date: Thu, 18 Dec 2003 00:34:04 -0500 Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v553) Cc: Garth Gibson To: pnfs-reqs@yahoogroups.com Content-Transfer-Encoding: 7bit Message-Id: X-Mailer: Apple Mail (2.553) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Tyce, [I've emailed this through the Yahoo group Benny set up, http://groups.yahoo.com/group/pnfs-reqs. I will forward it to the folks that have not yet joined this Yahoo group after I get it sent back to me :-)] The RDDP problem statement is similar and dissimilar to what we are doing. It is similar in that it is about higher performance, which always turns out to be cost-performance. It is dissimilar in that it was fighting an uphill battle to get RDMA into the IETF, while we are looking at no preconceived support or opposition in the IETF (that I am aware of). And it is dissimilar in that what we are proposing helps in the manageability of federated systems, which is not really a performance issue. I followed the RDDP example closely because it was easy -- our arguments on strictly bandwidth are at least as strong, in my opinion. And because I am not certain how to predict the IETF management's reaction to a manageability argument. And the standardized client code argument, although very import to some of us, seemed outside my notion of the IETF scope. Perhaps those with more experience selling ideas to the IETF could educate us? Should we focus on a small number of the most easily demonstrated problems or fill the problem statement out with all the problems we can contribute to solving? garth On Monday, December 15, 2003, at 01:49 PM, Tyce McLarty wrote: > I've been wondering how important it is too cast the "problem" as one > of cost, rather than as the ability to do things that cannot be done > today with added benefits in cost reduction. > > I liked the list that Garth put up at the workshop: > > Scalable bandwidth > Scalable capacity > Load balancing > capacity balancing > > plus the big winner - a standardized client. > > So the Introduction would be basically two paragraphs with (in either > order): > 1. proposal to extend NFSv4 to allow parallel out-of-band client > access to data separate from metadata operations. > 2. why it's important to do using the reasons outlined above. > > My question is - How close do we need to model the RDMA problem > statement? Is cost the best/only justification or can we use new & > needed capability plus value added? > > I think Gary has slanted his additions this direction, but seems like > we should all agree on some basic principles before we get too deep in > word-smithing. > > Thanks, > Tyce > > At 10:02 PM 12/12/2003 -0700, Gary Grider wrote: > >> I decided to toss out a very quick and dirty draft with a lot of >> parts missing. >> Nothing sacred, just thoughts as they occurred to me partially >> organized. >> >> I put it in Word so I could get formatting, TOC, etc. >> >> I am attaching a Word and PDF. >> >> I would be happy to put this on a web site for us if you want. I >> also would be happy to >> centralize the edits and re-post it on the web etc. >> >> Thanks >> Gary From garth@panasas.com Wed Dec 17 21:42:23 2003 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 93406 invoked from network); 18 Dec 2003 05:42:23 -0000 Received: from unknown (66.218.66.167) by m11.grp.scd.yahoo.com with QMQP; 18 Dec 2003 05:42:23 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 18 Dec 2003 05:42:22 -0000 Received: from panasas.com ([172.17.133.207]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY1NCX; Thu, 18 Dec 2003 00:42:19 -0500 Date: Thu, 18 Dec 2003 00:42:22 -0500 Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v553) Cc: Garth Gibson To: pnfs-reqs@yahoogroups.com, pnfs-ops@yahoogroups.com Content-Transfer-Encoding: 7bit Message-Id: X-Mailer: Apple Mail (2.553) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: pNFS Discussion Summary 1: 12/18/03 X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Summary 1: 12/18/03 pNFS-ops and pNFS-reqs folks, Following on the conversation that has been going on in the pNFS-ops list since Brent put out his notes on the heir-of-the-dog meeting of Fri Dec 5, I have tried below to summarize what I see as broad issues. Your additions, corrections or directions are requested. One theme I see evolving quickly is the differing opinions of the driving requirements and how these drive differing opinions of implementation issues in the NFSv4 operations discussion. I have tried to identify which issues are more about requirements than about "how" to achieve a requirement in NFSv4. This is not intended to be a power play, by taking the topic out of the reach of anyone. It is more to clarify which topics we need to resolve by defining our scope and share with the folks that are only on the requirements email list. I imagine the resolution to these requirements-related issues will be more customer oriented and feature set driven. Topics: 0.0 Defining Requirements 1.0 Minimalism 1.1 Proxying 1.2 Cache consistency 1.3 Delegation promotion & reacquisition 1.4 Layout delegations 1.5 Concurrent write 1.6 Map revocation 1.7 Separability 1.8 NTFS application semantics ---------------------------------------- [0.0 Defining Requirements]: What is the scope of requirements subgroup doing and how is it related to the ops subgroup discussions? I am beginning to see a significant difference between a "problem statement" document and a "requirements" document. I believe that in a problem statement we can make a strong case for a set of properties and applications that are currently underserved in NFSv4, and a direction that could in one or more steps resolve some or all of the problem. Alternatively I am coming to see the detailed requirements as a compendium of the most contentious and impactful issues, how they were argued and what resolution was accepted. I can see the problem statement getting done before we have sorted out all the hard problems, or even run into all of them, so it is a good document for establishing our interests in the IETF. But I suspect that the requirements document stays open well into agreement on the specification issues. For comparison, the first NFSv4 document was called "Design Considerations" (rfc2624): This document is to cover the "limitations and deficiencies of NFS version 3". This document will also be used as a mechanism to focus discussion and avenues of investigation as the definition of NFS version 4 progresses. Therefore, the contents of this document cover the general functional/feature areas that are anticipated for NFS version 4. I propose that what we have started into in the requirements subgroup is the problem statement, and that we should be careful to not let it get bogged down in the longer term requirements resolutions. ---------------------------------------- [1.0 Minimalism]: How much additional functionality do we sacrifice to limit the changes we seek in NFSv4? On one hand, some have said that getting to one true file system, with the high performance and the manageability of federated systems that might come with out-of-band access, is worth not matching *every* feature of all existing out-of-band file systems with this first set of extensions to NFSv4. That we should bite off what we can do quickly, correctly, with a clear incremental value to NFSv4, and roadmap more aggressive changes that could bog us down, or introduce so much complexity that interoperability becomes elusive. And that we should be mindful of the reception we may get from the IETF NFS working group if we *appear* to use out-of-band as an excuse to ask for a brace of changes in other aspects of NFSv4. On the other hand, the other out-of-band file systems that are inspiring the evolution of NFSv4 have customers that may not accept any backward sets in an evolution to NFSv4. This could create the need to develop, carry and differentiate all the diverse one-off out-of-band files systems plus a new out-of-band NFSv4. Some think it makes more sense to go far enough with this first NFSv4 to simplify the marketplace by making it reasonable for various vendors to deprecate/end-of-life/begin to wean from their proprietary offering. While it is certainly conceivable that we could be designing a roadmap of solutions in detail from the start, communication among standards bodies is hard enough without the challenge of designing specs for both with and without a requirement. This is a central issue in defining the requirements for out-of-band NFSv4, or at least for defining the scope of the first set of extensions. ---------------------------------------- [1.1 Proxying]: Operations/work that can only be done out-of-band vs alternative access through the NFSv4 server for all operations/work On one hand, some suggest that a set of out-of-band clients should not have to also have a data path through the NFSv4 metadata server. One reason is that customers may not tolerate the large variability in performance between out-of-band (when the going is good) and in-band (when the server chooses not to grant or to take away a delegation) accesses. Another reason, and I paraphrase someone else here, is that it is possible to construct out-of-band metadata servers that do not have access to the data servers except through the clients -- I encourage the source of this scenario to replace my paraphrasing with a correct use case, because I find it odd to design for file servers that do not have access to the data servers. On the other hand, others have suggested that any access or work that a client can do out-of-band should be possible with one or more commands applied to the metadata server's data path. This has been proposed for coping with recalled delegations, including concurrent writing by multiple clients; retry after client access errors, provided adequate idempotency of out-of-band operations; and many alternative implementations of out-of-band clients, including legacy clients that use out-of-band never or rarely. I think this is a topic that should be argued one way or the other in the requirements document. Use cases and examples in other systems would be best. ---------------------------------------- [1.2 Cache consistency]: NFSv4 delegations are not about client cache consistency; does out-of-band access require stronger cache consistency than NFSv4 provides NFSv4 cache consistency is a client function, based on testing file attributes on open and close. While a client holds a delegation, its users can close and reopen a file without recourse to the server, so inside a delegation a client cache contents for that file must be valid and up to date. However, a client cannot mandate getting a delegation on open, it must immediately (approximately) give up a delegation if it is recalled and a client has no way to reacquire a delegation on an open file after that delegation has been recalled. So we must not confuse delegations with strong cache consistency. Many of the various proprietary out-of-band file systems have much stronger client cache consistency, involving more different types and interactions of cache callbacks. Some of these differences may have been motivated by desire for differentiation, some by apps underserved by NFS cache consistency semantics, and some by the long standing designer belief that stronger semantics are theoretically better. The question we must resolve, and argue in the requirements document, is whether out-of-band access only within the NFSv4 cache consistency and delegations is not sufficient, why and how much more must/should be added before such a product is valuable. I think that application use cases should be discussed. And I caution us that most of us are the converted, coming to NFSv4 from one of these proprietary file systems, so gaining agreement amongst ourselves easily is not a good predictor of the challenge of gaining the agreement of the NFS standards working group. ---------------------------------------- [1.3 Delegation promotion & reacquisition]: must/should NFSv4 offer mechanisms for clients to possess a delegations more than once per open Delegations in NFSv4 are new, and came with significant concern about lots of complexity for not much performance, as they may do as little as avoid the client waiting for one round trip to the server on open. So, as described above with respect to cache consistency, the limitations on delegations can mean great difficulties for clients having performance requirements calling for out-of-band access mostly, or exclusively. So we have begun to propose mechanisms for clients to be more aggressive about seeking, obtaining, reobtaining after a recall, and even waiting for a signal that a denied delegation is now available. This could lead to discussions of transitioning from a write delegation to a read delegation, rather than no delegation, when a second delegation is requested. We all know, or can imagine, plenty of mechanism for this type of logic -- after all, it is not far from what some systems do for cache consistency. But all of this comes with complexity, that threat to interoperability, and chips away at minimalism. I would suggest that capture use cases to drive requirements for controversial steps down this path. ---------------------------------------- [1.4 Layout delegations]: can/should layout metadata "ride" on NFSv4 delegations or are new "layout" delegations needed If the delegations currently provided by NFSv4 are insufficient, for reasons of cache consistency or the needed to be able to reacquire a delegation in order to ensure that performance degradations can be limited, then some are suggesting that rather than proposing to change the semantics of the current delegations, we add new delegations tailored to the purpose, so called layout delegations. This is consistent with the advice we heard Dec 4 that it is much easier, and more welcomed, to add new things to NFSv4 than to change what is already there. Assuming that in response to requirements arguments, we find the existing NFSv4 delegations insufficient, then I think this topic is an implementation issue for the NFSv4 operations subgroup. But I for one would like to err on the side of fewer NFSv4 changes and slightly weaker semantics, where possible. ---------------------------------------- [1.5 Concurrent write]: write delegations now are held by exactly one client, if any; should/must NFS support multiple clients holding concurrent layout delegations One specifically excluded use case for out-of-band access is concurrent write, actually concurrent read and write, or write and write, by different clients. This is normally associated with expensive client cache consistency algorithms, but for our purposes here, the issue is managing the ordering, grouping/atomicity, and failure recovery of changes on data servers, not updating/invalidating the contents of client caches. It is certainly feasible to address out-of-band concurrent writing to data servers without addressing client cache consistency, if we so choose. I believe three folks with experience with different existing file systems referred to databases as the use case for needing concurrent write. I believe out-of-band concurrent write is an important use case to call out carefully, because a ambitious implementation of it could lead to a lot of state-maintaining messaging. Some have said that, allowing multiple clients to hold the same lock is a current need in NFSv4, and that a solution to this can provide the infrastructure for concurrent delegation of layout maps for read and overwrite (when growing the size of the file is not needed). This seems like a good operations discussion topic. ---------------------------------------- [1.6 Map revocation]: can/must the NFS server be able to revoke a client's use of a map, and enforce no future use (fence off the map) NFSv4 delegations allow a broken or malicious client no additional power to damage the stored file system because state changes must go through the server. But a delegated layout map that is held and used by a broken or malicious client after the delegation has been recalled could damage the stored file system in a way that the server, by not being on the data path, has no obvious way to protect against. So there has been a call for the ability for the server to fence out a client or enforce the revocation of a client's access to a specific file or filesystem. At first glance all three data server technologies, blocks, objects and files have some solution (blocks: lun masking/acls or SAN zoning; objects: capability revocation, key replacement; files: component file acls, volatile file handles). The scope and cost of each of these mechanisms maybe dramatically different. Some would say that this is going to end up being a differentiating property of the choice of underlying data server. For example, many would say that in systems that allow out-of-band block access, the client machines must be trustworthy to respect the delegation recall message (and lease timeouts). Others would object to this weakening of the NFS server integrity. I also see this as a requirements argument. ---------------------------------------- [1.7 Separability]: Independence vs co-dependence of layout metadata access and NFSv4 On one hand, simple "an address per block/object/file" maps could be represented as an array of NFSv4 attributes, manipulated using existing NFSv4 attribute accessing commands, so to reduce the amount of change to NFSv4. On the other hand, particularly for block maps of large files composed of extents, simple array indexing may be cumbersome and much bulkier than necessary. And also on the other hand, some suggest that it is desirable for the metadata access protocol to be separate from NFSv4 attribute access, so that the same metadata access protocol might be reusable under other file services. I think this topic would benefit from proposed metadata formats, particularly the SBC (block) maps. ---------------------------------------- [1.8 NTFS application semantics]: applications coded to NTFS semantics are different from those coded to POSIX and UNIX semantics NFS originated as a exported file system, whose semantics were defined by the underlying local filesystem on the file server. But since that local filesystem has almost always been UNIX or UNIX like, customers have come to think of NFS semantics as a well defined thing, not far from UNIX semantics (but with a customary list of POSIX exceptions). The semantics NTFS presents to applications using its storage is different in significant ways. Some of us see an evolution to better support for clients trying to support NTFS well to be very desirable. Others see chasing this as more than the NFS group as a whole is likely to bite off. This, and any other issues about wire protocol support for important semantics needed by different application file system interfaces (middleware exploited API extensions in databases or parallel programming systems such as MPI-IO) are also requirements topics. End summary 1. From bhalevy@panasas.com Wed Dec 17 22:13:53 2003 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 96391 invoked from network); 18 Dec 2003 06:13:52 -0000 Received: from unknown (66.218.66.217) by m5.grp.scd.yahoo.com with QMQP; 18 Dec 2003 06:13:52 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta2.grp.scd.yahoo.com with SMTP; 18 Dec 2003 06:13:52 -0000 Received: from yang ([172.17.19.55]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY1NF3; Thu, 18 Dec 2003 01:13:50 -0500 To: Cc: Date: Thu, 18 Dec 2003 01:13:41 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal In-reply-to: X-eGroups-Remote-IP: 65.194.124.178 From: "Benny Halevy" Subject: RE: [pnfs-reqs] Re: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy Garth, In case you guys want to broaden the problem statement... There are a couple of arguments I believe may be appealing to the IETF: 1. Interoperability. Several of the existing non monolithic file systems mentioned use proprietary protocols carried over Internet protocols. Standardizing their access protocols within NFS will allow interoperability between heterogeneous client hosts and heterogeneous server systems. The standardized client argument may fall into the interoperability category from the IETF point of view. 2. Taking advantage of IP SANs With the introduction of iSCSI, block and object based storage systems become accessible over IP based networks. NEPS takes advantage of this paradigm be allowing clients direct (yet moderated and secure) access to networked storage and therefore it enhances the value proposition of IP SANs. Benny > -----Original Message----- > From: Garth Gibson [mailto:garth@Panasas.Com] > Sent: Thursday, December 18, 2003 00:34 > To: pnfs-reqs@yahoogroups.com > Cc: Garth Gibson > Subject: [pnfs-reqs] Re: NEPS-REQS: getting started > > > Tyce, > > [I've emailed this through the Yahoo group Benny set up, > http://groups.yahoo.com/group/pnfs-reqs. I will forward it to the > folks that have not yet joined this Yahoo group after I get it sent > back to me :-)] > > The RDDP problem statement is similar and dissimilar to what we are > doing. It is similar in that it is about higher performance, which > always turns out to be cost-performance. It is dissimilar in that it > was fighting an uphill battle to get RDMA into the IETF, while we are > looking at no preconceived support or opposition in the IETF (that I am > aware of). And it is dissimilar in that what we are proposing helps in > the manageability of federated systems, which is not really a > performance issue. > > I followed the RDDP example closely because it was easy -- our > arguments on strictly bandwidth are at least as strong, in my opinion. > And because I am not certain how to predict the IETF management's > reaction to a manageability argument. And the standardized client code > argument, although very import to some of us, seemed outside my notion > of the IETF scope. > > Perhaps those with more experience selling ideas to the IETF could > educate us? Should we focus on a small number of the most easily > demonstrated problems or fill the problem statement out with all the > problems we can contribute to solving? > > garth > > > On Monday, December 15, 2003, at 01:49 PM, Tyce McLarty wrote: > > I've been wondering how important it is too cast the "problem" as one > > of cost, rather than as the ability to do things that cannot be done > > today with added benefits in cost reduction. > > > > I liked the list that Garth put up at the workshop: > > > > Scalable bandwidth > > Scalable capacity > > Load balancing > > capacity balancing > > > > plus the big winner - a standardized client. > > > > So the Introduction would be basically two paragraphs with (in either > > order): > > 1. proposal to extend NFSv4 to allow parallel out-of-band client > > access to data separate from metadata operations. > > 2. why it's important to do using the reasons outlined above. > > > > My question is - How close do we need to model the RDMA problem > > statement? Is cost the best/only justification or can we use new & > > needed capability plus value added? > > > > I think Gary has slanted his additions this direction, but seems like > > we should all agree on some basic principles before we get too deep in > > word-smithing. > > > > Thanks, > > Tyce > > > > At 10:02 PM 12/12/2003 -0700, Gary Grider wrote: > > > >> I decided to toss out a very quick and dirty draft with a lot of > >> parts missing. > >> Nothing sacred, just thoughts as they occurred to me partially > >> organized. > >> > >> I put it in Word so I could get formatting, TOC, etc. > >> > >> I am attaching a Word and PDF. > >> > >> I would be happy to put this on a web site for us if you want. I > >> also would be happy to > >> centralize the edits and re-post it on the web etc. > >> > >> Thanks > >> Gary > > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > From garth@panasas.com Thu Dec 18 14:37:55 2003 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 89807 invoked from network); 18 Dec 2003 22:37:54 -0000 Received: from unknown (66.218.66.216) by m13.grp.scd.yahoo.com with QMQP; 18 Dec 2003 22:37:54 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta1.grp.scd.yahoo.com with SMTP; 18 Dec 2003 22:37:54 -0000 Received: from panasas.com ([172.17.133.207]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY1RMV; Thu, 18 Dec 2003 17:37:52 -0500 Date: Thu, 18 Dec 2003 17:37:50 -0500 Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v553) To: pNFS Operations , pNFS Requirements Content-Transfer-Encoding: 7bit In-Reply-To: Message-Id: X-Mailer: Apple Mail (2.553) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT Thanks Dave. I agree. Lets refine the proxying issues: Legacy, strict, functional and recovery proxying. [1.1.0 Legacy proxying]: an NFS-v4.x server must be able to execute the full NFS-v4.0 or NFS-v4.1 protocol. I think Dave has given the case for this strongly. I do not see any case against this. ------------------------------------------- [1.1.1 Strict proxying]: does an NFS-v4.x server have to be able to execute exactly the wire packet that an NFS-v4.x client might have sent to a SBC/OSD/NFS data server? This captures the notion that a metadata server must also be a store-and-forward proxy for every data server it manages. It requires NFS-v4.x servers implement SCSI SBC over FC, if their data servers implement it; and the same for objects and files. This only makes sense to me for NFS data servers. And it is not what I intended in my prior summary, although it is a relevant question. I would say that pNFS requirements not require Strict Proxying. ------------------------------------------- [1.1.2 Functional proxying]: a file transformation achievable by an NFS-v4.x client using a set of data server operations must be a equivalently achievable using a (probably different) set of NFS-v4.x server operations This is the topic I intended to address in the last email. I believe Dave is arguing that even with metadata servers that do not have access to their data servers, the vendor of such a metadata server can construct a proprietary protocol for the metadata server to (strict) proxy data server accesses through clients that do have data server access. I am not comfortable making up a counter to this, so I exhort those that want a metadata server without data server access to speak up if they disagree. > On one hand, some suggest that a set of out-of-band clients should not > have to also have a data path through the NFSv4 metadata server. One > reason is that customers may not tolerate the large variability in > performance between out-of-band (when the going is good) and in-band > (when the server chooses not to grant or to take away a delegation) > accesses. Another reason, and I paraphrase someone else here, is that > it is possible to construct out-of-band metadata servers that do not > have access to the data servers except through the clients -- I > encourage the source of this scenario to replace my paraphrasing with > a correct use case, because I find it odd to design for file servers > that do not have access to the data servers. > > On the other hand, others have suggested that any access or work that > a client can do out-of-band should be possible with one or more > commands applied to the metadata server's data path. This has been > proposed for coping with recalled delegations, including concurrent > writing by multiple clients; retry after client access errors, > provided adequate idempotency of out-of-band operations; and many > alternative implementations of out-of-band clients, including legacy > clients that use out-of-band never or rarely. > > I think this is a topic that should be argued one way or the other in > the requirements document. Use cases and examples in other systems > would be best. ------------------------------------------- [1.1.3 Recovery proxying]: a file transformation begun by an NFS-v4.x client using a set of data server operations, but interrupted before completion, must be equivalently completable using a (probably different) set of NFS-v4.x server operations Some have suggested that having this property will greatly simplify the amount of spec that is devoted to out-of-band error recovery. Others have commented that a simple way to achieve this would be to require that all operations on data servers should be idempotent. ------------------------------------------- garth On Thursday, December 18, 2003, at 12:21 PM, Noveck, Dave wrote: > Good summary. > > I want to address the "proxying" issue. > >> [1.1 Proxying]: Operations/work that can only be done out-of-band vs >> alternative access through the NFSv4 server for all operations/work > > If you are talking about operations in the extension (let's call it > NFS-v4.x), that are not in the previous minor version (let's assume > that is nfs-v4.1), then you have a choice of whether these are > supported > for access through the server, or only for access by the client with > the > data server. Let's call this the issue of proxying in the strict > sense. > > There is another issue that people are calling "proxying" but is really > logically distinct. That is the issue of access by the previous minor > version, e.g. nfs-v4.0 or nfs-v4.1. Those versions have no concept of > separate data servers and they need to be able to work. End of story. > If you can't read files stored in nfs-v4.x with nfs-v4.0, you do not > have a minor version without proxying. You don't have a minor version > at all. I believe the working group is never going to accept that. > Even if I'm wrong and you can get the working group to accept that, > it is going to be very contentious and thus take up a lot of time. > Anybody, who really wants to go down this path should seriously > consider > the trade-off between supporting something they find objectionable and > getting a standard a lot later, if at all. > >> On one hand, some suggest that a set of out-of-band clients should not >> have to also have a data path through the NFSv4 metadata server. One >> reason is that customers may not tolerate the large variability in >> performance between out-of-band (when the going is good) and in-band >> (when the server chooses not to grant or to take away a delegation) >> accesses. > > Then such customers will use clients that access things out-of-band > whenever possible, and servers that never refuse to give out layout > delegations. You have a number of quality-of-implementations issues > for v4.x clients and servers. If a particular client only supports > access via v4.0, then performance will suck, and the working group > will understand that, but it won't accept not being able to use > v4.0 at all. The customer is going to be motivated to upgrade his > clients for those that need high-performance access, but he may be > OK with some clients using v4.0 for a long time, depending on the > particular performance those clients need. (And some will want v2/v3 > access but that is a matter that the working group has no say about). > >> Another reason, and I paraphrase someone else here, is that >> it is possible to construct out-of-band metadata servers that do not >> have access to the data servers except through the clients -- I >> encourage the source of this scenario to replace my paraphrasing with >> a >> correct use case, because I find it odd to design for file servers >> that >> do not have access to the data servers. > > So let's grant that it is possible (and we'll pass over the issue of > whether it is desirable, and in fact so desirable that one is willing > to > not get a standard and or get it much later). > > So we have a metadata server and it, for whatever reason, does not have > access to the data servers. However, by hypothesis, there are machines > (e.g. clients), that can communicate with both. So, if one has such an > architecture, then one can take such a machine, give it a > communication path > to the meta-data server and the data server and have the meta-data > server > transfer v4.0 READ requests to it, let it read the data from the data > server and send it back to the meta-data server who send it back to the > original requestor. Is that a very good solution? No. Is it likely > to be performant? No. Will it satisfy any particular customer? I > don't > know and that is the implementer's business decision. Will it satisfy > the hypothetical customer who doesn't care about v4.0 access? Clearly. > Will it satisfy the v4 working group? Yes, because they are not in the > business of telling you how performant v4.0 access has got to be. > >> On the other hand, others have suggested that any access or work that >> a >> client can do out-of-band should be possible with one or more commands >> applied to the metadata server's data path. This has been proposed >> for >> coping with recalled delegations, including concurrent writing by >> multiple clients; retry after client access errors, provided adequate >> idempotency of out-of-band operations; and many alternative >> implementations of out-of-band clients, including legacy clients that >> use out-of-band never or rarely. > > This effort is going to take a while, but if we manage it correctly, it > is not going to take so long that v3 clients are going to be rare > things, > and they have to be supported. But v3 clients are not an issue for the > working group. V4.0 clients are and they will be around and you will > have to support them, and I believe the working group is not going to > be disposed to cut you a lot of slack on this issue (and I don't see > why it should). > >> I think this is a topic that should be argued one way or the other in >> the requirements document. Use cases and examples in other systems >> would be best. > > I think the requirement should be that this work should be done as a > set of extensions to nfs-v4 delivered as a v4 minor version. If there > is some feature/requirement that conflicts with that model (and it is a > pretty flexible one), then you have to think long and hard before > deciding > that that requirement is more important than this basic deivery > vehicle, > because it seems to me that it is, in almost all respects, the ideal > way > to make this sort of technology available for widespread use. > > > > > > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > From julian_satran@il.ibm.com Mon Dec 22 02:26:02 2003 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 86299 invoked from network); 22 Dec 2003 10:26:01 -0000 Received: from unknown (66.218.66.218) by m11.grp.scd.yahoo.com with QMQP; 22 Dec 2003 10:26:01 -0000 Received: from unknown (HELO mtagate3.de.ibm.com) (195.212.29.152) by mta3.grp.scd.yahoo.com with SMTP; 22 Dec 2003 10:26:00 -0000 Received: from d12relay01.megacenter.de.ibm.com (d12relay01.megacenter.de.ibm.com [9.149.165.180] (may be forged)) by mtagate3.de.ibm.com (8.12.10/8.12.10) with ESMTP id hBMAPxn0031456; Mon, 22 Dec 2003 10:25:59 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay01.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id hBMAPwG4256428; Mon, 22 Dec 2003 11:25:58 +0100 In-Reply-To: To: pnfs-ops@yahoogroups.com Cc: pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Mon, 22 Dec 2003 12:25:57 +0200 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 22/12/2003 12:25:58, Serialize complete at 22/12/2003 12:25:58 Content-Type: multipart/alternative; boundary="=_alternative 00394B9DC2256E04_=" X-eGroups-Remote-IP: 195.212.29.152 From: Julian Satran Subject: RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 X-Yahoo-Group-Post: member; u=64714603 Since I raised the issue of the metadata server not having access to all it's data servers (or at least not with adequate bandwidth) I feel compelled to say that Dave's arguments about supporting 4.0 are compelling enough to make it mandatory. The open issue is if it is legal for a "compliant server" to have serving data disabled by a local administrative function (the old "must implement but may use"). Otherwise an organization that wants to discourage use of data serving through the metadata server has very little it can do to enforce policy in a way that will not affect other clients (it may do serve poorly but this still affects other clients). Julo "Noveck, Dave" 18/12/2003 19:21 Please respond to pnfs-ops@yahoogroups.com To , cc Subject RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 Good summary. I want to address the "proxying" issue. > [1.1 Proxying]: Operations/work that can only be done out-of-band vs > alternative access through the NFSv4 server for all operations/work If you are talking about operations in the extension (let's call it NFS-v4.x), that are not in the previous minor version (let's assume that is nfs-v4.1), then you have a choice of whether these are supported for access through the server, or only for access by the client with the data server. Let's call this the issue of proxying in the strict sense. There is another issue that people are calling "proxying" but is really logically distinct. That is the issue of access by the previous minor version, e.g. nfs-v4.0 or nfs-v4.1. Those versions have no concept of separate data servers and they need to be able to work. End of story. If you can't read files stored in nfs-v4.x with nfs-v4.0, you do not have a minor version without proxying. You don't have a minor version at all. I believe the working group is never going to accept that. Even if I'm wrong and you can get the working group to accept that, it is going to be very contentious and thus take up a lot of time. Anybody, who really wants to go down this path should seriously consider the trade-off between supporting something they find objectionable and getting a standard a lot later, if at all. > On one hand, some suggest that a set of out-of-band clients should not > have to also have a data path through the NFSv4 metadata server. One > reason is that customers may not tolerate the large variability in > performance between out-of-band (when the going is good) and in-band > (when the server chooses not to grant or to take away a delegation) > accesses. Then such customers will use clients that access things out-of-band whenever possible, and servers that never refuse to give out layout delegations. You have a number of quality-of-implementations issues for v4.x clients and servers. If a particular client only supports access via v4.0, then performance will suck, and the working group will understand that, but it won't accept not being able to use v4.0 at all. The customer is going to be motivated to upgrade his clients for those that need high-performance access, but he may be OK with some clients using v4.0 for a long time, depending on the particular performance those clients need. (And some will want v2/v3 access but that is a matter that the working group has no say about). > Another reason, and I paraphrase someone else here, is that > it is possible to construct out-of-band metadata servers that do not > have access to the data servers except through the clients -- I > encourage the source of this scenario to replace my paraphrasing with a > correct use case, because I find it odd to design for file servers that > do not have access to the data servers. So let's grant that it is possible (and we'll pass over the issue of whether it is desirable, and in fact so desirable that one is willing to not get a standard and or get it much later). So we have a metadata server and it, for whatever reason, does not have access to the data servers. However, by hypothesis, there are machines (e.g. clients), that can communicate with both. So, if one has such an architecture, then one can take such a machine, give it a communication path to the meta-data server and the data server and have the meta-data server transfer v4.0 READ requests to it, let it read the data from the data server and send it back to the meta-data server who send it back to the original requestor. Is that a very good solution? No. Is it likely to be performant? No. Will it satisfy any particular customer? I don't know and that is the implementer's business decision. Will it satisfy the hypothetical customer who doesn't care about v4.0 access? Clearly. Will it satisfy the v4 working group? Yes, because they are not in the business of telling you how performant v4.0 access has got to be. > On the other hand, others have suggested that any access or work that a > client can do out-of-band should be possible with one or more commands > applied to the metadata server's data path. This has been proposed for > coping with recalled delegations, including concurrent writing by > multiple clients; retry after client access errors, provided adequate > idempotency of out-of-band operations; and many alternative > implementations of out-of-band clients, including legacy clients that > use out-of-band never or rarely. This effort is going to take a while, but if we manage it correctly, it is not going to take so long that v3 clients are going to be rare things, and they have to be supported. But v3 clients are not an issue for the working group. V4.0 clients are and they will be around and you will have to support them, and I believe the working group is not going to be disposed to cut you a lot of slack on this issue (and I don't see why it should). > I think this is a topic that should be argued one way or the other in > the requirements document. Use cases and examples in other systems > would be best. I think the requirement should be that this work should be done as a set of extensions to nfs-v4 delivered as a v4 minor version. If there is some feature/requirement that conflicts with that model (and it is a pretty flexible one), then you have to think long and hard before deciding that that requirement is more important than this basic deivery vehicle, because it seems to me that it is, in almost all respects, the ideal way to make this sort of technology available for widespread use. To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From bhalevy@panasas.com Mon Dec 22 11:42:01 2003 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 94145 invoked from network); 22 Dec 2003 19:41:59 -0000 Received: from unknown (66.218.66.166) by m6.grp.scd.yahoo.com with QMQP; 22 Dec 2003 19:41:59 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta5.grp.scd.yahoo.com with SMTP; 22 Dec 2003 19:41:59 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Mon, 22 Dec 2003 14:41:57 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D38733@PIKES.panasas.com> To: "'julian_satran@il.ibm.com'" , "'pnfs-ops@yahoogroups.com'" Cc: "'pnfs-reqs@yahoogroups.com'" Date: Mon, 22 Dec 2003 14:41:53 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-ops] delegation arguments summary X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy > > * layout delegation revocation (and enforcement of) > > This issue is orthogonal. We dicussed volatile file handles, OSD > > capabilities, and SAN LUN mapping techniques. > > > > Almost orthogonal. There is a subtle problem of sharing layout delegations if one of clientts is doing writes or appends. This falls under CW (concurrent write) sharing since there is one or more writers. By saying "this issue is orthogonal" I meant that the mechanism for revoking the layout delegation is orthogonal to whether we need a complete new set of delegations or extend the current model. I agree that when the layout changes due to writes, appends, or for any other reason the server has to recall layout delegations, at least from those clients that requested layout for region that's about to be the changed. Hopefully, all clients behave nicely and their delegations do not have to be revoked. You want to revoke the layout delegation from unresponsive clients since allowing them to use the stale layout may end up with data corruption. Speaking of append, I always thought it'd be really nice to have an NFS APPEND operation... This seems like something we can propose right away on nfsv4@ietf.org How does people on this list feel about that? A use case I encountered is a customer that use a shared file as a log and have multiple nodes in the cluster appending to that file with some coordination (right now, NFSv3 + NLM). They don't care about ordering of the appended records and they even accept records written more than once to the file, but they do care about the consistency of each record so writers can't just silently overwrite each other. > The issue is furthermore complicated by the "sparse" layout that we all want to support (do we?) Can you please turn the details knob on "sparse" layout and maybe give a concrete example where this layout make the proposed model fall short? > > layout delegation: > > - returned on READ_IND, WRITE_IND, LAYOUT_DELEG_ASK > > > > Covers only layout (aggregation header, map, handles/caps). > > Optional, recallable, revocable. > > Assures the client that the layout information it has will not change. > > But the layout information may change even in the most trivial single writer case and definitely in RW cases. Correct, when the layout is about to be changed (a writer calls COMMIT_IND) or when there is a write-write conflict (two clients call WRITE_IND for overlapping regions) some or all layout delegations must be recalled. > > WRITE yes client can safely cache read and write data, > > serve opens, and locks locally and can perform > > out-of-band or server reads and writes. > At least this requires mapping updates for block storage. > For those souls that want strict local-FS semantics (UNIX) cache and map invalidations can be a side-effect of the byte-range locking mechanism. This sounds like something that falls into the distributed cache coherency realm - meaning multiple clients have a CW data delegation and a layout delegation. My assumption was that in this case the logical block map changes rarely when the clients are writing in place, otherwise they should fall back to writing through the server. Having an efficient distributed cache coherency mechanism in NFS seems to me like a stretch but it's worth a discussion to see if block based SAN filesystems can or can't live without it. Benny From ggrider@lanl.gov Mon Dec 22 11:53:58 2003 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 3731 invoked from network); 22 Dec 2003 19:53:57 -0000 Received: from unknown (66.218.66.166) by m11.grp.scd.yahoo.com with QMQP; 22 Dec 2003 19:53:57 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta5.grp.scd.yahoo.com with SMTP; 22 Dec 2003 19:53:57 -0000 Received: from mailrelay3.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id hBMJrufK001673; Mon, 22 Dec 2003 12:53:56 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay3.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id hBMJrtIt031106; Mon, 22 Dec 2003 12:53:55 -0700 Received: from cthulu.lanl.gov (vpn-client-189.lanl.gov [128.165.253.189]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id hBMJrqFR016230; Mon, 22 Dec 2003 12:53:53 -0700 Message-Id: <5.2.0.9.2.20031222125146.018b3cc0@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Mon, 22 Dec 2003 12:53:51 -0700 To: pnfs-reqs@yahoogroups.com, "'julian_satran@il.ibm.com'" , "'pnfs-ops@yahoogroups.com'" Cc: "'pnfs-reqs@yahoogroups.com'" In-Reply-To: <30489F1321F5C343ACF6872B2CF7942A05D38733@PIKES.panasas.com > Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=====================_15088946==.ALT" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: Re: [pnfs-reqs] RE: [pnfs-ops] delegation arguments summary X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs At 02:41 PM 12/22/2003 -0500, Halevy, Benny wrote: > > > * layout delegation revocation (and enforcement of) > > > This issue is orthogonal. We dicussed volatile file handles, OSD > > > capabilities, and SAN LUN mapping techniques. > > > > > > > Almost orthogonal. There is a subtle problem of sharing layout delegations if one of clientts is doing writes or appends. > > This falls under CW (concurrent write) sharing since there is one or more writers. > By saying "this issue is orthogonal" I meant that the mechanism for revoking the > layout delegation is orthogonal to whether we need a complete new set of > delegations or extend the current model. > > I agree that when the layout changes due to writes, appends, or for any other > reason the server has to recall layout delegations, at least from those clients > that requested layout for region that's about to be the changed. Hopefully, > all clients behave nicely and their delegations do not have to be revoked. > You want to revoke the layout delegation from unresponsive clients since allowing > them to use the stale layout may end up with data corruption. > > Speaking of append, I always thought it'd be really nice to have an NFS APPEND > operation... This seems like something we can propose right away on nfsv4@ietf.org > How does people on this list feel about that? > > A use case I encountered is a customer that use a shared file as a log and have > multiple nodes in the cluster appending to that file with some coordination > (right now, NFSv3 + NLM). They don't care about ordering of the appended records > and they even accept records written more than once to the file, but they do care > about the consistency of each record so writers can't just silently overwrite > each other. > > > The issue is furthermore complicated by the "sparse" layout that we all want to support (do we?) > > Can you please turn the details knob on "sparse" layout and maybe give a > concrete example where this layout make the proposed model fall short? > > > > layout delegation: > > > - returned on READ_IND, WRITE_IND, LAYOUT_DELEG_ASK > > > > > > Covers only layout (aggregation header, map, handles/caps). > > > Optional, recallable, revocable. > > > Assures the client that the layout information it has will not change. > > > > But the layout information may change even in the most trivial single writer case and definitely in RW cases. > > Correct, when the layout is about to be changed (a writer calls COMMIT_IND) > or when there is a write-write conflict (two clients call WRITE_IND for > overlapping regions) some or all layout delegations must be recalled. > > > > WRITE yes client can safely cache read and write data, > > > serve opens, and locks locally and can perform > > > out-of-band or server reads and writes. > > At least this requires mapping updates for block storage. > > For those souls that want strict local-FS semantics (UNIX) cache and map invalidations can be a side-effect of the byte-range locking mechanism. > > This sounds like something that falls into the distributed cache coherency > realm - meaning multiple clients have a CW data delegation and a layout delegation. > My assumption was that in this case the logical block map changes > rarely when the clients are writing in place, otherwise they should fall back to > writing through the server. As long as there is a way to get concurrent write to scale with reasonable behavior, like non overlapped regions and any other reasonable promises. I suppose we need to pin down what those reasonable promises are. Gary > Having an efficient distributed cache coherency > mechanism in NFS seems to me like a stretch but it's worth a discussion to see > if block based SAN filesystems can or can't live without it. > > Benny > > > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > > > > Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From black_david@emc.com Tue Dec 23 12:33:03 2003 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 82796 invoked from network); 23 Dec 2003 20:33:03 -0000 Received: from unknown (66.218.66.216) by m20.grp.scd.yahoo.com with QMQP; 23 Dec 2003 20:33:03 -0000 Received: from unknown (HELO mxic2.corp.emc.com) (128.221.12.9) by mta1.grp.scd.yahoo.com with SMTP; 23 Dec 2003 20:33:02 -0000 Received: by mxic2.corp.emc.com with Internet Mail Service (5.5.2653.19) id ; Tue, 23 Dec 2003 15:33:02 -0500 Message-ID: To: pnfs-reqs@yahoogroups.com Date: Tue, 23 Dec 2003 15:32:59 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: multipart/mixed; boundary="----_=_NextPart_000_01C3C993.F4B068F2" X-eGroups-Remote-IP: 128.221.12.9 From: black_david@emc.com Subject: RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 ADVERTISEMENT Garth Gibson wrote: > > The RDDP problem statement is similar and dissimilar to what we are > > doing. It is similar in that it is about higher performance, which > > always turns out to be cost-performance. It is dissimilar in that it > > was fighting an uphill battle to get RDMA into the IETF, while we are > > looking at no preconceived support or opposition in the IETF (that I > > am aware of). And it is dissimilar in that what we are proposing > > helps in the manageability of federated systems, which is not really a > > performance issue. > > > > I followed the RDDP example closely because it was easy -- our > > arguments on strictly bandwidth are at least as strong, in my opinion. > > And because I am not certain how to predict the IETF management's > > reaction to a manageability argument. And the standardized client > > code argument, although very import to some of us, seemed outside my > > notion of the IETF scope. > > > > Perhaps those with more experience selling ideas to the IETF could > > educate us? Should we focus on a small number of the most easily > > demonstrated problems or fill the problem statement out with all the > > problems we can contribute to solving? Having been heavily involved in getting both IPS and RDDP work underway in the IETF, I have a few observations: - A problem statement draft is a good thing to have, but the folks in charge of the IETF are looking for a concise summary of what the problem is, how to go about solving it, and **why** the IETF should solve it. The latter is of particular importance, as I'll explain shortly. - I've attached a slide deck that I used for RDDP at the Spring 2002 IETF BOF on this topic. This sort of "elevator pitch" style coverage of the topics is needed in addition to the more in-depth academic approach that is in the RDDP problem statement. - Goals and battles need to be chosen carefully. One of the things that delayed RDDP work is that the RDDP proponents were absolutely convinced that they needed to change TCP, and hence decided to go to battle with the IETF Transport community which was equally convinced that TCP should not be changed. In 20/20 hindsight, this was a mistake, as the IETF Transport community turned out to be correct that TCP does not require normative changes for RDDP. - Nonetheless, there is somewhat of an "uphill battle" to be engaged, as Beepy and/or Spencer described in Ann Arbor - the IETF has grown to a potentially unwieldy size, and as a consequence has developed a healthy institutional bias against new work. As a result, it is necessary to have good reasons not only for why work should be done, but also why it should be done in the IETF. The fact that we want to extend an existing IETF protocol (NFSv4) in a way that can take advantage of another (iSCSI) provides at least two reasons. Beyond this, there is value in drawing on the IETF's network expertise in areas such as security. - A draft WG statement/scope of work is very important at an early stage, including not only what we want to do, but what we do *not* want to do. I tend to view the latter as more important, as a shared view of what will not be worked on is a significant sign that a technical community has coalesced around a common effort and goals. For example, there are fairly strong statements about work that is out of scope in both the IPS and RDDP charters, and as a WG chair, I've found those statements useful from time to time ... I hope this helps, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- Attachment (not stored) ROI-Problem-Scenario-0302.ppt Type: application/vnd.ms-powerpoint From black_david@emc.com Tue Dec 23 13:34:43 2003 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 93561 invoked from network); 23 Dec 2003 21:34:41 -0000 Received: from unknown (66.218.66.218) by m2.grp.scd.yahoo.com with QMQP; 23 Dec 2003 21:34:41 -0000 Received: from unknown (HELO mxic2.corp.emc.com) (128.221.12.9) by mta3.grp.scd.yahoo.com with SMTP; 23 Dec 2003 21:34:41 -0000 Received: by mxic2.corp.emc.com with Internet Mail Service (5.5.2653.19) id ; Tue, 23 Dec 2003 16:34:40 -0500 Message-ID: To: pnfs-reqs@yahoogroups.com, pnfs-ops@yahoogroups.com Date: Tue, 23 Dec 2003 16:34:32 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain X-eGroups-Remote-IP: 128.221.12.9 From: black_david@emc.com Subject: Re: pNFS Discussion Summary 1: Caching and Delegations X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 ADVERTISEMENT I've split this commentary on Garth's issues into two categories. This is about caching, delegations, and layout info. > > [1.2 Cache consistency]: NFSv4 delegations are not about client cache > > consistency; does out-of-band access require stronger cache > > consistency than NFSv4 provides With a little care in defining the protocol extensions, this issue can be left to server implementers, unless one wants to take the (silly, IMHO) position that the protocol should be incapable of providing stronger cache consistency. HighRoad uses the same FMP protocol to provide both NFS-style close-to-open consistency for NFS clients and the stronger forms of consistency required by CIFS - as long as the server knows what clients have which access rights to what blocks, cache consistency strength comes down to server implementation decisions about what outstanding access rights conflict with a new request. We've actually built server prototypes that provide stronger consistency for NFS without change to either the FMP protocol or clients, but the shipped product only provides NFS-style consistency for NFS. > > [1.3 Delegation promotion & reacquisition]: must/should NFSv4 offer > > mechanisms for clients to possess a delegations more than once per open > > > > Delegations in NFSv4 are new, and came with significant concern about > > lots of complexity for not much performance, as they may do as little > > as avoid the client waiting for one round trip to the server on open. > > So, as described above with respect to cache consistency, the > > limitations on delegations can mean great difficulties for clients > > having performance requirements calling for out-of-band access mostly, > > or exclusively. Yes, and this is a strong reason for separating "layout" delegations from the existing "data" delegations, IMHO. Consider a web or video server that is caching file opens for performance reasons - if updating the content underneath the server makes it impossible to get the direct access ("layout") delegations back, the result is that one has to shut down and restart all the servers after the content update in order to restore performance. The sysadmin responsible for this annoying work will want to tar-and-feather the system designers who made it necessary (that would be us if we get this wrong ...). > > [1.4 Layout delegations]: can/should layout metadata "ride" on NFSv4 > > delegations or are new "layout" delegations needed New "layout" delegations are needed for clean separation of functionality, and so that "layout" delegations can be designed for direct access requirements. See [1.3] above. > > [1.5 Concurrent write]: write delegations now are held by exactly one > > client, if any; should/must NFS support multiple clients holding > > concurrent layout delegations. I understand the value of this to the self-coordinating HPC applications, but would like to see this functionality specified (assuming it is specified) as a cleanly separable option, as I think the desire to self-coordinate a shared write delegation will be limited to a small number of application spaces, like HPC. I also note Gary's comment that it's sufficient for parallel write to work in the non-overlapping case, which does not require any new concurrent write delegation as long as each client can hold an exclusive write delegation for its range. Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- From black_david@emc.com Tue Dec 23 13:35:57 2003 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 44163 invoked from network); 23 Dec 2003 21:35:56 -0000 Received: from unknown (66.218.66.216) by m17.grp.scd.yahoo.com with QMQP; 23 Dec 2003 21:35:56 -0000 Received: from unknown (HELO mxic2.corp.emc.com) (128.221.12.9) by mta1.grp.scd.yahoo.com with SMTP; 23 Dec 2003 21:35:55 -0000 Received: by mxic2.corp.emc.com with Internet Mail Service (5.5.2653.19) id ; Tue, 23 Dec 2003 16:35:54 -0500 Message-ID: To: pnfs-reqs@yahoogroups.com, pnfs-ops@yahoogroups.com Date: Tue, 23 Dec 2003 16:35:52 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain X-eGroups-Remote-IP: 128.221.12.9 From: black_david@emc.com Subject: Re: pNFS Discussion Summary 1: Functionality X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 ADVERTISEMENT I've split this commentary on Garth's issues into two categories. This is about a couple of topics that I would classify as desirable, but not mandatory functionality. > > [1.6 Map revocation]: can/must the NFS server be able to revoke a > > client's use of a map, and enforce no future use (fence off the map) [... snip ...] > > Some would say that this is going to end up being a differentiating > > property of the choice of underlying data server. For example, many > > would say that in systems that allow out-of-band block access, the > > client machines must be trustworthy to respect the delegation recall > > message (and lease timeouts). Others would object to this weakening > > of the NFS server integrity. I tend to take the former position, as if one cannot fence off client access, not allowing access to untrustworthy clients becomes a fallback. In the block world, while mechanisms exist to fence off access, standard means of invoking them are somewhat immature. > > [1.8 NTFS application semantics]: applications coded to NTFS semantics > > are different from those coded to POSIX and UNIX semantics IMHO, this is an orthogonal tarpit we should stay out of. I strongly believe that trying to extend NFSv4 so it can be just as good as CIFS for applications coded to Windows APIs should be someone else's problem. Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- From black_david@emc.com Tue Dec 23 14:00:01 2003 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 67601 invoked from network); 23 Dec 2003 21:59:57 -0000 Received: from unknown (66.218.66.172) by m9.grp.scd.yahoo.com with QMQP; 23 Dec 2003 21:59:57 -0000 Received: from unknown (HELO MAHO3MSX2.corp.emc.com) (128.221.11.32) by mta4.grp.scd.yahoo.com with SMTP; 23 Dec 2003 22:00:00 -0000 Received: by maho3msx2.corp.emc.com with Internet Mail Service (5.5.2653.19) id ; Tue, 23 Dec 2003 16:59:59 -0500 Message-ID: To: pnfs-ops@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com Date: Tue, 23 Dec 2003 16:59:55 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain X-eGroups-Remote-IP: 128.221.11.32 From: black_david@emc.com Subject: Avoiding Delegation Recall X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 Dave Noveck writes: > I've been wondering whether we could avoid the recall in many cases in which the > layout is changing. I know this sounds like I've lost my mind (so what else is > new?) but hear me out. > > The idea is the layout delegation gives you the ability to rely on mapped areas > but not holes or areas past the eof, and that correspondingly converting an area > from a hole to being mapped should not necessitate recall of the layout delegation. > This would force some complexity in the case in which were about to read from one > of the data servers and found something unmappeed but it would mean that layout > delegations would not need to be recalled in many common cases. That depends on the consistency model. For NFS-level consistency, I believe that returning zeroes for a hole that another client has filled is allowed by the consistency model, but this "negative caching" behavior is not exactly common. If one wants to be able to support stronger consistency, one must be able to recall the (non-)layout delegation after the hole fill in order to force the other client to see the newly written data. Writing data that moves EOF is similar, but there are some subtleties in that EOF changes are not identical to cache consistency. I think (and hope in the case of EOF) that all of this falls under my previous comment that with a little attention to detail in specification of the protocol, we can make cache consistency solely a server implementation decision (implementer picks model, protocol can support all the interesting ones). I strongly prefer that approach because I believe consistency model debates to be an attractive tarpit ... Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- From julian_satran@il.ibm.com Fri Dec 26 01:36:19 2003 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 67407 invoked from network); 26 Dec 2003 09:36:18 -0000 Received: from unknown (66.218.66.218) by m17.grp.scd.yahoo.com with QMQP; 26 Dec 2003 09:36:18 -0000 Received: from unknown (HELO mtagate2.de.ibm.com) (195.212.29.151) by mta3.grp.scd.yahoo.com with SMTP; 26 Dec 2003 09:36:17 -0000 Received: from d12relay01.megacenter.de.ibm.com (d12relay01.megacenter.de.ibm.com [9.149.165.180] (may be forged)) by mtagate2.de.ibm.com (8.12.10/8.12.10) with ESMTP id hBQ9aGHf096778; Fri, 26 Dec 2003 09:36:16 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay01.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id hBQ9aF7Y254838; Fri, 26 Dec 2003 10:36:15 +0100 In-Reply-To: <30489F1321F5C343ACF6872B2CF7942A05D38733@PIKES.panasas.com> To: pnfs-reqs@yahoogroups.com Cc: "'pnfs-ops@yahoogroups.com'" , "'pnfs-reqs@yahoogroups.com'" MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Fri, 26 Dec 2003 11:36:12 +0200 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 26/12/2003 11:36:15, Serialize complete at 26/12/2003 11:36:15 Content-Type: multipart/alternative; boundary="=_alternative 002D4385C2256E08_=" X-eGroups-Remote-IP: 195.212.29.151 From: Julian Satran Subject: Re: [pnfs-reqs] RE: [pnfs-ops] delegation arguments summary X-Yahoo-Group-Post: member; u=64714603 ADVERTISEMENT Benny and all, "Halevy, Benny" wrote on 22/12/2003 21:41:53: > > > > * layout delegation revocation (and enforcement of) > > > This issue is orthogonal. We dicussed volatile file handles, OSD > > > capabilities, and SAN LUN mapping techniques. > > > > > > > Almost orthogonal. There is a subtle problem of sharing layout > delegations if one of clientts is doing writes or appends. > > This falls under CW (concurrent write) sharing since there is one or > more writers. > By saying "this issue is orthogonal" I meant that the mechanism for revoking the > layout delegation is orthogonal to whether we need a complete new set of > delegations or extend the current model. > > I agree that when the layout changes due to writes, appends, or for any other > reason the server has to recall layout delegations, at least from those clients > that requested layout for region that's about to be the changed. Hopefully, > all clients behave nicely and their delegations do not have to be revoked. > You want to revoke the layout delegation from unresponsive clients > since allowing > them to use the stale layout may end up with data corruption. > > Speaking of append, I always thought it'd be really nice to have an NFS APPEND > operation... This seems like something we can propose right away on > nfsv4@ietf.org > How does people on this list feel about that? > I agree that supporting append is important data base and message queuing use frequently logs but so do many simple commercial applications. > A use case I encountered is a customer that use a shared file as a log and have > multiple nodes in the cluster appending to that file with some coordination > (right now, NFSv3 + NLM). They don't care about ordering of the > appended records > and they even accept records written more than once to the file, but > they do care > about the consistency of each record so writers can't just silently overwrite > each other. > > > The issue is furthermore complicated by the "sparse" layout that we > all want to support (do we?) > > Can you please turn the details knob on "sparse" layout and maybe give a > concrete example where this layout make the proposed model fall short? > If you consider very large files very large files sparsely populated and being used by a well coordianted set of applications it makes more sense to have mapping information use and caching coordinated. The longer I think about it the more it looks that mapping and caching information are not distinct pieces of information and we better try to treat them as such. > > > layout delegation: > > > - returned on READ_IND, WRITE_IND, LAYOUT_DELEG_ASK > > > > > > Covers only layout (aggregation header, map, handles/caps). > > > Optional, recallable, revocable. > > > Assures the client that the layout information it has will not change. > > > > But the layout information may change even in the most trivial > single writer case and definitely in RW cases. > > Correct, when the layout is about to be changed (a writer calls COMMIT_IND) > or when there is a write-write conflict (two clients call WRITE_IND for > overlapping regions) some or all layout delegations must be recalled. > > > > WRITE yes client can safely cache read and write data, > > > serve opens, and locks locally and can perform > > > out-of-band or server reads and writes. > > At least this requires mapping updates for block storage. > > For those souls that want strict local-FS semantics (UNIX) cache > and map invalidations can be a side-effect of the byte-range locking mechanism. > > This sounds like something that falls into the distributed cache coherency > realm - meaning multiple clients have a CW data delegation and a > layout delegation. > My assumption was that in this case the logical block map changes > rarely when the clients are writing in place, otherwise they should fall back to > writing through the server. Having an efficient distributed cache coherency > mechanism in NFS seems to me like a stretch but it's worth a discussion to see > if block based SAN filesystems can or can't live without it. > I think that if we work towards common structures for mapping and caching we might end up letting the implementer or user decide about the consistency level he wants and support all. We certainly can't afford to ignore those that require consistency beyond the close-to-open level conventionally associated with NFS especially when there are distributed or cluster file-systems that got their customers use it today (GPFS, SAN-FS). > Benny > > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > From julian_satran@il.ibm.com Fri Dec 26 01:36:27 2003 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 74476 invoked from network); 26 Dec 2003 09:36:24 -0000 Received: from unknown (66.218.66.166) by m12.grp.scd.yahoo.com with QMQP; 26 Dec 2003 09:36:24 -0000 Received: from unknown (HELO mtagate1.de.ibm.com) (195.212.29.150) by mta5.grp.scd.yahoo.com with SMTP; 26 Dec 2003 09:36:23 -0000 Received: from d12relay02.megacenter.de.ibm.com (d12relay02.megacenter.de.ibm.com [9.149.165.196] (may be forged)) by mtagate1.de.ibm.com (8.12.10/8.12.10) with ESMTP id hBQ9aJjB127026; Fri, 26 Dec 2003 09:36:19 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay02.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id hBQ9aHO5258208; Fri, 26 Dec 2003 10:36:18 +0100 In-Reply-To: <5.2.0.9.2.20031222125146.018b3cc0@cic-mail.lanl.gov> To: Gary Grider Cc: "'pnfs-ops@yahoogroups.com'" , pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Fri, 26 Dec 2003 11:36:14 +0200 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 26/12/2003 11:36:18, Serialize complete at 26/12/2003 11:36:18 Content-Type: multipart/alternative; boundary="=_alternative 002DC7EBC2256E08_=" X-eGroups-Remote-IP: 195.212.29.150 From: Julian Satran Subject: Re: [pnfs-reqs] RE: [pnfs-ops] delegation arguments summary X-Yahoo-Group-Post: member; u=64714603 I agree with Gary that handling efficiently the "good-path" (e.g., concurrent writers with non-overlapping regions, or single writer with readers needing only close-to-open consistency) is essential. To me it looks as all those could be better handled if we could approach mapping and caching concurrently. Regards, Julo Gary Grider 22/12/2003 21:53 To pnfs-reqs@yahoogroups.com, Julian Satran/Haifa/IBM@IBMIL, "'pnfs-ops@yahoogroups.com'" cc "'pnfs-reqs@yahoogroups.com'" Subject Re: [pnfs-reqs] RE: [pnfs-ops] delegation arguments summary At 02:41 PM 12/22/2003 -0500, Halevy, Benny wrote: > > * layout delegation revocation (and enforcement of) > > This issue is orthogonal. We dicussed volatile file handles, OSD > > capabilities, and SAN LUN mapping techniques. > > > > Almost orthogonal. There is a subtle problem of sharing layout delegations if one of clientts is doing writes or appends. This falls under CW (concurrent write) sharing since there is one or more writers. By saying "this issue is orthogonal" I meant that the mechanism for revoking the layout delegation is orthogonal to whether we need a complete new set of delegations or extend the current model. I agree that when the layout changes due to writes, appends, or for any other reason the server has to recall layout delegations, at least from those clients that requested layout for region that's about to be the changed. Hopefully, all clients behave nicely and their delegations do not have to be revoked. You want to revoke the layout delegation from unresponsive clients since allowing them to use the stale layout may end up with data corruption. Speaking of append, I always thought it'd be really nice to have an NFS APPEND operation... This seems like something we can propose right away on nfsv4@ietf.org How does people on this list feel about that? A use case I encountered is a customer that use a shared file as a log and have multiple nodes in the cluster appending to that file with some coordination (right now, NFSv3 + NLM). They don't care about ordering of the appended records and they even accept records written more than once to the file, but they do care about the consistency of each record so writers can't just silently overwrite each other. > The issue is furthermore complicated by the "sparse" layout that we all want to support (do we?) Can you please turn the details knob on "sparse" layout and maybe give a concrete example where this layout make the proposed model fall short? > > layout delegation: > > - returned on READ_IND, WRITE_IND, LAYOUT_DELEG_ASK > > > > Covers only layout (aggregation header, map, handles/caps). > > Optional, recallable, revocable. > > Assures the client that the layout information it has will not change. > > But the layout information may change even in the most trivial single writer case and definitely in RW cases. Correct, when the layout is about to be changed (a writer calls COMMIT_IND) or when there is a write-write conflict (two clients call WRITE_IND for overlapping regions) some or all layout delegations must be recalled. > > WRITE yes client can safely cache read and write data, > > serve opens, and locks locally and can perform > > out-of-band or server reads and writes. > At least this requires mapping updates for block storage. > For those souls that want strict local-FS semantics (UNIX) cache and map invalidations can be a side-effect of the byte-range locking mechanism. This sounds like something that falls into the distributed cache coherency realm - meaning multiple clients have a CW data delegation and a layout delegation. My assumption was that in this case the logical block map changes rarely when the clients are writing in place, otherwise they should fall back to writing through the server. As long as there is a way to get concurrent write to scale with reasonable behavior, like non overlapped regions and any other reasonable promises. I suppose we need to pin down what those reasonable promises are. Gary Having an efficient distributed cache coherency mechanism in NFS seems to me like a stretch but it's worth a discussion to see if block based SAN filesystems can or can't live without it. Benny To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Yahoo! Groups Links * To visit your group on the web, go to: * http://groups.yahoo.com/group/pnfs-reqs/ * * To unsubscribe from this group, send an email to: * pnfs-reqs-unsubscribe@yahoogroups.com * * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From julian_satran@il.ibm.com Fri Dec 26 01:36:45 2003 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 68436 invoked from network); 26 Dec 2003 09:36:44 -0000 Received: from unknown (66.218.66.217) by m17.grp.scd.yahoo.com with QMQP; 26 Dec 2003 09:36:44 -0000 Received: from unknown (HELO mtagate3.de.ibm.com) (195.212.29.152) by mta2.grp.scd.yahoo.com with SMTP; 26 Dec 2003 09:36:43 -0000 Received: from d12relay02.megacenter.de.ibm.com (d12relay02.megacenter.de.ibm.com [9.149.165.196] (may be forged)) by mtagate3.de.ibm.com (8.12.10/8.12.10) with ESMTP id hBQ9agn0122204; Fri, 26 Dec 2003 09:36:42 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay02.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id hBQ9afO5284158; Fri, 26 Dec 2003 10:36:42 +0100 In-Reply-To: To: pnfs-ops@yahoogroups.com Cc: pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Fri, 26 Dec 2003 11:36:39 +0200 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 26/12/2003 11:36:41, Serialize complete at 26/12/2003 11:36:41 Content-Type: multipart/alternative; boundary="=_alternative 0033FFCAC2256E08_=" X-eGroups-Remote-IP: 195.212.29.152 From: Julian Satran Subject: Re: [pnfs-ops] Re: pNFS Discussion Summary 1: Caching and Delegations X-Yahoo-Group-Post: member; u=64714603 David & all, black_david@emc.com wrote on 23/12/2003 23:34:32: > I've split this commentary on Garth's issues into two categories. > This is about caching, delegations, and layout info. > > > > [1.2 Cache consistency]: NFSv4 delegations are not about client cache > > > consistency; does out-of-band access require stronger cache > > > consistency than NFSv4 provides > > With a little care in defining the protocol extensions, this issue > can be left to server implementers, unless one wants to take the > (silly, IMHO) position that the protocol should be incapable of > providing stronger cache consistency. > I agree. As a statement of direction we should say that the protocol should be capable of providing all level of consistency close-to-open or UNIX and it should be a client/server - needed/implemented decision on what to use. The issue we may want to discuss is what must be provided as a minimum in a compliant client/server. > HighRoad uses the same FMP protocol to provide both NFS-style > close-to-open consistency for NFS clients and the stronger forms > of consistency required by CIFS - as long as the server knows what > clients have which access rights to what blocks, cache consistency > strength comes down to server implementation decisions about > what outstanding access rights conflict with a new request. We've > actually built server prototypes that provide stronger consistency > for NFS without change to either the FMP protocol or clients, but > the shipped product only provides NFS-style consistency for NFS. > > > > [1.3 Delegation promotion & reacquisition]: must/should NFSv4 offer > > > mechanisms for clients to possess a delegations more than once per open > > > > > > Delegations in NFSv4 are new, and came with significant concern about > > > lots of complexity for not much performance, as they may do as little > > > as avoid the client waiting for one round trip to the server on open. > > > So, as described above with respect to cache consistency, the > > > limitations on delegations can mean great difficulties for clients > > > having performance requirements calling for out-of-band access mostly, > > > or exclusively. > > Yes, and this is a strong reason for separating "layout" delegations from > the existing "data" delegations, IMHO. Consider a web or video server > that is caching file opens for performance reasons - if updating the > content underneath the server makes it impossible to get the direct > access ("layout") delegations back, the result is that one has to shut > down and restart all the servers after the content update in order to > restore performance. The sysadmin responsible for this annoying > work will want to tar-and-feather the system designers who made > it necessary (that would be us if we get this wrong ...). > I have my doubts that this makes sense as I could not find a case in which those are not strongly related and doing them separately will force us into considering a myriad of invalid combinations and failure modes. The only good argument for doing them separately is that they are easier to implement and understand separately but this might be misleading (it may increase substantially the exception handling). This is why I would refrain from suggesting this as a requirement now. > > > [1.4 Layout delegations]: can/should layout metadata "ride" on NFSv4 > > > delegations or are new "layout" delegations needed > > New "layout" delegations are needed for clean separation of functionality, > and so that "layout" delegations can be designed for direct access > requirements. See [1.3] above. > > > > [1.5 Concurrent write]: write delegations now are held by exactly one > > > client, if any; should/must NFS support multiple clients holding > > > concurrent layout delegations. > > I understand the value of this to the self-coordinating HPC applications, > but would like to see this functionality specified (assuming it is > specified) as a cleanly separable option, as I think the desire to > self-coordinate a shared write delegation will be limited to a small > number of application spaces, like HPC. I also note Gary's comment > that it's sufficient for parallel write to work in the non-overlapping > case, which does not require any new concurrent write delegation as > long as each client can hold an exclusive write delegation for its range. > > Thanks, > --David > ---------------------------------------------------- > David L. Black, Senior Technologist > EMC Corporation, 176 South St., Hopkinton, MA 01748 > +1 (508) 293-7953 FAX: +1 (508) 293-7786 > black_david@emc.com Mobile: +1 (978) 394-7754 > ---------------------------------------------------- > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > From julian_satran@il.ibm.com Mon Dec 29 02:11:03 2003 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 82439 invoked from network); 29 Dec 2003 10:11:01 -0000 Received: from unknown (66.218.66.218) by m14.grp.scd.yahoo.com with QMQP; 29 Dec 2003 10:11:01 -0000 Received: from unknown (HELO mtagate7.de.ibm.com) (195.212.29.156) by mta3.grp.scd.yahoo.com with SMTP; 29 Dec 2003 10:11:00 -0000 Received: from d12relay01.megacenter.de.ibm.com (d12relay01.megacenter.de.ibm.com [9.149.165.180] (may be forged)) by mtagate7.de.ibm.com (8.12.10/8.12.10) with ESMTP id hBTAAqwj127908; Mon, 29 Dec 2003 10:10:52 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay01.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id hBTAApcZ253454; Mon, 29 Dec 2003 11:10:51 +0100 In-Reply-To: To: pnfs-ops@yahoogroups.com Cc: Garth Gibson , pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Mon, 29 Dec 2003 12:10:49 +0200 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 29/12/2003 12:10:51, Serialize complete at 29/12/2003 12:10:51 Content-Type: multipart/alternative; boundary="=_alternative 0037E92AC2256E0B_=" X-eGroups-Remote-IP: 195.212.29.156 From: Julian Satran Subject: Re: [pnfs-ops] [minimalism] pNFS Discussion Summary 1: 12/18/03 X-Yahoo-Group-Post: member; u=64714603 Garth and All, Garth Gibson wrote on 18/12/2003 07:42:22: > ---------------------------------------- > > [1.0 Minimalism]: How much additional functionality do we sacrifice to > limit the changes we seek in NFSv4? > > On one hand, some have said that getting to one true file system, with > the high performance and the manageability of federated systems that > might come with out-of-band access, is worth not matching *every* > feature of all existing out-of-band file systems with this first set of > extensions to NFSv4. That we should bite off what we can do quickly, > correctly, with a clear incremental value to NFSv4, and roadmap more > aggressive changes that could bog us down, or introduce so much > complexity that interoperability becomes elusive. And that we should > be mindful of the reception we may get from the IETF NFS working group > if we *appear* to use out-of-band as an excuse to ask for a brace of > changes in other aspects of NFSv4. > > On the other hand, the other out-of-band file systems that are > inspiring the evolution of NFSv4 have customers that may not accept any > backward sets in an evolution to NFSv4. This could create the need to > develop, carry and differentiate all the diverse one-off out-of-band > files systems plus a new out-of-band NFSv4. Some think it makes more > sense to go far enough with this first NFSv4 to simplify the > marketplace by making it reasonable for various vendors to > deprecate/end-of-life/begin to wean from their proprietary offering. > > While it is certainly conceivable that we could be designing a roadmap > of solutions in detail from the start, communication among standards > bodies is hard enough without the challenge of designing specs for both > with and without a requirement. > > This is a central issue in defining the requirements for out-of-band > NFSv4, or at least for defining the scope of the first set of > extensions. > > ---------------------------------------- > I am afraid that this text makes achieving compliance with existing out-of-band filesytems sound more complex than it might be. I see several items that we should strive to keep even in a minimalist set of requirements: * attribute set rich enough to enable expressing the attributes of the major local-filesytems (Unix brands and Windows) * access control that accommodates the access control mechanisms of the major local-filesytems and some of the popular distributed file-systems (AFS?) * coherency mechanisms that enable vendors to optionally implement the two major flavor of coherent file access: o completely coherent o close-to-open coherent None of those seem to me as involving major departures from NFSv4. Julo From andros@citi.umich.edu Mon Dec 29 12:11:13 2003 Return-Path: X-Sender: andros@citi.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 99698 invoked from network); 29 Dec 2003 20:11:10 -0000 Received: from unknown (66.218.66.216) by m5.grp.scd.yahoo.com with QMQP; 29 Dec 2003 20:11:10 -0000 Received: from unknown (HELO citi.umich.edu) (141.211.133.111) by mta1.grp.scd.yahoo.com with SMTP; 29 Dec 2003 20:11:10 -0000 Received: from citi.umich.edu (citi.umich.edu [141.211.133.111]) by citi.umich.edu (Postfix) with ESMTP id A223D20806; Mon, 29 Dec 2003 15:11:09 -0500 (EST) X-Mailer: exmh version 2.5 07/13/2001 with version: MH 6.8.3 #74[UCI] To: pnfs-ops@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com, andros@citi.umich.edu Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 29 Dec 2003 15:11:09 -0500 Message-Id: <20031229201109.A223D20806@citi.umich.edu> X-eGroups-Remote-IP: 141.211.133.111 From: "William A.(Andy) Adamson" Subject: why not use mandatory byte-range locking X-Yahoo-Group-Post: member; u=169434965 the discussion of byte-range delegations and cache consistancy provoked this thought: why not use existing mandatory byte-range locking? the client opens a file, requests a (mandatory) lock on the region of the file it's interested in. the resultant lock stateid is passed as an argument to the READ/WRITE_IND request. we can require a mandatory lock stateid prior to handing out layout maps for direct i/o. the layout map is 'good 'only for as long as the byte-range lock. the mandatory lock protects the layout, so no need for layout delegations. mandatory locking also allows the client to cache and operate locally on the locked data region with cache consistancy guarentees. we already have the byte-range locking code written. so how far does this get us? does it make sense to start with the locking code instead of the delegation as far as extenstions? -->Andy From ggrider@lanl.gov Mon Dec 29 20:10:53 2003 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 30244 invoked from network); 30 Dec 2003 04:10:50 -0000 Received: from unknown (66.218.66.166) by m11.grp.scd.yahoo.com with QMQP; 30 Dec 2003 04:10:50 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta5.grp.scd.yahoo.com with SMTP; 30 Dec 2003 04:10:49 -0000 Received: from mailrelay2.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id hBU4AmfK005317; Mon, 29 Dec 2003 21:10:49 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay2.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id hBU4AmtS026425; Mon, 29 Dec 2003 21:10:48 -0700 Received: from cthulu.lanl.gov (vpn-client-136.lanl.gov [128.165.253.136]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id hBU4AkFR002434; Mon, 29 Dec 2003 21:10:46 -0700 Message-Id: <5.2.0.9.2.20031229210957.018956f0@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Mon, 29 Dec 2003 21:10:44 -0700 To: pnfs-reqs@yahoogroups.com, pnfs-ops@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com, andros@citi.umich.edu In-Reply-To: <20031229201109.A223D20806@citi.umich.edu> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=====================_3295098==.ALT" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: Re: [pnfs-reqs] why not use mandatory byte-range locking X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs ADVERTISEMENT As long as there is a way to ask for higher level coordination so byte range locks are not mandatory, default is ok. Thanks Gary At 03:11 PM 12/29/2003 -0500, William A.(Andy) Adamson wrote: > the discussion of byte-range delegations and cache consistancy provoked this > thought: why not use existing mandatory byte-range locking? > > the client opens a file, requests a (mandatory) lock on the region of the file > it's interested in. the resultant lock stateid is passed as an argument to the > READ/WRITE_IND request. we can require a mandatory lock stateid prior to > handing out layout maps for direct i/o. the layout map is 'good 'only for as > long as the byte-range lock. > > the mandatory lock protects the layout, so no need for layout delegations. > mandatory locking also allows the client to cache and operate locally on the > locked data region with cache consistancy guarentees. > > we already have the byte-range locking code written. so how far does this get > us? does it make sense to start with the locking code instead of the > delegation as far as extenstions? > > -->Andy > > > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > > > > Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From mclarty3@llnl.gov Tue Dec 30 09:16:21 2003 Return-Path: X-Sender: mclarty3@llnl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 28819 invoked from network); 30 Dec 2003 17:16:19 -0000 Received: from unknown (66.218.66.172) by m9.grp.scd.yahoo.com with QMQP; 30 Dec 2003 17:16:19 -0000 Received: from unknown (HELO smtp-3.llnl.gov) (128.115.41.83) by mta4.grp.scd.yahoo.com with SMTP; 30 Dec 2003 17:16:19 -0000 Received: from poptop.llnl.gov (localhost [127.0.0.1]) by smtp-3.llnl.gov (8.12.3p2-20030917/8.12.3/LLNL evision: 1.13 $) with ESMTP id hBUHGH9S025152; Tue, 30 Dec 2003 09:16:17 -0800 (PST) Received: from POLARBEAR.llnl.gov ([134.9.18.59] verified) by poptop.llnl.gov (CommuniGate Pro SMTP 4.0.6) with ESMTP id 33235490; Tue, 30 Dec 2003 09:16:17 -0800 Message-Id: <5.0.0.25.2.20031230083936.02fba428@poptop.llnl.gov> X-Sender: e002801@poptop.llnl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.0 Date: Tue, 30 Dec 2003 09:16:16 -0800 To: pnfs-reqs@yahoogroups.com, pnfs-ops@yahoogroups.com Cc: andros@citi.umich.edu In-Reply-To: <5.2.0.9.2.20031229210957.018956f0@cic-mail.lanl.gov> References: <20031229201109.A223D20806@citi.umich.edu> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-eGroups-Remote-IP: 128.115.41.83 From: Tyce McLarty Subject: Re: [pnfs-reqs] why not use mandatory byte-range locking X-Yahoo-Group-Post: member; u=169320772 ADVERTISEMENT I'm sure I do not understand all the subtleties of byte-range delgations vs. byte-range locking, but I think the essential ingredient we are after is the ability to use some coordination across thousands of clients, like Gary says. A single process in an HPC application frequently needs to access many discontiguous byte-ranges, but the coordinated group of clients will access a large contiguous byte-range. I think it was this example that led us to the idea of layout delegations to begin with. The key is to keep thinking in terms of many parallel clients, not a single one. Thanks, Tyce At 09:10 PM 12/29/2003 -0700, Gary Grider wrote: >As long as there is a way to ask for higher level coordination so byte >range locks >are not mandatory, default is ok. > >Thanks >Gary > >At 03:11 PM 12/29/2003 -0500, William A.(Andy) Adamson wrote: >>the discussion of byte-range delegations and cache consistancy provoked this >>thought: why not use existing mandatory byte-range locking? >> >>the client opens a file, requests a (mandatory) lock on the region of the >>file >>it's interested in. the resultant lock stateid is passed as an argument >>to the >>READ/WRITE_IND request. we can require a mandatory lock stateid prior to >>handing out layout maps for direct i/o. the layout map is 'good 'only for as >>long as the byte-range lock. >> >>the mandatory lock protects the layout, so no need for layout delegations. >>mandatory locking also allows the client to cache and operate locally on the >>locked data region with cache consistancy guarentees. >> >>we already have the byte-range locking code written. so how far does this >>get >>us? does it make sense to start with the locking code instead of the >>delegation as far as extenstions? >> >>-->Andy >> >> >> >>To unsubscribe from this group, send an email to: >>pnfs-reqs-unsubscribe@yahoogroups.com >> >> >> >> >> >>---------- >>Yahoo! Groups Links >> * To visit your group on the web, go to: >> * >> http://groups.yahoo.com/group/pnfs-reqs/ >> >> * >> * To unsubscribe from this group, send an email to: >> * >> pnfs-reqs-unsubscribe@yahoogroups.com >> >> * >> * Your use of Yahoo! Groups is subject to the >> Yahoo! Terms of Service. > >Yahoo! Groups Sponsor >ADVERTISEMENT > > >---------- >Yahoo! Groups Links > * To visit your group on the web, go to: > * > http://groups.yahoo.com/group/pnfs-reqs/ > > * > * To unsubscribe from this group, send an email to: > * > pnfs-reqs-unsubscribe@yahoogroups.com > > * > * Your use of Yahoo! Groups is subject to the > Yahoo! Terms of Service. From dnoveck@netapp.com Tue Dec 30 12:08:08 2003 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 26134 invoked from network); 30 Dec 2003 20:08:07 -0000 Received: from unknown (66.218.66.167) by m12.grp.scd.yahoo.com with QMQP; 30 Dec 2003 20:08:07 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta6.grp.scd.yahoo.com with SMTP; 30 Dec 2003 20:08:07 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id hBUK86Kw003014; Tue, 30 Dec 2003 12:08:06 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id hBUK86pr015864; Tue, 30 Dec 2003 12:08:06 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3CF10.A12BC55E" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Tue, 30 Dec 2003 12:08:02 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 Thread-Index: AcPIdgEt2ftUYTH2RMiubXEU2ZfttQGc/84A To: Cc: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck It seems legal to me but I'm guessing that there are others that would think differently. I tend to think that it is not a good idea, though. There are going to be operations which, by their nature, are better done through the metadata server. A two-byte write which spans multiple data servers is an example. Another is append-writes, which have been mentioned (by whom I don't remember just now) as a desirable v4 extension, assuming the data to be written is of reasonable size. In each case, we may create appropriate caching/locking primitives to allow the operation to be done without making any request of the metadata server that is officially denominated an "IO" request. But can you really argue that this will be the best way for the client to do such operations? And does it really make sense to force clients to invest the effort in terms of the code do such operations doing the IO with the data server only, when the performance benefit of that is going to be small, or zero, or negative? You may wind up making as many requests of the meta-data server with the data-server-only approach. It's just that they won't be IO operations (but instead locking and, in the case of append, getattr operations). In complicated protocols (and v4 is a complicated protocol and is getting more complicated), there are going to be multiple ways of doing the same thing, which are going to differ in their performance characteristics. An organization can be reasonably concerned about clients making the wrong choice, just as it is concerned about clients that are making excessive resource demands for other reasons. There are two issues that I am worried about in taking such a drastic approach as simply refusing to support a valid piece of the protocol, even if that choice is made by the server administrator. The first is that determining the better choice depends on a lot of variables and that a simple formula governing an option (e.g. "IO through the metadata server is bad") is unlikely to completely match reality. The second is that I-don't-like-your-IO-request-so-you-lose is kind of a blunt instrument to deal with the problem. If you have identified some set of bad client practices, you can find the clients doing them, report the appropriate statistics, even, if the issue is critical, artificially give such clients (or specific requests) bad performance in a way that doesn't hurt other clients (unless they are waiting for the first set to do something. Sigh!), by just delaying processing of their requests by millisecond or two. That should be enough to preserve metadata-server bandwidth for more worthwhile purposes. If that's insufficiently discouraging, you can raise the delay. If you start rejecting requests because you would have done it differently, even if you are correct, you are on the road to creating your own sub-protocol, which is why this kind of thing is worrying, even if legal. -----Original Message----- From: Julian Satran [mailto:julian_satran@il.ibm.com] Sent: Monday, December 22, 2003 5:26 AM To: pnfs-ops@yahoogroups.com Cc: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com Subject: RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 Since I raised the issue of the metadata server not having access to all it's data servers (or at least not with adequate bandwidth) I feel compelled to say that Dave's arguments about supporting 4.0 are compelling enough to make it mandatory. The open issue is if it is legal for a "compliant server" to have serving data disabled by a local administrative function (the old "must implement but may use"). Otherwise an organization that wants to discourage use of data serving through the metadata server has very little it can do to enforce policy in a way that will not affect other clients (it may do serve poorly but this still affects other clients). Julo "Noveck, Dave" 18/12/2003 19:21 Please respond to pnfs-ops@yahoogroups.com To , cc Subject RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 Good summary. I want to address the "proxying" issue. > [1.1 Proxying]: Operations/work that can only be done out-of-band vs > alternative access through the NFSv4 server for all operations/work If you are talking about operations in the extension (let's call it NFS-v4.x), that are not in the previous minor version (let's assume that is nfs-v4.1), then you have a choice of whether these are supported for access through the server, or only for access by the client with the data server. Let's call this the issue of proxying in the strict sense. There is another issue that people are calling "proxying" but is really logically distinct. That is the issue of access by the previous minor version, e.g. nfs-v4.0 or nfs-v4.1. Those versions have no concept of separate data servers and they need to be able to work. End of story. If you can't read files stored in nfs-v4.x with nfs-v4.0, you do not have a minor version without proxying. You don't have a minor version at all. I believe the working group is never going to accept that. Even if I'm wrong and you can get the working group to accept that, it is going to be very contentious and thus take up a lot of time. Anybody, who really wants to go down this path should seriously consider the trade-off between supporting something they find objectionable and getting a standard a lot later, if at all. > On one hand, some suggest that a set of out-of-band clients should not > have to also have a data path through the NFSv4 metadata server. One > reason is that customers may not tolerate the large variability in > performance between out-of-band (when the going is good) and in-band > (when the server chooses not to grant or to take away a delegation) > accesses. Then such customers will use clients that access things out-of-band whenever possible, and servers that never refuse to give out layout delegations. You have a number of quality-of-implementations issues for v4.x clients and servers. If a particular client only supports access via v4.0, then performance will suck, and the working group will understand that, but it won't accept not being able to use v4.0 at all. The customer is going to be motivated to upgrade his clients for those that need high-performance access, but he may be OK with some clients using v4.0 for a long time, depending on the particular performance those clients need. (And some will want v2/v3 access but that is a matter that the working group has no say about). > Another reason, and I paraphrase someone else here, is that > it is possible to construct out-of-band metadata servers that do not > have access to the data servers except through the clients -- I > encourage the source of this scenario to replace my paraphrasing with a > correct use case, because I find it odd to design for file servers that > do not have access to the data servers. So let's grant that it is possible (and we'll pass over the issue of whether it is desirable, and in fact so desirable that one is willing to not get a standard and or get it much later). So we have a metadata server and it, for whatever reason, does not have access to the data servers. However, by hypothesis, there are machines (e.g. clients), that can communicate with both. So, if one has such an architecture, then one can take such a machine, give it a communication path to the meta-data server and the data server and have the meta-data server transfer v4.0 READ requests to it, let it read the data from the data server and send it back to the meta-data server who send it back to the original requestor. Is that a very good solution? No. Is it likely to be performant? No. Will it satisfy any particular customer? I don't know and that is the implementer's business decision. Will it satisfy the hypothetical customer who doesn't care about v4.0 access? Clearly. Will it satisfy the v4 working group? Yes, because they are not in the business of telling you how performant v4.0 access has got to be. > On the other hand, others have suggested that any access or work that a > client can do out-of-band should be possible with one or more commands > applied to the metadata server's data path. This has been proposed for > coping with recalled delegations, including concurrent writing by > multiple clients; retry after client access errors, provided adequate > idempotency of out-of-band operations; and many alternative > implementations of out-of-band clients, including legacy clients that > use out-of-band never or rarely. This effort is going to take a while, but if we manage it correctly, it is not going to take so long that v3 clients are going to be rare things, and they have to be supported. But v3 clients are not an issue for the working group. V4.0 clients are and they will be around and you will have to support them, and I believe the working group is not going to be disposed to cut you a lot of slack on this issue (and I don't see why it should). > I think this is a topic that should be argued one way or the other in > the requirements document. Use cases and examples in other systems > would be best. I think the requirement should be that this work should be done as a set of extensions to nfs-v4 delivered as a v4 minor version. If there is some feature/requirement that conflicts with that model (and it is a pretty flexible one), then you have to think long and hard before deciding that that requirement is more important than this basic deivery vehicle, because it seems to me that it is, in almost all respects, the ideal way to make this sort of technology available for widespread use. To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ * To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From black_david@emc.com Fri Jan 02 08:45:29 2004 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 73456 invoked from network); 2 Jan 2004 16:45:24 -0000 Received: from unknown (66.218.66.166) by m18.grp.scd.yahoo.com with QMQP; 2 Jan 2004 16:45:24 -0000 Received: from unknown (HELO mxic2.corp.emc.com) (128.221.12.9) by mta5.grp.scd.yahoo.com with SMTP; 2 Jan 2004 16:45:23 -0000 Received: by mxic2.corp.emc.com with Internet Mail Service (5.5.2653.19) id ; Fri, 2 Jan 2004 11:45:23 -0500 Message-ID: To: pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com, pnfs-sbc@yahoogroups.com Date: Fri, 2 Jan 2004 11:45:21 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain X-eGroups-Remote-IP: 128.221.12.9 From: black_david@emc.com Subject: Two Functionality issues X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 In starting to look at design issues for block metadata, I've run across a couple of issues around functionality to be supported that could use wider discussion. This is based on an initial review of the EMC High Road FMP protocol and the IBM StorageTank SAN.FS protocol. I've tried to just describe the issues here without taking a position. [4] Functionality SAN.FS extents come with both read and write extent mappings and block usage bitmaps. The separate read and write mappings allow for clients to participate in copy-on- write functionality - IIRC, Craig has described this. Issue [4.1]: Should protocol include support for client participation in copy-on-write? A motivation for the separate arrays of block usage bits" appears to be allowing clients to turn file data into holes (e.g., AIX fclear system call). Issue [4.2]: Is the ability to turn valid data into a file "hole" (e.g., AIX fclear) at the client important to support? FMP does not support separate read mappings or usage bitmaps, and hence is not capable of involving clients in copy-on-write or allowing a client to turn valid data into a file "hole". Comments? Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- From dnoveck@netapp.com Mon Jan 05 08:00:09 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 73569 invoked from network); 5 Jan 2004 16:00:05 -0000 Received: from unknown (66.218.66.167) by m3.grp.scd.yahoo.com with QMQP; 5 Jan 2004 16:00:05 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta6.grp.scd.yahoo.com with SMTP; 5 Jan 2004 16:00:05 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i05G04Kw001147; Mon, 5 Jan 2004 08:00:04 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i05G00SX005409; Mon, 5 Jan 2004 08:00:04 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Mon, 5 Jan 2004 07:59:50 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Two Functionality issues Thread-Index: AcPRT9ZJdxsdz5loSIqJV3CopbvMEAB0UrSg To: , , X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] Two Functionality issues X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck With regard to issue 4.2, the fclear operation, I don't have a position on whether this is important to do but I am pretty sure that if we do this, it should not be by means of something limited to the block metadata. If people need this, we should do this via an ordinary v4.x operation, which I'll call FCLEAR for now. The operation of turning a written area into a hole has three visible consequences: 1) The written data within the targeted area vanishes and is replaced by zeros as seen by ordinary v4.0 clients and also in pnfs environments where the metadata format is file or object oriented. 2) Mod a whole bunch of server policy stuff (snapshots, etc.) the disk space previously used is made available (No real guarantees but clients may want to do this to make space available and in many environments they will be able reliably to get the results they desire). 3) The SAN metadata will show the targeted area as a hole. So I would argue that, given that this has visible consequences for all sorts of clients it should be done in a common way, even though the most definitive manifestation of the function is via the SAN metadata. Consider a client implementing the fclear function. Even though a test program might depend on 3), real applications that want this functionality are going to be most interested in 1) and 2). If this function were implemented only through the SAN metadata, what is the client to do to give the application the expected behavior? You can get 1) expensively by writing lots of zeros, but for 2) you are stuck. The result is that even applications that don't explicitly or implicitly depend on 3) are burdened by the fact that fclear support in not universally available. We want to have a single protocol and not three protocols. So I think this means that functionality should only be restricted to a single form of metadata if the consequences of that functionality can only be seen through that form of metadata, which isn't the case here. -----Original Message----- From: black_david@emc.com [mailto:black_david@emc.com] Sent: Friday, January 02, 2004 11:45 AM To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com; pnfs-sbc@yahoogroups.com Subject: [pnfs-reqs] Two Functionality issues In starting to look at design issues for block metadata, I've run across a couple of issues around functionality to be supported that could use wider discussion. This is based on an initial review of the EMC High Road FMP protocol and the IBM StorageTank SAN.FS protocol. I've tried to just describe the issues here without taking a position. [4] Functionality SAN.FS extents come with both read and write extent mappings and block usage bitmaps. The separate read and write mappings allow for clients to participate in copy-on- write functionality - IIRC, Craig has described this. Issue [4.1]: Should protocol include support for client participation in copy-on-write? A motivation for the separate arrays of block usage bits" appears to be allowing clients to turn file data into holes (e.g., AIX fclear system call). Issue [4.2]: Is the ability to turn valid data into a file "hole" (e.g., AIX fclear) at the client important to support? FMP does not support separate read mappings or usage bitmaps, and hence is not capable of involving clients in copy-on-write or allowing a client to turn valid data into a file "hole". Comments? Thanks, --David, +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From andros@citi.umich.edu Mon Jan 05 10:27:41 2004 Return-Path: X-Sender: andros@citi.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 26235 invoked from network); 5 Jan 2004 18:27:40 -0000 Received: from unknown (66.218.66.217) by m18.grp.scd.yahoo.com with QMQP; 5 Jan 2004 18:27:40 -0000 Received: from unknown (HELO citi.umich.edu) (141.211.133.111) by mta2.grp.scd.yahoo.com with SMTP; 5 Jan 2004 18:27:39 -0000 Received: from citi.umich.edu (citi.umich.edu [141.211.133.111]) by citi.umich.edu (Postfix) with ESMTP id 07168207D3; Mon, 5 Jan 2004 13:27:38 -0500 (EST) X-Mailer: exmh version 2.5 07/13/2001 with version: MH 6.8.3 #74[UCI] To: pnfs-ops@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com, andros@citi.umich.edu In-reply-to: Your message of "Mon, 29 Dec 2003 13:00:02 PST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 05 Jan 2004 13:27:37 -0500 Message-Id: <20040105182738.07168207D3@citi.umich.edu> X-eGroups-Remote-IP: 141.211.133.111 From: "William A.(Andy) Adamson" Subject: Re: [pnfs-ops] why not use mandatory byte-range locking X-Yahoo-Group-Post: member; u=169434965 > Andy Adamson wrote: > > the discussion of byte-range delegations and cache consistancy provoked this > > thought: why not use existing mandatory byte-range locking? > > > the client opens a file, requests a (mandatory) lock on the region of the file > > it's interested in. the resultant lock stateid is passed as an argument to the > > READ/WRITE_IND request. we can require a mandatory lock stateid prior to > > handing out layout maps for direct i/o. the layout map is 'good 'only for as > > long as the byte-range lock. > > One problem is that there is no way for the client to specify that he wants > a mandatory (as opposed to advisory) byte-range lock, he just asks for one > and the server gives him the type of byte-range that server is giving out > for that fs. So, if you did that, applications that relied on the semantics > (or lack of semantics) of advisory byte-range locks would break. From 3530 5.11.5. Mode Attribute ...... Note that in UNIX, if a file has the MODE4_SGID bit set and no MODE4_XGRP bit set, then READ and WRITE must use mandatory file locking. so for unix, there is a way to specify mandatory vrs advisory locking. since this is also in 3530: 8. File Locking and Share Reservations ..... These mechanisms can implement policy ranging from advisory only locking to full mandatory locking. adding a flag to a LOCK/T/U to indicate mandatory locking vrs advisory is within reason. > Another issue is that while you say "'good' only for as long as the byte-range > lock", the results of doing this are that the layout map and the data will > be fixed for at least as long as the byte-range lock exists, i.e. sometimes > too long. If I'm going to be reading directly from the data server, then > I want the layout to stay constant for a long time, or at least I don't want > to be forced to repeatedly get locks for small areas of the layout. The > obvious (and desirable) thing for me to do is to get a shared lock for the > whole file so the layout cannot change, but if we combine changes of layout > and changes of data under a single sort of lock, mandatory byte-range locks > in this case, we have stopped anybody writing in the file for a very long > time, i.e. essentially forever since my lease will normally be continually > renewed. > > When you combine a guarantee that the layout will not change with a guarantee > that the data will not change, in such a way that they can't be separated, > you artificially increase the amount of conflicts, in many cases to an > unacceptable level. perhaps i'm missing something, but isn't it the case the layout of the data and the abilty to access the data are totally bound together? the layout changes due to writes and appends (other??). if the layout changes, stale layout maps are not only no longer any good, they can lead to data corruption. it seems to me that the guarentee that the layout won't change is bound to the guarentee that the data won't change. i can't think of any conflicts such as you mention - could you give some examples? > When you have a delegation model, the problem is > excessive recalls, while when you have a locking model the problem is that > some applications will slow to a crawl/halt. > > > the mandatory lock protects the layout, so no need for layout delegations. > > mandatory locking also allows the client to cache and operate locally on the > > locked data region with cache consistancy guarentees. > > If you are going to be doing some local operation, then short-term mandatory > byte-range locking can help you. If need to do a lock/fetch/update/write/unlock > cycle on a record, this is the ticket (and in v4 lock/fetch and write/unlock > can be COMPOUND's :-). The record you hold while updating can be considered > cached for that brief period. If, however, you are caching data generally, > i.e. for a period outside the range of a short operation sequence, you are > going to need something that is delegation-like, in that if I have the > cached data and want to keep it until there is some reason to get rid of > it, i.e. it is LRU'd out or there is a conflict, then I have to have some > way of finding out that there is a conflict. Delegations do that via a > recall and one can imagine it being done other ways. But the mandatory > lock model is that I have a lock because I need it and so there is no > provision to tell me that someone else has a conflict. The logic is that > he will wait until I give the lock up, and waiting for the cached data to > be LRU'd is going to be too long in most cases. > i agree that the ability for the server to recall is a required feature. i'm simply suggesting that mandatory locking may have more features in common with what we need for pnfs than delgations, and that we could extend the existing mandatory byte-range locking model with fewer changes than extending the existing delegation model. so, how about estending the mandatory locking model with a recall mechanism? > > > we already have the byte-range locking code written. > > I only have advisory byte-range locking code written. Who has v4 mandatory > byte-range locking implemented? > > > so how far does this get > > us? does it make sense to start with the locking code instead of the > > delegation as far as extenstions? > > I think if we define some form of byte-range delegations (at least for data > and maybe for layout as well), there is going to be lots of code sharing with > an existing mandatory byte-range locking implementation. The data structures > and many of the interfaces are going to be the same. and this is really why i brought this up. a new lock type that has the features we desire (e.g. a recall mechanism) makes sense to me. > The difference is going > to be what you do about conflicts. Instead of saying to the second claimant, > "You snoozed so you lose", in some cases you have to be prepared to recall the > delegation so that, for example, an otherwise unexceptionable write can proceed. From dnoveck@netapp.com Mon Jan 05 12:11:32 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 69316 invoked from network); 5 Jan 2004 20:11:30 -0000 Received: from unknown (66.218.66.172) by m5.grp.scd.yahoo.com with QMQP; 5 Jan 2004 20:11:30 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta4.grp.scd.yahoo.com with SMTP; 5 Jan 2004 20:11:30 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i05KBUKw005747; Mon, 5 Jan 2004 12:11:30 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i05KBUSR010661; Mon, 5 Jan 2004 12:11:30 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Mon, 5 Jan 2004 12:11:23 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: [pnfs-ops] why not use mandatory byte-range locking Thread-Index: AcPTuZy/JW5xIH2uTcip4yVASdheYgACMxhQ To: , X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] Re: [pnfs-ops] why not use mandatory byte-range locking X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck Andy Adamson wrote: [Dave Noveck wrote]: > > When you combine a guarantee that the layout will not change with a guarantee > > that the data will not change, in such a way that they can't be separated, > > you artificially increase the amount of conflicts, in many cases to an > > unacceptable level. > perhaps i'm missing something, but isn't it the case the layout of the data > and the abilty to access the data are totally bound together? the layout > changes due to writes and appends (other??). It is (almost) always the case that if the layout changes, it is because of some data being written. The exceptions are so few that they can easily be dealt with by considering that a fictitious write which just happened to overwrite the same data has occurred (e.g. the data server scans its disks and finds a bad spot making it advisable to move data stored there somewhere else, thus changing the layout, even though there was no application-level write). However, the problem I see with your "totally bound together" formulation is in the other direction. There are many many cases in which the data changes and the layout does not change and they are important from a performance point of view. In the SAN case, whenever a file is modified by overwriting, the data changes but the layout des not. In the pnfs cases in which the distribution is by files or objects, the layout changes even less. The layout is normally established once ("this file is striped among the following 64 data servers at 256K per stripe") and that hardly ever changes. The protocol has to allow for the possibility that there is a change (e.g. the administrator wants to add more data servers) but as a practical matter the clients can go on their merry way using the layout information they got when the file was first accessed. > if the layout changes, stale > layout maps are not only no longer any good, they can lead to data corruption. > it seems to me that the guarentee that the layout won't change is bound to the > guarentee that the data won't change. i can't think of any conflicts such as > you mention - could you give some examples? Thousands of Linux nodes in an application cluster are merrily reading and writing, not changing the layout. The application is careful to not cache inappropriately (and it knows how the file is used so it is reasonable that it might do that), so callbacks will not be needed for cache invalidating. The problem is, you want these nodes to get the layout information and use it and not be bothered when the layout *isn't* changing (and when they are the people doing the writes are bothered since they have to wait for the delegation recalls from large numbers of clients). However, since it is possible that the layout will change, the clients, since they have layout info, will be notified when it changes. Since it is changing infrequently (almost never) this is fine. But it isn't fine, if, whenever the data changes, you act as if the layout is changing. > i agree that the ability for the server to recall is a required feature. i'm > simply suggesting that mandatory locking may have more features in common with > what we need for pnfs than delgations, and that we could extend the existing > mandatory byte-range locking model with fewer changes than extending the > existing delegation model. I think you are reading too much into my words. When I call such a thing a delegation, I don't mean that it is very much like the delegations that exist in v4.0 today. I mean simply that it is an optionally-granted recallable lock. It makes sense in v4.x to do such a thing with a new OP (as I don't think you can add parameters to existing ops) but GET_RANGE_DELEG is going to look a whole lot more like the existing LOCK op than it does anything related to current delegations in v4.0. > so, how about estending the mandatory locking model with a recall mechanism? I'd call the result a "range delegation". The issue I have is the ability to lock (i.e. get a delegation for) the layout for a given region without getting recalled when the data changes. I don't see a need for the reverse (i.e. a lock on the data without getting recalled when the layout changes). When we get to the detailed specification, we'll see if it turns out better for these (the data lock/delegation and the layout lock/delegation) to be conceptually independent or assembled into a hierarchy in which the don't-change-the-data-or-layout lock/delegation is stronger than the don't- change-the-layout lock/delegation. -----Original Message----- From: William A.(Andy) Adamson [mailto:andros@citi.umich.edu] Sent: Monday, January 05, 2004 1:28 PM To: pnfs-ops@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com; andros@citi.umich.edu Subject: [pnfs-reqs] Re: [pnfs-ops] why not use mandatory byte-range locking > Andy Adamson wrote: > > the discussion of byte-range delegations and cache consistancy provoked this > > thought: why not use existing mandatory byte-range locking? > > > the client opens a file, requests a (mandatory) lock on the region of the file > > it's interested in. the resultant lock stateid is passed as an argument to the > > READ/WRITE_IND request. we can require a mandatory lock stateid prior to > > handing out layout maps for direct i/o. the layout map is 'good 'only for as > > long as the byte-range lock. > > One problem is that there is no way for the client to specify that he wants > a mandatory (as opposed to advisory) byte-range lock, he just asks for one > and the server gives him the type of byte-range that server is giving out > for that fs. So, if you did that, applications that relied on the semantics > (or lack of semantics) of advisory byte-range locks would break. >From 3530 5.11.5. Mode Attribute ...... Note that in UNIX, if a file has the MODE4_SGID bit set and no MODE4_XGRP bit set, then READ and WRITE must use mandatory file locking. so for unix, there is a way to specify mandatory vrs advisory locking. since this is also in 3530: 8. File Locking and Share Reservations ..... These mechanisms can implement policy ranging from advisory only locking to full mandatory locking. adding a flag to a LOCK/T/U to indicate mandatory locking vrs advisory is within reason. > Another issue is that while you say "'good' only for as long as the byte-range > lock", the results of doing this are that the layout map and the data will > be fixed for at least as long as the byte-range lock exists, i.e. sometimes > too long. If I'm going to be reading directly from the data server, then > I want the layout to stay constant for a long time, or at least I don't want > to be forced to repeatedly get locks for small areas of the layout. The > obvious (and desirable) thing for me to do is to get a shared lock for the > whole file so the layout cannot change, but if we combine changes of layout > and changes of data under a single sort of lock, mandatory byte-range locks > in this case, we have stopped anybody writing in the file for a very long > time, i.e. essentially forever since my lease will normally be continually > renewed. > > When you combine a guarantee that the layout will not change with a guarantee > that the data will not change, in such a way that they can't be separated, > you artificially increase the amount of conflicts, in many cases to an > unacceptable level. perhaps i'm missing something, but isn't it the case the layout of the data and the abilty to access the data are totally bound together? the layout changes due to writes and appends (other??). if the layout changes, stale layout maps are not only no longer any good, they can lead to data corruption. it seems to me that the guarentee that the layout won't change is bound to the guarentee that the data won't change. i can't think of any conflicts such as you mention - could you give some examples? > When you have a delegation model, the problem is > excessive recalls, while when you have a locking model the problem is that > some applications will slow to a crawl/halt. > > > the mandatory lock protects the layout, so no need for layout delegations. > > mandatory locking also allows the client to cache and operate locally on the > > locked data region with cache consistancy guarentees. > > If you are going to be doing some local operation, then short-term mandatory > byte-range locking can help you. If need to do a lock/fetch/update/write/unlock > cycle on a record, this is the ticket (and in v4 lock/fetch and write/unlock > can be COMPOUND's :-). The record you hold while updating can be considered > cached for that brief period. If, however, you are caching data generally, > i.e. for a period outside the range of a short operation sequence, you are > going to need something that is delegation-like, in that if I have the > cached data and want to keep it until there is some reason to get rid of > it, i.e. it is LRU'd out or there is a conflict, then I have to have some > way of finding out that there is a conflict. Delegations do that via a > recall and one can imagine it being done other ways. But the mandatory > lock model is that I have a lock because I need it and so there is no > provision to tell me that someone else has a conflict. The logic is that > he will wait until I give the lock up, and waiting for the cached data to > be LRU'd is going to be too long in most cases. > i agree that the ability for the server to recall is a required feature. i'm simply suggesting that mandatory locking may have more features in common with what we need for pnfs than delgations, and that we could extend the existing mandatory byte-range locking model with fewer changes than extending the existing delegation model. so, how about estending the mandatory locking model with a recall mechanism? > > > we already have the byte-range locking code written. > > I only have advisory byte-range locking code written. Who has v4 mandatory > byte-range locking implemented? > > > so how far does this get > > us? does it make sense to start with the locking code instead of the > > delegation as far as extenstions? > > I think if we define some form of byte-range delegations (at least for data > and maybe for layout as well), there is going to be lots of code sharing with > an existing mandatory byte-range locking implementation. The data structures > and many of the interfaces are going to be the same. and this is really why i brought this up. a new lock type that has the features we desire (e.g. a recall mechanism) makes sense to me. > The difference is going > to be what you do about conflicts. Instead of saying to the second claimant, > "You snoozed so you lose", in some cases you have to be prepared to recall the > delegation so that, for example, an otherwise unexceptionable write can proceed. Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From black_david@emc.com Mon Jan 05 16:39:00 2004 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 13162 invoked from network); 6 Jan 2004 00:38:58 -0000 Received: from unknown (66.218.66.216) by m15.grp.scd.yahoo.com with QMQP; 6 Jan 2004 00:38:58 -0000 Received: from unknown (HELO MAHO3MSX2.corp.emc.com) (128.221.11.32) by mta1.grp.scd.yahoo.com with SMTP; 6 Jan 2004 00:38:58 -0000 Received: by maho3msx2.corp.emc.com with Internet Mail Service (5.5.2653.19) id ; Mon, 5 Jan 2004 19:38:57 -0500 Message-ID: To: pnfs-reqs@yahoogroups.com, pnfs-ops@yahoogroups.com Date: Mon, 5 Jan 2004 19:38:57 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain X-eGroups-Remote-IP: 128.221.11.32 From: black_david@emc.com Subject: RE: [pnfs-reqs] Re: [pnfs-ops] why not use mandatory byte-range l ocking X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 > > Andy Adamson wrote: > > > the discussion of byte-range delegations and cache consistancy provoked this > > > thought: why not use existing mandatory byte-range locking? The "existing" locking cannot be reused - it has to be a new type of locking that might share some operations with the existing locking, i.e., > and this is really why i brought this up. a new lock type that has the > features we desire (e.g. a recall mechanism) makes sense to me. Keep in mind that what's required is significantly more than locking. For an example, take a look at FMP_Flush in the uploaded FMP spec to see the things that may need to be done when releasing a write lock. Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- From garth@panasas.com Tue Jan 06 20:31:21 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 60427 invoked from network); 7 Jan 2004 04:31:19 -0000 Received: from unknown (66.218.66.218) by m3.grp.scd.yahoo.com with QMQP; 7 Jan 2004 04:31:19 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 7 Jan 2004 04:31:19 -0000 Received: from [172.17.19.50] ([172.17.19.50]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYGWFA; Tue, 6 Jan 2004 23:31:18 -0500 Mime-Version: 1.0 (Apple Message framework v609) Content-Transfer-Encoding: 7bit Message-Id: <54AEC7B6-40CA-11D8-B7B5-000A95A94F04@panasas.com> Content-Type: text/plain; charset=US-ASCII; format=flowed To: pnfs-reqs@yahoogroups.com Date: Tue, 6 Jan 2004 23:31:15 -0500 X-Mailer: Apple Mail (2.609) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Announcing a weekly pNFS requirements concall X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson While our mailing lists are seeing a good flow of good comments, the timeline we have set for ourselves, to give the IETF something in early Feb, is short. So I've set up a weekly conference call for an hour, for all that can make it. Notes from these calls will go out to the Yahoo group reflector for those that can't make it. Beginning this Friday, Jan 9, 12-1pm EST, hosted by Panasas. Contact garth gibson if you would like to participate and do not know the dial in numbers. Thanks garth From julian_satran@il.ibm.com Fri Jan 09 23:04:02 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 17499 invoked from network); 10 Jan 2004 07:04:01 -0000 Received: from unknown (66.218.66.167) by m14.grp.scd.yahoo.com with QMQP; 10 Jan 2004 07:04:01 -0000 Received: from unknown (HELO mtagate3.de.ibm.com) (195.212.29.152) by mta6.grp.scd.yahoo.com with SMTP; 10 Jan 2004 07:04:00 -0000 Received: from d12relay01.megacenter.de.ibm.com (d12relay01.megacenter.de.ibm.com [9.149.165.180]) by mtagate3.de.ibm.com (8.12.10/8.12.10) with ESMTP id i0A73vHI118250; Sat, 10 Jan 2004 07:03:57 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay01.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i0A73tKG278370; Sat, 10 Jan 2004 08:03:56 +0100 In-Reply-To: <20031229201109.A223D20806@citi.umich.edu> To: pnfs-ops@yahoogroups.com Cc: andros@citi.umich.edu, pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Sat, 10 Jan 2004 09:03:54 +0200 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 10/01/2004 09:03:56, Serialize complete at 10/01/2004 09:03:56 Content-Type: multipart/alternative; boundary="=_alternative 0023E6B2C2256E17_=" X-eGroups-Remote-IP: 195.212.29.152 From: Julian Satran Subject: Re: [pnfs-ops] why not use mandatory byte-range locking X-Yahoo-Group-Post: member; u=64714603 It looks to me at least as a valid option. I think that the argument against it has to do with revocation and data access. With NFS - data accesses go through the NFS server and a server that has revoked a lock will not let the client access data. With the new scheme revocation has to be explicit - as the client is accessing data by its own. Delegation expresses this "new reality" better. But perhaps for layout what is needed is combination of lock-stateID and delegation. Julo "William A.(Andy) Adamson" 29/12/2003 22:11 Please respond to pnfs-ops@yahoogroups.com To pnfs-ops@yahoogroups.com cc pnfs-reqs@yahoogroups.com, andros@citi.umich.edu Subject [pnfs-ops] why not use mandatory byte-range locking the discussion of byte-range delegations and cache consistancy provoked this thought: why not use existing mandatory byte-range locking? the client opens a file, requests a (mandatory) lock on the region of the file it's interested in. the resultant lock stateid is passed as an argument to the READ/WRITE_IND request. we can require a mandatory lock stateid prior to handing out layout maps for direct i/o. the layout map is 'good 'only for as long as the byte-range lock. the mandatory lock protects the layout, so no need for layout delegations. mandatory locking also allows the client to cache and operate locally on the locked data region with cache consistancy guarentees. we already have the byte-range locking code written. so how far does this get us? does it make sense to start with the locking code instead of the delegation as far as extenstions? -->Andy To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From garth@panasas.com Wed Jan 14 16:16:12 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 5528 invoked from network); 15 Jan 2004 00:16:11 -0000 Received: from unknown (66.218.66.218) by m4.grp.scd.yahoo.com with QMQP; 15 Jan 2004 00:16:11 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 15 Jan 2004 00:16:11 -0000 Received: from [172.17.133.59] ([172.17.133.59]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYHVN9; Wed, 14 Jan 2004 19:16:09 -0500 Mime-Version: 1.0 (Apple Message framework v609) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <037B3245-46F0-11D8-AC67-000A95A94F04@panasas.com> Content-Transfer-Encoding: 7bit Date: Wed, 14 Jan 2004 16:16:07 -0800 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.609) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] Announcing a weekly pNFS requirements concall X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Yes. As agreed in the last call, we moved the meeting time to Thursday 11-12 EST, 8-9 PST at the same number. Contact me if you do not have the number. Sorry for the late reminder. I'll get the notes from the last meeting out this afternoon. garth On Jan 14, 2004, at 6:45 AM, Julian Satran wrote: > > Do we have a call this week? Julo > From: Garth Gibson > Date: January 6, 2004 8:31:15 PM PST > To: pnfs-reqs@yahoogroups.com > Subject: [pnfs-reqs] Announcing a weekly pNFS requirements concall > Reply-To: pnfs-reqs@yahoogroups.com > > While our mailing lists are seeing a good flow of good comments, the > timeline we have set for ourselves, to give the IETF something in early > Feb, is short. So I've set up a weekly conference call for an hour, > for all that can make it. Notes from these calls will go out to the > Yahoo group reflector for those that can't make it. > > Beginning this Friday, Jan 9, 12-1pm EST, hosted by Panasas. Contact > garth gibson if you would like to participate and do not know the dial > in numbers. > > Thanks > garth From garth@panasas.com Wed Jan 14 23:19:17 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 35625 invoked from network); 15 Jan 2004 07:19:16 -0000 Received: from unknown (66.218.66.167) by m5.grp.scd.yahoo.com with QMQP; 15 Jan 2004 07:19:16 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 15 Jan 2004 07:19:16 -0000 Received: from [172.17.133.59] ([172.17.133.59]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYHWJ2; Thu, 15 Jan 2004 02:19:11 -0500 Mime-Version: 1.0 (Apple Message framework v609) Content-Transfer-Encoding: 7bit Message-Id: <1B419D86-472B-11D8-AC67-000A95A94F04@panasas.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: pnfs-reqs@yahoogroups.com Date: Wed, 14 Jan 2004 23:19:07 -0800 X-Mailer: Apple Mail (2.609) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: pNFS requirements concall 2004-01-09 12-1 EST notes X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT click here pNFS requirements concall 2004-01-09 12-1 EST notes Participants ---------------------------------------------------- David Black, EMC Tyce McLarty, LLNL Dave Noveck, Tom Talpey, Peter Corbett, NetApp Julian Satran, IBM Andy Adamson, CITI Garth Gibson, Benny Halevy, Panasas Garth chaired, Benny took notes. Logistics ---------------------------------------------------- A) This meeting time, Fri 12-1 EST is during Julian's weekend in Israel. We agreed to move to Thursday 11-12 EST beginning Thurs Jan 14. B) The next face-to-face meeting is proposed to be Wed Mar 31 8:30-12 in the morning, immediately before the Usenix FAST conference (www.usenix.org/events/fast04) in the same hotel (Grand Hyatt, 345 Stockton Street, San Francisco, CA 94108, 415.398.1234, 1.800.633.7313, http://grandsanfrancisco.hyatt.com/property/index.jhtml). We are seeking USENIX help for setting this up (Peter Honeyman is asking USENIX for help). FAST starts at 2pm Wed. Its sister conference, NSDI (Network System Design and Implementation) is being held in the same hotel Mon morning until Wed noon. It is proposed to hold this meeting as a BOF, open and advertised in one or both conferences. Requirements group action items ---------------------------------------------------- Garth asks all contributors to strive to include use cases, application areas and other enduser oriented justification in all requirements deliverables. 1) Problem statement Informational Internet Draft as a vehicle to communicate to IETF - Timeline: we would like our topic to be considered for an agenda at Seoul Feb 29 - March 5 59th IETF meeting - Deadline: IETF deadline is approximately Feb 7, working backwards and allowing time for communication and errors, we plan to set a within-the-group deadline for last comments on the document end of day Jan 30 - Purpose: to explain the problem we seek to fix, why it should be fixed and why it should be done in the IETF - Example: "RDMA over IP Problem Statement", draft-ietf-rddp-problem-statement-02, by Allyn Romanow, Jeff Mogul, Tom Talpey, Stephen Bailey with help from Jeff Chase and Jim Pinkerton (ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-ietf-rddp- problem-statement-02.txt) - Audience: skill set like us, background more varied, expect to be persuaded, although the academic citation list of the example is much more than we need, we should add about three-five pages of good content to the boilerplate internet draft document structure See discussion below. 2) Elevator pitch for general external communication - Purpose: ensure that the members of our community attending the Seoul IETF are equipped with the essentials (at least David Black, Tom Talpey and Julian Satran are attending Seoul) - Deadline: Feb 19, first day of Connectathon 2004 (Feb 19-26, 180 Park Ave., San Jose, CA 95113 San Jose, CA, www.connectathon.org), which I guess creates another opportunity for a (subgroup?) face-to-face Garth proposed, subset from his NEPS position paper: "striping all the way to the clients; providing scalable bandwidth, scalable capacity, load balancing and capacity balancing for federated servers and consolidated storage." Additionally, there are a number of products for a good range of companies offering proprietary solutions in this area, some of which employ/extend/supplant the IETF's NFS and iSCSI, so an IETF effort building on or working within NFS and iSCSI seems very natural and compelling. Some comments: Black: striping is not inherent, prefers "direct access" Garth: Direct access is a loaded term for NFS and IETF: for us it is moving data from multiple servers to one client without proxying it through one server network endpoint, while direct access for DAFS and RDMA has to do with eliminating copies in the memory system of any client or server. Satran: you can give up load balancing and capacity balancing since these are means and not ends. Black: don't want to focus on one sentence, multiple sentences may prove better. Julian: what about security? Black: would not bring it here in the elevator, but in the problem statement section on "why IETF" Multiple folks: Scalable capacity and scalable bandwidth are the core ideas. With respect to "federated storage and consolidated storage", federated and consolidated are loaded words Black: propose something more general as scalable storage systems Multiple: given the many uses of scalable, probably just "storage systems" 3) Slide deck for face-to-face external communications - Purpose: if members of our community attending the Seoul IETF have a chance to present to the NFS (or other) working groups, we should equip them with a presentation of the problem statement document; this will also be useful at the FAST BOF - Example: ROI-Problem-Scenario-0302.ppt, presented by David Black at the second IETF BOF on RDMA as a synopsis of the corresponding problem statement, particularly because the first BOF had not achieved its goals - Deadline: Feb 19, same as elevator pitch 4) Draft requirements document - Purpose: a working document to capture and justify the group's decisions what to do and what not to do - Timeline: not clear yet Most of the discussions going on in the pNFS-reqs mailing list are addressing issues that belong in this document. This is great, but should not be confused with the problem statement. The problem statement is for external communication, justifying the effort to standardize something, summarizing the commonality achieved at the NEPS workshop; while the requirements document is for resolution of issues, and may not be complete until the standard draft is pretty much fleshed out. Discussion on Problem Statement ---------------------------------------------------- Garth: Lets start with comments on the beginnings of a problem statement that Garth (Dec 10) and Gary Grider (Dec 12) contributed to this group. Garth: Here is what I put out to start the conversation, based on nearly copying the RDDP problem statement abstract and table of contents. > A possible pNFS problem statement abstract: > This draft addresses an NFS-based solution to the problem of high > system costs due to store-and-forward copying of storage data from > storage devices through a file server mount point to high-speed > end-hosts that also have connectivity to source storage devices. The > problem is due to the high cost of funneling large storage bandwidths > through NFS on single IP addresses, and it can be substantially > improved using "out-of-band access." The high cost of high-bandwidth > NFS servers has limited the use of NFS in data centers especially > where high storage bandwidths are required and numerous storage > serving devices are already networked together. > > A pNFS table of contents might be: > 1. Introduction > 2. The high cost of high bandwidth storage through NFS > 2.1 Out-of-band access decreases bandwidth requirements in central > file servers > 3. Application level routing of storage data packets is the root cause > of the problem > 4. Storage bandwidth bottlenecks are problematic for many key file > system applications > 5. Out-of-band access techniques > 5.1 A conceptual framework: pNFS delegated maps for distributing files > over SBC, OSD and NFS storage subsystems > 6. Security considerations > 7. Acknowledgements > 8. Informative references Garth: I started with the RDDP problem statement. RDDP affected the design of servers therefore affected their costs. So they pitched a cost problem with the design of communication protocol going forward in time. Garth: "Store and forward copying through the ip address that you mounted" problem is a cost problem. Talpey: I wouldn't lead with cost. RDDP is about system overhead. Garth: by analogy the system overhead we are avoiding is the forwarding all the data packets through a single IP address, a single server endpoint (NFS mount) Tom: elevator pitch for RDDP: data copy costs cycle and bus bandwidth, avoiding data copy scales servers. In our case we have a bottleneck moving the data through one point (Tom recommends avoid the term "single IP address") Garth: RDDP references moore's law. For us client and server machines both follow moore's law, but the rate of growth of the number of clients making demands on the servers is causing demand to exceed server bandwidth. Tom: be careful with mentioning Moore's law Corbett: missing one point of "the clients are focusing on a narrow part of the dataset" Garth: The cluster phenomena drives the demand way ahead of the server supply. [people liked the term "cluster phenomena"] Tom: the fundamental thing is scaling access to a single object. Garth: the market for that may be too small. Tom: it was said that "RDDP is good only for databases" so, I agree, be careful for narrowing down the scope of applicability. Talpey: there are existing solution like trunking the NFS protocol to achieve scalable bandwidth to multiple files but you still have the single server issue. the real problem is achieving scalable bandwidth to a single file. Garth: this seems to narrow, as Corbett's NEPS presentation argues, the same problem exists for "close" collections of files like a single directory even if no file is itself spread over multiple servers -- SNIA talks about virtualizing a file (spreading the parts of one file) and virtualizing a file system (spreading the parts of one volume) while preserving the manageability implied by one file server Tom: "out of band" is a bad label because it already means certain things. Garth: I didn't use "direct access" because of prior and different definition by DAFS -- We need a new word for what we're proposing. Multiple: separation of data and control is good -- maybe some variant of this gives us the words we need: separated data path, parallel data path From Brian.Pawlowski@netapp.com Thu Jan 15 03:55:52 2004 Return-Path: X-Sender: beepy@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 3528 invoked from network); 15 Jan 2004 11:55:50 -0000 Received: from unknown (66.218.66.172) by m14.grp.scd.yahoo.com with QMQP; 15 Jan 2004 11:55:50 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta4.grp.scd.yahoo.com with SMTP; 15 Jan 2004 11:55:50 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0FBtnKw023061 for ; Thu, 15 Jan 2004 03:55:49 -0800 (PST) Received: from tooting-fe.eng.netapp.com (tooting-fe.eng.netapp.com [10.56.10.118]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0FBtnSR002121 for ; Thu, 15 Jan 2004 03:55:49 -0800 (PST) Received: (from beepy@localhost) by tooting-fe.eng.netapp.com (8.11.6+Sun/8.11.6) id i0FBtnl04249 for pnfs-reqs@yahoogroups.com; Thu, 15 Jan 2004 03:55:49 -0800 (PST) Message-Id: <200401151155.i0FBtnl04249@tooting-fe.eng.netapp.com> In-Reply-To: <1B419D86-472B-11D8-AC67-000A95A94F04@panasas.com> from Garth Gibson at "Jan 14, 4 11:19:07 pm" To: pnfs-reqs@yahoogroups.com Date: Thu, 15 Jan 2004 03:55:48 -0800 (PST) X-Mailer: ELM [version 2.4ME++ PL40 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: Brian Pawlowski From: Brian Pawlowski Subject: Re: [pnfs-reqs] pNFS requirements concall 2004-01-09 12-1 EST notes X-Yahoo-Group-Post: member; u=169504717 ADVERTISEMENT > A) This meeting time, Fri 12-1 EST is during Julian's weekend in > Israel. We agreed to move to Thursday 11-12 EST beginning Thurs Jan > 14. You meant the other Thursday Jan 14. From julian_satran@il.ibm.com Mon Jan 19 00:48:33 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 72665 invoked from network); 19 Jan 2004 08:48:31 -0000 Received: from unknown (66.218.66.172) by m17.grp.scd.yahoo.com with QMQP; 19 Jan 2004 08:48:31 -0000 Received: from unknown (HELO mtagate5.de.ibm.com) (195.212.29.154) by mta4.grp.scd.yahoo.com with SMTP; 19 Jan 2004 08:48:30 -0000 Received: from d12relay01.megacenter.de.ibm.com (d12relay01.megacenter.de.ibm.com [9.149.165.180]) by mtagate5.de.ibm.com (8.12.10/8.12.10) with ESMTP id i0J8mSe2117726; Mon, 19 Jan 2004 08:48:28 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay01.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i0J8mRJY181632; Mon, 19 Jan 2004 09:48:27 +0100 In-Reply-To: To: pnfs-reqs@yahoogroups.com Cc: pNFS Operations , pNFS Requirements MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Mon, 19 Jan 2004 00:48:22 -0800 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 19/01/2004 10:48:27, Serialize complete at 19/01/2004 10:48:27 Content-Type: multipart/alternative; boundary="=_alternative 0061A2A0C2256E1F_=" X-eGroups-Remote-IP: 195.212.29.154 From: Julian Satran Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran Garth Gibson wrote on 19/12/2003 00:37:50: > Thanks Dave. I agree. Lets refine the proxying issues: Legacy, > strict, functional and recovery proxying. > > [1.1.0 Legacy proxying]: an NFS-v4.x server must be able to execute the > full NFS-v4.0 or NFS-v4.1 protocol. > > I think Dave has given the case for this strongly. I do not see any > case against this. > > ------------------------------------------- > > [1.1.1 Strict proxying]: does an NFS-v4.x server have to be able to > execute exactly the wire packet that an NFS-v4.x client might have sent > to a SBC/OSD/NFS data server? > > This captures the notion that a metadata server must also be a > store-and-forward proxy for every data server it manages. It requires > NFS-v4.x servers implement SCSI SBC over FC, if their data servers > implement it; and the same for objects and files. > > This only makes sense to me for NFS data servers. And it is not what I > intended in my prior summary, although it is a relevant question. I > would say that pNFS requirements not require Strict Proxying. > Agree > ------------------------------------------- > > [1.1.2 Functional proxying]: a file transformation achievable by an > NFS-v4.x client using a set of data server operations must be a > equivalently achievable using a (probably different) set of NFS-v4.x > server operations > > This is the topic I intended to address in the last email. I believe > Dave is arguing that even with metadata servers that do not have access > to their data servers, the vendor of such a metadata server can > construct a proprietary protocol for the metadata server to (strict) > proxy data server accesses through clients that do have data server > access. I am not comfortable making up a counter to this, so I exhort > those that want a metadata server without data server access to speak > up if they disagree. > > > On one hand, some suggest that a set of out-of-band clients should not > > have to also have a data path through the NFSv4 metadata server. One > > reason is that customers may not tolerate the large variability in > > performance between out-of-band (when the going is good) and in-band > > (when the server chooses not to grant or to take away a delegation) > > accesses. Another reason, and I paraphrase someone else here, is that > > it is possible to construct out-of-band metadata servers that do not > > have access to the data servers except through the clients -- I > > encourage the source of this scenario to replace my paraphrasing with > > a correct use case, because I find it odd to design for file servers > > that do not have access to the data servers. > > > > On the other hand, others have suggested that any access or work that > > a client can do out-of-band should be possible with one or more > > commands applied to the metadata server's data path. This has been > > proposed for coping with recalled delegations, including concurrent > > writing by multiple clients; retry after client access errors, > > provided adequate idempotency of out-of-band operations; and many > > alternative implementations of out-of-band clients, including legacy > > clients that use out-of-band never or rarely. > > > > I think this is a topic that should be argued one way or the other in > > the requirements document. Use cases and examples in other systems > > would be best. > I guess that proxying through a client should be recomended but not mandated. We might the want to find how to do it while respecting restrictions removed the metadata server from the path. > ------------------------------------------- > > [1.1.3 Recovery proxying]: a file transformation begun by an NFS-v4.x > client using a set of data server operations, but interrupted before > completion, must be equivalently completable using a (probably > different) set of NFS-v4.x server operations > > Some have suggested that having this property will greatly simplify the > amount of spec that is devoted to out-of-band error recovery. Others > have commented that a simple way to achieve this would be to require > that all operations on data servers should be idempotent. > > ------------------------------------------- > > garth > > > On Thursday, December 18, 2003, at 12:21 PM, Noveck, Dave wrote: > > > Good summary. > > > > I want to address the "proxying" issue. > > > >> [1.1 Proxying]: Operations/work that can only be done out-of-band vs > >> alternative access through the NFSv4 server for all operations/work > > > > If you are talking about operations in the extension (let's call it > > NFS-v4.x), that are not in the previous minor version (let's assume > > that is nfs-v4.1), then you have a choice of whether these are > > supported > > for access through the server, or only for access by the client with > > the > > data server. Let's call this the issue of proxying in the strict > > sense. > > > > There is another issue that people are calling "proxying" but is really > > logically distinct. That is the issue of access by the previous minor > > version, e.g. nfs-v4.0 or nfs-v4.1. Those versions have no concept of > > separate data servers and they need to be able to work. End of story. > > If you can't read files stored in nfs-v4.x with nfs-v4.0, you do not > > have a minor version without proxying. You don't have a minor version > > at all. I believe the working group is never going to accept that. > > Even if I'm wrong and you can get the working group to accept that, > > it is going to be very contentious and thus take up a lot of time. > > Anybody, who really wants to go down this path should seriously > > consider > > the trade-off between supporting something they find objectionable and > > getting a standard a lot later, if at all. > > > >> On one hand, some suggest that a set of out-of-band clients should not > >> have to also have a data path through the NFSv4 metadata server. One > >> reason is that customers may not tolerate the large variability in > >> performance between out-of-band (when the going is good) and in-band > >> (when the server chooses not to grant or to take away a delegation) > >> accesses. > > > > Then such customers will use clients that access things out-of-band > > whenever possible, and servers that never refuse to give out layout > > delegations. You have a number of quality-of-implementations issues > > for v4.x clients and servers. If a particular client only supports > > access via v4.0, then performance will suck, and the working group > > will understand that, but it won't accept not being able to use > > v4.0 at all. The customer is going to be motivated to upgrade his > > clients for those that need high-performance access, but he may be > > OK with some clients using v4.0 for a long time, depending on the > > particular performance those clients need. (And some will want v2/v3 > > access but that is a matter that the working group has no say about). > > > >> Another reason, and I paraphrase someone else here, is that > >> it is possible to construct out-of-band metadata servers that do not > >> have access to the data servers except through the clients -- I > >> encourage the source of this scenario to replace my paraphrasing with > >> a > >> correct use case, because I find it odd to design for file servers > >> that > >> do not have access to the data servers. > > > > So let's grant that it is possible (and we'll pass over the issue of > > whether it is desirable, and in fact so desirable that one is willing > > to > > not get a standard and or get it much later). > > > > So we have a metadata server and it, for whatever reason, does not have > > access to the data servers. However, by hypothesis, there are machines > > (e.g. clients), that can communicate with both. So, if one has such an > > architecture, then one can take such a machine, give it a > > communication path > > to the meta-data server and the data server and have the meta-data > > server > > transfer v4.0 READ requests to it, let it read the data from the data > > server and send it back to the meta-data server who send it back to the > > original requestor. Is that a very good solution? No. Is it likely > > to be performant? No. Will it satisfy any particular customer? I > > don't > > know and that is the implementer's business decision. Will it satisfy > > the hypothetical customer who doesn't care about v4.0 access? Clearly. > > Will it satisfy the v4 working group? Yes, because they are not in the > > business of telling you how performant v4.0 access has got to be. > > > >> On the other hand, others have suggested that any access or work that > >> a > >> client can do out-of-band should be possible with one or more commands > >> applied to the metadata server's data path. This has been proposed > >> for > >> coping with recalled delegations, including concurrent writing by > >> multiple clients; retry after client access errors, provided adequate > >> idempotency of out-of-band operations; and many alternative > >> implementations of out-of-band clients, including legacy clients that > >> use out-of-band never or rarely. > > > > This effort is going to take a while, but if we manage it correctly, it > > is not going to take so long that v3 clients are going to be rare > > things, > > and they have to be supported. But v3 clients are not an issue for the > > working group. V4.0 clients are and they will be around and you will > > have to support them, and I believe the working group is not going to > > be disposed to cut you a lot of slack on this issue (and I don't see > > why it should). > > > >> I think this is a topic that should be argued one way or the other in > >> the requirements document. Use cases and examples in other systems > >> would be best. > > > > I think the requirement should be that this work should be done as a > > set of extensions to nfs-v4 delivered as a v4 minor version. If there > > is some feature/requirement that conflicts with that model (and it is a > > pretty flexible one), then you have to think long and hard before > > deciding > > that that requirement is more important than this basic deivery > > vehicle, > > because it seems to me that it is, in almost all respects, the ideal > > way > > to make this sort of technology available for widespread use. > > > > > > > > > > > > > > To unsubscribe from this group, send an email to: > > pnfs-ops-unsubscribe@yahoogroups.com > > > > > > > > Yahoo! Groups Links > > > > To visit your group on the web, go to: > > http://groups.yahoo.com/group/pnfs-ops/ > > > > To unsubscribe from this group, send an email to: > > pnfs-ops-unsubscribe@yahoogroups.com > > > > Your use of Yahoo! Groups is subject to: > > http://docs.yahoo.com/info/terms/ > > > > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > From julian_satran@il.ibm.com Mon Jan 19 00:48:44 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 73823 invoked from network); 19 Jan 2004 08:48:44 -0000 Received: from unknown (66.218.66.217) by m13.grp.scd.yahoo.com with QMQP; 19 Jan 2004 08:48:44 -0000 Received: from unknown (HELO mtagate7.de.ibm.com) (195.212.29.156) by mta2.grp.scd.yahoo.com with SMTP; 19 Jan 2004 08:48:42 -0000 Received: from d12relay01.megacenter.de.ibm.com (d12relay01.megacenter.de.ibm.com [9.149.165.180]) by mtagate7.de.ibm.com (8.12.10/8.12.10) with ESMTP id i0J8mVRm094584; Mon, 19 Jan 2004 08:48:31 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay01.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i0J8mTJY231014; Mon, 19 Jan 2004 09:48:30 +0100 In-Reply-To: <20040105182738.07168207D3@citi.umich.edu> To: pnfs-reqs@yahoogroups.com Cc: andros@citi.umich.edu, pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Mon, 19 Jan 2004 00:48:25 -0800 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 19/01/2004 10:48:29, Serialize complete at 19/01/2004 10:48:29 Content-Type: multipart/alternative; boundary="=_alternative 006221DBC2256E1F_=" X-eGroups-Remote-IP: 195.212.29.156 From: Julian Satran Subject: Re: [pnfs-reqs] Re: [pnfs-ops] why not use mandatory byte-range locking X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran I think that Andy has strong arguments. Julo "William A.(Andy) Adamson" 05/01/2004 20:27 Please respond to pnfs-reqs To pnfs-ops@yahoogroups.com cc pnfs-reqs@yahoogroups.com, andros@citi.umich.edu Subject [pnfs-reqs] Re: [pnfs-ops] why not use mandatory byte-range locking > Andy Adamson wrote: > > the discussion of byte-range delegations and cache consistancy provoked this > > thought: why not use existing mandatory byte-range locking? > > > the client opens a file, requests a (mandatory) lock on the region of the file > > it's interested in. the resultant lock stateid is passed as an argument to the > > READ/WRITE_IND request. we can require a mandatory lock stateid prior to > > handing out layout maps for direct i/o. the layout map is 'good 'only for as > > long as the byte-range lock. > > One problem is that there is no way for the client to specify that he wants > a mandatory (as opposed to advisory) byte-range lock, he just asks for one > and the server gives him the type of byte-range that server is giving out > for that fs. So, if you did that, applications that relied on the semantics > (or lack of semantics) of advisory byte-range locks would break. From 3530 5.11.5. Mode Attribute ...... Note that in UNIX, if a file has the MODE4_SGID bit set and no MODE4_XGRP bit set, then READ and WRITE must use mandatory file locking. so for unix, there is a way to specify mandatory vrs advisory locking. since this is also in 3530: 8. File Locking and Share Reservations ..... These mechanisms can implement policy ranging from advisory only locking to full mandatory locking. adding a flag to a LOCK/T/U to indicate mandatory locking vrs advisory is within reason. > Another issue is that while you say "'good' only for as long as the byte-range > lock", the results of doing this are that the layout map and the data will > be fixed for at least as long as the byte-range lock exists, i.e. sometimes > too long. If I'm going to be reading directly from the data server, then > I want the layout to stay constant for a long time, or at least I don't want > to be forced to repeatedly get locks for small areas of the layout. The > obvious (and desirable) thing for me to do is to get a shared lock for the > whole file so the layout cannot change, but if we combine changes of layout > and changes of data under a single sort of lock, mandatory byte-range locks > in this case, we have stopped anybody writing in the file for a very long > time, i.e. essentially forever since my lease will normally be continually > renewed. > > When you combine a guarantee that the layout will not change with a guarantee > that the data will not change, in such a way that they can't be separated, > you artificially increase the amount of conflicts, in many cases to an > unacceptable level. perhaps i'm missing something, but isn't it the case the layout of the data and the abilty to access the data are totally bound together? the layout changes due to writes and appends (other??). if the layout changes, stale layout maps are not only no longer any good, they can lead to data corruption. it seems to me that the guarentee that the layout won't change is bound to the guarentee that the data won't change. i can't think of any conflicts such as you mention - could you give some examples? > When you have a delegation model, the problem is > excessive recalls, while when you have a locking model the problem is that > some applications will slow to a crawl/halt. > > > the mandatory lock protects the layout, so no need for layout delegations. > > mandatory locking also allows the client to cache and operate locally on the > > locked data region with cache consistancy guarentees. > > If you are going to be doing some local operation, then short-term mandatory > byte-range locking can help you. If need to do a lock/fetch/update/write/unlock > cycle on a record, this is the ticket (and in v4 lock/fetch and write/unlock > can be COMPOUND's :-). The record you hold while updating can be considered > cached for that brief period. If, however, you are caching data generally, > i.e. for a period outside the range of a short operation sequence, you are > going to need something that is delegation-like, in that if I have the > cached data and want to keep it until there is some reason to get rid of > it, i.e. it is LRU'd out or there is a conflict, then I have to have some > way of finding out that there is a conflict. Delegations do that via a > recall and one can imagine it being done other ways. But the mandatory > lock model is that I have a lock because I need it and so there is no > provision to tell me that someone else has a conflict. The logic is that > he will wait until I give the lock up, and waiting for the cached data to > be LRU'd is going to be too long in most cases. > i agree that the ability for the server to recall is a required feature. i'm simply suggesting that mandatory locking may have more features in common with what we need for pnfs than delgations, and that we could extend the existing mandatory byte-range locking model with fewer changes than extending the existing delegation model. so, how about estending the mandatory locking model with a recall mechanism? > > > we already have the byte-range locking code written. > > I only have advisory byte-range locking code written. Who has v4 mandatory > byte-range locking implemented? > > > so how far does this get > > us? does it make sense to start with the locking code instead of the > > delegation as far as extenstions? > > I think if we define some form of byte-range delegations (at least for data > and maybe for layout as well), there is going to be lots of code sharing with > an existing mandatory byte-range locking implementation. The data structures > and many of the interfaces are going to be the same. and this is really why i brought this up. a new lock type that has the features we desire (e.g. a recall mechanism) makes sense to me. > The difference is going > to be what you do about conflicts. Instead of saying to the second claimant, > "You snoozed so you lose", in some cases you have to be prepared to recall the > delegation so that, for example, an otherwise unexceptionable write can proceed. ------------------------ Yahoo! Groups Sponsor ---------------------~--> Upgrade to 128-bit SSL Security! http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM ---------------------------------------------------------------------~-> Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From julian_satran@il.ibm.com Mon Jan 19 00:49:47 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 8402 invoked from network); 19 Jan 2004 08:49:44 -0000 Received: from unknown (66.218.66.172) by m1.grp.scd.yahoo.com with QMQP; 19 Jan 2004 08:49:44 -0000 Received: from unknown (HELO mtagate3.de.ibm.com) (195.212.29.152) by mta4.grp.scd.yahoo.com with SMTP; 19 Jan 2004 08:49:43 -0000 Received: from d12relay02.megacenter.de.ibm.com (d12relay02.megacenter.de.ibm.com [9.149.165.196]) by mtagate3.de.ibm.com (8.12.10/8.12.10) with ESMTP id i0J8mPHI114860; Mon, 19 Jan 2004 08:48:25 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay02.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i0J8mNmx112370; Mon, 19 Jan 2004 09:48:24 +0100 In-Reply-To: To: pnfs-reqs@yahoogroups.com Cc: pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Mon, 19 Jan 2004 00:48:19 -0800 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 19/01/2004 10:48:23, Serialize complete at 19/01/2004 10:48:23 Content-Type: multipart/alternative; boundary="=_alternative 0060D9BEC2256E1F_=" X-eGroups-Remote-IP: 195.212.29.152 From: Julian Satran Subject: Re: [pnfs-reqs] RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran "Noveck, Dave" wrote on 30/12/2003 22:08:02: > It seems legal to me but I'm guessing that there are others that would > think differently. > > I tend to think that it is not a good idea, though. There are going > to be operations which, by their nature, are better done through the > metadata server. A two-byte write which spans multiple data servers > is an example. Another is append-writes, which have been mentioned > (by whom I don't remember just now) as a desirable v4 extension, > assuming the data to be written is of reasonable size. In each case, > we may create appropriate caching/locking primitives to allow the > operation to be done without making any request of the metadata server > that is officially denominated an "IO" request. But can you really > argue that this will be the best way for the client to do such > operations? And does it really make sense to force clients to invest > the effort in terms of the code do such operations doing the IO with > the data server only, when the performance benefit of that is going to > be small, or zero, or negative? You may wind up making as many > requests of the meta-data server with the data-server-only approach. > It's just that they won't be IO operations (but instead locking and, > in the case of append, getattr operations). > This can be argued both ways. For applications that share little and build files by append (all transaction loggers) doing them on the client is a distinct advantage. And so iit is for object storage that supports append. > In complicated protocols (and v4 is a complicated protocol and is > getting more complicated), there are going to be multiple ways of > doing the same thing, which are going to differ in their performance > characteristics. An organization can be reasonably concerned about > clients making the wrong choice, just as it is concerned about clients > that are making excessive resource demands for other reasons. There > are two issues that I am worried about in taking such a drastic > approach as simply refusing to support a valid piece of the protocol, > even if that choice is made by the server administrator. The first is > that determining the better choice depends on a lot of variables and > that a simple formula governing an option (e.g. "IO through the > metadata server is bad") is unlikely to completely match reality. The > second is that I-don't-like-your-IO-request-so-you-lose is kind of a > blunt instrument to deal with the problem. > I don't think this is a big issue or that the scenario I describe will be widely used but with Object Storage you may not have (or need to very often) a channel between the metadata server and the data servers. This partial access scheme may be maintained also in block environments or federated filers for various reasons (security may be one - you don't trust your administrator with all the data). > If you have identified some set of bad client practices, you can find > the clients doing them, report the appropriate statistics, even, if > the issue is critical, artificially give such clients (or specific > requests) bad performance in a way that doesn't hurt other clients > (unless they are waiting for the first set to do something. Sigh!), > by just delaying processing of their requests by millisecond or two. > That should be enough to preserve metadata-server bandwidth for more > worthwhile purposes. If that's insufficiently discouraging, you can > raise the delay. If you start rejecting requests because you would > have done it differently, even if you are correct, you are on the road > to creating your own sub-protocol, which is why this kind of thing is > worrying, even if legal. > > > -----Original Message----- > From: Julian Satran [mailto:julian_satran@il.ibm.com] > Sent: Monday, December 22, 2003 5:26 AM > To: pnfs-ops@yahoogroups.com > Cc: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com > Subject: RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 > > Since I raised the issue of the metadata server not having access to > all it's data servers (or at least not with adequate bandwidth) I feel > compelled to say that Dave's arguments about supporting 4.0 are > compelling enough to make it mandatory. The open issue is if it is > legal for a "compliant server" to have serving data disabled by a > local administrative function (the old "must implement but may use"). > Otherwise an organization that wants to discourage use of data serving > through the metadata server has very little it can do to enforce > policy in a way that will not affect other clients (it may do serve > poorly but this still affects other clients). > > Julo > > > "Noveck, Dave" > 18/12/2003 19:21 > > Please respond to > pnfs-ops@yahoogroups.com > > To > > , > > cc > > Subject > > RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 > > > > > Good summary. > > I want to address the "proxying" issue. > > > [1.1 Proxying]: Operations/work that can only be done out-of-band vs > > alternative access through the NFSv4 server for all operations/work > > If you are talking about operations in the extension (let's call it > NFS-v4.x), that are not in the previous minor version (let's assume > that is nfs-v4.1), then you have a choice of whether these are supported > for access through the server, or only for access by the client with the > data server. Let's call this the issue of proxying in the strict sense. > > There is another issue that people are calling "proxying" but is really > logically distinct. That is the issue of access by the previous minor > version, e.g. nfs-v4.0 or nfs-v4.1. Those versions have no concept of > separate data servers and they need to be able to work. End of story. > If you can't read files stored in nfs-v4.x with nfs-v4.0, you do not > have a minor version without proxying. You don't have a minor version > at all. I believe the working group is never going to accept that. > Even if I'm wrong and you can get the working group to accept that, > it is going to be very contentious and thus take up a lot of time. > Anybody, who really wants to go down this path should seriously consider > the trade-off between supporting something they find objectionable and > getting a standard a lot later, if at all. > > > On one hand, some suggest that a set of out-of-band clients should not > > have to also have a data path through the NFSv4 metadata server. One > > reason is that customers may not tolerate the large variability in > > performance between out-of-band (when the going is good) and in-band > > (when the server chooses not to grant or to take away a delegation) > > accesses. > > Then such customers will use clients that access things out-of-band > whenever possible, and servers that never refuse to give out layout > delegations. You have a number of quality-of-implementations issues > for v4.x clients and servers. If a particular client only supports > access via v4.0, then performance will suck, and the working group > will understand that, but it won't accept not being able to use > v4.0 at all. The customer is going to be motivated to upgrade his > clients for those that need high-performance access, but he may be > OK with some clients using v4.0 for a long time, depending on the > particular performance those clients need. (And some will want v2/v3 > access but that is a matter that the working group has no say about). > > > Another reason, and I paraphrase someone else here, is that > > it is possible to construct out-of-band metadata servers that do not > > have access to the data servers except through the clients -- I > > encourage the source of this scenario to replace my paraphrasing with a > > correct use case, because I find it odd to design for file servers that > > do not have access to the data servers. > > So let's grant that it is possible (and we'll pass over the issue of > whether it is desirable, and in fact so desirable that one is willing to > not get a standard and or get it much later). > > So we have a metadata server and it, for whatever reason, does not have > access to the data servers. However, by hypothesis, there are machines > (e.g. clients), that can communicate with both. So, if one has such an > architecture, then one can take such a machine, give it a communication path > to the meta-data server and the data server and have the meta-data server > transfer v4.0 READ requests to it, let it read the data from the data > server and send it back to the meta-data server who send it back to the > original requestor. Is that a very good solution? No. Is it likely > to be performant? No. Will it satisfy any particular customer? I don't > know and that is the implementer's business decision. Will it satisfy > the hypothetical customer who doesn't care about v4.0 access? Clearly. > Will it satisfy the v4 working group? Yes, because they are not in the > business of telling you how performant v4.0 access has got to be. > > > On the other hand, others have suggested that any access or work that a > > client can do out-of-band should be possible with one or more commands > > applied to the metadata server's data path. This has been proposed for > > coping with recalled delegations, including concurrent writing by > > multiple clients; retry after client access errors, provided adequate > > idempotency of out-of-band operations; and many alternative > > implementations of out-of-band clients, including legacy clients that > > use out-of-band never or rarely. > > This effort is going to take a while, but if we manage it correctly, it > is not going to take so long that v3 clients are going to be rare things, > and they have to be supported. But v3 clients are not an issue for the > working group. V4.0 clients are and they will be around and you will > have to support them, and I believe the working group is not going to > be disposed to cut you a lot of slack on this issue (and I don't see > why it should). > > > I think this is a topic that should be argued one way or the other in > > the requirements document. Use cases and examples in other systems > > would be best. > > I think the requirement should be that this work should be done as a > set of extensions to nfs-v4 delivered as a v4 minor version. If there > is some feature/requirement that conflicts with that model (and it is a > pretty flexible one), then you have to think long and hard before deciding > that that requirement is more important than this basic deivery vehicle, > because it seems to me that it is, in almost all respects, the ideal way > to make this sort of technology available for widespread use. > > > > > > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > > > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > > > > > Yahoo! Groups Links > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. > > > Yahoo! Groups Links > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From pcorbett@netapp.com Mon Jan 19 11:52:20 2004 Return-Path: X-Sender: Peter.Corbett@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 13972 invoked from network); 19 Jan 2004 19:52:19 -0000 Received: from unknown (66.218.66.218) by m19.grp.scd.yahoo.com with QMQP; 19 Jan 2004 19:52:19 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta3.grp.scd.yahoo.com with SMTP; 19 Jan 2004 19:52:19 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0JJprKw007257 for ; Mon, 19 Jan 2004 11:51:53 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0JJprpr011490 for ; Mon, 19 Jan 2004 11:51:53 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3DEC5.AEA79DEC" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Mon, 19 Jan 2004 11:51:51 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying Thread-Index: AcPeaQdFnqO5JJSFTaWLR7tAnrV/QgAXHQqQ To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Corbett, Peter" From: "Corbett, Peter" Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying X-Yahoo-Group-Post: member; u=44152959 X-Yahoo-Profile: pfcorbett2004 Julian, For some reason, your messages always come to me in a microscopic font that I'm finding harder and harder to read. I don't know if this is the case for all recipients, or it is peculiar to my client. I am using Outlook. I've never seen it on mail from anybody else. Peter -----Original Message----- From: Julian Satran [mailto:julian_satran@il.ibm.com] Sent: Monday, January 19, 2004 3:48 AM To: pnfs-reqs@yahoogroups.com Cc: pNFS Operations; pNFS Requirements Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying Garth Gibson wrote on 19/12/2003 00:37:50: > Thanks Dave. I agree. Lets refine the proxying issues: Legacy, > strict, functional and recovery proxying. > > [1.1.0 Legacy proxying]: an NFS-v4.x server must be able to execute the > full NFS-v4.0 or NFS-v4.1 protocol. > > I think Dave has given the case for this strongly. I do not see any > case against this. > > ------------------------------------------- > > [1.1.1 Strict proxying]: does an NFS-v4.x server have to be able to > execute exactly the wire packet that an NFS-v4.x client might have sent > to a SBC/OSD/NFS data server? > > This captures the notion that a metadata server must also be a > store-and-forward proxy for every data server it manages. It requires > NFS-v4.x servers implement SCSI SBC over FC, if their data servers > implement it; and the same for objects and files. > > This only makes sense to me for NFS data servers. And it is not what I > intended in my prior summary, although it is a relevant question. I > would say that pNFS requirements not require Strict Proxying. > Agree > ------------------------------------------- > > [1.1.2 Functional proxying]: a file transformation achievable by an > NFS-v4.x client using a set of data server operations must be a > equivalently achievable using a (probably different) set of NFS-v4.x > server operations > > This is the topic I intended to address in the last email. I believe > Dave is arguing that even with metadata servers that do not have access > to their data servers, the vendor of such a metadata server can > construct a proprietary protocol for the metadata server to (strict) > proxy data server accesses through clients that do have data server > access. I am not comfortable making up a counter to this, so I exhort > those that want a metadata server without data server access to speak > up if they disagree. > > > On one hand, some suggest that a set of out-of-band clients should not > > have to also have a data path through the NFSv4 metadata server. One > > reason is that customers may not tolerate the large variability in > > performance between out-of-band (when the going is good) and in-band > > (when the server chooses not to grant or to take away a delegation) > > accesses. Another reason, and I paraphrase someone else here, is that > > it is possible to construct out-of-band metadata servers that do not > > have access to the data servers except through the clients -- I > > encourage the source of this scenario to replace my paraphrasing with > > a correct use case, because I find it odd to design for file servers > > that do not have access to the data servers. > > > > On the other hand, others have suggested that any access or work that > > a client can do out-of-band should be possible with one or more > > commands applied to the metadata server's data path. This has been > > proposed for coping with recalled delegations, including concurrent > > writing by multiple clients; retry after client access errors, > > provided adequate idempotency of out-of-band operations; and many > > alternative implementations of out-of-band clients, including legacy > > clients that use out-of-band never or rarely. > > > > I think this is a topic that should be argued one way or the other in > > the requirements document. Use cases and examples in other systems > > would be best. > I guess that proxying through a client should be recomended but not mandated. We might the want to find how to do it while respecting restrictions removed the metadata server from the path. > ------------------------------------------- > > [1.1.3 Recovery proxying]: a file transformation begun by an NFS-v4.x > client using a set of data server operations, but interrupted before > completion, must be equivalently completable using a (probably > different) set of NFS-v4.x server operations > > Some have suggested that having this property will greatly simplify the > amount of spec that is devoted to out-of-band error recovery. Others > have commented that a simple way to achieve this would be to require > that all operations on data servers should be idempotent. > > ------------------------------------------- > > garth > > > On Thursday, December 18, 2003, at 12:21 PM, Noveck, Dave wrote: > > > Good summary. > > > > I want to address the "proxying" issue. > > > >> [1.1 Proxying]: Operations/work that can only be done out-of-band vs > >> alternative access through the NFSv4 server for all operations/work > > > > If you are talking about operations in the extension (let's call it > > NFS-v4.x), that are not in the previous minor version (let's assume > > that is nfs-v4.1), then you have a choice of whether these are > > supported > > for access through the server, or only for access by the client with > > the > > data server. Let's call this the issue of proxying in the strict > > sense. > > > > There is another issue that people are calling "proxying" but is really > > logically distinct. That is the issue of access by the previous minor > > version, e.g. nfs-v4.0 or nfs-v4.1. Those versions have no concept of > > separate data servers and they need to be able to work. End of story. > > If you can't read files stored in nfs-v4.x with nfs-v4.0, you do not > > have a minor version without proxying. You don't have a minor version > > at all. I believe the working group is never going to accept that. > > Even if I'm wrong and you can get the working group to accept that, > > it is going to be very contentious and thus take up a lot of time. > > Anybody, who really wants to go down this path should seriously > > consider > > the trade-off between supporting something they find objectionable and > > getting a standard a lot later, if at all. > > > >> On one hand, some suggest that a set of out-of-band clients should not > >> have to also have a data path through the NFSv4 metadata server. One > >> reason is that customers may not tolerate the large variability in > >> performance between out-of-band (when the going is good) and in-band > >> (when the server chooses not to grant or to take away a delegation) > >> accesses. > > > > Then such customers will use clients that access things out-of-band > > whenever possible, and servers that never refuse to give out layout > > delegations. You have a number of quality-of-implementations issues > > for v4.x clients and servers. If a particular client only supports > > access via v4.0, then performance will suck, and the working group > > will understand that, but it won't accept not being able to use > > v4.0 at all. The customer is going to be motivated to upgrade his > > clients for those that need high-performance access, but he may be > > OK with some clients using v4.0 for a long time, depending on the > > particular performance those clients need. (And some will want v2/v3 > > access but that is a matter that the working group has no say about). > > > >> Another reason, and I paraphrase someone else here, is that > >> it is possible to construct out-of-band metadata servers that do not > >> have access to the data servers except through the clients -- I > >> encourage the source of this scenario to replace my paraphrasing with > >> a > >> correct use case, because I find it odd to design for file servers > >> that > >> do not have access to the data servers. > > > > So let's grant that it is possible (and we'll pass over the issue of > > whether it is desirable, and in fact so desirable that one is willing > > to > > not get a standard and or get it much later). > > > > So we have a metadata server and it, for whatever reason, does not have > > access to the data servers. However, by hypothesis, there are machines > > (e.g. clients), that can communicate with both. So, if one has such an > > architecture, then one can take such a machine, give it a > > communication path > > to the meta-data server and the data server and have the meta-data > > server > > transfer v4.0 READ requests to it, let it read the data from the data > > server and send it back to the meta-data server who send it back to the > > original requestor. Is that a very good solution? No. Is it likely > > to be performant? No. Will it satisfy any particular customer? I > > don't > > know and that is the implementer's business decision. Will it satisfy > > the hypothetical customer who doesn't care about v4.0 access? Clearly. > > Will it satisfy the v4 working group? Yes, because they are not in the > > business of telling you how performant v4.0 access has got to be. > > > >> On the other hand, others have suggested that any access or work that > >> a > >> client can do out-of-band should be possible with one or more commands > >> applied to the metadata server's data path. This has been proposed > >> for > >> coping with recalled delegations, including concurrent writing by > >> multiple clients; retry after client access errors, provided adequate > >> idempotency of out-of-band operations; and many alternative > >> implementations of out-of-band clients, including legacy clients that > >> use out-of-band never or rarely. > > > > This effort is going to take a while, but if we manage it correctly, it > > is not going to take so long that v3 clients are going to be rare > > things, > > and they have to be supported. But v3 clients are not an issue for the > > working group. V4.0 clients are and they will be around and you will > > have to support them, and I believe the working group is not going to > > be disposed to cut you a lot of slack on this issue (and I don't see > > why it should). > > > >> I think this is a topic that should be argued one way or the other in > >> the requirements document. Use cases and examples in other systems > >> would be best. > > > > I think the requirement should be that this work should be done as a > > set of extensions to nfs-v4 delivered as a v4 minor version. If there > > is some feature/requirement that conflicts with that model (and it is a > > pretty flexible one), then you have to think long and hard before > > deciding > > that that requirement is more important than this basic deivery > > vehicle, > > because it seems to me that it is, in almost all respects, the ideal > > way > > to make this sort of technology available for widespread use. > > > > > > > > > > > > > > To unsubscribe from this group, send an email to: > > pnfs-ops-unsubscribe@yahoogroups.com > > > > > > > > Yahoo! Groups Links > > > > To visit your group on the web, go to: > > http://groups.yahoo.com/group/pnfs-ops/ > > > > To unsubscribe from this group, send an email to: > > pnfs-ops-unsubscribe@yahoogroups.com > > > > Your use of Yahoo! Groups is subject to: > > http://docs.yahoo.com/info/terms/ > > > > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ * To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From dnoveck@netapp.com Mon Jan 19 13:16:06 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 45217 invoked from network); 19 Jan 2004 21:16:04 -0000 Received: from unknown (66.218.66.216) by m13.grp.scd.yahoo.com with QMQP; 19 Jan 2004 21:16:04 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta1.grp.scd.yahoo.com with SMTP; 19 Jan 2004 21:16:04 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0JLFdKw019639 for ; Mon, 19 Jan 2004 13:15:39 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0JLFcSR029819 for ; Mon, 19 Jan 2004 13:15:38 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3DED1.62082C70" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Mon, 19 Jan 2004 13:15:37 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying Thread-Index: AcPeaQdFnqO5JJSFTaWLR7tAnrV/QgAXHQqQAAKPEdA= To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck That font is quite annoying. Interesting fact: when I was replying to one of Julian's recent messages, I copied and pasted some of this message, anticipating that I would have to chnge it to a reasonable-size font, but what happened when I pasted it was that the part where Julian had quoted me, which was in the microscopic font, when pasted it was in what looked like a nrmal sized courier font, while the stuff that Julian had written himself was still in the microscpic font. The other thing was that Outlook when you put the cursor somewhere, normally changes the font indication up top so you can find out what font something is, but not here. It always said Courier 10pt, even in Julian's text which was clearly smaller than that. And now the wierdest part!! If I put the cursor in a paragraph that I originally wrote (and Julian incorporated with >'s and looks like it is a reasonable size), as soon as I type a single character, the whole paragraph instantly switches to Julian's microscopic font! and no it doesn't go back when I delete that character, but if cut and past that paragraph it does go back to a reasonable size. -----Original Message----- From: Corbett, Peter Sent: Monday, January 19, 2004 2:52 PM To: pnfs-reqs@yahoogroups.com Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying Julian, For some reason, your messages always come to me in a microscopic font that I'm finding harder and harder to read. I don't know if this is the case for all recipients, or it is peculiar to my client. I am using Outlook. I've never seen it on mail from anybody else. Peter -----Original Message----- From: Julian Satran [mailto:julian_satran@il.ibm.com] Sent: Monday, January 19, 2004 3:48 AM To: pnfs-reqs@yahoogroups.com Cc: pNFS Operations; pNFS Requirements Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying Garth Gibson wrote on 19/12/2003 00:37:50: > Thanks Dave. I agree. Lets refine the proxying issues: Legacy, > strict, functional and recovery proxying. > > [1.1.0 Legacy proxying]: an NFS-v4.x server must be able to execute the > full NFS-v4.0 or NFS-v4.1 protocol. > > I think Dave has given the case for this strongly. I do not see any > case against this. > > ------------------------------------------- > > [1.1.1 Strict proxying]: does an NFS-v4.x server have to be able to > execute exactly the wire packet that an NFS-v4.x client might have sent > to a SBC/OSD/NFS data server? > > This captures the notion that a metadata server must also be a > store-and-forward proxy for every data server it manages. It requires > NFS-v4.x servers implement SCSI SBC over FC, if their data servers > implement it; and the same for objects and files. > > This only makes sense to me for NFS data servers. And it is not what I > intended in my prior summary, although it is a relevant question. I > would say that pNFS requirements not require Strict Proxying. > Agree > ------------------------------------------- > > [1.1.2 Functional proxying]: a file transformation achievable by an > NFS-v4.x client using a set of data server operations must be a > equivalently achievable using a (probably different) set of NFS-v4.x > server operations > > This is the topic I intended to address in the last email. I believe > Dave is arguing that even with metadata servers that do not have access > to their data servers, the vendor of such a metadata server can > construct a proprietary protocol for the metadata server to (strict) > proxy data server accesses through clients that do have data server > access. I am not comfortable making up a counter to this, so I exhort > those that want a metadata server without data server access to speak > up if they disagree. > > > On one hand, some suggest that a set of out-of-band clients should not > > have to also have a data path through the NFSv4 metadata server. One > > reason is that customers may not tolerate the large variability in > > performance between out-of-band (when the going is good) and in-band > > (when the server chooses not to grant or to take away a delegation) > > accesses. Another reason, and I paraphrase someone else here, is that > > it is possible to construct out-of-band metadata servers that do not > > have access to the data servers except through the clients -- I > > encourage the source of this scenario to replace my paraphrasing with > > a correct use case, because I find it odd to design for file servers > > that do not have access to the data servers. > > > > On the other hand, others have suggested that any access or work that > > a client can do out-of-band should be possible with one or more > > commands applied to the metadata server's data path. This has been > > proposed for coping with recalled delegations, including concurrent > > writing by multiple clients; retry after client access errors, > > provided adequate idempotency of out-of-band operations; and many > > alternative implementations of out-of-band clients, including legacy > > clients that use out-of-band never or rarely. > > > > I think this is a topic that should be argued one way or the other in > > the requirements document. Use cases and examples in other systems > > would be best. > I guess that proxying through a client should be recomended but not mandated. We might the want to find how to do it while respecting restrictions removed the metadata server from the path. > ------------------------------------------- > > [1.1.3 Recovery proxying]: a file transformation begun by an NFS-v4.x > client using a set of data server operations, but interrupted before > completion, must be equivalently completable using a (probably > different) set of NFS-v4.x server operations > > Some have suggested that having this property will greatly simplify the > amount of spec that is devoted to out-of-band error recovery. Others > have commented that a simple way to achieve this would be to require > that all operations on data servers should be idempotent. > > ------------------------------------------- > > garth > > > On Thursday, December 18, 2003, at 12:21 PM, Noveck, Dave wrote: > > > Good summary. > > > > I want to address the "proxying" issue. > > > >> [1.1 Proxying]: Operations/work that can only be done out-of-band vs > >> alternative access through the NFSv4 server for all operations/work > > > > If you are talking about operations in the extension (let's call it > > NFS-v4.x), that are not in the previous minor version (let's assume > > that is nfs-v4.1), then you have a choice of whether these are > > supported > > for access through the server, or only for access by the client with > > the > > data server. Let's call this the issue of proxying in the strict > > sense. > > > > There is another issue that people are calling "proxying" but is really > > logically distinct. That is the issue of access by the previous minor > > version, e.g. nfs-v4.0 or nfs-v4.1. Those versions have no concept of > > separate data servers and they need to be able to work. End of story. > > If you can't read files stored in nfs-v4.x with nfs-v4.0, you do not > > have a minor version without proxying. You don't have a minor version > > at all. I believe the working group is never going to accept that. > > Even if I'm wrong and you can get the working group to accept that, > > it is going to be very contentious and thus take up a lot of time. > > Anybody, who really wants to go down this path should seriously > > consider > > the trade-off between supporting something they find objectionable and > > getting a standard a lot later, if at all. > > > >> On one hand, some suggest that a set of out-of-band clients should not > >> have to also have a data path through the NFSv4 metadata server. One > >> reason is that customers may not tolerate the large variability in > >> performance between out-of-band (when the going is good) and in-band > >> (when the server chooses not to grant or to take away a delegation) > >> accesses. > > > > Then such customers will use clients that access things out-of-band > > whenever possible, and servers that never refuse to give out layout > > delegations. You have a number of quality-of-implementations issues > > for v4.x clients and servers. If a particular client only supports > > access via v4.0, then performance will suck, and the working group > > will understand that, but it won't accept not being able to use > > v4.0 at all. The customer is going to be motivated to upgrade his > > clients for those that need high-performance access, but he may be > > OK with some clients using v4.0 for a long time, depending on the > > particular performance those clients need. (And some will want v2/v3 > > access but that is a matter that the working group has no say about). > > > >> Another reason, and I paraphrase someone else here, is that > >> it is possible to construct out-of-band metadata servers that do not > >> have access to the data servers except through the clients -- I > >> encourage the source of this scenario to replace my paraphrasing with > >> a > >> correct use case, because I find it odd to design for file servers > >> that > >> do not have access to the data servers. > > > > So let's grant that it is possible (and we'll pass over the issue of > > whether it is desirable, and in fact so desirable that one is willing > > to > > not get a standard and or get it much later). > > > > So we have a metadata server and it, for whatever reason, does not have > > access to the data servers. However, by hypothesis, there are machines > > (e.g. clients), that can communicate with both. So, if one has such an > > architecture, then one can take such a machine, give it a > > communication path > > to the meta-data server and the data server and have the meta-data > > server > > transfer v4.0 READ requests to it, let it read the data from the data > > server and send it back to the meta-data server who send it back to the > > original requestor. Is that a very good solution? No. Is it likely > > to be performant? No. Will it satisfy any particular customer? I > > don't > > know and that is the implementer's business decision. Will it satisfy > > the hypothetical customer who doesn't care about v4.0 access? Clearly. > > Will it satisfy the v4 working group? Yes, because they are not in the > > business of telling you how performant v4.0 access has got to be. > > > >> On the other hand, others have suggested that any access or work that > >> a > >> client can do out-of-band should be possible with one or more commands > >> applied to the metadata server's data path. This has been proposed > >> for > >> coping with recalled delegations, including concurrent writing by > >> multiple clients; retry after client access errors, provided adequate > >> idempotency of out-of-band operations; and many alternative > >> implementations of out-of-band clients, including legacy clients that > >> use out-of-band never or rarely. > > > > This effort is going to take a while, but if we manage it correctly, it > > is not going to take so long that v3 clients are going to be rare > > things, > > and they have to be supported. But v3 clients are not an issue for the > > working group. V4.0 clients are and they will be around and you will > > have to support them, and I believe the working group is not going to > > be disposed to cut you a lot of slack on this issue (and I don't see > > why it should). > > > >> I think this is a topic that should be argued one way or the other in > >> the requirements document. Use cases and examples in other systems > >> would be best. > > > > I think the requirement should be that this work should be done as a > > set of extensions to nfs-v4 delivered as a v4 minor version. If there > > is some feature/requirement that conflicts with that model (and it is a > > pretty flexible one), then you have to think long and hard before > > deciding > > that that requirement is more important than this basic deivery > > vehicle, > > because it seems to me that it is, in almost all respects, the ideal > > way > > to make this sort of technology available for widespread use. > > > > > > > > > > > > > > To unsubscribe from this group, send an email to: > > pnfs-ops-unsubscribe@yahoogroups.com > > > > > > > > Yahoo! Groups Links > > > > To visit your group on the web, go to: > > http://groups.yahoo.com/group/pnfs-ops/ > > > > To unsubscribe from this group, send an email to: > > pnfs-ops-unsubscribe@yahoogroups.com > > > > Your use of Yahoo! Groups is subject to: > > http://docs.yahoo.com/info/terms/ > > > > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ * To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ * To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From julian_satran@il.ibm.com Mon Jan 19 14:11:59 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 80152 invoked from network); 19 Jan 2004 22:11:59 -0000 Received: from unknown (66.218.66.167) by m5.grp.scd.yahoo.com with QMQP; 19 Jan 2004 22:11:59 -0000 Received: from unknown (HELO mtagate3.de.ibm.com) (195.212.29.152) by mta6.grp.scd.yahoo.com with SMTP; 19 Jan 2004 22:11:58 -0000 Received: from d12relay02.megacenter.de.ibm.com (d12relay02.megacenter.de.ibm.com [9.149.165.196]) by mtagate3.de.ibm.com (8.12.10/8.12.10) with ESMTP id i0JMBuHI127844 for ; Mon, 19 Jan 2004 22:11:56 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay02.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i0JMBtmx249174 for ; Mon, 19 Jan 2004 23:11:56 +0100 In-Reply-To: To: pnfs-reqs@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Mon, 19 Jan 2004 14:11:52 -0800 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 20/01/2004 00:11:56, Serialize complete at 20/01/2004 00:11:56 Content-Type: text/plain; charset="US-ASCII" X-eGroups-Remote-IP: 195.212.29.152 From: Julian Satran Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran My humble appologies to all. I've changed my internet mail options to text only (instead of HTML and text). I hope it works better. Interestingly enough I use mozilla (thunderbird) for my private mail and never experienced this behavior. It must be a Lotus-Note-vs.-Outlook war! Julo "Noveck, Dave" 19/01/2004 13:15 Please respond to pnfs-reqs To cc Subject RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying That font is quite annoying. Interesting fact: when I was replying to one of Julian's recent messages, I copied and pasted some of this message, anticipating that I would have to chnge it to a reasonable-size font, but what happened when I pasted it was that the part where Julian had quoted me, which was in the microscopic font, when pasted it was in what looked like a nrmal sized courier font, while the stuff that Julian had written himself was still in the microscpic font. The other thing was that Outlook when you put the cursor somewhere, normally changes the font indication up top so you can find out what font something is, but not here. It always said Courier 10pt, even in Julian's text which was clearly smaller than that. And now the wierdest part!! If I put the cursor in a paragraph that I originally wrote (and Julian incorporated with >'s and looks like it is a reasonable size), as soon as I type a single character, the whole paragraph instantly switches to Julian's microscopic font! and no it doesn't go back when I delete that character, but if cut and past that paragraph it does go back to a reasonable size. -----Original Message----- From: Corbett, Peter Sent: Monday, January 19, 2004 2:52 PM To: pnfs-reqs@yahoogroups.com Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying Julian, For some reason, your messages always come to me in a microscopic font that I'm finding harder and harder to read. I don't know if this is the case for all recipients, or it is peculiar to my client. I am using Outlook. I've never seen it on mail from anybody else. Peter -----Original Message----- From: Julian Satran [mailto:julian_satran@il.ibm.com] Sent: Monday, January 19, 2004 3:48 AM To: pnfs-reqs@yahoogroups.com Cc: pNFS Operations; pNFS Requirements Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying Garth Gibson wrote on 19/12/2003 00:37:50: > Thanks Dave. I agree. Lets refine the proxying issues: Legacy, > strict, functional and recovery proxying. > > [1.1.0 Legacy proxying]: an NFS-v4.x server must be able to execute the > full NFS-v4.0 or NFS-v4.1 protocol. > > I think Dave has given the case for this strongly. I do not see any > case against this. > > ------------------------------------------- > > [1.1.1 Strict proxying]: does an NFS-v4.x server have to be able to > execute exactly the wire packet that an NFS-v4.x client might have sent > to a SBC/OSD/NFS data server? > > This captures the notion that a metadata server must also be a > store-and-forward proxy for every data server it manages. It requires > NFS-v4.x servers implement SCSI SBC over FC, if their data servers > implement it; and the same for objects and files. > > This only makes sense to me for NFS data servers. And it is not what I > intended in my prior summary, although it is a relevant question. I > would say that pNFS requirements not require Strict Proxying. > Agree > ------------------------------------------- > > [1.1.2 Functional proxying]: a file transformation achievable by an > NFS-v4.x client using a set of data server operations must be a > equivalently achievable using a (probably different) set of NFS-v4.x > server operations > > This is the topic I intended to address in the last email. I believe > Dave is arguing that even with metadata servers that do not have access > to their data servers, the vendor of such a metadata server can > construct a proprietary protocol for the metadata server to (strict) > proxy data server accesses through clients that do have data server > access. I am not comfortable making up a counter to this, so I exhort > those that want a metadata server without data server access to speak > up if they disagree. > > > On one hand, some suggest that a set of out-of-band clients should not > > have to also have a data path through the NFSv4 metadata server. One > > reason is that customers may not tolerate the large variability in > > performance between out-of-band (when the going is good) and in-band > > (when the server chooses not to grant or to take away a delegation) > > accesses. Another reason, and I paraphrase someone else here, is that > > it is possible to construct out-of-band metadata servers that do not > > have access to the data servers except through the clients -- I > > encourage the source of this scenario to replace my paraphrasing with > > a correct use case, because I find it odd to design for file servers > > that do not have access to the data servers. > > > > On the other hand, others have suggested that any access or work that > > a client can do out-of-band should be possible with one or more > > commands applied to the metadata server's data path. This has been > > proposed for coping with recalled delegations, including concurrent > > writing by multiple clients; retry after client access errors, > > provided adequate idempotency of out-of-band operations; and many > > alternative implementations of out-of-band clients, including legacy > > clients that use out-of-band never or rarely. > > > > I think this is a topic that should be argued one way or the other in > > the requirements document. Use cases and examples in other systems > > would be best. > I guess that proxying through a client should be recomended but not mandated. We might the want to find how to do it while respecting restrictions removed the metadata server from the path. > ------------------------------------------- > > [1.1.3 Recovery proxying]: a file transformation begun by an NFS-v4.x > client using a set of data server operations, but interrupted before > completion, must be equivalently completable using a (probably > different) set of NFS-v4.x server operations > > Some have suggested that having this property will greatly simplify the > amount of spec that is devoted to out-of-band error recovery. Others > have commented that a simple way to achieve this would be to require > that all operations on data servers should be idempotent. > > ------------------------------------------- > > garth > > > On Thursday, December 18, 2003, at 12:21 PM, Noveck, Dave wrote: > > > Good summary. > > > > I want to address the "proxying" issue. > > > >> [1.1 Proxying]: Operations/work that can only be done out-of-band vs > >> alternative access through the NFSv4 server for all operations/work > > > > If you are talking about operations in the extension (let's call it > > NFS-v4.x), that are not in the previous minor version (let's assume > > that is nfs-v4.1), then you have a choice of whether these are > > supported > > for access through the server, or only for access by the client with > > the > > data server. Let's call this the issue of proxying in the strict > > sense. > > > > There is another issue that people are calling "proxying" but is really > > logically distinct. That is the issue of access by the previous minor > > version, e.g. nfs-v4.0 or nfs-v4.1. Those versions have no concept of > > separate data servers and they need to be able to work. End of story. > > If you can't read files stored in nfs-v4.x with nfs-v4.0, you do not > > have a minor version without proxying. You don't have a minor version > > at all. I believe the working group is never going to accept that. > > Even if I'm wrong and you can get the working group to accept that, > > it is going to be very contentious and thus take up a lot of time. > > Anybody, who really wants to go down this path should seriously > > consider > > the trade-off between supporting something they find objectionable and > > getting a standard a lot later, if at all. > > > >> On one hand, some suggest that a set of out-of-band clients should not > >> have to also have a data path through the NFSv4 metadata server. One > >> reason is that customers may not tolerate the large variability in > >> performance between out-of-band (when the going is good) and in-band > >> (when the server chooses not to grant or to take away a delegation) > >> accesses. > > > > Then such customers will use clients that access things out-of-band > > whenever possible, and servers that never refuse to give out layout > > delegations. You have a number of quality-of-implementations issues > > for v4.x clients and servers. If a particular client only supports > > access via v4.0, then performance will suck, and the working group > > will understand that, but it won't accept not being able to use > > v4.0 at all. The customer is going to be motivated to upgrade his > > clients for those that need high-performance access, but he may be > > OK with some clients using v4.0 for a long time, depending on the > > particular performance those clients need. (And some will want v2/v3 > > access but that is a matter that the working group has no say about). > > > >> Another reason, and I paraphrase someone else here, is that > >> it is possible to construct out-of-band metadata servers that do not > >> have access to the data servers except through the clients -- I > >> encourage the source of this scenario to replace my paraphrasing with > >> a > >> correct use case, because I find it odd to design for file servers > >> that > >> do not have access to the data servers. > > > > So let's grant that it is possible (and we'll pass over the issue of > > whether it is desirable, and in fact so desirable that one is willing > > to > > not get a standard and or get it much later). > > > > So we have a metadata server and it, for whatever reason, does not have > > access to the data servers. However, by hypothesis, there are machines > > (e.g. clients), that can communicate with both. So, if one has such an > > architecture, then one can take such a machine, give it a > > communication path > > to the meta-data server and the data server and have the meta-data > > server > > transfer v4.0 READ requests to it, let it read the data from the data > > server and send it back to the meta-data server who send it back to the > > original requestor. Is that a very good solution? No. Is it likely > > to be performant? No. Will it satisfy any particular customer? I > > don't > > know and that is the implementer's business decision. Will it satisfy > > the hypothetical customer who doesn't care about v4.0 access? Clearly. > > Will it satisfy the v4 working group? Yes, because they are not in the > > business of telling you how performant v4.0 access has got to be. > > > >> On the other hand, others have suggested that any access or work that > >> a > >> client can do out-of-band should be possible with one or more commands > >> applied to the metadata server's data path. This has been proposed > >> for > >> coping with recalled delegations, including concurrent writing by > >> multiple clients; retry after client access errors, provided adequate > >> idempotency of out-of-band operations; and many alternative > >> implementations of out-of-band clients, including legacy clients that > >> use out-of-band never or rarely. > > > > This effort is going to take a while, but if we manage it correctly, it > > is not going to take so long that v3 clients are going to be rare > > things, > > and they have to be supported. But v3 clients are not an issue for the > > working group. V4.0 clients are and they will be around and you will > > have to support them, and I believe the working group is not going to > > be disposed to cut you a lot of slack on this issue (and I don't see > > why it should). > > > >> I think this is a topic that should be argued one way or the other in > >> the requirements document. Use cases and examples in other systems > >> would be best. > > > > I think the requirement should be that this work should be done as a > > set of extensions to nfs-v4 delivered as a v4 minor version. If there > > is some feature/requirement that conflicts with that model (and it is a > > pretty flexible one), then you have to think long and hard before > > deciding > > that that requirement is more important than this basic deivery > > vehicle, > > because it seems to me that it is, in almost all respects, the ideal > > way > > to make this sort of technology available for widespread use. > > > > > > > > > > > > > > To unsubscribe from this group, send an email to: > > pnfs-ops-unsubscribe@yahoogroups.com > > > > > > > > Yahoo! Groups Links > > > > To visit your group on the web, go to: > > http://groups.yahoo.com/group/pnfs-ops/ > > > > To unsubscribe from this group, send an email to: > > pnfs-ops-unsubscribe@yahoogroups.com > > > > Your use of Yahoo! Groups is subject to: > > http://docs.yahoo.com/info/terms/ > > > > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From dhildebz@eecs.umich.edu Mon Jan 19 14:13:10 2004 Return-Path: X-Sender: dhildebz@eecs.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 71211 invoked from network); 19 Jan 2004 22:13:10 -0000 Received: from unknown (66.218.66.216) by m12.grp.scd.yahoo.com with QMQP; 19 Jan 2004 22:13:10 -0000 Received: from unknown (HELO willow.eecs.umich.edu) (141.213.4.14) by mta1.grp.scd.yahoo.com with SMTP; 19 Jan 2004 22:13:10 -0000 Received: from willow.eecs.umich.edu (localhost.eecs.umich.edu [127.0.0.1]) by willow.eecs.umich.edu (8.12.10/8.12.9) with ESMTP id i0JMD8vE011664 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 19 Jan 2004 17:13:09 -0500 Received: from localhost (dhildebz@localhost) by willow.eecs.umich.edu (8.12.10/8.12.9/Submit) with ESMTP id i0JMD8wL011661; Mon, 19 Jan 2004 17:13:08 -0500 X-Authentication-Warning: willow.eecs.umich.edu: dhildebz owned process doing -bs Date: Mon, 19 Jan 2004 17:13:08 -0500 (EST) To: pNFS Requirements Cc: pNFS Operations In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-eGroups-Remote-IP: 141.213.4.14 From: Dean Hildebrand Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying X-Yahoo-Group-Post: member; u=169352062 X-Yahoo-Profile: seattleplus > > [1.1.2 Functional proxying]: a file transformation achievable by an > > NFS-v4.x client using a set of data server operations must be a > > equivalently achievable using a (probably different) set of NFS-v4.x > > server operations > > > > This is the topic I intended to address in the last email. I believe > > Dave is arguing that even with metadata servers that do not have access > > to their data servers, the vendor of such a metadata server can > > construct a proprietary protocol for the metadata server to (strict) > > proxy data server accesses through clients that do have data server > > access. I am not comfortable making up a counter to this, so I exhort > > those that want a metadata server without data server access to speak > > up if they disagree. > > > > > On one hand, some suggest that a set of out-of-band clients should not > > > > have to also have a data path through the NFSv4 metadata server. One > > > reason is that customers may not tolerate the large variability in > > > performance between out-of-band (when the going is good) and in-band > > > (when the server chooses not to grant or to take away a delegation) > > > accesses. Another reason, and I paraphrase someone else here, is that > > > > it is possible to construct out-of-band metadata servers that do not > > > have access to the data servers except through the clients -- I > > > encourage the source of this scenario to replace my paraphrasing with > > > a correct use case, because I find it odd to design for file servers > > > that do not have access to the data servers. > > > > > > On the other hand, others have suggested that any access or work that > > > a client can do out-of-band should be possible with one or more > > > commands applied to the metadata server's data path. This has been > > > proposed for coping with recalled delegations, including concurrent > > > writing by multiple clients; retry after client access errors, > > > provided adequate idempotency of out-of-band operations; and many > > > alternative implementations of out-of-band clients, including legacy > > > clients that use out-of-band never or rarely. > > > > > > I think this is a topic that should be argued one way or the other in > > > the requirements document. Use cases and examples in other systems > > > would be best. > > > > I guess that proxying through a client should be recomended but not > mandated. > We might the want to find how to do it while respecting restrictions > removed the metadata server from the path. I think relying on clients to do anything correctly is against the inherent nature of NFS. Clients in NFS are transient and cannot be trusted to do anything correctly. Therefore, the metadata server should find its own way to write data to the data servers without relying on clients. If proxying through a client is optional, it still seems orthogonal to the behavior of existing installations and the spirit of NFS. Maybe there is a valid use case someone could describe? Dean From dnoveck@netapp.com Mon Jan 19 15:23:15 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 7197 invoked from network); 19 Jan 2004 23:23:13 -0000 Received: from unknown (66.218.66.216) by m20.grp.scd.yahoo.com with QMQP; 19 Jan 2004 23:23:13 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta1.grp.scd.yahoo.com with SMTP; 19 Jan 2004 23:23:13 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0JNNDKw008007; Mon, 19 Jan 2004 15:23:13 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0JNN8ST005106; Mon, 19 Jan 2004 15:23:12 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Mon, 19 Jan 2004 15:23:08 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying Thread-Index: AcPe2XB/4kBYFENqR3OWwRD2koQeGAAByR0A To: Cc: "pNFS Operations" X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck Dean Hildebrand wrote: > I think relying on clients to do anything correctly is against the > inherent nature of NFS. Clients in NFS are transient and cannot be > trusted to do anything correctly. Therefore, the metadata server should > find its own way to write data to the data servers without relying on > clients. If proxying through a client is optional, it still seems > orthogonal to the behavior of existing installations and the spirit of > NFS. Maybe there is a valid use case someone could describe? I'm now totally confused. Before we talk about use cases for "proxying through a client", I'd like to understand what it is. My understanding is that when this discussion started, a number of people were referring to a client writing data by sending a write to the meta- data server (aka the NFS server) as "proxying", because, if your view is that the proper/best/ideal way of doing data transfer operations is to obtain mapping information and then do a write to the data server (i.e. other NFS server or object data server or SAN-connected disk), then the direct NFS write can be seen as the meta-data server acting as the client's proxy. Is my understanding correct? No matter how you come down on the quesion of the desirability of that, I don't think there any way to argue that doing a write by sending an NFS write request to an NFS server is against the inherent nature of NFS. Nor does it ask the client do anything correctly that it hasn't been doing all along. At some point the phrase "proxying through the client" was used and I realize I don't know what is meant by it. It doesn't seem to match the "proxying" that was being discussed originally. How would the client be a proxy for (presumably) the server? What am I missing? -----Original Message----- From: Dean Hildebrand [mailto:dhildebz@eecs.umich.edu] Sent: Monday, January 19, 2004 5:13 PM To: pNFS Requirements Cc: pNFS Operations Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying > > [1.1.2 Functional proxying]: a file transformation achievable by an > > NFS-v4.x client using a set of data server operations must be a > > equivalently achievable using a (probably different) set of NFS-v4.x > > server operations > > > > This is the topic I intended to address in the last email. I believe > > Dave is arguing that even with metadata servers that do not have access > > to their data servers, the vendor of such a metadata server can > > construct a proprietary protocol for the metadata server to (strict) > > proxy data server accesses through clients that do have data server > > access. I am not comfortable making up a counter to this, so I exhort > > those that want a metadata server without data server access to speak > > up if they disagree. > > > > > On one hand, some suggest that a set of out-of-band clients should not > > > > have to also have a data path through the NFSv4 metadata server. One > > > reason is that customers may not tolerate the large variability in > > > performance between out-of-band (when the going is good) and in-band > > > (when the server chooses not to grant or to take away a delegation) > > > accesses. Another reason, and I paraphrase someone else here, is that > > > > it is possible to construct out-of-band metadata servers that do not > > > have access to the data servers except through the clients -- I > > > encourage the source of this scenario to replace my paraphrasing with > > > a correct use case, because I find it odd to design for file servers > > > that do not have access to the data servers. > > > > > > On the other hand, others have suggested that any access or work that > > > a client can do out-of-band should be possible with one or more > > > commands applied to the metadata server's data path. This has been > > > proposed for coping with recalled delegations, including concurrent > > > writing by multiple clients; retry after client access errors, > > > provided adequate idempotency of out-of-band operations; and many > > > alternative implementations of out-of-band clients, including legacy > > > clients that use out-of-band never or rarely. > > > > > > I think this is a topic that should be argued one way or the other in > > > the requirements document. Use cases and examples in other systems > > > would be best. > > > > I guess that proxying through a client should be recomended but not > mandated. > We might the want to find how to do it while respecting restrictions > removed the metadata server from the path. I think relying on clients to do anything correctly is against the inherent nature of NFS. Clients in NFS are transient and cannot be trusted to do anything correctly. Therefore, the metadata server should find its own way to write data to the data servers without relying on clients. If proxying through a client is optional, it still seems orthogonal to the behavior of existing installations and the spirit of NFS. Maybe there is a valid use case someone could describe? Dean Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From bhalevy@panasas.com Mon Jan 19 15:31:01 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 48321 invoked from network); 19 Jan 2004 23:31:01 -0000 Received: from unknown (66.218.66.166) by m5.grp.scd.yahoo.com with QMQP; 19 Jan 2004 23:31:01 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta5.grp.scd.yahoo.com with SMTP; 19 Jan 2004 23:31:00 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Mon, 19 Jan 2004 18:30:58 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D3879D@PIKES.panasas.com> To: "'pnfs-reqs@yahoogroups.com'" Cc: pNFS Operations Date: Mon, 19 Jan 2004 18:30:56 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy Dave Noveck wrote: >At some point the phrase "proxying through the client" was used and I >realize I don't know what is meant by it. It doesn't seem to match >the "proxying" that was being discussed originally. How would the >client be a proxy for (presumably) the server? What am I missing? I think it was you who suggested (maybe in a rhetorical way) that when the metadata server is not capable of accessing the storage it manages it should still be able to perform I/O using a client. Maybe this created the "proxying through the client" idea... Benny >-----Original Message----- >From: Noveck, Dave [mailto:dnoveck@netapp.com] >Sent: Monday, January 19, 2004 6:23 PM >To: pnfs-reqs@yahoogroups.com >Cc: pNFS Operations >Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >Dean Hildebrand wrote: >> I think relying on clients to do anything correctly is against the >> inherent nature of NFS. Clients in NFS are transient and cannot be >> trusted to do anything correctly. Therefore, the metadata >server should >> find its own way to write data to the data servers without relying on >> clients. If proxying through a client is optional, it still seems >> orthogonal to the behavior of existing installations and the >spirit of >> NFS. Maybe there is a valid use case someone could describe? > >I'm now totally confused. Before we talk about use cases for "proxying >through a client", I'd like to understand what it is. > >My understanding is that when this discussion started, a >number of people >were referring to a client writing data by sending a write to the meta- >data server (aka the NFS server) as "proxying", because, if >your view is >that the proper/best/ideal way of doing data transfer operations is to >obtain mapping information and then do a write to the data server (i.e. >other NFS server or object data server or SAN-connected disk), then >the direct NFS write can be seen as the meta-data server acting as the >client's proxy. Is my understanding correct? > >No matter how you come down on the quesion of the desirability of that, >I don't think there any way to argue that doing a write by sending an >NFS write request to an NFS server is against the inherent nature of >NFS. Nor does it ask the client do anything correctly that it hasn't >been doing all along. > >At some point the phrase "proxying through the client" was used and I >realize I don't know what is meant by it. It doesn't seem to match >the "proxying" that was being discussed originally. How would the >client be a proxy for (presumably) the server? What am I missing? > >-----Original Message----- >From: Dean Hildebrand [mailto:dhildebz@eecs.umich.edu] >Sent: Monday, January 19, 2004 5:13 PM >To: pNFS Requirements >Cc: pNFS Operations >Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >> > [1.1.2 Functional proxying]: a file transformation >achievable by an >> > NFS-v4.x client using a set of data server operations must be a >> > equivalently achievable using a (probably different) set >of NFS-v4.x >> > server operations >> > >> > This is the topic I intended to address in the last email. > I believe >> > Dave is arguing that even with metadata servers that do >not have access >> > to their data servers, the vendor of such a metadata server can >> > construct a proprietary protocol for the metadata server >to (strict) >> > proxy data server accesses through clients that do have >data server >> > access. I am not comfortable making up a counter to this, >so I exhort >> > those that want a metadata server without data server >access to speak >> > up if they disagree. >> > >> > > On one hand, some suggest that a set of out-of-band >clients should not >> >> > > have to also have a data path through the NFSv4 metadata >server. One >> > > reason is that customers may not tolerate the large >variability in >> > > performance between out-of-band (when the going is good) >and in-band >> > > (when the server chooses not to grant or to take away a >delegation) >> > > accesses. Another reason, and I paraphrase someone else >here, is that >> >> > > it is possible to construct out-of-band metadata servers >that do not >> > > have access to the data servers except through the clients -- I >> > > encourage the source of this scenario to replace my >paraphrasing with >> > > a correct use case, because I find it odd to design for >file servers >> > > that do not have access to the data servers. >> > > >> > > On the other hand, others have suggested that any access >or work that >> > > a client can do out-of-band should be possible with one or more >> > > commands applied to the metadata server's data path. >This has been >> > > proposed for coping with recalled delegations, including >concurrent >> > > writing by multiple clients; retry after client access errors, >> > > provided adequate idempotency of out-of-band operations; >and many >> > > alternative implementations of out-of-band clients, >including legacy >> > > clients that use out-of-band never or rarely. >> > > >> > > I think this is a topic that should be argued one way or >the other in >> > > the requirements document. Use cases and examples in >other systems >> > > would be best. >> > >> >> I guess that proxying through a client should be recomended but not >> mandated. >> We might the want to find how to do it while respecting restrictions >> removed the metadata server from the path. > >I think relying on clients to do anything correctly is against the >inherent nature of NFS. Clients in NFS are transient and cannot be >trusted to do anything correctly. Therefore, the metadata >server should >find its own way to write data to the data servers without relying on >clients. If proxying through a client is optional, it still seems >orthogonal to the behavior of existing installations and the spirit of >NFS. Maybe there is a valid use case someone could describe? > >Dean > > > > > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > > > >------------------------ Yahoo! Groups Sponsor >---------------------~--> >Upgrade to 128-bit SSL Security! >http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >--------------------------------------------------------------- >------~-> > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > From dnoveck@netapp.com Mon Jan 19 15:32:39 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 75759 invoked from network); 19 Jan 2004 23:32:37 -0000 Received: from unknown (66.218.66.167) by m1.grp.scd.yahoo.com with QMQP; 19 Jan 2004 23:32:37 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta6.grp.scd.yahoo.com with SMTP; 19 Jan 2004 23:32:37 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0JNWaKw009351; Mon, 19 Jan 2004 15:32:37 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0JNWaSR007663; Mon, 19 Jan 2004 15:32:36 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3DEE4.8419B3E0" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Mon, 19 Jan 2004 15:32:34 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 Thread-Index: AcPeaTkQQwHlD+iITeOyPo50R1swWAAZbAXA To: , X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck ADVERTISEMENT Julian Satran wrote: > "Noveck, Dave" wrote on 30/12/2003 22:08:02: > > It seems legal to me but I'm guessing that there are others that would > > think differently. > > > I tend to think that it is not a good idea, though. There are going > > to be operations which, by their nature, are better done through the > > metadata server. A two-byte write which span s multiple data servers > > is an example. Another is append-writes, which have been mentioned > > (by whom I don't remember just now) as a desirable v4 extension, > > assuming the data to be written is of reasonable size. In each case, > > we may create appropriate caching/locking primitives to allow the > > operation to be done without making any request of the metadata server > > that is officially denominated an "IO" request. But can you really > > argue that this will be the best way for the client to do such > > operations? And does it really make sense to force clients to invest > > the effort in terms of the code do such operations doing the IO with > > the data server only, when the performance benefit of that is going to > > be small, or zero, or negative? You may wind up making as many > > requests of the meta-data server with the data-server-only approach. > > It's just that they won't be IO operations (but instead locking and, > > in the case of append, getattr operations). > This can be argued both ways. For applications that share little and > build files by append (all transaction loggers) doing them on the client > is a distinct advantage. If you are talking about a situation in which there is little sharing then the clients are going to have exclusive delegations for the files that they are appending to, in which case the append write feature is not really being used. The client is best advised to simply gather up its writes and write the whole file at once (or the part that it wrote until its delegation is recalled). But in either of those cases, it knows the eof of the file and can write to a specific offset and thus is not depending on the fact that you have a write-append feature. Clients that are doing those writes in this situation may indeed wind up being more efficient writing to the data server, but in other situations, where there is sharing, things will be different. This can be argued any number of ways depending on details of the implementation, and applications. The original concept here was that someone can decide in advance that using the data server is better than using the metadata server to the point that use of the metadata server to do IO can be *prohibited*. The fact that this is a complicated issue seems to me to argue strongly against that sort of approach. > And so iit is for object storage that supports append. This will depend on the details of the protocol as it evolves. I had expected that EOF would be something that is managed by the metadata server. As you point out, with object storage, it can be managed by the data server (and you could probably do the same with the parallel-file option). This gets to an important issue for this effort that we will be coming back to again and again: how much advantage to take of specific features of some data storage methods that are not shared by all. I don't see any general principle that is going to work for this all the time. We are going to have to decide on a case-by-case basis. -----Original Message----- From: Julian Satran [mailto:julian_satran@il.ibm.com] Sent: Monday, January 19, 2004 3:48 AM To: pnfs-reqs@yahoogroups.com Cc: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com Subject: Re: [pnfs-reqs] RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 "Noveck, Dave" wrote on 30/12/2003 22:08:02: > It seems legal to me but I'm guessing that there are others that would > think differently. > > I tend to think that it is not a good idea, though. There are going > to be operations which, by their nature, are better done through the > metadata server. A two-byte write which spans multiple data servers > is an example. Another is append-writes, which have been mentioned > (by whom I don't remember just now) as a desirable v4 extension, > assuming the data to be written is of reasonable size. In each case, > we may create appropriate caching/locking primitives to allow the > operation to be done without making any request of the metadata server > that is officially denominated an "IO" request. But can you really > argue that this will be the best way for the client to do such > operations? And does it really make sense to force clients to invest > the effort in terms of the code do such operations doing the IO with > the data server only, when the performance benefit of that is going to > be small, or zero, or negative? You may wind up making as many > requests of the meta-data server with the data-server-only approach. > It's just that they won't be IO operations (but instead locking and, > in the case of append, getattr operations). > This can be argued both ways. For applications that share little and build files by append (all transaction loggers) doing them on the client is a distinct advantage. And so iit is for object storage that supports append. > In complicated protocols (and v4 is a complicated protocol and is > getting more complicated), there are going to be multiple ways of > doing the same thing, which are going to differ in their performance > characteristics. An organization can be reasonably concerned about > clients making the wrong choice, just as it is concerned about clients > that are making excessive resource demands for other reasons. There > are two issues that I am worried about in taking such a drastic > approach as simply refusing to support a valid piece of the protocol, > even if that choice is made by the server administrator. The first is > that determining the better choice depends on a lot of variables and > that a simple formula governing an option (e.g. "IO through the > metadata server is bad") is unlikely to completely match reality. The > second is that I-don't-like-your-IO-request-so-you-lose is kind of a > blunt instrument to deal with the problem. > I don't think this is a big issue or that the scenario I describe will be widely used but with Object Storage you may not have (or need to very often) a channel between the metadata server and the data servers. This partial access scheme may be maintained also in block environments or federated filers for various reasons (security may be one - you don't trust your administrator with all the data). > If you have identified some set of bad client practices, you can find > the clients doing them, report the appropriate statistics, even, if > the issue is critical, artificially give such clients (or specific > requests) bad performance in a way that doesn't hurt other clients > (unless they are waiting for the first set to do something. Sigh!), > by just delaying processing of their requests by millisecond or two. > That should be enough to preserve metadata-server bandwidth for more > worthwhile purposes. If that's insufficiently discouraging, you can > raise the delay. If you start rejecting requests because you would > have done it differently, even if you are correct, you are on the road > to creating your own sub-protocol, which is why this kind of thing is > worrying, even if legal. > > > -----Original Message----- > From: Julian Satran [mailto:julian_satran@il.ibm.com] > Sent: Monday, December 22, 2003 5:26 AM > To: pnfs-ops@yahoogroups.com > Cc: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com > Subject: RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 > > Since I raised the issue of the metadata server not having access to > all it's data servers (or at least not with adequate bandwidth) I feel > compelled to say that Dave's arguments about supporting 4.0 are > compelling enough to make it mandatory. The open issue is if it is > legal for a "compliant server" to have serving data disabled by a > local administrative function (the old "must implement but may use"). > Otherwise an organization that wants to discourage use of data serving > through the metadata server has very little it can do to enforce > policy in a way that will not affect other clients (it may do serve > poorly but this still affects other clients). > > Julo > > > "Noveck, Dave" > 18/12/2003 19:21 > > Please respond to > pnfs-ops@yahoogroups.com > > To > > , > > cc > > Subject > > RE: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03 > > > > > Good summary. > > I want to address the "proxying" issue. > > > [1.1 Proxying]: Operations/work that can only be done out-of-band vs > > alternative access through the NFSv4 server for all operations/work > > If you are talking about operations in the extension (let's call it > NFS-v4.x), that are not in the previous minor version (let's assume > that is nfs-v4.1), then you have a choice of whether these are supported > for access through the server, or only for access by the client with the > data server. Let's call this the issue of proxying in the strict sense. > > There is another issue that people are calling "proxying" but is really > logically distinct. That is the issue of access by the previous minor > version, e.g. nfs-v4.0 or nfs-v4.1. Those versions have no concept of > separate data servers and they need to be able to work. End of story. > If you can't read files stored in nfs-v4.x with nfs-v4.0, you do not > have a minor version without proxying. You don't have a minor version > at all. I believe the working group is never going to accept that. > Even if I'm wrong and you can get the working group to accept that, > it is going to be very contentious and thus take up a lot of time. > Anybody, who really wants to go down this path should seriously consider > the trade-off between supporting something they find objectionable and > getting a standard a lot later, if at all. > > > On one hand, some suggest that a set of out-of-band clients should not > > have to also have a data path through the NFSv4 metadata server. One > > reason is that customers may not tolerate the large variability in > > performance between out-of-band (when the going is good) and in-band > > (when the server chooses not to grant or to take away a delegation) > > accesses. > > Then such customers will use clients that access things out-of-band > whenever possible, and servers that never refuse to give out layout > delegations. You have a number of quality-of-implementations issues > for v4.x clients and servers. If a particular client only supports > access via v4.0, then performance will suck, and the working group > will understand that, but it won't accept not being able to use > v4.0 at all. The customer is going to be motivated to upgrade his > clients for those that need high-performance access, but he may be > OK with some clients using v4.0 for a long time, depending on the > particular performance those clients need. (And some will want v2/v3 > access but that is a matter that the working group has no say about). > > > Another reason, and I paraphrase someone else here, is that > > it is possible to construct out-of-band metadata servers that do not > > have access to the data servers except through the clients -- I > > encourage the source of this scenario to replace my paraphrasing with a > > correct use case, because I find it odd to design for file servers that > > do not have access to the data servers. > > So let's grant that it is possible (and we'll pass over the issue of > whether it is desirable, and in fact so desirable that one is willing to > not get a standard and or get it much later). > > So we have a metadata server and it, for whatever reason, does not have > access to the data servers. However, by hypothesis, there are machines > (e.g. clients), that can communicate with both. So, if one has such an > architecture, then one can take such a machine, give it a communication path > to the meta-data server and the data server and have the meta-data server > transfer v4.0 READ requests to it, let it read the data from the data > server and send it back to the meta-data server who send it back to the > original requestor. Is that a very good solution? No. Is it likely > to be performant? No. Will it satisfy any particular customer? I don't > know and that is the implementer's business decision. Will it satisfy > the hypothetical customer who doesn't care about v4.0 access? Clearly. > Will it satisfy the v4 working group? Yes, because they are not in the > business of telling you how performant v4.0 access has got to be. > > > On the other hand, others have suggested that any access or work that a > > client can do out-of-band should be possible with one or more commands > > applied to the metadata server's data path. This has been proposed for > > coping with recalled delegations, including concurrent writing by > > multiple clients; retry after client access errors, provided adequate > > idempotency of out-of-band operations; and many alternative > > implementations of out-of-band clients, including legacy clients that > > use out-of-band never or rarely. > > This effort is going to take a while, but if we manage it correctly, it > is not going to take so long that v3 clients are going to be rare things, > and they have to be supported. But v3 clients are not an issue for the > working group. V4.0 clients are and they will be around and you will > have to support them, and I believe the working group is not going to > be disposed to cut you a lot of slack on this issue (and I don't see > why it should). > > > I think this is a topic that should be argued one way or the other in > > the requirements document. Use cases and examples in other systems > > would be best. > > I think the requirement should be that this work should be done as a > set of extensions to nfs-v4 delivered as a v4 minor version. If there > is some feature/requirement that conflicts with that model (and it is a > pretty flexible one), then you have to think long and hard before deciding > that that requirement is more important than this basic deivery vehicle, > because it seems to me that it is, in almost all respects, the ideal way > to make this sort of technology available for widespread use. > > > > > > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > > > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > > > > > Yahoo! Groups Links > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > > To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. > > > Yahoo! Groups Links > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ * To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From garth@panasas.com Mon Jan 19 16:15:32 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 70221 invoked from network); 20 Jan 2004 00:15:28 -0000 Received: from unknown (66.218.66.217) by m20.grp.scd.yahoo.com with QMQP; 20 Jan 2004 00:15:28 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta2.grp.scd.yahoo.com with SMTP; 20 Jan 2004 00:15:26 -0000 Received: from [172.17.2.81] ([172.17.2.81]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYKMAZ; Mon, 19 Jan 2004 19:15:24 -0500 In-Reply-To: <30489F1321F5C343ACF6872B2CF7942A05D3879D@PIKES.panasas.com> References: <30489F1321F5C343ACF6872B2CF7942A05D3879D@PIKES.panasas.com> Mime-Version: 1.0 (Apple Message framework v609) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit Cc: pNFS Operations Date: Mon, 19 Jan 2004 19:15:22 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.609) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson I think there are multiple issues being reviewed here. First, what I called Functional Proxying is what Dave Noveck understands :-) It is a safety valve for clients that don't want to, can't, or think it is slower to directly access the leaf storage server. In this case a client can always accomplish any legal file system state transformation using NFSv4 operations on the metadata server. For the other scenario, I believe it was Julian Satran who mentioned that IBM has considered situations where the metadata server does not have good access, if any, to the storage devices directly. For example, FC disks connected to all clients with FC NICs, and a metadata server connected to clients via Ethernet and not connected to FC at all. In this case, the metadata wants all, or almost all, accesses to be direct from client to storage. It was in this context that it was mentioned that a client could proxy a command from a metadata server to storage. > Since I raised the issue of the metadata server not having access to > all it's data servers (or at least not with adequate bandwidth) I feel > compelled to say that Dave's arguments about supporting 4.0 are > compelling enough to make it mandatory. The open issue is if it is > legal for a "compliant server" to have serving data disabled by a > local administrative function (the old "must implement but may use"). > Otherwise an organization that wants to discourage use of data serving > through the metadata server has very little it can do to enforce > policy in a way that will not affect other clients (it may do serve > poorly but this still affects other clients). > Julo > I guess that proxying through a client should be recomended but not > mandated. > We might the want to find how to do it while respecting restrictions > removed the metadata server from the path. > Julo My guess is that Dean is worried that a metadata server needs tighter control over storage than can be achieved by asking a client to do work on its behalf, in a trust model where clients are not trusted at the same level as servers. Man-in-the-middle security attacks come to mind very easily. Channeling for Julian, while this is a valid issue for clear text commands sent to untrusted clients, object storage has done command level digital signature things to ensure untrusted clients can't tamper, but denial of service remains a threat. My own take on this would be to say that the "client proxy" should be separated from the untrusted clients and pulled into the server trust domain, making it is logical node in the server's box. In this case the client proxying is an implementation artifact and we need not concern ourselves with it. Dave, Dean, Julian, please correct me if I am not representing your position correctly. garth ============================================================== On Jan 19, 2004, at 6:30 PM, Halevy, Benny wrote: > Dave Noveck wrote: >> At some point the phrase "proxying through the client" was used and I >> realize I don't know what is meant by it. It doesn't seem to match >> the "proxying" that was being discussed originally. How would the >> client be a proxy for (presumably) the server? What am I missing? > > I think it was you who suggested (maybe in a rhetorical way) that > when the metadata server is not capable of accessing the storage > it manages it should still be able to perform I/O using a client. > Maybe this created the "proxying through the client" idea... > > Benny > >> -----Original Message----- >> From: Noveck, Dave [mailto:dnoveck@netapp.com] >> Sent: Monday, January 19, 2004 6:23 PM >> To: pnfs-reqs@yahoogroups.com >> Cc: pNFS Operations >> Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >> 12/18/03: subtopic: proxying >> >> >> Dean Hildebrand wrote: >>> I think relying on clients to do anything correctly is against the >>> inherent nature of NFS. Clients in NFS are transient and cannot be >>> trusted to do anything correctly. Therefore, the metadata server >>> should >>> find its own way to write data to the data servers without relying on >>> clients. If proxying through a client is optional, it still seems >>> orthogonal to the behavior of existing installations and the spirit >>> of >>> NFS. Maybe there is a valid use case someone could describe? >> >> I'm now totally confused. Before we talk about use cases for >> "proxying >> through a client", I'd like to understand what it is. >> >> My understanding is that when this discussion started, a number of >> people >> were referring to a client writing data by sending a write to the >> meta- >> data server (aka the NFS server) as "proxying", because, if your view >> is >> that the proper/best/ideal way of doing data transfer operations is to >> obtain mapping information and then do a write to the data server >> (i.e. >> other NFS server or object data server or SAN-connected disk), then >> the direct NFS write can be seen as the meta-data server acting as the >> client's proxy. Is my understanding correct? >> >> No matter how you come down on the quesion of the desirability of >> that, >> I don't think there any way to argue that doing a write by sending an >> NFS write request to an NFS server is against the inherent nature of >> NFS. Nor does it ask the client do anything correctly that it hasn't >> been doing all along. >> >> At some point the phrase "proxying through the client" was used and I >> realize I don't know what is meant by it. It doesn't seem to match >> the "proxying" that was being discussed originally. How would the >> client be a proxy for (presumably) the server? What am I missing? >> >> -----Original Message----- >> From: Dean Hildebrand [mailto:dhildebz@eecs.umich.edu] >> Sent: Monday, January 19, 2004 5:13 PM >> To: pNFS Requirements >> Cc: pNFS Operations >> Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >> 12/18/03: subtopic: proxying >> >> >>>> [1.1.2 Functional proxying]: a file transformation achievable by an >>>> NFS-v4.x client using a set of data server operations must be a >>>> equivalently achievable using a (probably different) set of NFS-v4.x >>>> server operations >>>> >>>> This is the topic I intended to address in the last email. I believe >>>> Dave is arguing that even with metadata servers that do not have >>>> access >>>> to their data servers, the vendor of such a metadata server can >>>> construct a proprietary protocol for the metadata server to (strict) >>>> proxy data server accesses through clients that do have data server >>>> access. I am not comfortable making up a counter to this, so I >>>> exhort >>>> those that want a metadata server without data server access to >>>> speak >>>> up if they disagree. >>>> >>>>> On one hand, some suggest that a set of out-of-band clients should >>>>> not >>> >>>>> have to also have a data path through the NFSv4 metadata server. >>>>> One >>>>> reason is that customers may not tolerate the large variability in >>>>> performance between out-of-band (when the going is good) and >>>>> in-band >>>>> (when the server chooses not to grant or to take away a delegation) >>>>> accesses. Another reason, and I paraphrase someone else here, is >>>>> that >>> >>>>> it is possible to construct out-of-band metadata servers that do >>>>> not >>>>> have access to the data servers except through the clients -- I >>>>> encourage the source of this scenario to replace my paraphrasing >>>>> with >>>>> a correct use case, because I find it odd to design for file >>>>> servers >>>>> that do not have access to the data servers. >>>>> >>>>> On the other hand, others have suggested that any access or work >>>>> that >>>>> a client can do out-of-band should be possible with one or more >>>>> commands applied to the metadata server's data path. This has been >>>>> proposed for coping with recalled delegations, including concurrent >>>>> writing by multiple clients; retry after client access errors, >>>>> provided adequate idempotency of out-of-band operations; and many >>>>> alternative implementations of out-of-band clients, including >>>>> legacy >>>>> clients that use out-of-band never or rarely. >>>>> >>>>> I think this is a topic that should be argued one way or the other >>>>> in >>>>> the requirements document. Use cases and examples in other systems >>>>> would be best. >>>> >>> >>> I guess that proxying through a client should be recomended but not >>> mandated. >>> We might the want to find how to do it while respecting restrictions >>> removed the metadata server from the path. >> >> I think relying on clients to do anything correctly is against the >> inherent nature of NFS. Clients in NFS are transient and cannot be >> trusted to do anything correctly. Therefore, the metadata >> server should >> find its own way to write data to the data servers without relying on >> clients. If proxying through a client is optional, it still seems >> orthogonal to the behavior of existing installations and the spirit of >> NFS. Maybe there is a valid use case someone could describe? >> >> Dean >> From dnoveck@netapp.com Tue Jan 20 03:19:31 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 21140 invoked from network); 20 Jan 2004 11:19:30 -0000 Received: from unknown (66.218.66.218) by m11.grp.scd.yahoo.com with QMQP; 20 Jan 2004 11:19:30 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta3.grp.scd.yahoo.com with SMTP; 20 Jan 2004 11:19:30 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0KBJPKw001618; Tue, 20 Jan 2004 03:19:25 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0KBJPSR013716; Tue, 20 Jan 2004 03:19:25 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Tue, 20 Jan 2004 03:19:20 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying Thread-Index: AcPe5E3+ne+ylrX4RNKlmGvjXmcMYwAXzstw To: , X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck You may be right about the origin of this but I did not suggest that *NFS clients* take part when IO was done in this way. When considering a situation in which there was no direct connection between the meta-data server and the data server, I did note that there was a large set of machines that had a connection to both, making it possible/easy to provide a connection between meta-data server and data server, albeit indirect. While it is true that that large class of machines can have NFS clients running on them (and many will), I don't think it is a good idea to place the burden of effecting this communication (to help servers without a direct communications path) on the clients. This is as opposed to a server using the same hardware that a client would use to effect an indirect communication path, which seems quite reasonable to me, but does not affect the client-server protocol. In addition to the reasons that Dean cites for finding this troublesome, let me add one more. Suppose we have IO from a v4.0 client, necessitating access by the meta-data server to the data server. If that function were imposed as a requirement on v4.x clients, then how do you deal with the case in which no v4.x clients are functioning? Previous V4 minor versions should just work and making them dependent on v4.x clients is not going to fly. The server has to support v4.0 and can use the same hardware as clients and much of the same software, but effecting the necessary communication is part of the server's responsibility. -----Original Message----- From: Halevy, Benny [mailto:bhalevy@panasas.com] Sent: Monday, January 19, 2004 6:31 PM To: 'pnfs-reqs@yahoogroups.com' Cc: pNFS Operations Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/03: subtopic: proxying Dave Noveck wrote: >At some point the phrase "proxying through the client" was used and I >realize I don't know what is meant by it. It doesn't seem to match >the "proxying" that was being discussed originally. How would the >client be a proxy for (presumably) the server? What am I missing? I think it was you who suggested (maybe in a rhetorical way) that when the metadata server is not capable of accessing the storage it manages it should still be able to perform I/O using a client. Maybe this created the "proxying through the client" idea... Benny >-----Original Message----- >From: Noveck, Dave [mailto:dnoveck@netapp.com] >Sent: Monday, January 19, 2004 6:23 PM >To: pnfs-reqs@yahoogroups.com >Cc: pNFS Operations >Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >Dean Hildebrand wrote: >> I think relying on clients to do anything correctly is against the >> inherent nature of NFS. Clients in NFS are transient and cannot be >> trusted to do anything correctly. Therefore, the metadata >server should >> find its own way to write data to the data servers without relying on >> clients. If proxying through a client is optional, it still seems >> orthogonal to the behavior of existing installations and the >spirit of >> NFS. Maybe there is a valid use case someone could describe? > >I'm now totally confused. Before we talk about use cases for "proxying >through a client", I'd like to understand what it is. > >My understanding is that when this discussion started, a >number of people >were referring to a client writing data by sending a write to the meta- >data server (aka the NFS server) as "proxying", because, if >your view is >that the proper/best/ideal way of doing data transfer operations is to >obtain mapping information and then do a write to the data server (i.e. >other NFS server or object data server or SAN-connected disk), then >the direct NFS write can be seen as the meta-data server acting as the >client's proxy. Is my understanding correct? > >No matter how you come down on the quesion of the desirability of that, >I don't think there any way to argue that doing a write by sending an >NFS write request to an NFS server is against the inherent nature of >NFS. Nor does it ask the client do anything correctly that it hasn't >been doing all along. > >At some point the phrase "proxying through the client" was used and I >realize I don't know what is meant by it. It doesn't seem to match >the "proxying" that was being discussed originally. How would the >client be a proxy for (presumably) the server? What am I missing? > >-----Original Message----- >From: Dean Hildebrand [mailto:dhildebz@eecs.umich.edu] >Sent: Monday, January 19, 2004 5:13 PM >To: pNFS Requirements >Cc: pNFS Operations >Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >> > [1.1.2 Functional proxying]: a file transformation >achievable by an >> > NFS-v4.x client using a set of data server operations must be a >> > equivalently achievable using a (probably different) set >of NFS-v4.x >> > server operations >> > >> > This is the topic I intended to address in the last email. > I believe >> > Dave is arguing that even with metadata servers that do >not have access >> > to their data servers, the vendor of such a metadata server can >> > construct a proprietary protocol for the metadata server >to (strict) >> > proxy data server accesses through clients that do have >data server >> > access. I am not comfortable making up a counter to this, >so I exhort >> > those that want a metadata server without data server >access to speak >> > up if they disagree. >> > >> > > On one hand, some suggest that a set of out-of-band >clients should not >> >> > > have to also have a data path through the NFSv4 metadata >server. One >> > > reason is that customers may not tolerate the large >variability in >> > > performance between out-of-band (when the going is good) >and in-band >> > > (when the server chooses not to grant or to take away a >delegation) >> > > accesses. Another reason, and I paraphrase someone else >here, is that >> >> > > it is possible to construct out-of-band metadata servers >that do not >> > > have access to the data servers except through the clients -- I >> > > encourage the source of this scenario to replace my >paraphrasing with >> > > a correct use case, because I find it odd to design for >file servers >> > > that do not have access to the data servers. >> > > >> > > On the other hand, others have suggested that any access >or work that >> > > a client can do out-of-band should be possible with one or more >> > > commands applied to the metadata server's data path. >This has been >> > > proposed for coping with recalled delegations, including >concurrent >> > > writing by multiple clients; retry after client access errors, >> > > provided adequate idempotency of out-of-band operations; >and many >> > > alternative implementations of out-of-band clients, >including legacy >> > > clients that use out-of-band never or rarely. >> > > >> > > I think this is a topic that should be argued one way or >the other in >> > > the requirements document. Use cases and examples in >other systems >> > > would be best. >> > >> >> I guess that proxying through a client should be recomended but not >> mandated. >> We might the want to find how to do it while respecting restrictions >> removed the metadata server from the path. > >I think relying on clients to do anything correctly is against the >inherent nature of NFS. Clients in NFS are transient and cannot be >trusted to do anything correctly. Therefore, the metadata >server should >find its own way to write data to the data servers without relying on >clients. If proxying through a client is optional, it still seems >orthogonal to the behavior of existing installations and the spirit of >NFS. Maybe there is a valid use case someone could describe? > >Dean > > > > > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > > > >------------------------ Yahoo! Groups Sponsor >---------------------~--> >Upgrade to 128-bit SSL Security! >http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >--------------------------------------------------------------- >------~-> > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From bhalevy@panasas.com Tue Jan 20 08:00:59 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 44018 invoked from network); 20 Jan 2004 16:00:58 -0000 Received: from unknown (66.218.66.172) by m4.grp.scd.yahoo.com with QMQP; 20 Jan 2004 16:00:58 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 20 Jan 2004 16:00:58 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Tue, 20 Jan 2004 11:00:56 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D3879F@PIKES.panasas.com> To: "'pnfs-reqs@yahoogroups.com'" , pnfs-ops@yahoogroups.com Date: Tue, 20 Jan 2004 11:00:55 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy Dave, I completely agree with your assertions below. One more reason not to provide support in the NFS protocol for such servers is to guarantee interoperability with simple MFSv4.x clients that do not support out-of-band I/O or some optional extensions, e.g. write sharing (if we spec. it). Without the ability to read and write via the NFS server, sharing a file that's being written by one or more writers needs complete support for write sharing by all clients as well as the server. I suggest we mention this issue in the problem statement document and explain why we want to leave it open for the server implementation to solve and don't want to solve it within the NFS protocol. Benny >-----Original Message----- >From: Noveck, Dave [mailto:dnoveck@netapp.com] >Sent: Tuesday, January 20, 2004 6:19 AM >To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com >Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >You may be right about the origin of this but I did not suggest that >*NFS clients* take part when IO was done in this way. When >considering a situation in which there was no direct connection >between the meta-data server and the data server, I did note >that there was a large set of machines that had a connection to >both, making it possible/easy to provide a connection between >meta-data server and data server, albeit indirect. > >While it is true that that large class of machines can have NFS >clients running on them (and many will), I don't think it is a >good idea to place the burden of effecting this communication >(to help servers without a direct communications path) on the >clients. This is as opposed to a server using the same hardware >that a client would use to effect an indirect communication >path, which seems quite reasonable to me, but does not affect >the client-server protocol. > >In addition to the reasons that Dean cites for finding this >troublesome, let me add one more. Suppose we have IO from >a v4.0 client, necessitating access by the meta-data server >to the data server. If that function were imposed as a >requirement on v4.x clients, then how do you deal with the >case in which no v4.x clients are functioning? Previous V4 >minor versions should just work and making them dependent >on v4.x clients is not going to fly. The server has to >support v4.0 and can use the same hardware as clients and >much of the same software, but effecting the necessary >communication is part of the server's responsibility. > >-----Original Message----- >From: Halevy, Benny [mailto:bhalevy@panasas.com] >Sent: Monday, January 19, 2004 6:31 PM >To: 'pnfs-reqs@yahoogroups.com' >Cc: pNFS Operations >Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >Dave Noveck wrote: >>At some point the phrase "proxying through the client" was used and I >>realize I don't know what is meant by it. It doesn't seem to match >>the "proxying" that was being discussed originally. How would the >>client be a proxy for (presumably) the server? What am I missing? > >I think it was you who suggested (maybe in a rhetorical way) that >when the metadata server is not capable of accessing the storage >it manages it should still be able to perform I/O using a client. >Maybe this created the "proxying through the client" idea... > >Benny > >>-----Original Message----- >>From: Noveck, Dave [mailto:dnoveck@netapp.com] >>Sent: Monday, January 19, 2004 6:23 PM >>To: pnfs-reqs@yahoogroups.com >>Cc: pNFS Operations >>Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>12/18/03: subtopic: proxying >> >> >>Dean Hildebrand wrote: >>> I think relying on clients to do anything correctly is against the >>> inherent nature of NFS. Clients in NFS are transient and cannot be >>> trusted to do anything correctly. Therefore, the metadata >>server should >>> find its own way to write data to the data servers without >relying on >>> clients. If proxying through a client is optional, it still seems >>> orthogonal to the behavior of existing installations and the >>spirit of >>> NFS. Maybe there is a valid use case someone could describe? >> >>I'm now totally confused. Before we talk about use cases for >"proxying >>through a client", I'd like to understand what it is. >> >>My understanding is that when this discussion started, a >>number of people >>were referring to a client writing data by sending a write to >the meta- >>data server (aka the NFS server) as "proxying", because, if >>your view is >>that the proper/best/ideal way of doing data transfer operations is to >>obtain mapping information and then do a write to the data >server (i.e. >>other NFS server or object data server or SAN-connected disk), then >>the direct NFS write can be seen as the meta-data server acting as the >>client's proxy. Is my understanding correct? >> >>No matter how you come down on the quesion of the >desirability of that, >>I don't think there any way to argue that doing a write by sending an >>NFS write request to an NFS server is against the inherent nature of >>NFS. Nor does it ask the client do anything correctly that it hasn't >>been doing all along. >> >>At some point the phrase "proxying through the client" was used and I >>realize I don't know what is meant by it. It doesn't seem to match >>the "proxying" that was being discussed originally. How would the >>client be a proxy for (presumably) the server? What am I missing? >> >>-----Original Message----- >>From: Dean Hildebrand [mailto:dhildebz@eecs.umich.edu] >>Sent: Monday, January 19, 2004 5:13 PM >>To: pNFS Requirements >>Cc: pNFS Operations >>Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>12/18/03: subtopic: proxying >> >> >>> > [1.1.2 Functional proxying]: a file transformation >>achievable by an >>> > NFS-v4.x client using a set of data server operations must be a >>> > equivalently achievable using a (probably different) set >>of NFS-v4.x >>> > server operations >>> > >>> > This is the topic I intended to address in the last email. >> I believe >>> > Dave is arguing that even with metadata servers that do >>not have access >>> > to their data servers, the vendor of such a metadata server can >>> > construct a proprietary protocol for the metadata server >>to (strict) >>> > proxy data server accesses through clients that do have >>data server >>> > access. I am not comfortable making up a counter to this, >>so I exhort >>> > those that want a metadata server without data server >>access to speak >>> > up if they disagree. >>> > >>> > > On one hand, some suggest that a set of out-of-band >>clients should not >>> >>> > > have to also have a data path through the NFSv4 metadata >>server. One >>> > > reason is that customers may not tolerate the large >>variability in >>> > > performance between out-of-band (when the going is good) >>and in-band >>> > > (when the server chooses not to grant or to take away a >>delegation) >>> > > accesses. Another reason, and I paraphrase someone else >>here, is that >>> >>> > > it is possible to construct out-of-band metadata servers >>that do not >>> > > have access to the data servers except through the clients -- I >>> > > encourage the source of this scenario to replace my >>paraphrasing with >>> > > a correct use case, because I find it odd to design for >>file servers >>> > > that do not have access to the data servers. >>> > > >>> > > On the other hand, others have suggested that any access >>or work that >>> > > a client can do out-of-band should be possible with one or more >>> > > commands applied to the metadata server's data path. >>This has been >>> > > proposed for coping with recalled delegations, including >>concurrent >>> > > writing by multiple clients; retry after client access errors, >>> > > provided adequate idempotency of out-of-band operations; >>and many >>> > > alternative implementations of out-of-band clients, >>including legacy >>> > > clients that use out-of-band never or rarely. >>> > > >>> > > I think this is a topic that should be argued one way or >>the other in >>> > > the requirements document. Use cases and examples in >>other systems >>> > > would be best. >>> > >>> >>> I guess that proxying through a client should be recomended but not >>> mandated. >>> We might the want to find how to do it while respecting >restrictions >>> removed the metadata server from the path. >> >>I think relying on clients to do anything correctly is against the >>inherent nature of NFS. Clients in NFS are transient and cannot be >>trusted to do anything correctly. Therefore, the metadata >>server should >>find its own way to write data to the data servers without relying on >>clients. If proxying through a client is optional, it still seems >>orthogonal to the behavior of existing installations and the spirit of >>NFS. Maybe there is a valid use case someone could describe? >> >>Dean >> >> >> >> >> >>Yahoo! Groups Links >> >>To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >>To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >>Your use of Yahoo! Groups is subject to: >> http://docs.yahoo.com/info/terms/ >> >> >> >> >> >>------------------------ Yahoo! Groups Sponsor >>---------------------~--> >>Upgrade to 128-bit SSL Security! >>http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >>--------------------------------------------------------------- >>------~-> >> >>Yahoo! Groups Links >> >>To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >>To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >>Your use of Yahoo! Groups is subject to: >> http://docs.yahoo.com/info/terms/ >> >> > > > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > >To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > > > >------------------------ Yahoo! Groups Sponsor >---------------------~--> >Upgrade to 128-bit SSL Security! >http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >--------------------------------------------------------------- >------~-> > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > From Thomas.Talpey@netapp.com Tue Jan 20 09:08:11 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 83125 invoked from network); 20 Jan 2004 17:08:03 -0000 Received: from unknown (66.218.66.166) by m20.grp.scd.yahoo.com with QMQP; 20 Jan 2004 17:08:02 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta5.grp.scd.yahoo.com with SMTP; 20 Jan 2004 17:08:00 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0KH7xKw019662; Tue, 20 Jan 2004 09:07:59 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0KH7xpr015087; Tue, 20 Jan 2004 09:07:59 -0800 (PST) Received: from tmt.netapp.com ([10.97.1.30]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Tue, 20 Jan 2004 12:07:53 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3DF77.F0C43A80" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Tue, 20 Jan 2004 09:07:43 -0800 Message-ID: <5.2.1.1.2.20040120115557.01f84da8@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying Thread-Index: AcPfd/FIyXKQ2cP9TcmTvuQUbzySOg== To: Cc: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu At 07:15 PM 1/19/2004, Garth Gibson wrote: >First, what I called Functional Proxying is what Dave Noveck >understands :-) It is a safety valve for clients that don't want to, >can't, or think it is slower to directly access the leaf storage >server. I think we should avoid making up a new term for something that NFS has been all about since day minus-one. And definitely let's not call it a safety valve. It's just regular old NFS. Right? >For the other scenario, I believe it was Julian Satran who mentioned >that IBM has considered situations where the metadata server does not >have good access, if any, to the storage devices directly. For >example, FC disks connected to all clients with FC NICs, and a metadata >server connected to clients via Ethernet and not connected to FC at >all. In this case, the metadata wants all, or almost all, accesses to >be direct from client to storage. So, this is important and sets the tone for where complexity resides. The issue is not so much whether the client chooses to perform a direct transfer, but whether it is forced to. This is important, and isn't a protocol issue, it's an implementation (or perhaps better, "deployment") choice. It's not going to be a popular proposal if we dwell on this. I view the document as centering around what happens when clients negotiate the advanced version, not how they fall back. Requirements, not implementation. Tom. From julian_satran@il.ibm.com Tue Jan 20 16:54:29 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 75408 invoked from network); 21 Jan 2004 00:54:28 -0000 Received: from unknown (66.218.66.218) by m14.grp.scd.yahoo.com with QMQP; 21 Jan 2004 00:54:28 -0000 Received: from unknown (HELO mtagate2.de.ibm.com) (195.212.29.151) by mta3.grp.scd.yahoo.com with SMTP; 21 Jan 2004 00:54:27 -0000 Received: from d12relay01.megacenter.de.ibm.com (d12relay01.megacenter.de.ibm.com [9.149.165.180]) by mtagate2.de.ibm.com (8.12.10/8.12.10) with ESMTP id i0L0sPRT089196; Wed, 21 Jan 2004 00:54:25 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay01.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i0L0sNI3159660; Wed, 21 Jan 2004 01:54:25 +0100 In-Reply-To: <30489F1321F5C343ACF6872B2CF7942A05D3879F@PIKES.panasas.com> To: pnfs-ops@yahoogroups.com Cc: pnfs-ops@yahoogroups.com, "'pnfs-reqs@yahoogroups.com'" MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Tue, 20 Jan 2004 16:54:23 -0800 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 21/01/2004 02:54:25, Serialize complete at 21/01/2004 02:54:25 Content-Type: text/plain; charset="US-ASCII" X-eGroups-Remote-IP: 195.212.29.151 From: Julian Satran Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran ADVERTISEMENT Benny, Even "simple" NFSV4 clients support referrals (Moved). So a metadata server may refer those requests to another server that has access to data. The trouble I have is having to mandate this on all users or making an "optional to use" feature and leave the "ERROR-DATA-ACCESS-NOT supported-here" as a legal error (and that is the position I am taking). Julo "Halevy, Benny" 20/01/2004 08:00 Please respond to pnfs-ops To "'pnfs-reqs@yahoogroups.com'" , pnfs-ops@yahoogroups.com cc Subject RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying Dave, I completely agree with your assertions below. One more reason not to provide support in the NFS protocol for such servers is to guarantee interoperability with simple MFSv4.x clients that do not support out-of-band I/O or some optional extensions, e.g. write sharing (if we spec. it). Without the ability to read and write via the NFS server, sharing a file that's being written by one or more writers needs complete support for write sharing by all clients as well as the server. I suggest we mention this issue in the problem statement document and explain why we want to leave it open for the server implementation to solve and don't want to solve it within the NFS protocol. Benny >-----Original Message----- >From: Noveck, Dave [mailto:dnoveck@netapp.com] >Sent: Tuesday, January 20, 2004 6:19 AM >To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com >Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >You may be right about the origin of this but I did not suggest that >*NFS clients* take part when IO was done in this way. When >considering a situation in which there was no direct connection >between the meta-data server and the data server, I did note >that there was a large set of machines that had a connection to >both, making it possible/easy to provide a connection between >meta-data server and data server, albeit indirect. > >While it is true that that large class of machines can have NFS >clients running on them (and many will), I don't think it is a >good idea to place the burden of effecting this communication >(to help servers without a direct communications path) on the >clients. This is as opposed to a server using the same hardware >that a client would use to effect an indirect communication >path, which seems quite reasonable to me, but does not affect >the client-server protocol. > >In addition to the reasons that Dean cites for finding this >troublesome, let me add one more. Suppose we have IO from >a v4.0 client, necessitating access by the meta-data server >to the data server. If that function were imposed as a >requirement on v4.x clients, then how do you deal with the >case in which no v4.x clients are functioning? Previous V4 >minor versions should just work and making them dependent >on v4.x clients is not going to fly. The server has to >support v4.0 and can use the same hardware as clients and >much of the same software, but effecting the necessary >communication is part of the server's responsibility. > >-----Original Message----- >From: Halevy, Benny [mailto:bhalevy@panasas.com] >Sent: Monday, January 19, 2004 6:31 PM >To: 'pnfs-reqs@yahoogroups.com' >Cc: pNFS Operations >Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >Dave Noveck wrote: >>At some point the phrase "proxying through the client" was used and I >>realize I don't know what is meant by it. It doesn't seem to match >>the "proxying" that was being discussed originally. How would the >>client be a proxy for (presumably) the server? What am I missing? > >I think it was you who suggested (maybe in a rhetorical way) that >when the metadata server is not capable of accessing the storage >it manages it should still be able to perform I/O using a client. >Maybe this created the "proxying through the client" idea... > >Benny > >>-----Original Message----- >>From: Noveck, Dave [mailto:dnoveck@netapp.com] >>Sent: Monday, January 19, 2004 6:23 PM >>To: pnfs-reqs@yahoogroups.com >>Cc: pNFS Operations >>Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>12/18/03: subtopic: proxying >> >> >>Dean Hildebrand wrote: >>> I think relying on clients to do anything correctly is against the >>> inherent nature of NFS. Clients in NFS are transient and cannot be >>> trusted to do anything correctly. Therefore, the metadata >>server should >>> find its own way to write data to the data servers without >relying on >>> clients. If proxying through a client is optional, it still seems >>> orthogonal to the behavior of existing installations and the >>spirit of >>> NFS. Maybe there is a valid use case someone could describe? >> >>I'm now totally confused. Before we talk about use cases for >"proxying >>through a client", I'd like to understand what it is. >> >>My understanding is that when this discussion started, a >>number of people >>were referring to a client writing data by sending a write to >the meta- >>data server (aka the NFS server) as "proxying", because, if >>your view is >>that the proper/best/ideal way of doing data transfer operations is to >>obtain mapping information and then do a write to the data >server (i.e. >>other NFS server or object data server or SAN-connected disk), then >>the direct NFS write can be seen as the meta-data server acting as the >>client's proxy. Is my understanding correct? >> >>No matter how you come down on the quesion of the >desirability of that, >>I don't think there any way to argue that doing a write by sending an >>NFS write request to an NFS server is against the inherent nature of >>NFS. Nor does it ask the client do anything correctly that it hasn't >>been doing all along. >> >>At some point the phrase "proxying through the client" was used and I >>realize I don't know what is meant by it. It doesn't seem to match >>the "proxying" that was being discussed originally. How would the >>client be a proxy for (presumably) the server? What am I missing? >> >>-----Original Message----- >>From: Dean Hildebrand [mailto:dhildebz@eecs.umich.edu] >>Sent: Monday, January 19, 2004 5:13 PM >>To: pNFS Requirements >>Cc: pNFS Operations >>Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>12/18/03: subtopic: proxying >> >> >>> > [1.1.2 Functional proxying]: a file transformation >>achievable by an >>> > NFS-v4.x client using a set of data server operations must be a >>> > equivalently achievable using a (probably different) set >>of NFS-v4.x >>> > server operations >>> > >>> > This is the topic I intended to address in the last email. >> I believe >>> > Dave is arguing that even with metadata servers that do >>not have access >>> > to their data servers, the vendor of such a metadata server can >>> > construct a proprietary protocol for the metadata server >>to (strict) >>> > proxy data server accesses through clients that do have >>data server >>> > access. I am not comfortable making up a counter to this, >>so I exhort >>> > those that want a metadata server without data server >>access to speak >>> > up if they disagree. >>> > >>> > > On one hand, some suggest that a set of out-of-band >>clients should not >>> >>> > > have to also have a data path through the NFSv4 metadata >>server. One >>> > > reason is that customers may not tolerate the large >>variability in >>> > > performance between out-of-band (when the going is good) >>and in-band >>> > > (when the server chooses not to grant or to take away a >>delegation) >>> > > accesses. Another reason, and I paraphrase someone else >>here, is that >>> >>> > > it is possible to construct out-of-band metadata servers >>that do not >>> > > have access to the data servers except through the clients -- I >>> > > encourage the source of this scenario to replace my >>paraphrasing with >>> > > a correct use case, because I find it odd to design for >>file servers >>> > > that do not have access to the data servers. >>> > > >>> > > On the other hand, others have suggested that any access >>or work that >>> > > a client can do out-of-band should be possible with one or more >>> > > commands applied to the metadata server's data path. >>This has been >>> > > proposed for coping with recalled delegations, including >>concurrent >>> > > writing by multiple clients; retry after client access errors, >>> > > provided adequate idempotency of out-of-band operations; >>and many >>> > > alternative implementations of out-of-band clients, >>including legacy >>> > > clients that use out-of-band never or rarely. >>> > > >>> > > I think this is a topic that should be argued one way or >>the other in >>> > > the requirements document. Use cases and examples in >>other systems >>> > > would be best. >>> > >>> >>> I guess that proxying through a client should be recomended but not >>> mandated. >>> We might the want to find how to do it while respecting >restrictions >>> removed the metadata server from the path. >> >>I think relying on clients to do anything correctly is against the >>inherent nature of NFS. Clients in NFS are transient and cannot be >>trusted to do anything correctly. Therefore, the metadata >>server should >>find its own way to write data to the data servers without relying on >>clients. If proxying through a client is optional, it still seems >>orthogonal to the behavior of existing installations and the spirit of >>NFS. Maybe there is a valid use case someone could describe? >> >>Dean >> >> >> >> >> >>Yahoo! Groups Links >> >>To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >>To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >>Your use of Yahoo! Groups is subject to: >> http://docs.yahoo.com/info/terms/ >> >> >> >> >> >>------------------------ Yahoo! Groups Sponsor >>---------------------~--> >>Upgrade to 128-bit SSL Security! >>http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >>--------------------------------------------------------------- >>------~-> >> >>Yahoo! Groups Links >> >>To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >>To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >>Your use of Yahoo! Groups is subject to: >> http://docs.yahoo.com/info/terms/ >> >> > > > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > >To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > > > >------------------------ Yahoo! Groups Sponsor >---------------------~--> >Upgrade to 128-bit SSL Security! >http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >--------------------------------------------------------------- >------~-> > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From dnoveck@netapp.com Wed Jan 21 05:32:00 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 16469 invoked from network); 21 Jan 2004 13:31:53 -0000 Received: from unknown (66.218.66.172) by m14.grp.scd.yahoo.com with QMQP; 21 Jan 2004 13:31:53 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta4.grp.scd.yahoo.com with SMTP; 21 Jan 2004 13:31:52 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0LDVpKw025463; Wed, 21 Jan 2004 05:31:51 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0LDVppr022754; Wed, 21 Jan 2004 05:31:51 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Wed, 21 Jan 2004 05:31:41 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying Thread-Index: AcPfuSHl8H5HCtQBTWW1hkwg37TWRgAaJ0kw To: , X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck > Even "simple" NFSV4 clients support referrals (Moved). Unfortunately, the "simple" clients that actually exist today don't :-(, but I suppose we can assume that time will correct that problem. > So a metadata server may refer those requests to another server > that has access to data. But that's referral of an entire filesystem (everything sharing a given fsid value) to another nfsv4 server (i.e. a metadata server). You can't (in v4.0) refer requests for a single file or separately refer data IO requests and those that involve metadata. > The trouble I have is having to mandate this on all users or making an > "optional to use" feature and leave the "ERROR-DATA-ACCESS-NOT > supported-here" as a legal error (and that is the position I am taking). It can't be a legal error in v4.0. -----Original Message----- From: Julian Satran [mailto:julian_satran@il.ibm.com] Sent: Tuesday, January 20, 2004 7:54 PM To: pnfs-ops@yahoogroups.com Cc: pnfs-ops@yahoogroups.com; 'pnfs-reqs@yahoogroups.com' Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying Benny, Even "simple" NFSV4 clients support referrals (Moved). So a metadata server may refer those requests to another server that has access to data. The trouble I have is having to mandate this on all users or making an "optional to use" feature and leave the "ERROR-DATA-ACCESS-NOT supported-here" as a legal error (and that is the position I am taking). Julo "Halevy, Benny" 20/01/2004 08:00 Please respond to pnfs-ops To "'pnfs-reqs@yahoogroups.com'" , pnfs-ops@yahoogroups.com cc Subject RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying Dave, I completely agree with your assertions below. One more reason not to provide support in the NFS protocol for such servers is to guarantee interoperability with simple MFSv4.x clients that do not support out-of-band I/O or some optional extensions, e.g. write sharing (if we spec. it). Without the ability to read and write via the NFS server, sharing a file that's being written by one or more writers needs complete support for write sharing by all clients as well as the server. I suggest we mention this issue in the problem statement document and explain why we want to leave it open for the server implementation to solve and don't want to solve it within the NFS protocol. Benny >-----Original Message----- >From: Noveck, Dave [mailto:dnoveck@netapp.com] >Sent: Tuesday, January 20, 2004 6:19 AM >To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com >Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >You may be right about the origin of this but I did not suggest that >*NFS clients* take part when IO was done in this way. When >considering a situation in which there was no direct connection >between the meta-data server and the data server, I did note >that there was a large set of machines that had a connection to >both, making it possible/easy to provide a connection between >meta-data server and data server, albeit indirect. > >While it is true that that large class of machines can have NFS >clients running on them (and many will), I don't think it is a >good idea to place the burden of effecting this communication >(to help servers without a direct communications path) on the >clients. This is as opposed to a server using the same hardware >that a client would use to effect an indirect communication >path, which seems quite reasonable to me, but does not affect >the client-server protocol. > >In addition to the reasons that Dean cites for finding this >troublesome, let me add one more. Suppose we have IO from >a v4.0 client, necessitating access by the meta-data server >to the data server. If that function were imposed as a >requirement on v4.x clients, then how do you deal with the >case in which no v4.x clients are functioning? Previous V4 >minor versions should just work and making them dependent >on v4.x clients is not going to fly. The server has to >support v4.0 and can use the same hardware as clients and >much of the same software, but effecting the necessary >communication is part of the server's responsibility. > >-----Original Message----- >From: Halevy, Benny [mailto:bhalevy@panasas.com] >Sent: Monday, January 19, 2004 6:31 PM >To: 'pnfs-reqs@yahoogroups.com' >Cc: pNFS Operations >Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >Dave Noveck wrote: >>At some point the phrase "proxying through the client" was used and I >>realize I don't know what is meant by it. It doesn't seem to match >>the "proxying" that was being discussed originally. How would the >>client be a proxy for (presumably) the server? What am I missing? > >I think it was you who suggested (maybe in a rhetorical way) that >when the metadata server is not capable of accessing the storage >it manages it should still be able to perform I/O using a client. >Maybe this created the "proxying through the client" idea... > >Benny > >>-----Original Message----- >>From: Noveck, Dave [mailto:dnoveck@netapp.com] >>Sent: Monday, January 19, 2004 6:23 PM >>To: pnfs-reqs@yahoogroups.com >>Cc: pNFS Operations >>Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>12/18/03: subtopic: proxying >> >> >>Dean Hildebrand wrote: >>> I think relying on clients to do anything correctly is against the >>> inherent nature of NFS. Clients in NFS are transient and cannot be >>> trusted to do anything correctly. Therefore, the metadata >>server should >>> find its own way to write data to the data servers without >relying on >>> clients. If proxying through a client is optional, it still seems >>> orthogonal to the behavior of existing installations and the >>spirit of >>> NFS. Maybe there is a valid use case someone could describe? >> >>I'm now totally confused. Before we talk about use cases for >"proxying >>through a client", I'd like to understand what it is. >> >>My understanding is that when this discussion started, a >>number of people >>were referring to a client writing data by sending a write to >the meta- >>data server (aka the NFS server) as "proxying", because, if >>your view is >>that the proper/best/ideal way of doing data transfer operations is to >>obtain mapping information and then do a write to the data >server (i.e. >>other NFS server or object data server or SAN-connected disk), then >>the direct NFS write can be seen as the meta-data server acting as the >>client's proxy. Is my understanding correct? >> >>No matter how you come down on the quesion of the >desirability of that, >>I don't think there any way to argue that doing a write by sending an >>NFS write request to an NFS server is against the inherent nature of >>NFS. Nor does it ask the client do anything correctly that it hasn't >>been doing all along. >> >>At some point the phrase "proxying through the client" was used and I >>realize I don't know what is meant by it. It doesn't seem to match >>the "proxying" that was being discussed originally. How would the >>client be a proxy for (presumably) the server? What am I missing? >> >>-----Original Message----- >>From: Dean Hildebrand [mailto:dhildebz@eecs.umich.edu] >>Sent: Monday, January 19, 2004 5:13 PM >>To: pNFS Requirements >>Cc: pNFS Operations >>Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>12/18/03: subtopic: proxying >> >> >>> > [1.1.2 Functional proxying]: a file transformation >>achievable by an >>> > NFS-v4.x client using a set of data server operations must be a >>> > equivalently achievable using a (probably different) set >>of NFS-v4.x >>> > server operations >>> > >>> > This is the topic I intended to address in the last email. >> I believe >>> > Dave is arguing that even with metadata servers that do >>not have access >>> > to their data servers, the vendor of such a metadata server can >>> > construct a proprietary protocol for the metadata server >>to (strict) >>> > proxy data server accesses through clients that do have >>data server >>> > access. I am not comfortable making up a counter to this, >>so I exhort >>> > those that want a metadata server without data server >>access to speak >>> > up if they disagree. >>> > >>> > > On one hand, some suggest that a set of out-of-band >>clients should not >>> >>> > > have to also have a data path through the NFSv4 metadata >>server. One >>> > > reason is that customers may not tolerate the large >>variability in >>> > > performance between out-of-band (when the going is good) >>and in-band >>> > > (when the server chooses not to grant or to take away a >>delegation) >>> > > accesses. Another reason, and I paraphrase someone else >>here, is that >>> >>> > > it is possible to construct out-of-band metadata servers >>that do not >>> > > have access to the data servers except through the clients -- I >>> > > encourage the source of this scenario to replace my >>paraphrasing with >>> > > a correct use case, because I find it odd to design for >>file servers >>> > > that do not have access to the data servers. >>> > > >>> > > On the other hand, others have suggested that any access >>or work that >>> > > a client can do out-of-band should be possible with one or more >>> > > commands applied to the metadata server's data path. >>This has been >>> > > proposed for coping with recalled delegations, including >>concurrent >>> > > writing by multiple clients; retry after client access errors, >>> > > provided adequate idempotency of out-of-band operations; >>and many >>> > > alternative implementations of out-of-band clients, >>including legacy >>> > > clients that use out-of-band never or rarely. >>> > > >>> > > I think this is a topic that should be argued one way or >>the other in >>> > > the requirements document. Use cases and examples in >>other systems >>> > > would be best. >>> > >>> >>> I guess that proxying through a client should be recomended but not >>> mandated. >>> We might the want to find how to do it while respecting >restrictions >>> removed the metadata server from the path. >> >>I think relying on clients to do anything correctly is against the >>inherent nature of NFS. Clients in NFS are transient and cannot be >>trusted to do anything correctly. Therefore, the metadata >>server should >>find its own way to write data to the data servers without relying on >>clients. If proxying through a client is optional, it still seems >>orthogonal to the behavior of existing installations and the spirit of >>NFS. Maybe there is a valid use case someone could describe? >> >>Dean >> >> >> >> >> >>Yahoo! Groups Links >> >>To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >>To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >>Your use of Yahoo! Groups is subject to: >> http://docs.yahoo.com/info/terms/ >> >> >> >> >> >>------------------------ Yahoo! Groups Sponsor >>---------------------~--> >>Upgrade to 128-bit SSL Security! >>http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >>--------------------------------------------------------------- >>------~-> >> >>Yahoo! Groups Links >> >>To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >>To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >>Your use of Yahoo! Groups is subject to: >> http://docs.yahoo.com/info/terms/ >> >> > > > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > >To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > > > >------------------------ Yahoo! Groups Sponsor >---------------------~--> >Upgrade to 128-bit SSL Security! >http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >--------------------------------------------------------------- >------~-> > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From julian_satran@il.ibm.com Wed Jan 21 17:58:27 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 27962 invoked from network); 22 Jan 2004 01:58:20 -0000 Received: from unknown (66.218.66.218) by m13.grp.scd.yahoo.com with QMQP; 22 Jan 2004 01:58:20 -0000 Received: from unknown (HELO mtagate7.de.ibm.com) (195.212.29.156) by mta3.grp.scd.yahoo.com with SMTP; 22 Jan 2004 01:58:18 -0000 Received: from d12relay02.megacenter.de.ibm.com (d12relay02.megacenter.de.ibm.com [9.149.165.196]) by mtagate7.de.ibm.com (8.12.10/8.12.10) with ESMTP id i0M1uGRm080524; Thu, 22 Jan 2004 01:56:16 GMT Received: from d10ml001.telaviv.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay02.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i0M1uFYk277030; Thu, 22 Jan 2004 02:56:15 +0100 In-Reply-To: To: pnfs-ops@yahoogroups.com Cc: pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Wed, 21 Jan 2004 17:56:11 -0800 X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 6.0.2CF2|July 23, 2003) at 22/01/2004 03:56:15, Serialize complete at 22/01/2004 03:56:15 Content-Type: text/plain; charset="US-ASCII" X-eGroups-Remote-IP: 195.212.29.156 From: Julian Satran Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran Dave, I assume that minor version can incorporate new function and new errors. I see the point related to your first objection (selective about moved) but we can work through it patiently. And it is nothing bad if for some FS data requests force that server to appear as moved for any client that ever uses a data request from the original metadata server. We will have then a metadata server + boxes that serve old NFSv4 clients only (and/or for specific Fss to which the the metadata server has no data access. Julo "Noveck, Dave" 21/01/2004 05:31 Please respond to pnfs-ops To , cc Subject RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying > Even "simple" NFSV4 clients support referrals (Moved). Unfortunately, the "simple" clients that actually exist today don't :-(, but I suppose we can assume that time will correct that problem. > So a metadata server may refer those requests to another server > that has access to data. But that's referral of an entire filesystem (everything sharing a given fsid value) to another nfsv4 server (i.e. a metadata server). You can't (in v4.0) refer requests for a single file or separately refer data IO requests and those that involve metadata. > The trouble I have is having to mandate this on all users or making an > "optional to use" feature and leave the "ERROR-DATA-ACCESS-NOT > supported-here" as a legal error (and that is the position I am taking). It can't be a legal error in v4.0. -----Original Message----- From: Julian Satran [mailto:julian_satran@il.ibm.com] Sent: Tuesday, January 20, 2004 7:54 PM To: pnfs-ops@yahoogroups.com Cc: pnfs-ops@yahoogroups.com; 'pnfs-reqs@yahoogroups.com' Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying Benny, Even "simple" NFSV4 clients support referrals (Moved). So a metadata server may refer those requests to another server that has access to data. The trouble I have is having to mandate this on all users or making an "optional to use" feature and leave the "ERROR-DATA-ACCESS-NOT supported-here" as a legal error (and that is the position I am taking). Julo "Halevy, Benny" 20/01/2004 08:00 Please respond to pnfs-ops To "'pnfs-reqs@yahoogroups.com'" , pnfs-ops@yahoogroups.com cc Subject RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying Dave, I completely agree with your assertions below. One more reason not to provide support in the NFS protocol for such servers is to guarantee interoperability with simple MFSv4.x clients that do not support out-of-band I/O or some optional extensions, e.g. write sharing (if we spec. it). Without the ability to read and write via the NFS server, sharing a file that's being written by one or more writers needs complete support for write sharing by all clients as well as the server. I suggest we mention this issue in the problem statement document and explain why we want to leave it open for the server implementation to solve and don't want to solve it within the NFS protocol. Benny >-----Original Message----- >From: Noveck, Dave [mailto:dnoveck@netapp.com] >Sent: Tuesday, January 20, 2004 6:19 AM >To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com >Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >You may be right about the origin of this but I did not suggest that >*NFS clients* take part when IO was done in this way. When >considering a situation in which there was no direct connection >between the meta-data server and the data server, I did note >that there was a large set of machines that had a connection to >both, making it possible/easy to provide a connection between >meta-data server and data server, albeit indirect. > >While it is true that that large class of machines can have NFS >clients running on them (and many will), I don't think it is a >good idea to place the burden of effecting this communication >(to help servers without a direct communications path) on the >clients. This is as opposed to a server using the same hardware >that a client would use to effect an indirect communication >path, which seems quite reasonable to me, but does not affect >the client-server protocol. > >In addition to the reasons that Dean cites for finding this >troublesome, let me add one more. Suppose we have IO from >a v4.0 client, necessitating access by the meta-data server >to the data server. If that function were imposed as a >requirement on v4.x clients, then how do you deal with the >case in which no v4.x clients are functioning? Previous V4 >minor versions should just work and making them dependent >on v4.x clients is not going to fly. The server has to >support v4.0 and can use the same hardware as clients and >much of the same software, but effecting the necessary >communication is part of the server's responsibility. > >-----Original Message----- >From: Halevy, Benny [mailto:bhalevy@panasas.com] >Sent: Monday, January 19, 2004 6:31 PM >To: 'pnfs-reqs@yahoogroups.com' >Cc: pNFS Operations >Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >12/18/03: subtopic: proxying > > >Dave Noveck wrote: >>At some point the phrase "proxying through the client" was used and I >>realize I don't know what is meant by it. It doesn't seem to match >>the "proxying" that was being discussed originally. How would the >>client be a proxy for (presumably) the server? What am I missing? > >I think it was you who suggested (maybe in a rhetorical way) that >when the metadata server is not capable of accessing the storage >it manages it should still be able to perform I/O using a client. >Maybe this created the "proxying through the client" idea... > >Benny > >>-----Original Message----- >>From: Noveck, Dave [mailto:dnoveck@netapp.com] >>Sent: Monday, January 19, 2004 6:23 PM >>To: pnfs-reqs@yahoogroups.com >>Cc: pNFS Operations >>Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>12/18/03: subtopic: proxying >> >> >>Dean Hildebrand wrote: >>> I think relying on clients to do anything correctly is against the >>> inherent nature of NFS. Clients in NFS are transient and cannot be >>> trusted to do anything correctly. Therefore, the metadata >>server should >>> find its own way to write data to the data servers without >relying on >>> clients. If proxying through a client is optional, it still seems >>> orthogonal to the behavior of existing installations and the >>spirit of >>> NFS. Maybe there is a valid use case someone could describe? >> >>I'm now totally confused. Before we talk about use cases for >"proxying >>through a client", I'd like to understand what it is. >> >>My understanding is that when this discussion started, a >>number of people >>were referring to a client writing data by sending a write to >the meta- >>data server (aka the NFS server) as "proxying", because, if >>your view is >>that the proper/best/ideal way of doing data transfer operations is to >>obtain mapping information and then do a write to the data >server (i.e. >>other NFS server or object data server or SAN-connected disk), then >>the direct NFS write can be seen as the meta-data server acting as the >>client's proxy. Is my understanding correct? >> >>No matter how you come down on the quesion of the >desirability of that, >>I don't think there any way to argue that doing a write by sending an >>NFS write request to an NFS server is against the inherent nature of >>NFS. Nor does it ask the client do anything correctly that it hasn't >>been doing all along. >> >>At some point the phrase "proxying through the client" was used and I >>realize I don't know what is meant by it. It doesn't seem to match >>the "proxying" that was being discussed originally. How would the >>client be a proxy for (presumably) the server? What am I missing? >> >>-----Original Message----- >>From: Dean Hildebrand [mailto:dhildebz@eecs.umich.edu] >>Sent: Monday, January 19, 2004 5:13 PM >>To: pNFS Requirements >>Cc: pNFS Operations >>Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>12/18/03: subtopic: proxying >> >> >>> > [1.1.2 Functional proxying]: a file transformation >>achievable by an >>> > NFS-v4.x client using a set of data server operations must be a >>> > equivalently achievable using a (probably different) set >>of NFS-v4.x >>> > server operations >>> > >>> > This is the topic I intended to address in the last email. >> I believe >>> > Dave is arguing that even with metadata servers that do >>not have access >>> > to their data servers, the vendor of such a metadata server can >>> > construct a proprietary protocol for the metadata server >>to (strict) >>> > proxy data server accesses through clients that do have >>data server >>> > access. I am not comfortable making up a counter to this, >>so I exhort >>> > those that want a metadata server without data server >>access to speak >>> > up if they disagree. >>> > >>> > > On one hand, some suggest that a set of out-of-band >>clients should not >>> >>> > > have to also have a data path through the NFSv4 metadata >>server. One >>> > > reason is that customers may not tolerate the large >>variability in >>> > > performance between out-of-band (when the going is good) >>and in-band >>> > > (when the server chooses not to grant or to take away a >>delegation) >>> > > accesses. Another reason, and I paraphrase someone else >>here, is that >>> >>> > > it is possible to construct out-of-band metadata servers >>that do not >>> > > have access to the data servers except through the clients -- I >>> > > encourage the source of this scenario to replace my >>paraphrasing with >>> > > a correct use case, because I find it odd to design for >>file servers >>> > > that do not have access to the data servers. >>> > > >>> > > On the other hand, others have suggested that any access >>or work that >>> > > a client can do out-of-band should be possible with one or more >>> > > commands applied to the metadata server's data path. >>This has been >>> > > proposed for coping with recalled delegations, including >>concurrent >>> > > writing by multiple clients; retry after client access errors, >>> > > provided adequate idempotency of out-of-band operations; >>and many >>> > > alternative implementations of out-of-band clients, >>including legacy >>> > > clients that use out-of-band never or rarely. >>> > > >>> > > I think this is a topic that should be argued one way or >>the other in >>> > > the requirements document. Use cases and examples in >>other systems >>> > > would be best. >>> > >>> >>> I guess that proxying through a client should be recomended but not >>> mandated. >>> We might the want to find how to do it while respecting >restrictions >>> removed the metadata server from the path. >> >>I think relying on clients to do anything correctly is against the >>inherent nature of NFS. Clients in NFS are transient and cannot be >>trusted to do anything correctly. Therefore, the metadata >>server should >>find its own way to write data to the data servers without relying on >>clients. If proxying through a client is optional, it still seems >>orthogonal to the behavior of existing installations and the spirit of >>NFS. Maybe there is a valid use case someone could describe? >> >>Dean >> >> >> >> >> >>Yahoo! Groups Links >> >>To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >>To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >>Your use of Yahoo! Groups is subject to: >> http://docs.yahoo.com/info/terms/ >> >> >> >> >> >>------------------------ Yahoo! Groups Sponsor >>---------------------~--> >>Upgrade to 128-bit SSL Security! >>http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >>--------------------------------------------------------------- >>------~-> >> >>Yahoo! Groups Links >> >>To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >>To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >>Your use of Yahoo! Groups is subject to: >> http://docs.yahoo.com/info/terms/ >> >> > > > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > >To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > > > >------------------------ Yahoo! Groups Sponsor >---------------------~--> >Upgrade to 128-bit SSL Security! >http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >--------------------------------------------------------------- >------~-> > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From pnfs-reqs@yahoogroups.com Thu Jan 22 04:29:30 2004 Return-Path: Received: (qmail 30598 invoked from network); 22 Jan 2004 12:29:29 -0000 Received: from unknown (66.218.66.216) by m15.grp.scd.yahoo.com with QMQP; 22 Jan 2004 12:29:29 -0000 Received: from unknown (HELO n17.grp.scd.yahoo.com) (66.218.66.72) by mta1.grp.scd.yahoo.com with SMTP; 22 Jan 2004 12:29:29 -0000 X-eGroups-Return: notify@yahoogroups.com Received: from [66.218.67.139] by n17.grp.scd.yahoo.com with NNFMP; 22 Jan 2004 12:29:26 -0000 Date: 22 Jan 2004 12:29:25 -0000 Message-ID: <1074774565.648.17240.w2@yahoogroups.com> X-eGroups-Application: files X-Yahoo-Group-Post: system From: pnfs-reqs@yahoogroups.com To: pnfs-reqs@yahoogroups.com Subject: New file uploaded to pnfs-reqs MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-eGroups-Remote-IP: 66.218.66.72 Hello, This email message is a notification to let you know that a file has been uploaded to the Files area of the pnfs-reqs group. File : /draft-ietf-pNFS-problem-statement-v2.doc Uploaded by : garth_a_gibson Description : v0.2 of pNFS problem statement You can access this file at the URL http://groups.yahoo.com/group/pnfs-reqs/files/draft-ietf-pNFS-problem-statement-v2.doc To learn more about file sharing for your group, please visit http://help.yahoo.com/help/us/groups/files Regards, garth_a_gibson From garth@panasas.com Thu Jan 22 04:38:24 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 19877 invoked from network); 22 Jan 2004 12:38:23 -0000 Received: from unknown (66.218.66.218) by m17.grp.scd.yahoo.com with QMQP; 22 Jan 2004 12:38:23 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 22 Jan 2004 12:38:23 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYLFGA; Thu, 22 Jan 2004 07:38:16 -0500 Mime-Version: 1.0 (Apple Message framework v609) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; format=flowed To: pnfs-reqs@yahoogroups.com Date: Thu, 22 Jan 2004 04:38:13 -0800 X-Mailer: Apple Mail (2.609) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: uploaded a draft of a pNFS problem statement X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Based on the feedback in the last two concalls, I have taken a shot at a pNFS problem statement. The notion of a bottleneck, driven by the dramatic increase in bandwidth demand coming from clusters, and the desire to continue to allow filesystems and namespaces to be big and not specialized to data distribution, are central. I didn't do any work in the application space sections -- and I have not put in any citations yet -- sorry. Tom was right -- even with this thin version it comes out at 8 pages. I am happy to take comments and produce revisions, or to turn over the document to anyone who wants to make a pass through it. Talk to you in a couple of hours at the 11-12 EST concall. garth From Thomas.Talpey@netapp.com Thu Jan 22 05:09:26 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 70873 invoked from network); 22 Jan 2004 13:09:25 -0000 Received: from unknown (66.218.66.172) by m10.grp.scd.yahoo.com with QMQP; 22 Jan 2004 13:09:25 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta4.grp.scd.yahoo.com with SMTP; 22 Jan 2004 13:09:25 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0MD9PKw022756; Thu, 22 Jan 2004 05:09:25 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0MD9PSR024389; Thu, 22 Jan 2004 05:09:25 -0800 (PST) Received: from tmt.netapp.com ([10.97.1.30]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Thu, 22 Jan 2004 08:09:19 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3E0E8.F1C8A980" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Thu, 22 Jan 2004 05:09:07 -0800 Message-ID: <5.2.1.1.2.20040122080409.00c246d0@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying Thread-Index: AcPg6PIwOdtHwUfvTAO5uqZggOHrXg== To: Cc: , X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: subtopic: proxying X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu At 07:54 PM 1/20/2004, Julian Satran wrote: >Benny, > >Even "simple" NFSV4 clients support referrals (Moved). So a metadata >server may refer those requests to another server that has access to data. Dave Noveck already pointed out that in fact most clients don't yet support this, and that the referral is for a filesystem not a file. But even apart form that, why wouldn't the referral happen the other way? That is, why wouldn't a client connect to an NFSv4 server, the two would determine that data bypass is desired and supported, and the server would redirect the client to the metadata server? This is a much cleaner and simpler upward migration story, and doesn't break NFSv4. In any case, I still argue that this kind of interaction should not be explored in the requirements document, except to point out that it's a requirement to support "stock" NFSv4 without client modifications. Tom. >The trouble I have is having to mandate this on all users or making an >"optional to use" feature and leave the "ERROR-DATA-ACCESS-NOT >supported-here" as a legal error (and that is the position I am taking). > >Julo > > > > > >"Halevy, Benny" >20/01/2004 08:00 >Please respond to >pnfs-ops > > >To >"'pnfs-reqs@yahoogroups.com'" , >pnfs-ops@yahoogroups.com >cc > >Subject >RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: 12/18/0 3: >subtopic: proxying > > > > > > >Dave, I completely agree with your assertions below. >One more reason not to provide support in the NFS protocol >for such servers is to guarantee interoperability with simple >MFSv4.x clients that do not support out-of-band I/O or >some optional extensions, e.g. write sharing (if we spec. >it). Without the ability to read and write via the NFS >server, sharing a file that's being written by one or more >writers needs complete support for write sharing by all >clients as well as the server. > >I suggest we mention this issue in the problem statement >document and explain why we want to leave it open for the >server implementation to solve and don't want to solve >it within the NFS protocol. > >Benny > >>-----Original Message----- >>From: Noveck, Dave [mailto:dnoveck@netapp.com] >>Sent: Tuesday, January 20, 2004 6:19 AM >>To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com >>Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>12/18/03: subtopic: proxying >> >> >>You may be right about the origin of this but I did not suggest that >>*NFS clients* take part when IO was done in this way. When >>considering a situation in which there was no direct connection >>between the meta-data server and the data server, I did note >>that there was a large set of machines that had a connection to >>both, making it possible/easy to provide a connection between >>meta-data server and data server, albeit indirect. >> >>While it is true that that large class of machines can have NFS >>clients running on them (and many will), I don't think it is a >>good idea to place the burden of effecting this communication >>(to help servers without a direct communications path) on the >>clients. This is as opposed to a server using the same hardware >>that a client would use to effect an indirect communication >>path, which seems quite reasonable to me, but does not affect >>the client-server protocol. >> >>In addition to the reasons that Dean cites for finding this >>troublesome, let me add one more. Suppose we have IO from >>a v4.0 client, necessitating access by the meta-data server >>to the data server. If that function were imposed as a >>requirement on v4.x clients, then how do you deal with the >>case in which no v4.x clients are functioning? Previous V4 >>minor versions should just work and making them dependent >>on v4.x clients is not going to fly. The server has to >>support v4.0 and can use the same hardware as clients and >>much of the same software, but effecting the necessary >>communication is part of the server's responsibility. >> >>-----Original Message----- >>From: Halevy, Benny [mailto:bhalevy@panasas.com] >>Sent: Monday, January 19, 2004 6:31 PM >>To: 'pnfs-reqs@yahoogroups.com' >>Cc: pNFS Operations >>Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>12/18/03: subtopic: proxying >> >> >>Dave Noveck wrote: >>>At some point the phrase "proxying through the client" was used and I >>>realize I don't know what is meant by it. It doesn't seem to match >>>the "proxying" that was being discussed originally. How would the >>>client be a proxy for (presumably) the server? What am I missing? >> >>I think it was you who suggested (maybe in a rhetorical way) that >>when the metadata server is not capable of accessing the storage >>it manages it should still be able to perform I/O using a client. >>Maybe this created the "proxying through the client" idea... >> >>Benny >> >>>-----Original Message----- >>>From: Noveck, Dave [mailto:dnoveck@netapp.com] >>>Sent: Monday, January 19, 2004 6:23 PM >>>To: pnfs-reqs@yahoogroups.com >>>Cc: pNFS Operations >>>Subject: RE: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>>12/18/03: subtopic: proxying >>> >>> >>>Dean Hildebrand wrote: >>>> I think relying on clients to do anything correctly is against the >>>> inherent nature of NFS. Clients in NFS are transient and cannot be >>>> trusted to do anything correctly. Therefore, the metadata >>>server should >>>> find its own way to write data to the data servers without >>relying on >>>> clients. If proxying through a client is optional, it still seems >>>> orthogonal to the behavior of existing installations and the >>>spirit of >>>> NFS. Maybe there is a valid use case someone could describe? >>> >>>I'm now totally confused. Before we talk about use cases for >>"proxying >>>through a client", I'd like to understand what it is. >>> >>>My understanding is that when this discussion started, a >>>number of people >>>were referring to a client writing data by sending a write to >>the meta- >>>data server (aka the NFS server) as "proxying", because, if >>>your view is >>>that the proper/best/ideal way of doing data transfer operations is to >>>obtain mapping information and then do a write to the data >>server (i.e. >>>other NFS server or object data server or SAN-connected disk), then >>>the direct NFS write can be seen as the meta-data server acting as the >>>client's proxy. Is my understanding correct? >>> >>>No matter how you come down on the quesion of the >>desirability of that, >>>I don't think there any way to argue that doing a write by sending an >>>NFS write request to an NFS server is against the inherent nature of >>>NFS. Nor does it ask the client do anything correctly that it hasn't >>>been doing all along. >>> >>>At some point the phrase "proxying through the client" was used and I >>>realize I don't know what is meant by it. It doesn't seem to match >>>the "proxying" that was being discussed originally. How would the >>>client be a proxy for (presumably) the server? What am I missing? >>> >>>-----Original Message----- >>>From: Dean Hildebrand [mailto:dhildebz@eecs.umich.edu] >>>Sent: Monday, January 19, 2004 5:13 PM >>>To: pNFS Requirements >>>Cc: pNFS Operations >>>Subject: Re: [pnfs-reqs] Re: [pnfs-ops] pNFS Discussion Summary 1: >>>12/18/03: subtopic: proxying >>> >>> >>>> > [1.1.2 Functional proxying]: a file transformation >>>achievable by an >>>> > NFS-v4.x client using a set of data server operations must be a >>>> > equivalently achievable using a (probably different) set >>>of NFS-v4.x >>>> > server operations >>>> > >>>> > This is the topic I intended to address in the last email. >>> I believe >>>> > Dave is arguing that even with metadata servers that do >>>not have access >>>> > to their data servers, the vendor of such a metadata server can >>>> > construct a proprietary protocol for the metadata server >>>to (strict) >>>> > proxy data server accesses through clients that do have >>>data server >>>> > access. I am not comfortable making up a counter to this, >>>so I exhort >>>> > those that want a metadata server without data server >>>access to speak >>>> > up if they disagree. >>>> > >>>> > > On one hand, some suggest that a set of out-of-band >>>clients should not >>>> >>>> > > have to also have a data path through the NFSv4 metadata >>>server. One >>>> > > reason is that customers may not tolerate the large >>>variability in >>>> > > performance between out-of-band (when the going is good) >>>and in-band >>>> > > (when the server chooses not to grant or to take away a >>>delegation) >>>> > > accesses. Another reason, and I paraphrase someone else >>>here, is that >>>> >>>> > > it is possible to construct out-of-band metadata servers >>>that do not >>>> > > have access to the data servers except through the clients -- I >>>> > > encourage the source of this scenario to replace my >>>paraphrasing with >>>> > > a correct use case, because I find it odd to design for >>>file servers >>>> > > that do not have access to the data servers. >>>> > > >>>> > > On the other hand, others have suggested that any access >>>or work that >>>> > > a client can do out-of-band should be possible with one or more >>>> > > commands applied to the metadata server's data path. >>>This has been >>>> > > proposed for coping with recalled delegations, including >>>concurrent >>>> > > writing by multiple clients; retry after client access errors, >>>> > > provided adequate idempotency of out-of-band operations; >>>and many >>>> > > alternative implementations of out-of-band clients, >>>including legacy >>>> > > clients that use out-of-band never or rarely. >>>> > > >>>> > > I think this is a topic that should be argued one way or >>>the other in >>>> > > the requirements document. Use cases and examples in >>>other systems >>>> > > would be best. >>>> > >>>> >>>> I guess that proxying through a client should be recomended but not >>>> mandated. >>>> We might the want to find how to do it while respecting >>restrictions >>>> removed the metadata server from the path. >>> >>>I think relying on clients to do anything correctly is against the >>>inherent nature of NFS. Clients in NFS are transient and cannot be >>>trusted to do anything correctly. Therefore, the metadata >>>server should >>>find its own way to write data to the data servers without relying on >>>clients. If proxying through a client is optional, it still seems >>>orthogonal to the behavior of existing installations and the spirit of >>>NFS. Maybe there is a valid use case someone could describe? >>> >>>Dean >>> >>> >>> >>> >>> >>>Yahoo! Groups Links >>> >>>To visit your group on the web, go to: >>> http://groups.yahoo.com/group/pnfs-reqs/ >>> >>>To unsubscribe from this group, send an email to: >>> pnfs-reqs-unsubscribe@yahoogroups.com >>> >>>Your use of Yahoo! Groups is subject to: >>> http://docs.yahoo.com/info/terms/ >>> >>> >>> >>> >>> >>>------------------------ Yahoo! Groups Sponsor >>>---------------------~--> >>>Upgrade to 128-bit SSL Security! >>>http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >>>--------------------------------------------------------------- >>>------~-> >>> >>>Yahoo! Groups Links >>> >>>To visit your group on the web, go to: >>> http://groups.yahoo.com/group/pnfs-reqs/ >>> >>>To unsubscribe from this group, send an email to: >>> pnfs-reqs-unsubscribe@yahoogroups.com >>> >>>Your use of Yahoo! Groups is subject to: >>> http://docs.yahoo.com/info/terms/ >>> >>> >> >> >> >>Yahoo! Groups Links >> >>To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-ops/ >> >>To unsubscribe from this group, send an email to: >> pnfs-ops-unsubscribe@yahoogroups.com >> >>Your use of Yahoo! Groups is subject to: >> http://docs.yahoo.com/info/terms/ >> >> >> >> >> >>------------------------ Yahoo! Groups Sponsor >>---------------------~--> >>Upgrade to 128-bit SSL Security! >>http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/W6uqlB/TM >>--------------------------------------------------------------- >>------~-> >> >>Yahoo! Groups Links >> >>To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >>To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >>Your use of Yahoo! Groups is subject to: >> http://docs.yahoo.com/info/terms/ >> >> > > > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-ops/ > >To unsubscribe from this group, send an email to: > pnfs-ops-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > > > > > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ From ggrider@lanl.gov Thu Jan 22 07:41:17 2004 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 43525 invoked from network); 22 Jan 2004 15:41:10 -0000 Received: from unknown (66.218.66.172) by m9.grp.scd.yahoo.com with QMQP; 22 Jan 2004 15:41:10 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta4.grp.scd.yahoo.com with SMTP; 22 Jan 2004 15:41:12 -0000 Received: from mailrelay1.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i0MFfBdj019460 for ; Thu, 22 Jan 2004 08:41:12 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay1.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i0MFfB82003006 for ; Thu, 22 Jan 2004 08:41:11 -0700 Received: from cthulu.lanl.gov (cthulu.lanl.gov [128.165.115.129]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i0MFfAYi001296; Thu, 22 Jan 2004 08:41:10 -0700 Message-Id: <5.2.0.9.2.20040122084042.015476c8@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Thu, 22 Jan 2004 08:41:36 -0700 To: pnfs-reqs@yahoogroups.com, pnfs-reqs@yahoogroups.com In-Reply-To: Mime-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="=====================_349192==.REL" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: Re: [pnfs-reqs] uploaded a draft of a pNFS problem statement X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs I get This page is currently unavailable Unfortunately, we are unable to process your request at this time. We apologize for the inconvenience. Please try again later. when I tried http://groups.yahoo.com/group/pnfs-reqs/files/draft-ietf-pNFS-problem-statement-v2.doc Thanks Gary At 04:38 AM 1/22/2004 -0800, Garth Gibson wrote: > Based on the feedback in the last two concalls, I have taken a shot at > a pNFS problem statement. The notion of a bottleneck, driven by the > dramatic increase in bandwidth demand coming from clusters, and the > desire to continue to allow filesystems and namespaces to be big and > not specialized to data distribution, are central. I didn't do any > work in the application space sections -- and I have not put in any > citations yet -- sorry. Tom was right -- even with this thin version > it comes out at 8 pages. > > I am happy to take comments and produce revisions, or to turn over the > document to anyone who wants to make a pass through it. > > Talk to you in a couple of hours at the 11-12 EST concall. > > garth > > > Yahoo! Groups Sponsor > ADVERTISEMENT > 54ef2.jpg > 54f7e.jpg > > Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From garth@panasas.com Thu Jan 22 07:51:59 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 37100 invoked from network); 22 Jan 2004 15:51:56 -0000 Received: from unknown (66.218.66.217) by m1.grp.scd.yahoo.com with QMQP; 22 Jan 2004 15:51:56 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta2.grp.scd.yahoo.com with SMTP; 22 Jan 2004 15:51:55 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYLF8S; Thu, 22 Jan 2004 10:51:51 -0500 Mime-Version: 1.0 (Apple Message framework v609) In-Reply-To: <1074774565.645.17240.w2@yahoogroups.com> References: <1074774565.645.17240.w2@yahoogroups.com> Content-Type: multipart/mixed; boundary=Apple-Mail-9--157255413 Message-Id: Date: Thu, 22 Jan 2004 07:51:48 -0800 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.609) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: New file uploaded to pnfs-reqs X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT Clicking on the URL in this message worked for me. Clicking on the file in in the web browser view of the file list did not. So I'll send the file directly :-) On Jan 22, 2004, at 4:29 AM, Yahoo! Groups Notification wrote: > > Hello, > > This email message is a notification to let you know that > a file has been uploaded to the Files area of your pnfs-reqs > group. > > File : /draft-ietf-pNFS-problem-statement-v2.doc > Uploaded by : garth_a_gibson > Description : v0.2 of pNFS problem statement > > You can access the file at the URL > > http://groups.yahoo.com/group/pnfs-reqs/files/draft-ietf-pNFS-problem- > statement-v2.doc > > Your group is currently configured to send you email > notification whenever a member uploads a file. To turn off > notification, visit > > http://groups.yahoo.com/group/pnfs-reqs/join > > Thank you for choosing Yahoo! Groups as your email group > service for the pnfs-reqs group. > > Regards, > > Yahoo! Groups Customer Care > > Your use of Yahoo! Groups is subject to > http://docs.yahoo.com/info/terms/ Attachment (not stored) draft-ietf-pNFS-problem-statement-v2.doc Type: application/applefile Attachment (not stored) draft-ietf-pNFS-problem-statement-v2.doc Type: application/msword From garth@panasas.com Thu Jan 22 08:31:29 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 64677 invoked from network); 22 Jan 2004 16:31:28 -0000 Received: from unknown (66.218.66.167) by m5.grp.scd.yahoo.com with QMQP; 22 Jan 2004 16:31:28 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 22 Jan 2004 16:31:28 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYLGFS; Thu, 22 Jan 2004 11:31:27 -0500 Mime-Version: 1.0 (Apple Message framework v609) Content-Transfer-Encoding: 7bit Message-Id: <6B29A655-4CF8-11D8-B71A-000A95A94F04@panasas.com> Content-Type: text/plain; charset=US-ASCII; format=flowed To: pnfs-reqs@yahoogroups.com Date: Thu, 22 Jan 2004 08:31:24 -0800 X-Mailer: Apple Mail (2.609) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: problem statement feedback concall: Mon Jan 26 8am PST, 11am EST X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Peter Corbett is making a pass through the problem statement tomorrow and we are meeting again for further discussions Monday Jan 26 8am PST, 11am EST at the same conference call dialin numbers we have been using. I believe this time may be inconvenient for some and would be will to schedule other concalls next week, as needed. garth From craigev@us.ibm.com Thu Jan 22 09:48:03 2004 Return-Path: X-Sender: craigev@us.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 10650 invoked from network); 22 Jan 2004 17:48:00 -0000 Received: from unknown (66.218.66.216) by m11.grp.scd.yahoo.com with QMQP; 22 Jan 2004 17:48:00 -0000 Received: from unknown (HELO e34.co.us.ibm.com) (32.97.110.132) by mta1.grp.scd.yahoo.com with SMTP; 22 Jan 2004 17:48:00 -0000 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e34.co.us.ibm.com (8.12.10/8.12.2) with ESMTP id i0MHlc6t361972; Thu, 22 Jan 2004 12:47:48 -0500 Received: from d03nm130.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i0MHlREO092548; Thu, 22 Jan 2004 10:47:27 -0700 To: pnfs-sbc@yahoogroups.com Cc: pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com, pnfs-sbc@yahoogroups.com X-Mailer: Lotus Notes Release 5.0.11 July 24, 2002 Message-ID: Date: Thu, 22 Jan 2004 12:47:23 -0500 X-MIMETrack: Serialize by Router on D03NM130/03/M/IBM(Release 6.0.2CF2|July 23, 2003) at 01/22/2004 10:47:26 MIME-Version: 1.0 Content-type: multipart/related; Boundary="0__=0ABBE4B0DFF2ABD28f9e8a93df938690918c0ABBE4B0DFF2ABD2" X-eGroups-Remote-IP: 32.97.110.132 From: Craig Everhart Subject: Re: [pnfs-sbc] Two Functionality issues X-Yahoo-Group-Post: member; u=67958684 [Catching up slowly, slowly.] Issue 4.1. Yes, I presented separate read and write mappings at CITI to allow clients to participate in a style of copy-on-write processing. (The concept is straight out of the Tank protocol spec.) There are some simple compression techniques that could be used, since for most virtual offsets, at most one block address is defined. It's only while a block is in the middle of an (uncommitted) copy-on-write operation that both would be defined. But I feel that the ability to make the distinction between read and write mappings, as well as the ability sometimes to offer both a read and a write mapping for a block, offers important functionality. Issue 4.2: I agree with Dave Noveck that the functionality is likely useful more broadly than in SBC-mode out-of-band access. Craig Craig Everhart +1 919 543 2169 (tie 441 2169) Inactive hide details for black_david@emc.comblack_david@emc.com black_david@emc.com 01/02/2004 11:45 AM Please respond to pnfs-sbc To: pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com, pnfs-sbc@yahoogroups.com cc: Subject: [pnfs-sbc] Two Functionality issues In starting to look at design issues for block metadata, I've run across a couple of issues around functionality to be supported that could use wider discussion. This is based on an initial review of the EMC High Road FMP protocol and the IBM StorageTank SAN.FS protocol. I've tried to just describe the issues here without taking a position. [4] Functionality SAN.FS extents come with both read and write extent mappings and block usage bitmaps. The separate read and write mappings allow for clients to participate in copy-on- write functionality - IIRC, Craig has described this. Issue [4.1]: Should protocol include support for client participation in copy-on-write? A motivation for the separate arrays of block usage bits" appears to be allowing clients to turn file data into holes (e.g., AIX fclear system call). Issue [4.2]: Is the ability to turn valid data into a file "hole" (e.g., AIX fclear) at the client important to support? FMP does not support separate read mappings or usage bitmaps, and hence is not capable of involving clients in copy-on-write or allowing a client to turn valid data into a file "hole". Comments? Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- ------------------------ Yahoo! Groups Sponsor ---------------------~--> Upgrade to 128-bit SSL Security! http://us.click.yahoo.com/qZ0LdD/yjVHAA/TtwFAA/26EolB/TM ---------------------------------------------------------------------~-> Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-sbc/ To unsubscribe from this group, send an email to: pnfs-sbc-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ Attachment (not stored) pic25436.gif Type: image/gif From pcorbett@netapp.com Sun Jan 25 15:08:58 2004 Return-Path: X-Sender: Peter.Corbett@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 61063 invoked from network); 25 Jan 2004 23:08:58 -0000 Received: from unknown (66.218.66.218) by m4.grp.scd.yahoo.com with QMQP; 25 Jan 2004 23:08:58 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta3.grp.scd.yahoo.com with SMTP; 25 Jan 2004 23:08:58 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0PN8vKw022469 for ; Sun, 25 Jan 2004 15:08:57 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0PN8vDi021318 for ; Sun, 25 Jan 2004 15:08:57 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----_=_NextPart_001_01C3E398.33E2214A" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Sun, 25 Jan 2004 15:08:54 -0800 Message-ID: X-MS-Has-Attach: yes X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: New file uploaded to pnfs-reqs Thread-Index: AcPg/94CRT+QaWyhRKGEv16f3S+5fACl+54Q To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Corbett, Peter" From: "Corbett, Peter" Subject: RE: [pnfs-reqs] Re: New file uploaded to pnfs-reqs X-Yahoo-Group-Post: member; u=44152959 X-Yahoo-Profile: pfcorbett2004 Here is my set of revisions. I did not have quite as much time to work on this as I had hoped to, and it still needs quite a bit of work. Please critique it agressively. I'm not sure I'll be able to make the call tomorrow, but I'll try to dial in for at least the first part of it. Thanks, Peter -----Original Message----- From: Garth Gibson [mailto:garth@panasas.com] Sent: Thursday, January 22, 2004 10:52 AM To: pnfs-reqs@yahoogroups.com Subject: [pnfs-reqs] Re: New file uploaded to pnfs-reqs Clicking on the URL in this message worked for me. Clicking on the file in in the web browser view of the file list did not. So I'll send the file directly :-) On Jan 22, 2004, at 4:29 AM, Yahoo! Groups Notification wrote: > > Hello, > > This email message is a notification to let you know that > a file has been uploaded to the Files area of your pnfs-reqs group. > > File : /draft-ietf-pNFS-problem-statement-v2.doc > Uploaded by : garth_a_gibson > Description : v0.2 of pNFS problem statement > > You can access the file at the URL > > http://groups.yahoo.com/group/pnfs-reqs/files/draft-ietf-pNFS-problem- > statement-v2.doc > > Your group is currently configured to send you email notification > whenever a member uploads a file. To turn off notification, visit > > http://groups.yahoo.com/group/pnfs-reqs/join > > Thank you for choosing Yahoo! Groups as your email group service for > the pnfs-reqs group. > > Regards, > > Yahoo! Groups Customer Care > > Your use of Yahoo! Groups is subject to > http://docs.yahoo.com/info/terms/ Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ Attachment (not stored) draft-ietf-pNFS-problem-statement-v3.doc Type: application/msword From Thomas.Talpey@netapp.com Mon Jan 26 04:12:43 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 73709 invoked from network); 26 Jan 2004 12:12:43 -0000 Received: from unknown (66.218.66.218) by m12.grp.scd.yahoo.com with QMQP; 26 Jan 2004 12:12:43 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta3.grp.scd.yahoo.com with SMTP; 26 Jan 2004 12:12:43 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0QCCgKw009103 for ; Mon, 26 Jan 2004 04:12:42 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0QCCfDi011215 for ; Mon, 26 Jan 2004 04:12:41 -0800 (PST) Received: from tmt.netapp.com ([10.97.1.33]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Mon, 26 Jan 2004 07:12:39 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3E405.B0E0D580" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Mon, 26 Jan 2004 04:12:20 -0800 Message-ID: <5.2.1.1.2.20040126070910.00bf4328@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: New file uploaded to pnfs-reqs Thread-Index: AcPkBbF5FIUKYsaPTPuUjgeSaM2FaQ== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: RE: [pnfs-reqs] Re: New file uploaded to pnfs-reqs X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu I did notice something important in the filename - it cannot be called "draft-ietf-"something at this point. Only official workgroup documents may be titled that way. The norm for an individual (or group) submission is to name it with the principal author, subject and revision, such as draft-someone-pnfs-problem-statement-00.txt The document submission would be rejected otherwise. Tom. At 06:08 PM 1/25/2004, Corbett, Peter wrote: >Here is my set of revisions. I did not have quite as much time to work >on this as I had hoped to, and it still needs quite a bit of work. >Please critique it agressively. I'm not sure I'll be able to make the >call tomorrow, but I'll try to dial in for at least the first part of >it. >Thanks, >Peter > > >-----Original Message----- >From: Garth Gibson [mailto:garth@panasas.com] >Sent: Thursday, January 22, 2004 10:52 AM >To: pnfs-reqs@yahoogroups.com >Subject: [pnfs-reqs] Re: New file uploaded to pnfs-reqs > > >Clicking on the URL in this message worked for me. Clicking on the >file in in the web browser view of the file list did not. > >So I'll send the file directly :-) > >On Jan 22, 2004, at 4:29 AM, Yahoo! Groups Notification wrote: > >> >> Hello, >> >> This email message is a notification to let you know that >> a file has been uploaded to the Files area of your pnfs-reqs group. >> >> File : /draft-ietf-pNFS-problem-statement-v2.doc >> Uploaded by : garth_a_gibson >> Description : v0.2 of pNFS problem statement >> >> You can access the file at the URL >> >> http://groups.yahoo.com/group/pnfs-reqs/files/draft-ietf-pNFS-problem- >> statement-v2.doc >> >> Your group is currently configured to send you email notification >> whenever a member uploads a file. To turn off notification, visit >> >> http://groups.yahoo.com/group/pnfs-reqs/join >> >> Thank you for choosing Yahoo! Groups as your email group service for >> the pnfs-reqs group. >> >> Regards, >> >> Yahoo! Groups Customer Care >> >> Your use of Yahoo! Groups is subject to >> http://docs.yahoo.com/info/terms/ > > > > >Yahoo! Groups Links > >To visit your group on the web, go to: >http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: >pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: >http://docs.yahoo.com/info/terms/ > > > > >Yahoo! Groups Links > >To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > >To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > >Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > From dnoveck@netapp.com Mon Jan 26 12:24:49 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 53563 invoked from network); 26 Jan 2004 20:24:49 -0000 Received: from unknown (66.218.66.172) by m8.grp.scd.yahoo.com with QMQP; 26 Jan 2004 20:24:49 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta4.grp.scd.yahoo.com with SMTP; 26 Jan 2004 20:24:49 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0QKOmKw019872 for ; Mon, 26 Jan 2004 12:24:48 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0QKOmRh008134 for ; Mon, 26 Jan 2004 12:24:48 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Mon, 26 Jan 2004 12:24:42 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Re: New file uploaded to pnfs-reqs Thread-Index: AcPg/94CRT+QaWyhRKGEv16f3S+5fACl+54QACeO4/A= To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] Re: New file uploaded to pnfs-reqs X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck as we discussed at the call today, here is my suggestion for text to replace the last two paragraphs of the introduction in the current document. I basically added a bridge paragraph to introduce the concept of separating data and control, swapped the order of the two existing paragraphs and made some minor adjustments in wording. One way of increasing the bandwidth provided for access through a single file system is to enable access to be provided, in a coherent fashion, through multiple endpoints. Separation of control and data flows provides a straightforward framework to accomplish this, by allowing data transfer to proceed in parallel from many clients to many data storage endpoints. Control and file management operations, inherently difficult to parallelize, remain the province of the single NFS server, while the offloading of data transfer operations serves to provide the requisite bandwidth scalability. Data transfer may proceed using NFS or other protocols suitable for the purpose such as iSCSI. Today the file system marketplace offers a number of proprietary alternatives to NFS servers that provide separated control and data flow. Examples include EMC High Road and IBM TotalStorage SAN FS. The lack of interoperability between these proprietary approaches hinders their adoption. An approach that solves the bandwidth problem using NFS is most desirable. By standardizing the key architectural features of separated control and data flows, a range of competitive and interoperable implementations can be provided. Moreover the industry's large investment in NFS would be protected. Without such an NFS-based solution to the bandwidth bottleneck, other file access approaches will compete with NFS (and probably with each other), causing a range of interoperability difficulties, compromising the benefits provided by a standard file access protocol. An approach that separates control and data flow and provides for data access through other protocols has additional benefits. Even though NFS is widely used as a network file system protocol, most of the world's data resides in data stores that are not accessible through NFS. Much of this data is stored in Storage Area Networks, accessible by Fibre Channel Protocol, or increasingly, by iSCSI. Storage Area Networks do not have the simple management capability that comes from a file system, that associates data with named objects in a hierarchical namespace. Such capabilities can be provided by NFS, while leveraging the existing SAN data access infrastructure, all within a common architectural framework. -----Original Message----- From: Corbett, Peter Sent: Sunday, January 25, 2004 6:09 PM To: pnfs-reqs@yahoogroups.com Subject: RE: [pnfs-reqs] Re: New file uploaded to pnfs-reqs Here is my set of revisions. I did not have quite as much time to work on this as I had hoped to, and it still needs quite a bit of work. Please critique it agressively. I'm not sure I'll be able to make the call tomorrow, but I'll try to dial in for at least the first part of it. Thanks, Peter -----Original Message----- From: Garth Gibson [mailto:garth@panasas.com] Sent: Thursday, January 22, 2004 10:52 AM To: pnfs-reqs@yahoogroups.com Subject: [pnfs-reqs] Re: New file uploaded to pnfs-reqs Clicking on the URL in this message worked for me. Clicking on the file in in the web browser view of the file list did not. So I'll send the file directly :-) On Jan 22, 2004, at 4:29 AM, Yahoo! Groups Notification wrote: > > Hello, > > This email message is a notification to let you know that > a file has been uploaded to the Files area of your pnfs-reqs group. > > File : /draft-ietf-pNFS-problem-statement-v2.doc > Uploaded by : garth_a_gibson > Description : v0.2 of pNFS problem statement > > You can access the file at the URL > > http://groups.yahoo.com/group/pnfs-reqs/files/draft-ietf-pNFS-problem- > statement-v2.doc > > Your group is currently configured to send you email notification > whenever a member uploads a file. To turn off notification, visit > > http://groups.yahoo.com/group/pnfs-reqs/join > > Thank you for choosing Yahoo! Groups as your email group service for > the pnfs-reqs group. > > Regards, > > Yahoo! Groups Customer Care > > Your use of Yahoo! Groups is subject to > http://docs.yahoo.com/info/terms/ Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From garth@panasas.com Mon Jan 26 16:56:01 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 69141 invoked from network); 27 Jan 2004 00:56:00 -0000 Received: from unknown (66.218.66.166) by m1.grp.scd.yahoo.com with QMQP; 27 Jan 2004 00:56:00 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta5.grp.scd.yahoo.com with SMTP; 27 Jan 2004 00:56:00 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYLZL4; Mon, 26 Jan 2004 19:55:58 -0500 Mime-Version: 1.0 (Apple Message framework v609) Content-Transfer-Encoding: quoted-printable Message-Id: <8EE09C50-5063-11D8-A540-000A95A94F04@panasas.com> Content-Type: text/plain; charset=WINDOWS-1252; format=flowed To: pnfs-reqs@yahoogroups.com Date: Mon, 26 Jan 2004 16:55:53 -0800 X-Mailer: Apple Mail (2.609) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: here are some citations that may work for the problem statement X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Requirements for Bandwidth ======== SGS File System RFP, DOE NNCA and DOD NSA, April 25, 2001. Knott, T., "Computing colossus," BP Frontiers magazine, Issue 6, April 2003, http://www.bp.com/frontiers. striping over file servers ========= John H. Hartman and John K. Ousterhout, "The Zebra Striped Network File System," ACM Transactions on Computer Systems 13, 3, August 1995, 279-310. CMU NASD =================== Gibson, G. A., et. al., A Cost-Effective, High-Bandwidth Storage Architecture, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1998. Amiri, K., Gibson, G.A., Golding, R., "Highly Concurrent Shared Storage," Int. Conf. on Distributed Computing Systems (ICDCS00), April 2000. Panasas ====================== Garth A. Gibson, Brent B. Welch, David F. Nagle, Bruce C. Moxon, "Object Storage: Scalable Bandwidth for HPC Clusters," Proc. of the ClusterWorld Conference and Expo June 23-26, 2003, in San Jose, CA, www.clusterworld.com. IBM Objects =================== Azagury, A., Dreizin, V., Factor, M., Henis, E., Naor, D., Rinetzky, N., Satran, J., Tavory, A., Yerushalmi, L, Towards an Object Store, IBM Storage Systems Technology Workshop, November 2002. Rodeh, O., Schonfeld, U., Teperman, A., zFS - A Scalable distributed File System using Object Disks, IBM Storage Systems Technology Workshop, November 2002. Miller, E. L., Freeman, W. E., Long, D. E., Reed, B. C., "Strong Security for Network Attached Storage," USENIX Conference on File and Storage Technologies (FAST), 2002. Other object-like solutions ========= Lee, E., Thekkath, C. Petal, Distributed virtual disks, ACM 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) October, 1996. Lustre: A Scalable, High Performance File System, Cluster File System, Inc., 2003, http://www.lustre.org/docs.html. Other products ================= SAN FS Sanergy GPFS High Road Sistina CXFS QFS -- Harriet Coverston, "Enabling Advanced Data Management with Sun StorEdge(TM) QFS/SAM-FS 4.0", Twentieth IEEE / Eleventh NASA Goddard Conference on Mass Storage Systems and Technologies, April 2003. DAFS ====== MSST coming up File System Workload Analysis For Large Scientific Computing Applications, Feng Wang, Qin Xin, Bo Hong, Ethan L. Miller, Darrell D. E. Long, Scott A. Brandt, University of California, Santa Cruz, Tyce T. McLarty, Lawrence Livermore National Laboratory From pcorbett@netapp.com Tue Jan 27 06:24:33 2004 Return-Path: X-Sender: Peter.Corbett@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 62706 invoked from network); 27 Jan 2004 14:16:28 -0000 Received: from unknown (66.218.66.172) by m11.grp.scd.yahoo.com with QMQP; 27 Jan 2004 14:16:28 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta4.grp.scd.yahoo.com with SMTP; 27 Jan 2004 14:16:28 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0REG3Kw014476 for ; Tue, 27 Jan 2004 06:16:03 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0REG3Rh021660 for ; Tue, 27 Jan 2004 06:16:03 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----_=_NextPart_001_01C3E4E0.160A787C" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Tue, 27 Jan 2004 06:15:59 -0800 Message-ID: X-MS-Has-Attach: yes X-MS-TNEF-Correlator: Thread-Topic: new version of problem statement Thread-Index: AcPk4BMGlQX8HwJSRrO/v8ehVEFedQ== To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Corbett, Peter" From: "Corbett, Peter" Subject: new version of problem statement X-Yahoo-Group-Post: member; u=44152959 X-Yahoo-Profile: pfcorbett2004 Here is a new version of the problem statement. There are still some gaps, especially in the application section. I incorporated the paragraph Dave wrote in the introduction. I made a large number of local changes, and a few broader changes, moving a few paragraphs around, and deleting some repetitive content. I think it is getting better. It is still making the same point over and over again. And it is repeating the same point, making it several times. Garth, I didn't add your references. Can you do that? Also, I couldn't track down the spelling of Benny's last name for the Ack section. Garth, you will also need to add your address info. I am going to pass the token now, as I don't think I'll have any more time to work on this before I leave on vacation Friday. Please forward comments to the group. Thanks, Peter <> Attachment (not stored) draft-ietf-pNFS-problem-statement-v3.doc Type: application/msword From dnoveck@netapp.com Tue Jan 27 08:01:01 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 19614 invoked from network); 27 Jan 2004 16:00:57 -0000 Received: from unknown (66.218.66.216) by m15.grp.scd.yahoo.com with QMQP; 27 Jan 2004 16:00:57 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta1.grp.scd.yahoo.com with SMTP; 27 Jan 2004 16:00:57 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0RFjlKw000958 for ; Tue, 27 Jan 2004 07:45:47 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0RFjkDi001953 for ; Tue, 27 Jan 2004 07:45:46 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3E4EC.9F2B86AF" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Tue, 27 Jan 2004 07:45:43 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: new version of problem statement Thread-Index: AcPk4BMGlQX8HwJSRrO/v8ehVEFedQAC3a3g To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck ADVERTISEMENT I just noticed that although Peter incorporated my paragraphs in the Introduction, the two paragraphs that they were intended to replace are still there as well. Garth, when you make the next pass, could you delete the fourth and fifth paragraphs of the introduction. One other issue is that I still think that the last and penultimate paragraphs of the introduction are better swapped. What do other people think about this? -----Original Message----- From: Corbett, Peter Sent: Tuesday, January 27, 2004 9:16 AM To: pnfs-reqs@yahoogroups.com Subject: [pnfs-reqs] new version of problem statement Here is a new version of the problem statement. There are still some gaps, especially in the application section. I incorporated the paragraph Dave wrote in the introduction. I made a large number of local changes, and a few broader changes, moving a few paragraphs around, and deleting some repetitive content. I think it is getting better. It is still making the same point over and over again. And it is repeating the same point, making it several times. Garth, I didn't add your references. Can you do that? Also, I couldn't track down the spelling of Benny's last name for the Ack section. Garth, you will also need to add your address info. I am going to pass the token now, as I don't think I'll have any more time to work on this before I leave on vacation Friday. Please forward comments to the group. Thanks, Peter <> Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ * To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From bhalevy@panasas.com Tue Jan 27 14:18:06 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 35112 invoked from network); 27 Jan 2004 17:50:20 -0000 Received: from unknown (66.218.66.218) by m4.grp.scd.yahoo.com with QMQP; 27 Jan 2004 17:50:20 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 27 Jan 2004 17:50:19 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Tue, 27 Jan 2004 12:50:05 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D387D5@PIKES.panasas.com> To: "'pnfs-reqs@yahoogroups.com'" Date: Tue, 27 Jan 2004 12:50:05 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3E4FD.FEDE34A0" X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy What you suggest makes sense. I'd move the last sentence of the "proprietary systems" paragraph to the grand finale since it is pretty much repeated in the other paragraph. That will make these two paragraphs look like this: Today the file system marketplace offers a number of proprietary systems that provide separated control and data flow. Examples include EMC High Road and IBM TotalStorage SAN FS. The lack of interoperability between these proprietary systems hinders their adoption. An approach that solves the bandwidth problem using NFS is desirable. By standardizing the key architectural features of separated control and data flows, a range of competitive and interoperable implementations can be provided. Such an approach has additional benefits. Even though NFS is widely used as a network file system protocol, most of the world's data resides in data stores that are not accessible through NFS. Much of this data is stored in Storage Area Networks, accessible by Fibre Channel Protocol, or increasingly, by iSCSI. Storage Area Networks do not have the simple management capability that comes from a file system, which associates data with named objects in a hierarchical namespace. Such capabilities can be provided by NFS, while leveraging the existing SAN data access infrastructure, all within a common architectural framework protecting the industry's large investment both in NFS and in SAN storage infrastructure. -----Original Message----- From: Noveck, Dave [mailto:dnoveck@netapp.com] Sent: Tuesday, January 27, 2004 10:46 AM To: pnfs-reqs@yahoogroups.com Subject: RE: [pnfs-reqs] new version of problem statement I just noticed that although Peter incorporated my paragraphs in the Introduction, the two paragraphs that they were intended to replace are still there as well. Garth, when you make the next pass, could you delete the fourth and fifth paragraphs of the introduction. One other issue is that I still think that the last and penultimate paragraphs of the introduction are better swapped. What do other people think about this? -----Original Message----- From: Corbett, Peter Sent: Tuesday, January 27, 2004 9:16 AM To: pnfs-reqs@yahoogroups.com Subject: [pnfs-reqs] new version of problem statement Here is a new version of the problem statement. There are still some gaps, especially in the application section. I incorporated the paragraph Dave wrote in the introduction. I made a large number of local changes, and a few broader changes, moving a few paragraphs around, and deleting some repetitive content. I think it is getting better. It is still making the same point over and over again. And it is repeating the same point, making it several times. Garth, I didn't add your references. Can you do that? Also, I couldn't track down the spelling of Benny's last name for the Ack section. Garth, you will also need to add your address info. I am going to pass the token now, as I don't think I'll have any more time to work on this before I leave on vacation Friday. Please forward comments to the group. Thanks, Peter <> Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ * To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ * To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From ggrider@lanl.gov Tue Jan 27 17:23:09 2004 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 71483 invoked from network); 28 Jan 2004 01:21:50 -0000 Received: from unknown (66.218.66.217) by m9.grp.scd.yahoo.com with QMQP; 28 Jan 2004 01:21:50 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta2.grp.scd.yahoo.com with SMTP; 28 Jan 2004 01:21:53 -0000 Received: from mailrelay2.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i0S1K6hE030602 for ; Tue, 27 Jan 2004 18:20:06 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay2.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i0S1K5sI012576; Tue, 27 Jan 2004 18:20:06 -0700 Received: from cthulu.lanl.gov (vpn-client-131.lanl.gov [128.165.253.131]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i0S1JtYi024530; Tue, 27 Jan 2004 18:19:59 -0700 Message-Id: <5.2.0.9.2.20040127181650.01609868@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Tue, 27 Jan 2004 18:19:52 -0700 To: pnfs-reqs@yahoogroups.com, "'pnfs-reqs@yahoogroups.com'" Cc: garth Gibson In-Reply-To: <30489F1321F5C343ACF6872B2CF7942A05D387D5@PIKES.panasas.com > Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=====================_16215676==.ALT" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: RE: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs I worked on the cluster applications section a bit. Here is where I am at: --------------------------------------------------------------------------------------------------------------------------------------- Clustered Applications There is a large number of clustered applications in many industry verticals that require bandwidth scaling of a single file system beyond what is possible with a single NFS server/network endpoint. Industries that have these applications include; industries that use high performance computation in science and engineering like universities and government laboratories, automotive, and aerospace; industries that do large scale data analysis like seismic, genomics, government intelligence, and business intelligence; and industries that create and use data for viewing and interacting with such as rendering, video production, video distribution, gaming, web serving, archiving etc. Many different I/O models are used in these industries, but all require bandwidths that extend to tens of gigabytes/sec, sometimes to/from a single file, sometimes to multiple files in the same directory, and sometimes from multiple files in different directories. With clustered computing becoming the prevalent way to address these applications needs, it will always be relatively easy to have scaled bandwidth needs that go well beyond a single NFS server/network endpoint, so the problem this proposal is addressing will not go away with time, in fact as scaling clusters to larger processor counts gets easier, the problem will get worse. In addition to the above data intensive cluster oriented applications, there has been increasing use of NFS file servers as the storage subsystem for databases. The traditional alternative has been to store databases in raw storage partitions, either on locally attached disks, or more commonly, in a Fibre Channel attached Storage Area Network. An advantage of the file-based approach is that it allows easier management of the data, especially in environments where there are very large numbers of database tables. However, the bandwidth achievable by the database servers to the file server is limited. In a SAN-based environment, individual tables stored on individual devices can become a hotspot. Tables can be distributed across a number of SAN devices connected to a number of Fibre Channels, to increase bandwidth. However, this introduces a significant degree of complexity to determine the best layout. This proposal addresses the issue of limited bandwidth from an NFS server by parallelizing data access to a single file system across a number of data servers. This allows increased bandwidth, comparable to that achieved from SAN storage. At the same time, it provides the benefits that accrue from using file-based storage. The parallelization of the file data can be done in such a way that the bandwidth achievable is robust across a wide variety of workloads. This can be accomplished without a large administrative burden. There is no shortage of applications that stretch standard single end point NFS file servers. NFS is well poised to assist in providing a standards based solution to help these applications that users and sites can deploy confidently. --------------------------------------------------------------------------------------------------------------------------------- Hope this helps a bit. If I missed an area you would like some words on, please let me know, I could probably get to it tonight, I hope. Thanks Gary At 12:50 PM 1/27/2004 -0500, Halevy, Benny wrote: > What you suggest makes sense. I'd move the last sentence of the "proprietary systems" > paragraph to the grand finale since it is pretty much repeated in the other paragraph. > > That will make these two paragraphs look like this: > > > Today the file system marketplace offers a number of proprietary systems that provide separated control and data flow. Examples include EMC High Road and IBM TotalStorage SAN FS. The lack of interoperability between these proprietary systems hinders their adoption. An approach that solves the bandwidth problem using NFS is desirable. By standardizing the key architectural features of separated control and data flows, a range of competitive and interoperable implementations can be provided. > > > > Such an approach has additional benefits. Even though NFS is widely used as a network file system protocol, most of the world's data resides in data stores that are not accessible through NFS. Much of this data is stored in Storage Area Networks, accessible by Fibre Channel Protocol, or increasingly, by iSCSI. Storage Area Networks do not have the simple management capability that comes from a file system, which associates data with named objects in a hierarchical namespace. Such capabilities can be provided by NFS, while leveraging the existing SAN data access infrastructure, all within a common architectural framework protecting the industry's large investment both in NFS and in SAN storage infrastructure. > > -----Original Message----- > From: Noveck, Dave [mailto:dnoveck@netapp.com] > Sent: Tuesday, January 27, 2004 10:46 AM > To: pnfs-reqs@yahoogroups.com > Subject: RE: [pnfs-reqs] new version of problem statement > > I just noticed that although Peter incorporated my paragraphs in the Introduction, the two paragraphs that they were intended to replace are still there as well. > > Garth, when you make the next pass, could you delete the fourth and fifth paragraphs of the introduction. > > One other issue is that I still think that the last and penultimate paragraphs of the introduction are better swapped. What do other people think about this? > -----Original Message----- > From: Corbett, Peter > Sent: Tuesday, January 27, 2004 9:16 AM > To: pnfs-reqs@yahoogroups.com > Subject: [pnfs-reqs] new version of problem statement > > Here is a new version of the problem statement. There are still some gaps, especially in the application section. I incorporated the paragraph Dave wrote in the introduction. I made a large number of local changes, and a few broader changes, moving a few paragraphs around, and deleting some repetitive content. I think it is getting better. It is still making the same point over and over again. And it is repeating the same point, making it several times. > > Garth, I didn't add your references. Can you do that? Also, I couldn't track down the spelling of Benny's last name for the Ack section. Garth, you will also need to add your address info. > > I am going to pass the token now, as I don't think I'll have any more time to work on this before I leave on vacation Friday. Please forward comments to the group. > > Thanks, > Peter > > <> > > > Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. > > Yahoo! Groups Links > o To visit your group on the web, go to: > o http://groups.yahoo.com/group/pnfs-reqs/ > o > o To unsubscribe from this group, send an email to: > o pnfs-reqs-unsubscribe@yahoogroups.com > o > o Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. > > Yahoo! Groups Links > + To visit your group on the web, go to: > + http://groups.yahoo.com/group/pnfs-reqs/ > + > + To unsubscribe from this group, send an email to: > + pnfs-reqs-unsubscribe@yahoogroups.com > + > + Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From garth@panasas.com Thu Jan 29 06:05:31 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 77144 invoked from network); 29 Jan 2004 14:05:30 -0000 Received: from unknown (66.218.66.167) by m5.grp.scd.yahoo.com with QMQP; 29 Jan 2004 14:05:30 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 29 Jan 2004 14:05:30 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYMHA4; Thu, 29 Jan 2004 09:03:28 -0500 Mime-Version: 1.0 (Apple Message framework v609) In-Reply-To: <5.2.0.9.2.20040127181650.01609868@cic-mail.lanl.gov> References: <5.2.0.9.2.20040127181650.01609868@cic-mail.lanl.gov> Content-Type: multipart/mixed; boundary=Apple-Mail-3-441032590 Message-Id: Date: Thu, 29 Jan 2004 06:03:16 -0800 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.609) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Here is my Wed pass -- most of the work was in the applications and citations sections, but I also did quite a bit in the introduction and a little bit in other places. Weekly concall is at 11am EST today. garth Attachment (not stored) draft-gibson-prob-st-00.doc Type: application/applefile Attachment (not stored) draft-gibson-prob-st-00.doc Type: application/msword From Brian.Pawlowski@netapp.com Thu Jan 29 06:27:08 2004 Return-Path: X-Sender: beepy@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 33797 invoked from network); 29 Jan 2004 14:26:58 -0000 Received: from unknown (66.218.66.167) by m20.grp.scd.yahoo.com with QMQP; 29 Jan 2004 14:26:58 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta6.grp.scd.yahoo.com with SMTP; 29 Jan 2004 14:26:55 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0TEQhKw015463 for ; Thu, 29 Jan 2004 06:26:43 -0800 (PST) Received: from tooting-fe.eng.netapp.com (tooting-fe.eng.netapp.com [10.56.10.118]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0TEQhDi004514 for ; Thu, 29 Jan 2004 06:26:43 -0800 (PST) Received: (from beepy@localhost) by tooting-fe.eng.netapp.com (8.11.6+Sun/8.11.6) id i0TEQh717920 for pnfs-reqs@yahoogroups.com; Thu, 29 Jan 2004 06:26:43 -0800 (PST) Message-Id: <200401291426.i0TEQh717920@tooting-fe.eng.netapp.com> In-Reply-To: from Garth Gibson at "Dec 18, 3 05:37:50 pm" To: pnfs-reqs@yahoogroups.com Date: Thu, 29 Jan 2004 06:26:43 -0800 (PST) X-Mailer: ELM [version 2.4ME++ PL40 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: Brian Pawlowski From: Brian Pawlowski Subject: How am I identified in pnfs-reqs and -ops mail lists? X-Yahoo-Group-Post: member; u=169504717 Can't get into archives - it's getting cranky. My e-mail address is beepy@netapp.com Did you enter me in some other way? beepy From Brian.Pawlowski@netapp.com Thu Jan 29 06:35:10 2004 Return-Path: X-Sender: beepy@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 52196 invoked from network); 29 Jan 2004 14:35:09 -0000 Received: from unknown (66.218.66.166) by m12.grp.scd.yahoo.com with QMQP; 29 Jan 2004 14:35:09 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta5.grp.scd.yahoo.com with SMTP; 29 Jan 2004 14:35:09 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i0TEVKKw016076 for ; Thu, 29 Jan 2004 06:31:20 -0800 (PST) Received: from tooting-fe.eng.netapp.com (tooting-fe.eng.netapp.com [10.56.10.118]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i0TEVKRh012071 for ; Thu, 29 Jan 2004 06:31:20 -0800 (PST) Received: (from beepy@localhost) by tooting-fe.eng.netapp.com (8.11.6+Sun/8.11.6) id i0TEVJa18450; Thu, 29 Jan 2004 06:31:19 -0800 (PST) Message-Id: <200401291431.i0TEVJa18450@tooting-fe.eng.netapp.com> In-Reply-To: <200401291426.i0TEQh717920@tooting-fe.eng.netapp.com> from Brian Pawlowski at "Jan 29, 4 06:26:43 am" To: pnfs-reqs@yahoogroups.com Date: Thu, 29 Jan 2004 06:31:19 -0800 (PST) Cc: pnfs-reqs@yahoogroups.com X-Mailer: ELM [version 2.4ME++ PL40 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: Brian Pawlowski From: Brian Pawlowski Subject: Re: [pnfs-reqs] How am I identified in pnfs-reqs and -ops mail lists? X-Yahoo-Group-Post: member; u=169504717 Great - meant to send that to Garth - sorry:-) > Can't get into archives - it's getting cranky. > > My e-mail address is beepy@netapp.com > > Did you enter me in some other way? > > beepy > > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > From garth@panasas.com Thu Jan 29 08:16:52 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 80097 invoked from network); 29 Jan 2004 16:16:51 -0000 Received: from unknown (66.218.66.172) by m15.grp.scd.yahoo.com with QMQP; 29 Jan 2004 16:16:51 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 29 Jan 2004 16:16:43 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYMHPB; Thu, 29 Jan 2004 11:16:06 -0500 In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v609) Content-Type: multipart/mixed; boundary=Apple-Mail-10-448991504 Message-Id: <6AA27F46-5276-11D8-A5D8-000A95A94F04@panasas.com> Cc: Peter Corbett Date: Thu, 29 Jan 2004 08:15:55 -0800 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.609) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Here is a PDF version. garth On Jan 29, 2004, at 6:03 AM, Garth Gibson wrote: > Here is my Wed pass -- most of the work was in the applications and > citations sections, but I also did quite a bit in the introduction and > a little bit in other places. > > Weekly concall is at 11am EST today. > > garth > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > Attachment (not stored) draft-gibson-prob-st-00..pdf Type: application/pdf From dhildebz@eecs.umich.edu Sun Feb 01 19:52:38 2004 Return-Path: X-Sender: dhildebz@eecs.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 19117 invoked from network); 2 Feb 2004 03:52:34 -0000 Received: from unknown (66.218.66.217) by m16.grp.scd.yahoo.com with QMQP; 2 Feb 2004 03:52:34 -0000 Received: from unknown (HELO willow.eecs.umich.edu) (141.213.4.14) by mta2.grp.scd.yahoo.com with SMTP; 2 Feb 2004 03:52:34 -0000 Received: from willow.eecs.umich.edu (localhost.eecs.umich.edu [127.0.0.1]) by willow.eecs.umich.edu (8.12.11/8.12.11) with ESMTP id i123qLae013573 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 1 Feb 2004 22:52:22 -0500 Received: from localhost (dhildebz@localhost) by willow.eecs.umich.edu (8.12.11/8.12.11/Submit) with ESMTP id i123qLUu013570; Sun, 1 Feb 2004 22:52:21 -0500 X-Authentication-Warning: willow.eecs.umich.edu: dhildebz owned process doing -bs Date: Sun, 1 Feb 2004 22:52:21 -0500 (EST) To: pnfs-reqs@yahoogroups.com Cc: Peter Corbett In-Reply-To: <6AA27F46-5276-11D8-A5D8-000A95A94F04@panasas.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-eGroups-Remote-IP: 141.213.4.14 From: Dean Hildebrand Subject: Re: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=169352062 X-Yahoo-Profile: seattleplus A couple comments: (1) Page 3, paragraph 2 > Storage Area Networks routinely provide much higher data bandwidths than >do NFS file servers. Unfortunately, the simple array of blocks interface >into Storage Area Networks does not lend itself to sharing data among the >clients in a cluster. NFS file service, with its hierarchical namespace >of separately controlled files, offers simpler and more cost-effective >management. One might conclude that users must chose between high >bandwidth and data sharing. I'm wondering if the concept of 'data sharing' is obvious here. I mean, maybe it should be expanded to make it clear it is talking about file consistency and such. Page 6, 2nd paragraph (Eliminating the bottleneck) There is no mention here of NFSv4 state information. I beleive the fact that NFSv4 state information prevents exporting the same file via multiple NFSv4 servers (as was done in v3) should be mentioned. Dean On Thu, 29 Jan 2004, Garth Gibson wrote: > Here is a PDF version. > garth > > On Jan 29, 2004, at 6:03 AM, Garth Gibson wrote: > > > Here is my Wed pass -- most of the work was in the applications and > > citations sections, but I also did quite a bit in the introduction and > > a little bit in other places. > > > > Weekly concall is at 11am EST today. > > > > garth > > > > Yahoo! Groups Links > > > > To visit your group on the web, go to: > > http://groups.yahoo.com/group/pnfs-reqs/ > > > > To unsubscribe from this group, send an email to: > > pnfs-reqs-unsubscribe@yahoogroups.com > > > > Your use of Yahoo! Groups is subject to: > > http://docs.yahoo.com/info/terms/ > > > > > > > > ________________________________________________________________________________ > Yahoo! Groups Links > * To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > * To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. > > From garth@panasas.com Tue Feb 03 08:30:00 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 66954 invoked from network); 3 Feb 2004 16:29:46 -0000 Received: from unknown (66.218.66.172) by m13.grp.scd.yahoo.com with QMQP; 3 Feb 2004 16:29:46 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 3 Feb 2004 16:29:42 -0000 Received: from [172.17.3.217] ([172.17.3.217]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYM9F5; Tue, 3 Feb 2004 11:29:34 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: References: Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <231E5860-5666-11D8-8D65-000A95A94F04@panasas.com> Content-Transfer-Encoding: quoted-printable Date: Tue, 3 Feb 2004 11:29:28 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT Dean, Thanks! I will clarify data sharing in this paragraph. In my experience these two words "data sharing" are the most common way the industry distinguishes any file system from a pure logical volume, so I don't think there is much risk of confusion. As to NFSv4 versus NFSv3, exporting a writable filesystem through multiple server addresses is rarely done in v3, though a few vendors do this. Those same vendors, at least, are very likely to export a writable filesystem through v4 over multiple servers, The key thing as I see it is that neither v3 nor v4 clients have a clue how to use more than one server address to spread out their work. I'd be happy to add that v4's additional statefulness adds to the complexity of exporting the same filesystem through multiple servers, though as I just argued, I think it is tangential to the main point. garth On Feb 1, 2004, at 10:52 PM, Dean Hildebrand wrote: > A couple comments: > > (1) > Page 3, paragraph 2 >> Storage Area Networks routinely provide much higher data bandwidths >> than >> do NFS file servers. Unfortunately, the simple array of blocks >> interface >> into Storage Area Networks does not lend itself to sharing data among >> the >> clients in a cluster. NFS file service, with its hierarchical >> namespace >> of separately controlled files, offers simpler and more cost-effective >> management. One might conclude that users must chose between high >> bandwidth and data sharing. > I'm wondering if the concept of 'data sharing' is obvious here. I > mean, > maybe it should be expanded to make it clear it is talking about file > consistency and such. > > Page 6, 2nd paragraph (Eliminating the bottleneck) > There is no mention here of NFSv4 state information. I beleive the > fact > that NFSv4 state information prevents exporting the same file via > multiple > NFSv4 servers (as was done in v3) should be mentioned. > > Dean > > On Thu, 29 Jan 2004, Garth Gibson wrote: > >> Here is a PDF version. >> garth >> >> On Jan 29, 2004, at 6:03 AM, Garth Gibson wrote: >> >>> Here is my Wed pass -- most of the work was in the applications and >>> citations sections, but I also did quite a bit in the introduction >>> and >>> a little bit in other places. >>> >>> Weekly concall is at 11am EST today. >>> >>> garth >>> >>> Yahoo! Groups Links >>> >>> To visit your group on the web, go to: >>> http://groups.yahoo.com/group/pnfs-reqs/ >>> >>> To unsubscribe from this group, send an email to: >>> pnfs-reqs-unsubscribe@yahoogroups.com >>> >>> Your use of Yahoo! Groups is subject to: >>> http://docs.yahoo.com/info/terms/ >>> >>> >> >> >> >> ______________________________________________________________________ >> __________ >> Yahoo! Groups Links >> * To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >> * To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >> * Your use of Yahoo! Groups is subject to the Yahoo! Terms of >> Service. >> >> > > > > > ------------------------ Yahoo! Groups Sponsor > ---------------------~--> > Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark > Printer at MyInks.com. Free s/h on orders $50 or more to the US & > Canada. > http://www.c1tracking.com/l.asp?cid=5511 > http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM > --------------------------------------------------------------------- > ~-> > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > From garth@panasas.com Tue Feb 03 08:58:31 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 64240 invoked from network); 3 Feb 2004 16:57:51 -0000 Received: from unknown (66.218.66.166) by m4.grp.scd.yahoo.com with QMQP; 3 Feb 2004 16:57:51 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta5.grp.scd.yahoo.com with SMTP; 3 Feb 2004 16:57:51 -0000 Received: from [172.17.2.81] ([172.17.2.81]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYM9KV; Tue, 3 Feb 2004 11:57:45 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Transfer-Encoding: 7bit Message-Id: <138EA0C4-566A-11D8-8D65-000A95A94F04@panasas.com> Content-Type: text/plain; charset=US-ASCII; format=flowed To: pnfs-reqs@yahoogroups.com Date: Tue, 3 Feb 2004 11:57:40 -0500 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: call today not needed X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT We had tentatively scheduled a pNFS concall today at 3pm EST if needed. I heard from Spencer this morning and he thought the current draft suited its purpose (still working to talk to Brian). And the only comment on this mailing list was from Dean (thanks Dean!) -- I'll add clarifications for his comments today. garth From dhildebz@eecs.umich.edu Tue Feb 03 09:51:22 2004 Return-Path: X-Sender: dhildebz@eecs.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 78745 invoked from network); 3 Feb 2004 17:51:22 -0000 Received: from unknown (66.218.66.167) by m6.grp.scd.yahoo.com with QMQP; 3 Feb 2004 17:51:22 -0000 Received: from unknown (HELO willow.eecs.umich.edu) (141.213.4.14) by mta6.grp.scd.yahoo.com with SMTP; 3 Feb 2004 17:51:21 -0000 Received: from willow.eecs.umich.edu (localhost.eecs.umich.edu [127.0.0.1]) by willow.eecs.umich.edu (8.12.11/8.12.11) with ESMTP id i13HnYkG006983 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 3 Feb 2004 12:49:35 -0500 Received: from localhost (dhildebz@localhost) by willow.eecs.umich.edu (8.12.11/8.12.11/Submit) with ESMTP id i13HnYVV006980 for ; Tue, 3 Feb 2004 12:49:34 -0500 X-Authentication-Warning: willow.eecs.umich.edu: dhildebz owned process doing -bs Date: Tue, 3 Feb 2004 12:49:34 -0500 (EST) To: pnfs-reqs@yahoogroups.com In-Reply-To: <231E5860-5666-11D8-8D65-000A95A94F04@panasas.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-eGroups-Remote-IP: 141.213.4.14 From: Dean Hildebrand Subject: Re: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=169352062 X-Yahoo-Profile: seattleplus I think you are right that the problem statement is not directly concerned with the statefulness of NFSv4. Adding something about it, or even adding what you said about spreading the work over the multiple servers might be useful. I also noticed some spelling/double word things that I can send along if useful. Dean On Tue, 3 Feb 2004, Garth Gibson wrote: > Dean, > > Thanks! > > I will clarify data sharing in this paragraph. In my experience these > two words "data sharing" are the most common way the industry > distinguishes any file system from a pure logical volume, so I don't > think there is much risk of confusion. > > As to NFSv4 versus NFSv3, exporting a writable filesystem through > multiple server addresses is rarely done in v3, though a few vendors do > this. Those same vendors, at least, are very likely to export a > writable filesystem through v4 over multiple servers, The key thing as > I see it is that neither v3 nor v4 clients have a clue how to use more > than one server address to spread out their work. > > I'd be happy to add that v4's additional statefulness adds to the > complexity of exporting the same filesystem through multiple servers, > though as I just argued, I think it is tangential to the main point. > > garth > > > On Feb 1, 2004, at 10:52 PM, Dean Hildebrand wrote: > > > A couple comments: > > > > (1) > > Page 3, paragraph 2 > >> Storage Area Networks routinely provide much higher data bandwidths > >> than > >> do NFS file servers. Unfortunately, the simple array of blocks > >> interface > >> into Storage Area Networks does not lend itself to sharing data among > >> the > >> clients in a cluster. NFS file service, with its hierarchical > >> namespace > >> of separately controlled files, offers simpler and more cost-effective > >> management. One might conclude that users must chose between high > >> bandwidth and data sharing. > > I'm wondering if the concept of 'data sharing' is obvious here. I > > mean, > > maybe it should be expanded to make it clear it is talking about file > > consistency and such. > > > > Page 6, 2nd paragraph (Eliminating the bottleneck) > > There is no mention here of NFSv4 state information. I beleive the > > fact > > that NFSv4 state information prevents exporting the same file via > > multiple > > NFSv4 servers (as was done in v3) should be mentioned. > > > > Dean > > > > On Thu, 29 Jan 2004, Garth Gibson wrote: > > > >> Here is a PDF version. > >> garth > >> > >> On Jan 29, 2004, at 6:03 AM, Garth Gibson wrote: > >> > >>> Here is my Wed pass -- most of the work was in the applications and > >>> citations sections, but I also did quite a bit in the introduction > >>> and > >>> a little bit in other places. > >>> > >>> Weekly concall is at 11am EST today. > >>> > >>> garth > >>> > >>> Yahoo! Groups Links > >>> > >>> To visit your group on the web, go to: > >>> http://groups.yahoo.com/group/pnfs-reqs/ > >>> > >>> To unsubscribe from this group, send an email to: > >>> pnfs-reqs-unsubscribe@yahoogroups.com > >>> > >>> Your use of Yahoo! Groups is subject to: > >>> http://docs.yahoo.com/info/terms/ > >>> > >>> > >> > >> > >> > >> ______________________________________________________________________ > >> __________ > >> Yahoo! Groups Links > >> * To visit your group on the web, go to: > >> http://groups.yahoo.com/group/pnfs-reqs/ > >> > >> * To unsubscribe from this group, send an email to: > >> pnfs-reqs-unsubscribe@yahoogroups.com > >> > >> * Your use of Yahoo! Groups is subject to the Yahoo! Terms of > >> Service. > >> > >> > > > > > > > > > > ------------------------ Yahoo! Groups Sponsor > > ---------------------~--> > > Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark > > Printer at MyInks.com. Free s/h on orders $50 or more to the US & > > Canada. > > http://www.c1tracking.com/l.asp?cid=5511 > > http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM > > --------------------------------------------------------------------- > > ~-> > > > > Yahoo! Groups Links > > > > To visit your group on the web, go to: > > http://groups.yahoo.com/group/pnfs-reqs/ > > > > To unsubscribe from this group, send an email to: > > pnfs-reqs-unsubscribe@yahoogroups.com > > > > Your use of Yahoo! Groups is subject to: > > http://docs.yahoo.com/info/terms/ > > > > > ________________________________________________________________________________ > Yahoo! Groups Links > * To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > * To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. > > From garth@panasas.com Tue Feb 03 10:07:25 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 74656 invoked from network); 3 Feb 2004 18:07:22 -0000 Received: from unknown (66.218.66.166) by m14.grp.scd.yahoo.com with QMQP; 3 Feb 2004 18:07:22 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta5.grp.scd.yahoo.com with SMTP; 3 Feb 2004 18:07:21 -0000 Received: from [172.17.2.81] ([172.17.2.81]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYM9Y4; Tue, 3 Feb 2004 13:06:33 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: References: Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: quoted-printable Date: Tue, 3 Feb 2004 13:06:27 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Please do send me all spelling/grammar corrections. I want to finish this asap :-) garth On Feb 3, 2004, at 12:49 PM, Dean Hildebrand wrote: > I think you are right that the problem statement is not directly > concerned > with the statefulness of NFSv4. Adding something about it, or even > adding > what you said about spreading the work over the multiple servers might > be > useful. > > I also noticed some spelling/double word things that I can send along > if > useful. > Dean > > On Tue, 3 Feb 2004, Garth Gibson wrote: > >> Dean, >> >> Thanks! >> >> I will clarify data sharing in this paragraph. In my experience >> these >> two words "data sharing" are the most common way the industry >> distinguishes any file system from a pure logical volume, so I don't >> think there is much risk of confusion. >> >> As to NFSv4 versus NFSv3, exporting a writable filesystem through >> multiple server addresses is rarely done in v3, though a few vendors >> do >> this. Those same vendors, at least, are very likely to export a >> writable filesystem through v4 over multiple servers, The key thing >> as >> I see it is that neither v3 nor v4 clients have a clue how to use >> more >> than one server address to spread out their work. >> >> I'd be happy to add that v4's additional statefulness adds to the >> complexity of exporting the same filesystem through multiple servers, >> though as I just argued, I think it is tangential to the main point. >> >> garth >> >> >> On Feb 1, 2004, at 10:52 PM, Dean Hildebrand wrote: >> >>> A couple comments: >>> >>> (1) >>> Page 3, paragraph 2 >>>> Storage Area Networks routinely provide much higher data bandwidths >>>> than >>>> do NFS file servers. Unfortunately, the simple array of blocks >>>> interface >>>> into Storage Area Networks does not lend itself to sharing data >>>> among >>>> the >>>> clients in a cluster. NFS file service, with its hierarchical >>>> namespace >>>> of separately controlled files, offers simpler and more >>>> cost-effective >>>> management. One might conclude that users must chose between high >>>> bandwidth and data sharing. >>> I'm wondering if the concept of 'data sharing' is obvious here. I >>> mean, >>> maybe it should be expanded to make it clear it is talking about file >>> consistency and such. >>> >>> Page 6, 2nd paragraph (Eliminating the bottleneck) >>> There is no mention here of NFSv4 state information. I beleive the >>> fact >>> that NFSv4 state information prevents exporting the same file via >>> multiple >>> NFSv4 servers (as was done in v3) should be mentioned. >>> >>> Dean >>> >>> On Thu, 29 Jan 2004, Garth Gibson wrote: >>> >>>> Here is a PDF version. >>>> garth >>>> >>>> On Jan 29, 2004, at 6:03 AM, Garth Gibson wrote: >>>> >>>>> Here is my Wed pass -- most of the work was in the applications and >>>>> citations sections, but I also did quite a bit in the introduction >>>>> and >>>>> a little bit in other places. >>>>> >>>>> Weekly concall is at 11am EST today. >>>>> >>>>> garth >>>>> >>>>> Yahoo! Groups Links >>>>> >>>>> To visit your group on the web, go to: >>>>> http://groups.yahoo.com/group/pnfs-reqs/ >>>>> >>>>> To unsubscribe from this group, send an email to: >>>>> pnfs-reqs-unsubscribe@yahoogroups.com >>>>> >>>>> Your use of Yahoo! Groups is subject to: >>>>> http://docs.yahoo.com/info/terms/ >>>>> >>>>> >>>> >>>> >>>> >>>> ____________________________________________________________________ >>>> __ >>>> __________ >>>> Yahoo! Groups Links >>>> * To visit your group on the web, go to: >>>> http://groups.yahoo.com/group/pnfs-reqs/ >>>> >>>> * To unsubscribe from this group, send an email to: >>>> pnfs-reqs-unsubscribe@yahoogroups.com >>>> >>>> * Your use of Yahoo! Groups is subject to the Yahoo! Terms of >>>> Service. >>>> >>>> >>> >>> >>> >>> >>> ------------------------ Yahoo! Groups Sponsor >>> ---------------------~--> >>> Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or >>> Lexmark >>> Printer at MyInks.com. Free s/h on orders $50 or more to the US & >>> Canada. >>> http://www.c1tracking.com/l.asp?cid=5511 >>> http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM >>> --------------------------------------------------------------------- >>> ~-> >>> >>> Yahoo! Groups Links >>> >>> To visit your group on the web, go to: >>> http://groups.yahoo.com/group/pnfs-reqs/ >>> >>> To unsubscribe from this group, send an email to: >>> pnfs-reqs-unsubscribe@yahoogroups.com >>> >>> Your use of Yahoo! Groups is subject to: >>> http://docs.yahoo.com/info/terms/ >>> >> >> >> ______________________________________________________________________ >> __________ >> Yahoo! Groups Links >> * To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >> * To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >> * Your use of Yahoo! Groups is subject to the Yahoo! Terms of >> Service. >> >> > > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > From dhildebz@eecs.umich.edu Tue Feb 03 11:23:37 2004 Return-Path: X-Sender: dhildebz@eecs.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 19030 invoked from network); 3 Feb 2004 19:23:34 -0000 Received: from unknown (66.218.66.172) by m9.grp.scd.yahoo.com with QMQP; 3 Feb 2004 19:23:34 -0000 Received: from unknown (HELO smtp.eecs.umich.edu) (141.213.4.43) by mta4.grp.scd.yahoo.com with SMTP; 3 Feb 2004 19:23:34 -0000 Received: from oemcomputer (dh152.citi.umich.edu [141.211.133.152]) (authenticated bits=0) by smtp.eecs.umich.edu (8.12.11/8.12.11) with ESMTP id i13JNTNm024614 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) for ; Tue, 3 Feb 2004 14:23:29 -0500 Message-ID: <001401c3ea8a$ec1c7420$9885d38d@oemcomputer> To: References: Date: Tue, 3 Feb 2004 14:21:23 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0010_01C3EA61.002A3800" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Scanned-By: MIMEDefang 2.39 X-eGroups-Remote-IP: 141.213.4.43 From: "Dean Hildebrand" Subject: Re: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=169352062 X-Yahoo-Profile: seattleplus Here is a copy of the World doc with changes to spelling, grammar and such. Feel free to use or ignore any changes. If you View->Markup in Word you should be able to see what I did. Dean ----- Original Message ----- From: Garth Gibson To: pnfs-reqs@yahoogroups.com Sent: Tuesday, February 03, 2004 1:06 PM Subject: Re: [pnfs-reqs] new version of problem statement Please do send me all spelling/grammar corrections. I want to finish this asap :-) garth On Feb 3, 2004, at 12:49 PM, Dean Hildebrand wrote: > I think you are right that the problem statement is not directly > concerned > with the statefulness of NFSv4. Adding something about it, or even > adding > what you said about spreading the work over the multiple servers might > be > useful. > > I also noticed some spelling/double word things that I can send along > if > useful. > Dean > > On Tue, 3 Feb 2004, Garth Gibson wrote: > >> Dean, >> >> Thanks! >> >> I will clarify data sharing in this paragraph. In my experience >> these >> two words "data sharing" are the most common way the industry >> distinguishes any file system from a pure logical volume, so I don't >> think there is much risk of confusion. >> >> As to NFSv4 versus NFSv3, exporting a writable filesystem through >> multiple server addresses is rarely done in v3, though a few vendors >> do >> this. Those same vendors, at least, are very likely to export a >> writable filesystem through v4 over multiple servers, The key thing >> as >> I see it is that neither v3 nor v4 clients have a clue how to use >> more >> than one server address to spread out their work. >> >> I'd be happy to add that v4's additional statefulness adds to the >> complexity of exporting the same filesystem through multiple servers, >> though as I just argued, I think it is tangential to the main point. >> >> garth >> >> >> On Feb 1, 2004, at 10:52 PM, Dean Hildebrand wrote: >> >>> A couple comments: >>> >>> (1) >>> Page 3, paragraph 2 >>>> Storage Area Networks routinely provide much higher data bandwidths >>>> than >>>> do NFS file servers. Unfortunately, the simple array of blocks >>>> interface >>>> into Storage Area Networks does not lend itself to sharing data >>>> among >>>> the >>>> clients in a cluster. NFS file service, with its hierarchical >>>> namespace >>>> of separately controlled files, offers simpler and more >>>> cost-effective >>>> management. One might conclude that users must chose between high >>>> bandwidth and data sharing. >>> I'm wondering if the concept of 'data sharing' is obvious here. I >>> mean, >>> maybe it should be expanded to make it clear it is talking about file >>> consistency and such. >>> >>> Page 6, 2nd paragraph (Eliminating the bottleneck) >>> There is no mention here of NFSv4 state information. I beleive the >>> fact >>> that NFSv4 state information prevents exporting the same file via >>> multiple >>> NFSv4 servers (as was done in v3) should be mentioned. >>> >>> Dean >>> >>> On Thu, 29 Jan 2004, Garth Gibson wrote: >>> >>>> Here is a PDF version. >>>> garth >>>> >>>> On Jan 29, 2004, at 6:03 AM, Garth Gibson wrote: >>>> >>>>> Here is my Wed pass -- most of the work was in the applications and >>>>> citations sections, but I also did quite a bit in the introduction >>>>> and >>>>> a little bit in other places. >>>>> >>>>> Weekly concall is at 11am EST today. >>>>> >>>>> garth >>>>> >>>>> Yahoo! Groups Links >>>>> >>>>> To visit your group on the web, go to: >>>>> http://groups.yahoo.com/group/pnfs-reqs/ >>>>> >>>>> To unsubscribe from this group, send an email to: >>>>> pnfs-reqs-unsubscribe@yahoogroups.com >>>>> >>>>> Your use of Yahoo! Groups is subject to: >>>>> http://docs.yahoo.com/info/terms/ >>>>> >>>>> >>>> >>>> >>>> >>>> ____________________________________________________________________ >>>> __ >>>> __________ >>>> Yahoo! Groups Links >>>> * To visit your group on the web, go to: >>>> http://groups.yahoo.com/group/pnfs-reqs/ >>>> >>>> * To unsubscribe from this group, send an email to: >>>> pnfs-reqs-unsubscribe@yahoogroups.com >>>> >>>> * Your use of Yahoo! Groups is subject to the Yahoo! Terms of >>>> Service. >>>> >>>> >>> >>> >>> >>> >>> ------------------------ Yahoo! Groups Sponsor >>> ---------------------~--> >>> Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or >>> Lexmark >>> Printer at MyInks.com. Free s/h on orders $50 or more to the US & >>> Canada. >>> http://www.c1tracking.com/l.asp?cid=5511 >>> http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM >>> --------------------------------------------------------------------- >>> ~-> >>> >>> Yahoo! Groups Links >>> >>> To visit your group on the web, go to: >>> http://groups.yahoo.com/group/pnfs-reqs/ >>> >>> To unsubscribe from this group, send an email to: >>> pnfs-reqs-unsubscribe@yahoogroups.com >>> >>> Your use of Yahoo! Groups is subject to: >>> http://docs.yahoo.com/info/terms/ >>> >> >> >> ______________________________________________________________________ >> __________ >> Yahoo! Groups Links >> * To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >> * To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >> * Your use of Yahoo! Groups is subject to the Yahoo! Terms of >> Service. >> >> > > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > ------------------------------------------------------------------------------ Yahoo! Groups Links a.. To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ b.. To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com c.. Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Attachment (not stored) draft-gibson-prob-st-00.doc Type: application/msword From dhildebz@eecs.umich.edu Tue Feb 03 11:24:03 2004 Return-Path: X-Sender: dhildebz@eecs.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 847 invoked from network); 3 Feb 2004 19:24:01 -0000 Received: from unknown (66.218.66.172) by m6.grp.scd.yahoo.com with QMQP; 3 Feb 2004 19:24:01 -0000 Received: from unknown (HELO smtp.eecs.umich.edu) (141.213.4.43) by mta4.grp.scd.yahoo.com with SMTP; 3 Feb 2004 19:24:00 -0000 Received: from oemcomputer (dh152.citi.umich.edu [141.211.133.152]) (authenticated bits=0) by smtp.eecs.umich.edu (8.12.11/8.12.11) with ESMTP id i13JMw1v024533 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) for ; Tue, 3 Feb 2004 14:22:58 -0500 Message-ID: <000801c3ea8a$d9ec6300$9885d38d@oemcomputer> To: References: Date: Tue, 3 Feb 2004 14:20:52 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0004_01C3EA60.EDD27AA0" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Scanned-By: MIMEDefang 2.39 X-eGroups-Remote-IP: 141.213.4.43 From: "Dean Hildebrand" Subject: Re: [pnfs-reqs] new version of problem statement X-Yahoo-Group-Post: member; u=169352062 X-Yahoo-Profile: seattleplus ADVERTISEMENT click here Here is a copy of the World doc with changes to spelling, grammar and such. Feel free to use or ignore any changes. If you View->Markup in Word you should be able to see what I did. Dean ----- Original Message ----- From: Garth Gibson To: pnfs-reqs@yahoogroups.com Sent: Tuesday, February 03, 2004 1:06 PM Subject: Re: [pnfs-reqs] new version of problem statement Please do send me all spelling/grammar corrections. I want to finish this asap :-) garth On Feb 3, 2004, at 12:49 PM, Dean Hildebrand wrote: > I think you are right that the problem statement is not directly > concerned > with the statefulness of NFSv4. Adding something about it, or even > adding > what you said about spreading the work over the multiple servers might > be > useful. > > I also noticed some spelling/double word things that I can send along > if > useful. > Dean > > On Tue, 3 Feb 2004, Garth Gibson wrote: > >> Dean, >> >> Thanks! >> >> I will clarify data sharing in this paragraph. In my experience >> these >> two words "data sharing" are the most common way the industry >> distinguishes any file system from a pure logical volume, so I don't >> think there is much risk of confusion. >> >> As to NFSv4 versus NFSv3, exporting a writable filesystem through >> multiple server addresses is rarely done in v3, though a few vendors >> do >> this. Those same vendors, at least, are very likely to export a >> writable filesystem through v4 over multiple servers, The key thing >> as >> I see it is that neither v3 nor v4 clients have a clue how to use >> more >> than one server address to spread out their work. >> >> I'd be happy to add that v4's additional statefulness adds to the >> complexity of exporting the same filesystem through multiple servers, >> though as I just argued, I think it is tangential to the main point. >> >> garth >> >> >> On Feb 1, 2004, at 10:52 PM, Dean Hildebrand wrote: >> >>> A couple comments: >>> >>> (1) >>> Page 3, paragraph 2 >>>> Storage Area Networks routinely provide much higher data bandwidths >>>> than >>>> do NFS file servers. Unfortunately, the simple array of blocks >>>> interface >>>> into Storage Area Networks does not lend itself to sharing data >>>> among >>>> the >>>> clients in a cluster. NFS file service, with its hierarchical >>>> namespace >>>> of separately controlled files, offers simpler and more >>>> cost-effective >>>> management. One might conclude that users must chose between high >>>> bandwidth and data sharing. >>> I'm wondering if the concept of 'data sharing' is obvious here. I >>> mean, >>> maybe it should be expanded to make it clear it is talking about file >>> consistency and such. >>> >>> Page 6, 2nd paragraph (Eliminating the bottleneck) >>> There is no mention here of NFSv4 state information. I beleive the >>> fact >>> that NFSv4 state information prevents exporting the same file via >>> multiple >>> NFSv4 servers (as was done in v3) should be mentioned. >>> >>> Dean >>> >>> On Thu, 29 Jan 2004, Garth Gibson wrote: >>> >>>> Here is a PDF version. >>>> garth >>>> >>>> On Jan 29, 2004, at 6:03 AM, Garth Gibson wrote: >>>> >>>>> Here is my Wed pass -- most of the work was in the applications and >>>>> citations sections, but I also did quite a bit in the introduction >>>>> and >>>>> a little bit in other places. >>>>> >>>>> Weekly concall is at 11am EST today. >>>>> >>>>> garth >>>>> >>>>> Yahoo! Groups Links >>>>> >>>>> To visit your group on the web, go to: >>>>> http://groups.yahoo.com/group/pnfs-reqs/ >>>>> >>>>> To unsubscribe from this group, send an email to: >>>>> pnfs-reqs-unsubscribe@yahoogroups.com >>>>> >>>>> Your use of Yahoo! Groups is subject to: >>>>> http://docs.yahoo.com/info/terms/ >>>>> >>>>> >>>> >>>> >>>> >>>> ____________________________________________________________________ >>>> __ >>>> __________ >>>> Yahoo! Groups Links >>>> * To visit your group on the web, go to: >>>> http://groups.yahoo.com/group/pnfs-reqs/ >>>> >>>> * To unsubscribe from this group, send an email to: >>>> pnfs-reqs-unsubscribe@yahoogroups.com >>>> >>>> * Your use of Yahoo! Groups is subject to the Yahoo! Terms of >>>> Service. >>>> >>>> >>> >>> >>> >>> >>> ------------------------ Yahoo! Groups Sponsor >>> ---------------------~--> >>> Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or >>> Lexmark >>> Printer at MyInks.com. Free s/h on orders $50 or more to the US & >>> Canada. >>> http://www.c1tracking.com/l.asp?cid=5511 >>> http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM >>> --------------------------------------------------------------------- >>> ~-> >>> >>> Yahoo! Groups Links >>> >>> To visit your group on the web, go to: >>> http://groups.yahoo.com/group/pnfs-reqs/ >>> >>> To unsubscribe from this group, send an email to: >>> pnfs-reqs-unsubscribe@yahoogroups.com >>> >>> Your use of Yahoo! Groups is subject to: >>> http://docs.yahoo.com/info/terms/ >>> >> >> >> ______________________________________________________________________ >> __________ >> Yahoo! Groups Links >> * To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >> * To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >> * Your use of Yahoo! Groups is subject to the Yahoo! Terms of >> Service. >> >> > > > > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > ------------------------------------------------------------------------------ Yahoo! Groups Links a.. To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ b.. To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com c.. Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Attachment (not stored) draft-gibson-prob-st-00.doc Type: application/msword From Thomas.Talpey@netapp.com Thu Feb 05 08:34:36 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 64988 invoked from network); 5 Feb 2004 16:34:34 -0000 Received: from unknown (66.218.66.166) by m15.grp.scd.yahoo.com with QMQP; 5 Feb 2004 16:34:34 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta5.grp.scd.yahoo.com with SMTP; 5 Feb 2004 16:34:34 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i15GY4Kw028307 for ; Thu, 5 Feb 2004 08:34:04 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i15GXnS5024214 for ; Thu, 5 Feb 2004 08:34:03 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.31]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Thu, 5 Feb 2004 08:15:44 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3EBEA.290B8000" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Thu, 5 Feb 2004 05:15:29 -0800 Message-ID: <5.2.1.1.2.20040205080928.035852f8@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Do we have a concall today? Thread-Index: AcPr6imDHfvfdLxESLynAqqz6p5XGQ== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Do we have a concall today? X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu X-eGroups-Rocket-Track: -10 ; IPCR=n-w0,n100,g0 Are we on for a concall today (in a couple of hours) I assume? The final submission deadline is Monday 9am Eastern, and we need to wrap up the edits and send it pronto. Tom. From garth@panasas.com Thu Feb 05 08:42:27 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 60107 invoked from network); 5 Feb 2004 16:42:26 -0000 Received: from unknown (66.218.66.172) by m9.grp.scd.yahoo.com with QMQP; 5 Feb 2004 16:42:26 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 5 Feb 2004 16:42:26 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYNK52; Thu, 5 Feb 2004 11:42:22 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <5.2.1.1.2.20040205080928.035852f8@silver.nane.netapp.com> References: <5.2.1.1.2.20040205080928.035852f8@silver.nane.netapp.com> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <41A40A7E-57FA-11D8-8D65-000A95A94F04@panasas.com> Content-Transfer-Encoding: 7bit Date: Thu, 5 Feb 2004 11:42:16 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] Do we have a concall today? X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson X-eGroups-Rocket-Track: -10 We had a concall, Tom and me, on getting this done. I am doing the conversion to ASCI today. Thanks to Tom I have instructions, but as I look at it, my confidence is not that high. If anyone has a magic bullet, please speak up. garth On Feb 5, 2004, at 8:15 AM, Talpey, Thomas wrote: > Are we on for a concall today (in a couple of hours) I assume? > > The final submission deadline is Monday 9am Eastern, and we need > to wrap up the edits and send it pronto. > > Tom. From ggrider@lanl.gov Thu Feb 05 09:45:28 2004 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 28495 invoked from network); 5 Feb 2004 17:45:18 -0000 Received: from unknown (66.218.66.167) by m20.grp.scd.yahoo.com with QMQP; 5 Feb 2004 17:45:18 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta6.grp.scd.yahoo.com with SMTP; 5 Feb 2004 17:45:14 -0000 Received: from mailrelay1.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i15Hj2HR023161 for ; Thu, 5 Feb 2004 10:45:03 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay1.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i15Hj2rl030787 for ; Thu, 5 Feb 2004 10:45:02 -0700 Received: from cthulu.lanl.gov (cthulu.lanl.gov [128.165.115.129]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i15Hj2Yi025881 for ; Thu, 5 Feb 2004 10:45:02 -0700 Message-Id: <5.2.0.9.2.20040205104444.0154f488@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Thu, 05 Feb 2004 10:45:02 -0700 To: pnfs-reqs@yahoogroups.com In-Reply-To: <41A40A7E-57FA-11D8-8D65-000A95A94F04@panasas.com> References: <5.2.1.1.2.20040205080928.035852f8@silver.nane.netapp.com> <5.2.1.1.2.20040205080928.035852f8@silver.nane.netapp.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=====================_8417043==.ALT" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: Re: [pnfs-reqs] Do we have a concall today? X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs X-eGroups-Rocket-Track: 1: 100 ; IPCR=n-w0,n100,g0 ; SERVER=66.218.86.251 Your confidence is not high on? Thanks Gary At 11:42 AM 2/5/2004 -0500, you wrote: > We had a concall, Tom and me, on getting this done. > > I am doing the conversion to ASCI today. Thanks to Tom I have > instructions, but as I look at it, my confidence is not that high. If > anyone has a magic bullet, please speak up. > > garth > > On Feb 5, 2004, at 8:15 AM, Talpey, Thomas wrote: > > > Are we on for a concall today (in a couple of hours) I assume? > > > > The final submission deadline is Monday 9am Eastern, and we need > > to wrap up the edits and send it pronto. > > > > Tom. > > > Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From garth@panasas.com Thu Feb 05 09:51:58 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 33445 invoked from network); 5 Feb 2004 17:51:57 -0000 Received: from unknown (66.218.66.167) by m13.grp.scd.yahoo.com with QMQP; 5 Feb 2004 17:51:56 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 5 Feb 2004 17:51:55 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYNLFF; Thu, 5 Feb 2004 12:51:18 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <5.2.0.9.2.20040205104444.0154f488@cic-mail.lanl.gov> References: <5.2.1.1.2.20040205080928.035852f8@silver.nane.netapp.com> <5.2.1.1.2.20040205080928.035852f8@silver.nane.netapp.com> <5.2.0.9.2.20040205104444.0154f488@cic-mail.lanl.gov> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Message-Id: Content-Transfer-Encoding: quoted-printable Date: Thu, 5 Feb 2004 12:51:12 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] Do we have a concall today? X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson X-eGroups-Rocket-Track: 1: 100 ; SERVER=66.218.86.249 converting Word to IETF compliant ASCI :-) On Feb 5, 2004, at 12:45 PM, Gary Grider wrote: > Your confidence is not high on? > > Thanks > Gary > > At 11:42 AM 2/5/2004 -0500, you wrote: > > We had a concall, Tom and me, on getting this done. > > I am doing the conversion to ASCI today. Thanks to Tom I have > instructions, but as I look at it, my confidence is not that high. If > anyone has a magic bullet, please speak up. > > garth From garth@panasas.com Thu Feb 05 13:48:03 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 18326 invoked from network); 5 Feb 2004 21:48:01 -0000 Received: from unknown (66.218.66.172) by m12.grp.scd.yahoo.com with QMQP; 5 Feb 2004 21:48:01 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 5 Feb 2004 21:47:59 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYNMS9; Thu, 5 Feb 2004 16:46:40 -0500 Mime-Version: 1.0 (Apple Message framework v612) To: pnfs-reqs@yahoogroups.com Message-Id: Content-Type: multipart/mixed; boundary=Apple-Mail-51-1073630512 Date: Thu, 5 Feb 2004 16:46:34 -0500 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: working on the Word to IETF conversion X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson X-eGroups-Rocket-Track: 1: 100 ; SERVER=66.218.86.252 Here are two Word files. The first is the draft after all content editing. The second is my attempt to reset margins and indents in order to use print to generic/text-only file to get an IETF ASCI document. I'm not doing well. Still working at it. Please help. Attachment (not stored) draft-gibson-prob-st-00.doc Type: application/applefile Attachment (not stored) draft-gibson-prob-st-00.doc Type: application/msword Attachment (not stored) draft-gibson-prob-st-00-1.doc Type: application/applefile Attachment (not stored) draft-gibson-prob-st-00-1.doc Type: application/msword From bhalevy@panasas.com Thu Feb 05 13:56:05 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 14661 invoked from network); 5 Feb 2004 21:55:25 -0000 Received: from unknown (66.218.66.218) by m20.grp.scd.yahoo.com with QMQP; 5 Feb 2004 21:55:25 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 5 Feb 2004 21:55:21 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Thu, 5 Feb 2004 16:54:54 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D38832@PIKES.panasas.com> To: "'pnfs-reqs@yahoogroups.com'" Date: Thu, 5 Feb 2004 16:54:41 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-reqs] working on the Word to IETF conversion X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy X-eGroups-Rocket-Track: 1: 100 ; SERVER=66.218.86.251 Garth, I'll take a stab at it if no one else (with better experience with submitting I-Ds) volunteers... I'll need to do some homework going over http://www.ietf.org/ietf/1id-guidelines.txt, http://www.ietf.org/rfc/rfc2026.txt, and http://www.ietf.org/ID-nits.html Benny > -----Original Message----- > From: Garth Gibson [mailto:garth@Panasas.Com] > Sent: Thursday, February 05, 2004 4:47 PM > To: pnfs-reqs@yahoogroups.com > Subject: [pnfs-reqs] working on the Word to IETF conversion > > > Here are two Word files. The first is the draft after all content > editing. The second is my attempt to reset margins and indents in > order to use print to generic/text-only file to get an IETF ASCI > document. I'm not doing well. Still working at it. Please help. > > > > ------------------------ Yahoo! Groups Sponsor > ---------------------~--> > Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark > Printer at MyInks.com. Free s/h on orders $50 or more to the > US & Canada. > http://www.c1tracking.com/l.asp?cid=5511 > http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM > -------------------------------------------------------------- > -------~-> > > > Yahoo! Groups Links > > > > > From black_david@emc.com Thu Feb 05 22:03:14 2004 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 6396 invoked from network); 6 Feb 2004 06:03:14 -0000 Received: from unknown (66.218.66.217) by m11.grp.scd.yahoo.com with QMQP; 6 Feb 2004 06:03:14 -0000 Received: from unknown (HELO srexchimc2.eng.emc.com) (168.159.100.11) by mta2.grp.scd.yahoo.com with SMTP; 6 Feb 2004 06:03:13 -0000 Received: from MAHO3MSX2.corp.emc.com (maho3msx2.isus.emc.com [128.221.11.32]) by srexchimc2.eng.emc.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id DWRHDBH1; Fri, 6 Feb 2004 01:02:22 -0500 Received: by maho3msx2.corp.emc.com with Internet Mail Service (5.5.2653.19) id ; Fri, 6 Feb 2004 01:02:21 -0500 Message-ID: X-Sybari-Trust: 2b481d31 b1a25add bdf41840 0000013d To: pnfs-reqs@yahoogroups.com Date: Fri, 6 Feb 2004 01:02:20 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain X-eGroups-Remote-IP: 168.159.100.11 From: black_david@emc.com Subject: RE: [pnfs-reqs] working on the Word to IETF conversion X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 X-eGroups-Rocket-Track: 1: 100 ; SERVER=66.218.86.248 ADVERTISEMENT Garth, > Here are two Word files. The first is the draft after all content > editing. The second is my attempt to reset margins and indents in > order to use print to generic/text-only file to get an IETF ASCI > document. I'm not doing well. Still working at it. Please help. You're fighting a losing battle - MS Word has entirely too many tricks up its sleeve. Fortunately, it has been beaten into submission by experts in the past, who have left instructions on how to do it in RFC 3285 (http://www.ietf.org/rfc/rfc3285.txt). You want to get the MS Word template from one of the locations provided in that RFC, cut and paste your entire content as unformatted text into a new file based on that template. It is crucial to use only the RFC text styles in that template - there should be no text in *any* other style (e.g., Normal, Heading) when you're done). Then follow the instructions in the RFC to print to a file via a Text-only printer ("Save As" won't work, even when saving as a text file, as it allows Word too much latitude to play games) and run the CRLF utility over the resulting text file before submitting. Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- > -----Original Message----- > From: Garth Gibson [mailto:garth@panasas.com] > Sent: Thursday, February 05, 2004 4:47 PM > To: pnfs-reqs@yahoogroups.com > Subject: [pnfs-reqs] working on the Word to IETF conversion > > > Here are two Word files. The first is the draft after all content > editing. The second is my attempt to reset margins and indents in > order to use print to generic/text-only file to get an IETF ASCI > document. I'm not doing well. Still working at it. Please help. > > > > ------------------------ Yahoo! Groups Sponsor > ---------------------~--> > Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark > Printer at MyInks.com. Free s/h on orders $50 or more to the > US & Canada. > http://www.c1tracking.com/l.asp?cid=5511 > http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM > -------------------------------------------------------------- > -------~-> > > > Yahoo! Groups Links > > > > > From julian_satran@il.ibm.com Fri Feb 06 01:29:54 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 78483 invoked from network); 6 Feb 2004 09:29:53 -0000 Received: from unknown (66.218.66.172) by m12.grp.scd.yahoo.com with QMQP; 6 Feb 2004 09:29:53 -0000 Received: from unknown (HELO mtagate3.uk.ibm.com) (195.212.29.136) by mta4.grp.scd.yahoo.com with SMTP; 6 Feb 2004 09:29:52 -0000 Received: from d06nrmr1407.portsmouth.uk.ibm.com (d06nrmr1407.portsmouth.uk.ibm.com [9.149.38.185]) by mtagate3.uk.ibm.com (8.12.10/8.12.10) with ESMTP id i169TiMf108972 for ; Fri, 6 Feb 2004 09:29:44 GMT Received: from d12ml102.megacenter.de.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1407.portsmouth.uk.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i169ThHh176618 for ; Fri, 6 Feb 2004 09:29:44 GMT In-Reply-To: To: pnfs-reqs@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Fri, 6 Feb 2004 11:29:42 +0200 X-MIMETrack: Serialize by Router on D12ML102/12/M/IBM(Release 6.0.2CF2|July 23, 2003) at 06/02/2004 11:29:43, Serialize complete at 06/02/2004 11:29:43 Content-Type: text/plain; charset="US-ASCII" X-eGroups-Remote-IP: 195.212.29.136 From: Julian Satran Subject: RE: [pnfs-reqs] working on the Word to IETF conversion X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran Garth, Go for the file I sent you. It should have all you need (including numbering not available in the RFC). If it does not work switch to framemaker. It will take you around an hour if you know your way and two-three if you don't. Regards, Julo black_david@emc.com 06/02/2004 08:02 Please respond to pnfs-reqs To pnfs-reqs@yahoogroups.com cc Subject RE: [pnfs-reqs] working on the Word to IETF conversion Garth, > Here are two Word files. The first is the draft after all content > editing. The second is my attempt to reset margins and indents in > order to use print to generic/text-only file to get an IETF ASCI > document. I'm not doing well. Still working at it. Please help. You're fighting a losing battle - MS Word has entirely too many tricks up its sleeve. Fortunately, it has been beaten into submission by experts in the past, who have left instructions on how to do it in RFC 3285 (http://www.ietf.org/rfc/rfc3285.txt). You want to get the MS Word template from one of the locations provided in that RFC, cut and paste your entire content as unformatted text into a new file based on that template. It is crucial to use only the RFC text styles in that template - there should be no text in *any* other style (e.g., Normal, Heading) when you're done). Then follow the instructions in the RFC to print to a file via a Text-only printer ("Save As" won't work, even when saving as a text file, as it allows Word too much latitude to play games) and run the CRLF utility over the resulting text file before submitting. Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- > -----Original Message----- > From: Garth Gibson [mailto:garth@panasas.com] > Sent: Thursday, February 05, 2004 4:47 PM > To: pnfs-reqs@yahoogroups.com > Subject: [pnfs-reqs] working on the Word to IETF conversion > > > Here are two Word files. The first is the draft after all content > editing. The second is my attempt to reset margins and indents in > order to use print to generic/text-only file to get an IETF ASCI > document. I'm not doing well. Still working at it. Please help. > > > > ------------------------ Yahoo! Groups Sponsor > ---------------------~--> > Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark > Printer at MyInks.com. Free s/h on orders $50 or more to the > US & Canada. > http://www.c1tracking.com/l.asp?cid=5511 > http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM > -------------------------------------------------------------- > -------~-> > > > Yahoo! Groups Links > > > > > Yahoo! Groups Links From julian_satran@il.ibm.com Fri Feb 06 01:30:43 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 34778 invoked from network); 6 Feb 2004 09:30:39 -0000 Received: from unknown (66.218.66.217) by m14.grp.scd.yahoo.com with QMQP; 6 Feb 2004 09:30:38 -0000 Received: from unknown (HELO mtagate3.uk.ibm.com) (195.212.29.136) by mta2.grp.scd.yahoo.com with SMTP; 6 Feb 2004 09:30:32 -0000 Received: from d06nrmr1407.portsmouth.uk.ibm.com (d06nrmr1407.portsmouth.uk.ibm.com [9.149.38.185]) by mtagate3.uk.ibm.com (8.12.10/8.12.10) with ESMTP id i169TiMf126760 for ; Fri, 6 Feb 2004 09:29:44 GMT Received: from d12ml102.megacenter.de.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1407.portsmouth.uk.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i169ThHg176618 for ; Fri, 6 Feb 2004 09:29:43 GMT In-Reply-To: To: pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Fri, 6 Feb 2004 11:29:19 +0200 X-MIMETrack: Serialize by Router on D12ML102/12/M/IBM(Release 6.0.2CF2|July 23, 2003) at 06/02/2004 11:29:42 Content-Type: multipart/mixed; boundary="=_mixed 002E9A2BC2256E32_=" X-eGroups-Remote-IP: 195.212.29.136 From: Julian Satran Subject: Re: [pnfs-reqs] working on the Word to IETF conversion X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran Garth, I send you a draft in word that has a set of word formats named RFCxxx. Use it as a template and change all your formats to it. Do not change it. It has all the fonts in fixed pitch and the template if you use only the RFC named styles is working (was with Word 2000). Then you should be safe with the two conversion tools (print to a generic printer and use crlf.exe). If you still have trouble I can send you the Framemaker template and instructions on how to use it (far better). Regards, Julo Garth Gibson 05/02/2004 23:46 Please respond to pnfs-reqs To pnfs-reqs@yahoogroups.com cc Subject [pnfs-reqs] working on the Word to IETF conversion Here are two Word files. The first is the draft after all content editing. The second is my attempt to reset margins and indents in order to use print to generic/text-only file to get an IETF ASCI document. I'm not doing well. Still working at it. Please help. Yahoo! Groups Links Attachment (not stored) draft-gibson-prob-st-00.doc.hqx Type: application/mac-binhex40 From garth@panasas.com Fri Feb 06 10:37:03 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 72863 invoked from network); 6 Feb 2004 18:36:59 -0000 Received: from unknown (66.218.66.172) by m16.grp.scd.yahoo.com with QMQP; 6 Feb 2004 18:36:59 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 6 Feb 2004 18:36:57 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYNQ7H; Fri, 6 Feb 2004 13:35:43 -0500 Mime-Version: 1.0 (Apple Message framework v612) To: pnfs-reqs@yahoogroups.com Message-Id: <4138B0BA-58D3-11D8-825E-000A95A94F04@panasas.com> Content-Type: multipart/mixed; boundary=Apple-Mail-6--998911455 Date: Fri, 6 Feb 2004 13:35:36 -0500 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Have ASCII, approaching submission X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Okay, so with help from many of you we may have an ASCI internet draft. Attached is the file that may be it. Next, the file name. According to "Guidelines to Authors of Internet-Drafts, Last modified September 5, 2002", says that we need to get the file name from IETF. Assuming that this is not an NFS document, not yet anyway, we are suggesting the name "draft-gibson-pnfs-problem-statement-00.txt". Unless I hear otherwise, I will submit a request for a name, giving this name as a suggestion. garth ----------- test from the Guidelines section ------ For those authors submitting updates to existing Internet-Drafts, the choice of the file name is easily determined (up the version by 1). For new documents, either suggest one or send a message to "internet-drafts@ietf.org" with the document title, noting if it is a product of a working group (and the name of the group), and an abstract. The file name to be assigned will be included in a response. Simply add the filename text to the document (ASCII and PostScript versions) and submit the Internet-Draft. If the document is a new one (i.e. starting with revision -00.txt) and is submitted as a working group document, the IETF secretariat will ask the chair(s) of the wg the permission to publish it as a working group document. To expedite the process, authors are encouraged to send the document to internet-drafts@ietf.org and at the same time cc: to the chair(s) of the working group. If the document is accepted as a working group document, then it will have the draft-ietf- file name and will be announced on the working group mailing list by the IETF Secretariat. If the document is not accepted as a working group document, it will be processed as an individual submission, where the filename will be draft--....txt. NOTE: Revision numbers are based on the filename (as in first, second, or third version of this document). If there is a filename change, the version number starts over at -00. Put another way, the prior version number will NOT be incremented when an Internet-Draft filename has changed. ALL FILES BEGIN at -00 Before each IETF meeting, a deadline is announced for submitting documents ahead of time to be published for the meeting. For new documents, he deadline is even sooner (one week). There is no accepted delay. If you send at the very last minute, it is possible that it will arrive too late because of congestion of your mail server queues. If it is received too late, it will not be published on time for the IETF meeting. Note that if a filename is suggested, but not used, the document will have to be resubmitted with the actual file name. Begin forwarded message: > From: "Benny Halevy" > Date: February 6, 2004 3:11:02 AM EST > To: "Garth Gibson" > Cc: "Benny Halevy" > Subject: RE: crlf.exe > > Garth, I followed David Black's instructions. > Files attached. > > Benny > Internet Draft Garth Gibson Expires: August 2004 Panasas Inc. & CMU Peter Corbett Network Appliance, Inc. Document: draft-gibson-pnfs-problem-statement-00.txt February 2004 pNFS Problem Statement Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Gibson et al Expires - August 2004 [Page 1] Internet Draft pNFS Problem Statement February 2004 Abstract This draft considers the problem of limited bandwidth to NFS servers. The bandwidth limitation exists because an NFS server has limited network, CPU, memory and disk I/O resources. Yet, access to any one file system through the NFSv4 protocol requires that a single server be accessed. While NFSv4 allows file system migration, it does not provide a mechanism that supports multiple servers simultaneously exporting a single writable file system. This problem has become aggravated in recent years with the advent of very cheap and easily expanded clusters of application servers that are also NFS clients. The aggregate bandwidth demands of such clustered clients, typically working on a shared data set preferentially stored in a single file system, can increase much more quickly than the bandwidth of any server. The proposed solution is to provide for the parallelization of file services, by enhancing NFSv4 in a minor version. Table of Contents 1. Introduction...................................................2 2. Bandwidth Scaling in Clusters..................................4 3. Clustered Applications.........................................4 4. Existing File Systems for Clusters.............................6 5. Eliminating the Bottleneck.....................................7 6. Separated control and data access techniques...................8 7. Security Considerations........................................9 8. Informative References.........................................9 9. Acknowledgments...............................................11 10. Author's Addresses...........................................11 11. Full Copyright Statement.....................................11 1. Introduction The storage I/O bandwidth requirements of clients are rapidly outstripping the ability of network file servers to supply them. Increasingly, this problem is being encountered in installations running the NFS protocol. The problem can be solved by increasing the server bandwidth. This draft suggests that an effort be mounted to enable NFS file service to scale with its clusters of clients. The proposed approach is to increase the aggregate bandwidth possible to a single file system by parallelizing the file service, resulting in multiple network connections to multiple server endpoints participating in the transfer of requested data. This should be Gibson et al Expires - August 2004 [Page 2] Internet Draft pNFS Problem Statement February 2004 achievable within the framework of NFS, possibly in a minor version of the NFSv4 protocol. In many application areas, single system servers are rapidly being replaced by clusters of inexpensive commodity computers. As clustering technology has improved, the barriers to running application codes on very large clusters have been lowered. Examples of application areas that are seeing the rapid adoption of scalable client clusters are data intensive applications such as genomics, seismic processing, data mining, content and video distribution, and high performance computing. The aggregate storage I/O requirements of a cluster can scale proportionally to the number of computers in the cluster. It is not unusual for clusters today to make bandwidth demands that far outstrip the capabilities of traditional file servers. A natural solution to this problem is to enable file service to scale as well, by increasing the number of server nodes that are able to service a single file system to a cluster of clients. Scalable bandwidth can be claimed by simply adding multiple independent servers to the network. Unfortunately, this leaves to file system users the task of spreading data across these independent servers. Because the data processed by a given data-intensive application is usually logically associated, users routinely co- locate this data in a single file system, directory or even a single file. The NFSv4 protocol currently requires that all the data in a single file system be accessible through a single exported network endpoint, constraining access to be through a single NFS server. A better way of increasing the bandwidth to a single file system is to enable access to be provided through multiple endpoints in a coordinated or coherent fashion. Separation of control and data flows provides a straightforward framework to accomplish this, by allowing transfers of data to proceed in parallel from many clients to many data storage endpoints. Control and file management operations, inherently more difficult to parallelize, can remain the province of a single NFS server, inheriting the simple management of today's NFS file service, while offloading data transfer operations allows bandwidth scalability. Data transfer may be done using NFS or other protocols, such as iSCSI. While NFS is a widely used network file system protocol, most of the world's data resides in data stores that are not accessible through NFS. Much of this data is stored in Storage Area Networks, accessible by SCSI's Fibre Channel Protocol (FCP), or increasingly, by iSCSI. Storage Area Networks routinely provide much higher data bandwidths than do NFS file servers. Unfortunately, the simple array of blocks interface into Storage Area Networks does not lend itself to controlling multiple clients that are simultaneously reading and Gibson et al Expires - August 2004 [Page 3] Internet Draft pNFS Problem Statement February 2004 writing the blocks of the same or different files, a workload usually referred to as data sharing. NFS file service, with its hierarchical namespace of separately controlled files, offers simpler and more cost-effective management. One might conclude that users must chose between high bandwidth and data sharing. Not only is this conclusion false, but it should also be possible to allow data stored in SAN devices, FCP or iSCSI, to be accessed under the control of an NFS server. Such an approach protects the industry's large investment in NFS, since the bandwidth bottleneck no longer needs to drive users to adopt a proprietary alternative solution, and leverages SAN storage infrastructures, all within a common architectural framework. 2. Bandwidth Scaling in Clusters When applied to data-intensive applications, clusters can generate unprecedented demand for storage bandwidth. At present, each node in the cluster is likely to be a dual processor, with each processor running at multiple GHz, with gigabytes of DRAM. Depending on the specific application, each node is capable of sustaining a demand of 10s to 100s of MB/s of data from storage. In addition, the number of nodes in a cluster is commonly in the 100s, with many instances of 1000s to 10,000s of nodes. The result is that storage systems may be called upon to provide an aggregate bandwidth of GB/s ranging upwards toward TB/s. The performance of a single NFS server has been improving, but it is not able to keep pace with cluster demand. Directly connected storage devices behind an NFS server have given way to disk arrays and networked disk arrays, making it now possible for an NFS server to directly access 100s to 1000s of disk drives whose aggregate capacity reaches upwards to PBs and whose raw bandwidths range upwards to 10s of GB/s. An NFS server is interposed between the scalable storage subsystem and the scalable client cluster. Multiple NIC endpoints help network bandwidth keep up with DRAM bandwidth. However, the rate of improvement of NFS server performance is not faster than the rate of improvement in each client node. As long as an NFS file system is associated with a single client-side network endpoint, the aggregate capabilities of a single NFS server to move data between storage networks and client networks will not be able to keep pace with the aggregate demand of clustered clients and large disk subsystems. 3. Clustered Applications Large datasets and high bandwidth processing of large datasets are increasingly common in a wide variety of applications. As most Gibson et al Expires - August 2004 [Page 4] Internet Draft pNFS Problem Statement February 2004 computer users can affirm, the size of everyday presentations, pictures and programs seems to grow continuously, and in fact average file size does grow with time [Ousterhout85, Baker91]. Simple copying, viewing, archiving and sharing of even this baseline use of growing files in day-to-day business and personal computing drives up the bandwidth demand on servers. Some applications, however, make much larger demands on file and file system capacity and bandwidth. Databases of DNA sequences, used in bioinformatics search, range up to tens of GBs and are often in use by all cluster users are the same time [NIH03]. These huge files may experience bursts of many concurrent clients loading the whole file independently. Bioinformatics is an example of extensive search in science application. Extensive search is much broader than science. Wall Street has taken to collecting long-term transaction record histories. Looking for patterns of unbilled transactions, fraud or predictable market trends is a growing financial opportunity [Agarwal95, Senator95]. Security and authentication are driving a need for image search, such as face recognition [Flickner95]. Databasing the faces of approved or suspected individuals and searching through many camera feeds involves huge data and bandwidths. Traditional database indexing in these high dimension data structures often fails to avoid full database scans of these huge files [Berchtold97]. With huge storage repositories and fast computers, huge sensor capture is increasingly used in many applications. Consumer digital photography fits this model, with photo touch-up and slide show generation tools driving bandwidth, although much more demanding applications are not unusual. Medical test imagery is being captured at very high resolution and tools are being developed for automatic preliminary diagnosis, for example [Afework98]. In the science world, even larger datasets are captured from satellites, telescopes, and atom-smashers, for example [Greiman97]. Preliminary processing of a sky survey suggests that thousand node clusters may sustain GB/s storage bandwidths [Gray03]. Seismic trace data, often measured in helicopter loads, commands large clusters for days to months [Knott03]. At the high end of science application, accurate physical simulation, its visualization and fault-tolerance checkpointing, has been estimated to need 10 GB/s bandwidth and 100 TB of capacity for every thousand nodes in a cluster [SGPFS01]. Gibson et al Expires - August 2004 [Page 5] Internet Draft pNFS Problem Statement February 2004 Most of these applications make heavy use of shared data across many clients, users and applications, have limited budgets available to fund aggressive computational goals, and have technical or scientific users with strong preferences for file systems and no patience for tuning storage. NFS file service, appropriately scaled up in capacity and bandwidth, is highly desired. In addition to these search, sensor and science applications, traditional database applications are increasingly employing NFS servers. These applications often have hotspot tables, leading to high bandwidth storage demands. Yet SAN-based solutions are sometimes harder to manage than NFS based solutions, especially in databases with a large number of tables. NFS servers with scalable bandwidth would accelerate the adoption of NFS for database applications. These examples suggest that there is no shortage of applications frustrated by the limitations of a single network endpoint on a single NFS server exporting a single file system or single huge file. 4. Existing File Systems for Clusters The server bottleneck has induced various vendors to develop proprietary alternatives to NFS. Known variously as asymmetric, out-of-band, clustered or SAN file systems, these proprietary alternatives exploit the scalability of storage networks by attaching all nodes in the client cluster to the storage network. Then, by reorganizing client and server code functionality to separate data traffic from control traffic, client nodes are able to access storage devices directly rather than requesting all data from the same single network endpoint in the file server that handles control traffic. Most proprietary alternative solutions have been tailored to storage area networks based on the fixed-sized block SCSI storage device command set and its Fibrechannel SCSI transport. Examples in this class include EMC's High Road (www.emc.com); IBM's TotalStorage SAN FS, SANergy and GPFS (www.ibm.com); Sistina/Redhat's GFS (www.readhat.com); SGI's CXFS (www.sgi.com); Veritas' SANPoint Direct and CFS (www.veritas.com); and Sun's QFS (www.sun.com). The Fibrechannel SCSI transport used in these systems may soon be replaceable by a TCP/IP SCSI transport, iSCSI, enabling these proprietary alternatives to operate on the same equipment and IETF protocols commonly used by NFS servers. While fixed-sized block SCSI storage devices are used in most file systems with separated data and control paths, this is not the only Gibson et al Expires - August 2004 [Page 6] Internet Draft pNFS Problem Statement February 2004 alternative available today. SCSI's newly emerging command set, the Object Storage Device (OSD) command set, transmits variable length storage objects over SCSI transports [T10-03]. Panasas' ActiveScale storage cluster employs a proto-OSD command set over iSCSI on its separated data path (www.panasas.com). IBM's research is also demonstrating a variant of their TotalStorage SAN FS employing proto- OSD commands [Azagury02]. Even more distinctive is Zforce's File Switch technology (www.zforce.com). Zforce virtualizes a CIFS file server spreading the contents of a file share over many backend CIFS storage servers and places their control path functionality inside a network switch in order to have some of the properties of both separated and non- separated data and control paths. However, striping files over multiple file-based storage servers is not a new concept. Berkeley's Zebra file system, the successor to the log-based file system developed for RAID storage, had a separated data and control path with file protocols to both [Hartman95]. 5. Eliminating the Bottleneck The restriction of a single network endpoint results from the way NFS associates file servers and file systems. Essentially, each client machine "mounts" each exported file system; these mount operations bind a network endpoint to all files in the exported file system, instructing the client to address that network endpoint with all requests associated with all files in that file system. Mechanisms intended for primarily for failover have been established for giving clients a list of network endpoints associated with a given file system. Multiple NFS servers can be used instead of a single NFS server, and many cluster administrators, programmers and end-users have experimented with this alternative. The principle compromise involved in exploiting multiple NFS servers is that a single file or single file system is decomposed into multiple files or file systems, respectively. For instance, a single file can be decomposed into many files, each located in a part of the namespace that is exported by a different NFS server; or the files of a single directory can be linked to files in directories located in file systems exported by different NFS servers. Because this decomposition is done without NFS server support, the work of decomposing and recomposing and the implications of the decomposition on capacity and load balancing, backup consistency, error recovery, and namespace management all fall to the customer. Moreover, the additional statefulness of NFSv4 makes correct semantics for files decomposed over multiple services without NFS support much more complex. Such extra work and extra problems are Gibson et al Expires - August 2004 [Page 7] Internet Draft pNFS Problem Statement February 2004 usually referred to as storage management costs, and are blamed for causing a high total cost of ownership for storage. Preserving the relative ease of use of NFS storage systems requires solutions to the bandwidth bottleneck that do not decompose files and directories in the file subtree namespace. A solution to this problem should continue to use the existing single network endpoint for control traffic, including namespace manipulations. Decompositions of individual files and file systems over multiple network endpoints can be provided via the separated data paths, without separating the control and metadata paths. 6. Separated control and data access techniques Separating storage data flow from file system control flow effectively moves the bottleneck away from the single endpoint of an NFS server and distributes it across the bisectional bandwidth of the storage network between the cluster nodes and storage devices. Since switch bandwidths of upwards of terabits per second are available today, this bottleneck is at least two orders of magnitude better than that of an NFS server network endpoint. In an architecture that separates the storage data path from the NFS control path there are choices of protocol for the data path. One straightforward answer is to extend the NFS protocol so it can accommodate can be used on both control and separated data paths. Another straightforward answer is to capture the existing market's dominant separated data path, fixed-sized block SCSI storage. A third alternative is the emerging object storage SCSI command set, OSD, which is appearing in new products with separate data and control paths. A solution that accommodates all of these approaches provides the broadest applicability for NFS. Specifically, NFS extensions should make minimal assumptions about the storage data server access protocol. The clients in such an extended NFS system should be compatible with the current NFSv4 protocol, and should be compatible with earlier versions of NFS as well. A solution should be capable of providing both asymmetric data access, with the data path connected via NFS or other protocols and transports, and symmetric parallel access to servers that run NFS on each server node. Specifically, it is desirable to enable NFS to manage asymmetric access to storage attached via iSCSI and Fibre Channel/SCSI storage area networks. As previously discussed, the root cause of the NFS server bottleneck is the binding between one network endpoint and all the files in a file system. NFS extensions can allow the association of additional Gibson et al Expires - August 2004 [Page 8] Internet Draft pNFS Problem Statement February 2004 network endpoints with specific files. These associations could be represented layout maps [Gibson98]. NFS clients could be extended to have the ability to retrieve and use these layout maps. NFSv4 provides an excellent foundation for this. We may be able to extend the current notion of file delegations to include the ability to retrieve and utilize a file layout map. A number of ideas have been proposed for storing, accessing, and acting upon layout information stored by NFS servers to allow separate access to file data over separate data paths. Data access can be supported over multiple protocols, including NFSv4, iSCSI, and OSD. 7. Security Considerations Bandwidth scaling solutions that employ separation of control and data paths will introduce new security concerns. For example, the data access methods will require authentication and access control mechanisms that are consistent with the primary mechanisms on the NFSv4 control paths. Object storage employs revocable cryptographic restrictions on each object, which can be created and revoked in the control path. With iSCSI access methods, iSCSI security capabilities are available, but do not contain NFS access control. Fibre Channel based SCSI access methods have less sophisticated security than iSCSI. These access methods typically use private networks to provide security. Any proposed solution must be analyzed for security threats and any such threats must be addressed. The IETF and the NFS working group have significant expertise in this area. 8. Informative References [Afework98] A. Afework, M. Beynon, F. Bustamonte, A. Demarzo, R. Ferriera, R. Miller, M. Silberman, J. Saltz, A. Sussman, H. Tang, "Digital dynamic telepathology - the virtual microscope," Proc. of the AMIA'98 Fall Symposium 1998. [Agarwal95] Agrawal, R. and Srikant, R. "Fast Algorithms for Mining Association Rules" VLDB, September 1995. [Azagury02] Azagury, A., Dreizin, V., Factor, M., Henis, E., Naor, D., Rinetzky, N., Satran, J., Tavory, A., Yerushalmi, L, "Towards an Object Store," IBM Storage Systems Technology Workshop, November 2002. [Baker91] Baker, M.G., Hartman, J.H., Kupfer, M.D., Shirriff, K.W. and Ousterhout, J.K. "Measurements of a Distributed File System" SOSP, October 1991. Gibson et al Expires - August 2004 [Page 9] Internet Draft pNFS Problem Statement February 2004 [Berchtold97] Berchtold, S., Boehm, C., Keim, D.A. and Kriegel, H. "A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space" ACM PODS, May 1997. [Fayyad98] Fayyad, U. "Taming the Giants and the Monsters: Mining Large Databases for Nuggets of Knowledge" Database Programming and Design, March 1998. [Flickner95] Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D. and Yanker, P. "Query by Image and Video Content: the QBIC System" IEEE Computer, September 1995. [Gibson98] Gibson, G. A., et. al., "A Cost-Effective, High-Bandwidth Storage Architecture," International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1998. [Gray03] Jim Gray, "Distributed Computing Economics," Technical Report MSR-TR-2003-24, March 2003. [Greiman97] Greiman, W., W. E. Johnston, C. McParland, D. Olson, B. Tierney, C. Tull, "High-Speed Distributed Data Handling for HENP," Computing in High Energy Physics, April, 1997. Berlin, Germany. [Hartman95] John H. Hartman and John K. Ousterhout, "The Zebra Striped Network File System," ACM Transactions on Computer Systems 13, 3, August 1995. [Knott03] Knott, T., "Computing colossus," BP Frontiers magazine, Issue 6, April 2003, http://www.bp.com/frontiers. [NIH03] "Easy Large-Scale Bioinformatics on the NIH Biowulf Supercluster," http://biowulf.nih.gov/easy.html, 2003. [Ousterhout85] Ousterhout, J.K., DaCosta, H., Harrison, D., Kunze, J.A., Kupfer, M. and Thompson, J.G. "A Trace Drive Analysis of the UNIX 4.2 BSD FIle System" SOSP, December 1985. [Senator95] Senator, T.E., Goldberg, H.G., Wooten, J., Cottini, M.A., Khan, A.F.U., Klinger, C.D., Llamas, W.M., Marrone, M.P. and Wong, R.W.H. "The Financial Crimes Enforcement Network AI System (FAIS): Identifying potential money laundering from reports of large cash transactions" AIMagazine 16 (4), Winter 1995. [SGPFS01] SGS File System RFP, DOE NNCA and DOD NSA, April 25, 2001. Gibson et al Expires - August 2004 [Page 10] Internet Draft pNFS Problem Statement February 2004 [T10-03] Draft OSD Standard, T10 Committee, Storage Networking Industry Association(SNIA), ftp://www.t10.org/ftp/t10/drafts/osd/osd-r08.pdf 9. Acknowledgments David Black, Gary Grider, Benny Halevy, Dean Hildebrand, Dave Noveck, Julian Satran, Tom Talpey, and Brent Welch contributed to the development of this problem statement. 10. Author's Addresses Garth Gibson Panasas Inc, and Carnegie Mellon University 1501 Reedsdale Street Pittsburgh, PA 15233 USA Phone: +1 412 323 3500 Email: ggibson@panasas.com Peter Corbett Network Appliance Inc. 375 Totten Pond Road Waltham, MA 02451 USA Phone: +1 781 768 5343 Email: peter@pcorbett.net 11. Full Copyright Statement Copyright (C) The Internet Society (2004). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. Gibson et al Expires - August 2004 [Page 11] Internet Draft pNFS Problem Statement February 2004 This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Gibson et al Expires - August 2004 [Page 12] From ggrider@lanl.gov Fri Feb 06 10:43:54 2004 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 47383 invoked from network); 6 Feb 2004 18:43:54 -0000 Received: from unknown (66.218.66.216) by m4.grp.scd.yahoo.com with QMQP; 6 Feb 2004 18:43:54 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta1.grp.scd.yahoo.com with SMTP; 6 Feb 2004 18:43:53 -0000 Received: from mailrelay3.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i16IgpHR017493 for ; Fri, 6 Feb 2004 11:42:51 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay3.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i16IgoeI012727 for ; Fri, 6 Feb 2004 11:42:50 -0700 Received: from cthulu.lanl.gov (vpn-client-187.lanl.gov [128.165.253.187]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i16IgmYi006488; Fri, 6 Feb 2004 11:42:48 -0700 Message-Id: <5.2.0.9.2.20040206114221.01582600@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Fri, 06 Feb 2004 11:42:45 -0700 To: pnfs-reqs@yahoogroups.com, pnfs-reqs@yahoogroups.com In-Reply-To: <4138B0BA-58D3-11D8-825E-000A95A94F04@panasas.com> Mime-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="=====================_9928526==.REL" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: Re: [pnfs-reqs] Have ASCII, approaching submission X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs Thanks Garth for the enormous effort. Thanks to all that helped. Gary At 01:35 PM 2/6/2004 -0500, Garth Gibson wrote: > Okay, so with help from many of you we may have an ASCI internet draft. > Attached is the file that may be it. > > Next, the file name. According to "Guidelines to Authors of > Internet-Drafts, Last modified September 5, 2002", says that we need to > get the file name from IETF. Assuming that this is not an NFS > document, not yet anyway, we are suggesting the name > "draft-gibson-pnfs-problem-statement-00.txt". > > Unless I hear otherwise, I will submit a request for a name, giving > this name as a suggestion. > > garth > > ----------- test from the Guidelines section ------ > > For those authors submitting updates to existing Internet-Drafts, the > choice of the file name is easily determined (up the version by 1). > For new documents, either suggest one or send a message to > "internet-drafts@ietf.org" with the document title, noting if it is a > product of a working group (and the name of the group), and an > abstract. The file name to be assigned will be included in a response. > Simply add the filename text to the document (ASCII and PostScript > versions) and submit the Internet-Draft. > > If the document is a new one (i.e. starting with revision -00.txt) and > is submitted as a working group document, the IETF secretariat will ask > the chair(s) of the wg the permission to publish it as a working group > document. To expedite the process, authors are encouraged to send the > document to internet-drafts@ietf.org and at the same time cc: to the > chair(s) of the working group. If the document is accepted as a > working group document, then it will have the draft-ietf- acronym> file name and will be announced on the working group mailing > list by the IETF Secretariat. If the document is not accepted as a > working group document, it will be processed as an individual > submission, where the filename will be draft--....txt. > > NOTE: Revision numbers are based on the filename (as in first, second, > or third version of this document). If there is a filename > change, the version number starts over at -00. Put another way, the > prior version number will NOT be incremented when an Internet-Draft > filename has changed. ALL FILES BEGIN at -00 > > Before each IETF meeting, a deadline is announced for submitting > documents ahead of time to be published for the meeting. For new > documents, he deadline is even sooner (one week). There is no accepted > delay. If you send at the very last minute, it is possible that it will > arrive too late because of congestion of your mail server queues. If > it is received too late, it will not be published on time for the IETF > meeting. > > Note that if a filename is suggested, but not used, the document will > have to be resubmitted with the actual file name. > > > > Begin forwarded message: > > From: "Benny Halevy" > > Date: February 6, 2004 3:11:02 AM EST > > To: "Garth Gibson" > > Cc: "Benny Halevy" > > Subject: RE: crlf.exe > > > > Garth, I followed David Black's instructions. > > Files attached. > > > > Benny > > > > Yahoo! Groups Sponsor > ADVERTISEMENT > 977749.jpg > 97778f.jpg > > Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From Thomas.Talpey@netapp.com Fri Feb 06 10:54:54 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 65270 invoked from network); 6 Feb 2004 18:54:53 -0000 Received: from unknown (66.218.66.218) by m4.grp.scd.yahoo.com with QMQP; 6 Feb 2004 18:54:53 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta3.grp.scd.yahoo.com with SMTP; 6 Feb 2004 18:54:53 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i16IsRKw010580 for ; Fri, 6 Feb 2004 10:54:28 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i16IsRRj003290 for ; Fri, 6 Feb 2004 10:54:27 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.32]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Fri, 6 Feb 2004 13:54:26 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3ECE2.A4505D00" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Fri, 6 Feb 2004 10:53:43 -0800 Message-ID: <5.2.1.1.2.20040206134617.00c3afd0@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] Have ASCII, approaching submission Thread-Index: AcPs4qTyDQAiLTf+S/uat3b2ahLQIA== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Re: [pnfs-reqs] Have ASCII, approaching submission X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu At 01:35 PM 2/6/2004, Garth Gibson wrote: >Okay, so with help from many of you we may have an ASCI internet draft. > Attached is the file that may be it. > >Next, the file name. According to "Guidelines to Authors of >Internet-Drafts, Last modified September 5, 2002", says that we need to >get the file name from IETF. Assuming that this is not an NFS >document, not yet anyway, we are suggesting the name >"draft-gibson-pnfs-problem-statement-00.txt". > >Unless I hear otherwise, I will submit a request for a name, giving >this name as a suggestion. I have only one comment - the 1st IETF copyright is the 2003 boilerplate. It needs to be 2004! Interestingly, the second appearance, at the end of the document, is fine - just the first one. I recommend fixing the .txt and not re-formatting. :-) You don't need to submit a request for a name, this is an "individual" submission. Your suggested title is the correct form and will be fine, you can go ahead and send it to internet-drafts@ietf.org with the necessary e-cover letter. You'll get an automated response and unless you hear otherwise, it will appear in a few days. Congrats! Tom. From garth@panasas.com Fri Feb 06 12:47:03 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 6062 invoked from network); 6 Feb 2004 20:47:00 -0000 Received: from unknown (66.218.66.172) by m6.grp.scd.yahoo.com with QMQP; 6 Feb 2004 20:47:00 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 6 Feb 2004 20:47:00 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSYNRZZ; Fri, 6 Feb 2004 15:45:37 -0500 Mime-Version: 1.0 (Apple Message framework v612) To: pnfs-reqs@yahoogroups.com Message-Id: <65468135-58E5-11D8-825E-000A95A94F04@panasas.com> Content-Type: multipart/mixed; boundary=Apple-Mail-14--991120022 Date: Fri, 6 Feb 2004 15:45:27 -0500 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Fwd: submitting "draft-gibson-pnfs-problem-statement-00.txt" an informational internet draft X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson It's submitted -- attached is the text that was finally sent (Tom found a 2003 error and Craig found a missing "as") Begin forwarded message: > From: ietfauto@ietf.org (Internet Draft Submission Manager) > Date: February 6, 2004 3:39:52 PM EST > To: garth@panasas.com > Subject: Re: submitting "draft-gibson-pnfs-problem-statement-00.txt" > an informational internet draft > Subject: Autoreply from Internet Draft Submission Manager > Reply-To: dinaras@ietf.org > > Greetings: > > This message is being sent to acknowledge receipt of your > Internet-Draft > submission or message to internet-drafts@ietf.org. > If you submitted an Internet-Draft, then it will be posted > on the Internet-Drafts page of the IETF Web site, and an I-D > Action message will be sent to the IETF Announcement List. > > Please note that all Internet-Drafts offered for publication > as RFCs must conform to the requirements specified in ID Nits > (http://www.ietf.org/ID-nits.html) or they will be returned > to the author(s) for revision. Therefore, the IETF Secretariat > strongly recommends that you address all of the issues raised > in this document before submitting a request to publish your > Internet-Draft to the IESG. > > The IETF Secretariat > Internet Draft Garth Gibson Expires: August 2004 Panasas Inc. & CMU Peter Corbett Network Appliance, Inc. Document: draft-gibson-pnfs-problem-statement-00.txt February 2004 pNFS Problem Statement Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Gibson et al Expires - August 2004 [Page 1] Internet Draft pNFS Problem Statement February 2004 Abstract This draft considers the problem of limited bandwidth to NFS servers. The bandwidth limitation exists because an NFS server has limited network, CPU, memory and disk I/O resources. Yet, access to any one file system through the NFSv4 protocol requires that a single server be accessed. While NFSv4 allows file system migration, it does not provide a mechanism that supports multiple servers simultaneously exporting a single writable file system. This problem has become aggravated in recent years with the advent of very cheap and easily expanded clusters of application servers that are also NFS clients. The aggregate bandwidth demands of such clustered clients, typically working on a shared data set preferentially stored in a single file system, can increase much more quickly than the bandwidth of any server. The proposed solution is to provide for the parallelization of file services, by enhancing NFSv4 in a minor version. Table of Contents 1. Introduction...................................................2 2. Bandwidth Scaling in Clusters..................................4 3. Clustered Applications.........................................4 4. Existing File Systems for Clusters.............................6 5. Eliminating the Bottleneck.....................................7 6. Separated control and data access techniques...................8 7. Security Considerations........................................9 8. Informative References.........................................9 9. Acknowledgments...............................................11 10. Author's Addresses...........................................11 11. Full Copyright Statement.....................................11 1. Introduction The storage I/O bandwidth requirements of clients are rapidly outstripping the ability of network file servers to supply them. Increasingly, this problem is being encountered in installations running the NFS protocol. The problem can be solved by increasing the server bandwidth. This draft suggests that an effort be mounted to enable NFS file service to scale with its clusters of clients. The proposed approach is to increase the aggregate bandwidth possible to a single file system by parallelizing the file service, resulting in multiple network connections to multiple server endpoints participating in the transfer of requested data. This should be Gibson et al Expires - August 2004 [Page 2] Internet Draft pNFS Problem Statement February 2004 achievable within the framework of NFS, possibly in a minor version of the NFSv4 protocol. In many application areas, single system servers are rapidly being replaced by clusters of inexpensive commodity computers. As clustering technology has improved, the barriers to running application codes on very large clusters have been lowered. Examples of application areas that are seeing the rapid adoption of scalable client clusters are data intensive applications such as genomics, seismic processing, data mining, content and video distribution, and high performance computing. The aggregate storage I/O requirements of a cluster can scale proportionally to the number of computers in the cluster. It is not unusual for clusters today to make bandwidth demands that far outstrip the capabilities of traditional file servers. A natural solution to this problem is to enable file service to scale as well, by increasing the number of server nodes that are able to service a single file system to a cluster of clients. Scalable bandwidth can be claimed by simply adding multiple independent servers to the network. Unfortunately, this leaves to file system users the task of spreading data across these independent servers. Because the data processed by a given data-intensive application is usually logically associated, users routinely co- locate this data in a single file system, directory or even a single file. The NFSv4 protocol currently requires that all the data in a single file system be accessible through a single exported network endpoint, constraining access to be through a single NFS server. A better way of increasing the bandwidth to a single file system is to enable access to be provided through multiple endpoints in a coordinated or coherent fashion. Separation of control and data flows provides a straightforward framework to accomplish this, by allowing transfers of data to proceed in parallel from many clients to many data storage endpoints. Control and file management operations, inherently more difficult to parallelize, can remain the province of a single NFS server, inheriting the simple management of today's NFS file service, while offloading data transfer operations allows bandwidth scalability. Data transfer may be done using NFS or other protocols, such as iSCSI. While NFS is a widely used network file system protocol, most of the world's data resides in data stores that are not accessible through NFS. Much of this data is stored in Storage Area Networks, accessible by SCSI's Fibre Channel Protocol (FCP), or increasingly, by iSCSI. Storage Area Networks routinely provide much higher data bandwidths than do NFS file servers. Unfortunately, the simple array of blocks interface into Storage Area Networks does not lend itself to controlling multiple clients that are simultaneously reading and Gibson et al Expires - August 2004 [Page 3] Internet Draft pNFS Problem Statement February 2004 writing the blocks of the same or different files, a workload usually referred to as data sharing. NFS file service, with its hierarchical namespace of separately controlled files, offers simpler and more cost-effective management. One might conclude that users must chose between high bandwidth and data sharing. Not only is this conclusion false, but it should also be possible to allow data stored in SAN devices, FCP or iSCSI, to be accessed under the control of an NFS server. Such an approach protects the industry's large investment in NFS, since the bandwidth bottleneck no longer needs to drive users to adopt a proprietary alternative solution, and leverages SAN storage infrastructures, all within a common architectural framework. 2. Bandwidth Scaling in Clusters When applied to data-intensive applications, clusters can generate unprecedented demand for storage bandwidth. At present, each node in the cluster is likely to be a dual processor, with each processor running at multiple GHz, with gigabytes of DRAM. Depending on the specific application, each node is capable of sustaining a demand of 10s to 100s of MB/s of data from storage. In addition, the number of nodes in a cluster is commonly in the 100s, with many instances of 1000s to 10,000s of nodes. The result is that storage systems may be called upon to provide an aggregate bandwidth of GB/s ranging upwards toward TB/s. The performance of a single NFS server has been improving, but it is not able to keep pace with cluster demand. Directly connected storage devices behind an NFS server have given way to disk arrays and networked disk arrays, making it now possible for an NFS server to directly access 100s to 1000s of disk drives whose aggregate capacity reaches upwards to PBs and whose raw bandwidths range upwards to 10s of GB/s. An NFS server is interposed between the scalable storage subsystem and the scalable client cluster. Multiple NIC endpoints help network bandwidth keep up with DRAM bandwidth. However, the rate of improvement of NFS server performance is not faster than the rate of improvement in each client node. As long as an NFS file system is associated with a single client-side network endpoint, the aggregate capabilities of a single NFS server to move data between storage networks and client networks will not be able to keep pace with the aggregate demand of clustered clients and large disk subsystems. 3. Clustered Applications Large datasets and high bandwidth processing of large datasets are increasingly common in a wide variety of applications. As most Gibson et al Expires - August 2004 [Page 4] Internet Draft pNFS Problem Statement February 2004 computer users can affirm, the size of everyday presentations, pictures and programs seems to grow continuously, and in fact average file size does grow with time [Ousterhout85, Baker91]. Simple copying, viewing, archiving and sharing of even this baseline use of growing files in day-to-day business and personal computing drives up the bandwidth demand on servers. Some applications, however, make much larger demands on file and file system capacity and bandwidth. Databases of DNA sequences, used in bioinformatics search, range up to tens of GBs and are often in use by all cluster users are the same time [NIH03]. These huge files may experience bursts of many concurrent clients loading the whole file independently. Bioinformatics is an example of extensive search in science application. Extensive search is much broader than science. Wall Street has taken to collecting long-term transaction record histories. Looking for patterns of unbilled transactions, fraud or predictable market trends is a growing financial opportunity [Agarwal95, Senator95]. Security and authentication are driving a need for image search, such as face recognition [Flickner95]. Databasing the faces of approved or suspected individuals and searching through many camera feeds involves huge data and bandwidths. Traditional database indexing in these high dimension data structures often fails to avoid full database scans of these huge files [Berchtold97]. With huge storage repositories and fast computers, huge sensor capture is increasingly used in many applications. Consumer digital photography fits this model, with photo touch-up and slide show generation tools driving bandwidth, although much more demanding applications are not unusual. Medical test imagery is being captured at very high resolution and tools are being developed for automatic preliminary diagnosis, for example [Afework98]. In the science world, even larger datasets are captured from satellites, telescopes, and atom-smashers, for example [Greiman97]. Preliminary processing of a sky survey suggests that thousand node clusters may sustain GB/s storage bandwidths [Gray03]. Seismic trace data, often measured in helicopter loads, commands large clusters for days to months [Knott03]. At the high end of science application, accurate physical simulation, its visualization and fault-tolerance checkpointing, has been estimated to need 10 GB/s bandwidth and 100 TB of capacity for every thousand nodes in a cluster [SGPFS01]. Gibson et al Expires - August 2004 [Page 5] Internet Draft pNFS Problem Statement February 2004 Most of these applications make heavy use of shared data across many clients, users and applications, have limited budgets available to fund aggressive computational goals, and have technical or scientific users with strong preferences for file systems and no patience for tuning storage. NFS file service, appropriately scaled up in capacity and bandwidth, is highly desired. In addition to these search, sensor and science applications, traditional database applications are increasingly employing NFS servers. These applications often have hotspot tables, leading to high bandwidth storage demands. Yet SAN-based solutions are sometimes harder to manage than NFS based solutions, especially in databases with a large number of tables. NFS servers with scalable bandwidth would accelerate the adoption of NFS for database applications. These examples suggest that there is no shortage of applications frustrated by the limitations of a single network endpoint on a single NFS server exporting a single file system or single huge file. 4. Existing File Systems for Clusters The server bottleneck has induced various vendors to develop proprietary alternatives to NFS. Known variously as asymmetric, out-of-band, clustered or SAN file systems, these proprietary alternatives exploit the scalability of storage networks by attaching all nodes in the client cluster to the storage network. Then, by reorganizing client and server code functionality to separate data traffic from control traffic, client nodes are able to access storage devices directly rather than requesting all data from the same single network endpoint in the file server that handles control traffic. Most proprietary alternative solutions have been tailored to storage area networks based on the fixed-sized block SCSI storage device command set and its Fibrechannel SCSI transport. Examples in this class include EMC's High Road (www.emc.com); IBM's TotalStorage SAN FS, SANergy and GPFS (www.ibm.com); Sistina/Redhat's GFS (www.readhat.com); SGI's CXFS (www.sgi.com); Veritas' SANPoint Direct and CFS (www.veritas.com); and Sun's QFS (www.sun.com). The Fibrechannel SCSI transport used in these systems may soon be replaceable by a TCP/IP SCSI transport, iSCSI, enabling these proprietary alternatives to operate on the same equipment and IETF protocols commonly used by NFS servers. While fixed-sized block SCSI storage devices are used in most file systems with separated data and control paths, this is not the only Gibson et al Expires - August 2004 [Page 6] Internet Draft pNFS Problem Statement February 2004 alternative available today. SCSI's newly emerging command set, the Object Storage Device (OSD) command set, transmits variable length storage objects over SCSI transports [T10-03]. Panasas' ActiveScale storage cluster employs a proto-OSD command set over iSCSI on its separated data path (www.panasas.com). IBM's research is also demonstrating a variant of their TotalStorage SAN FS employing proto- OSD commands [Azagury02]. Even more distinctive is Zforce's File Switch technology (www.zforce.com). Zforce virtualizes a CIFS file server spreading the contents of a file share over many backend CIFS storage servers and places their control path functionality inside a network switch in order to have some of the properties of both separated and non- separated data and control paths. However, striping files over multiple file-based storage servers is not a new concept. Berkeley's Zebra file system, the successor to the log-based file system developed for RAID storage, had a separated data and control path with file protocols to both [Hartman95]. 5. Eliminating the Bottleneck The restriction of a single network endpoint results from the way NFS associates file servers and file systems. Essentially, each client machine "mounts" each exported file system; these mount operations bind a network endpoint to all files in the exported file system, instructing the client to address that network endpoint with all requests associated with all files in that file system. Mechanisms intended for primarily for failover have been established for giving clients a list of network endpoints associated with a given file system. Multiple NFS servers can be used instead of a single NFS server, and many cluster administrators, programmers and end-users have experimented with this alternative. The principle compromise involved in exploiting multiple NFS servers is that a single file or single file system is decomposed into multiple files or file systems, respectively. For instance, a single file can be decomposed into many files, each located in a part of the namespace that is exported by a different NFS server; or the files of a single directory can be linked to files in directories located in file systems exported by different NFS servers. Because this decomposition is done without NFS server support, the work of decomposing and recomposing and the implications of the decomposition on capacity and load balancing, backup consistency, error recovery, and namespace management all fall to the customer. Moreover, the additional statefulness of NFSv4 makes correct semantics for files decomposed over multiple services without NFS support much more complex. Such extra work and extra problems are Gibson et al Expires - August 2004 [Page 7] Internet Draft pNFS Problem Statement February 2004 usually referred to as storage management costs, and are blamed for causing a high total cost of ownership for storage. Preserving the relative ease of use of NFS storage systems requires solutions to the bandwidth bottleneck that do not decompose files and directories in the file subtree namespace. A solution to this problem should continue to use the existing single network endpoint for control traffic, including namespace manipulations. Decompositions of individual files and file systems over multiple network endpoints can be provided via the separated data paths, without separating the control and metadata paths. 6. Separated control and data access techniques Separating storage data flow from file system control flow effectively moves the bottleneck away from the single endpoint of an NFS server and distributes it across the bisectional bandwidth of the storage network between the cluster nodes and storage devices. Since switch bandwidths of upwards of terabits per second are available today, this bottleneck is at least two orders of magnitude better than that of an NFS server network endpoint. In an architecture that separates the storage data path from the NFS control path there are choices of protocol for the data path. One straightforward answer is to extend the NFS protocol so it can accommodate can be used on both control and separated data paths. Another straightforward answer is to capture the existing market's dominant separated data path, fixed-sized block SCSI storage. A third alternative is the emerging object storage SCSI command set, OSD, which is appearing in new products with separate data and control paths. A solution that accommodates all of these approaches provides the broadest applicability for NFS. Specifically, NFS extensions should make minimal assumptions about the storage data server access protocol. The clients in such an extended NFS system should be compatible with the current NFSv4 protocol, and should be compatible with earlier versions of NFS as well. A solution should be capable of providing both asymmetric data access, with the data path connected via NFS or other protocols and transports, and symmetric parallel access to servers that run NFS on each server node. Specifically, it is desirable to enable NFS to manage asymmetric access to storage attached via iSCSI and Fibre Channel/SCSI storage area networks. As previously discussed, the root cause of the NFS server bottleneck is the binding between one network endpoint and all the files in a file system. NFS extensions can allow the association of additional Gibson et al Expires - August 2004 [Page 8] Internet Draft pNFS Problem Statement February 2004 network endpoints with specific files. These associations could be represented as layout maps [Gibson98]. NFS clients could be extended to have the ability to retrieve and use these layout maps. NFSv4 provides an excellent foundation for this. We may be able to extend the current notion of file delegations to include the ability to retrieve and utilize a file layout map. A number of ideas have been proposed for storing, accessing, and acting upon layout information stored by NFS servers to allow separate access to file data over separate data paths. Data access can be supported over multiple protocols, including NFSv4, iSCSI, and OSD. 7. Security Considerations Bandwidth scaling solutions that employ separation of control and data paths will introduce new security concerns. For example, the data access methods will require authentication and access control mechanisms that are consistent with the primary mechanisms on the NFSv4 control paths. Object storage employs revocable cryptographic restrictions on each object, which can be created and revoked in the control path. With iSCSI access methods, iSCSI security capabilities are available, but do not contain NFS access control. Fibre Channel based SCSI access methods have less sophisticated security than iSCSI. These access methods typically use private networks to provide security. Any proposed solution must be analyzed for security threats and any such threats must be addressed. The IETF and the NFS working group have significant expertise in this area. 8. Informative References [Afework98] A. Afework, M. Beynon, F. Bustamonte, A. Demarzo, R. Ferriera, R. Miller, M. Silberman, J. Saltz, A. Sussman, H. Tang, "Digital dynamic telepathology - the virtual microscope," Proc. of the AMIA'98 Fall Symposium 1998. [Agarwal95] Agrawal, R. and Srikant, R. "Fast Algorithms for Mining Association Rules" VLDB, September 1995. [Azagury02] Azagury, A., Dreizin, V., Factor, M., Henis, E., Naor, D., Rinetzky, N., Satran, J., Tavory, A., Yerushalmi, L, "Towards an Object Store," IBM Storage Systems Technology Workshop, November 2002. [Baker91] Baker, M.G., Hartman, J.H., Kupfer, M.D., Shirriff, K.W. and Ousterhout, J.K. "Measurements of a Distributed File System" SOSP, October 1991. Gibson et al Expires - August 2004 [Page 9] Internet Draft pNFS Problem Statement February 2004 [Berchtold97] Berchtold, S., Boehm, C., Keim, D.A. and Kriegel, H. "A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space" ACM PODS, May 1997. [Fayyad98] Fayyad, U. "Taming the Giants and the Monsters: Mining Large Databases for Nuggets of Knowledge" Database Programming and Design, March 1998. [Flickner95] Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D. and Yanker, P. "Query by Image and Video Content: the QBIC System" IEEE Computer, September 1995. [Gibson98] Gibson, G. A., et. al., "A Cost-Effective, High-Bandwidth Storage Architecture," International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1998. [Gray03] Jim Gray, "Distributed Computing Economics," Technical Report MSR-TR-2003-24, March 2003. [Greiman97] Greiman, W., W. E. Johnston, C. McParland, D. Olson, B. Tierney, C. Tull, "High-Speed Distributed Data Handling for HENP," Computing in High Energy Physics, April, 1997. Berlin, Germany. [Hartman95] John H. Hartman and John K. Ousterhout, "The Zebra Striped Network File System," ACM Transactions on Computer Systems 13, 3, August 1995. [Knott03] Knott, T., "Computing colossus," BP Frontiers magazine, Issue 6, April 2003, http://www.bp.com/frontiers. [NIH03] "Easy Large-Scale Bioinformatics on the NIH Biowulf Supercluster," http://biowulf.nih.gov/easy.html, 2003. [Ousterhout85] Ousterhout, J.K., DaCosta, H., Harrison, D., Kunze, J.A., Kupfer, M. and Thompson, J.G. "A Trace Drive Analysis of the UNIX 4.2 BSD FIle System" SOSP, December 1985. [Senator95] Senator, T.E., Goldberg, H.G., Wooten, J., Cottini, M.A., Khan, A.F.U., Klinger, C.D., Llamas, W.M., Marrone, M.P. and Wong, R.W.H. "The Financial Crimes Enforcement Network AI System (FAIS): Identifying potential money laundering from reports of large cash transactions" AIMagazine 16 (4), Winter 1995. [SGPFS01] SGS File System RFP, DOE NNCA and DOD NSA, April 25, 2001. Gibson et al Expires - August 2004 [Page 10] Internet Draft pNFS Problem Statement February 2004 [T10-03] Draft OSD Standard, T10 Committee, Storage Networking Industry Association(SNIA), ftp://www.t10.org/ftp/t10/drafts/osd/osd-r08.pdf 9. Acknowledgments David Black, Gary Grider, Benny Halevy, Dean Hildebrand, Dave Noveck, Julian Satran, Tom Talpey, and Brent Welch contributed to the development of this problem statement. 10. Author's Addresses Garth Gibson Panasas Inc, and Carnegie Mellon University 1501 Reedsdale Street Pittsburgh, PA 15233 USA Phone: +1 412 323 3500 Email: ggibson@panasas.com Peter Corbett Network Appliance Inc. 375 Totten Pond Road Waltham, MA 02451 USA Phone: +1 781 768 5343 Email: peter@pcorbett.net 11. Full Copyright Statement Copyright (C) The Internet Society (2004). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. Gibson et al Expires - August 2004 [Page 11] Internet Draft pNFS Problem Statement February 2004 This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Gibson et al Expires - August 2004 [Page 12] From Thomas.Talpey@netapp.com Tue Feb 10 09:27:36 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 11044 invoked from network); 10 Feb 2004 17:27:33 -0000 Received: from unknown (66.218.66.166) by m13.grp.scd.yahoo.com with QMQP; 10 Feb 2004 17:27:33 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta5.grp.scd.yahoo.com with SMTP; 10 Feb 2004 17:27:32 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i1AHQXKw020644; Tue, 10 Feb 2004 09:26:33 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i1AHQWBr025977; Tue, 10 Feb 2004 09:26:32 -0800 (PST) Received: from tmt.netapp.com ([10.97.1.34]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Tue, 10 Feb 2004 12:26:21 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3EFFA.FFDC5C80" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Tue, 10 Feb 2004 09:25:25 -0800 Message-ID: <5.2.1.1.2.20040210122105.00c43e08@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: I-D ACTION:draft-gibson-pnfs-problem-statement-00.txt Thread-Index: AcPv+wA+PZxVNM4nRKmu+s4AMQ1+KQ== To: , "Garth Gibson" X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Fwd: I-D ACTION:draft-gibson-pnfs-problem-statement-00.txt X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu Garth, if you haven't already, I think it would make sense for you to forward this to the nfsv4 wg alias, with the suggestion that it be discussed at the upcoming Korea meeting and the list. What's the status on putting together a presentation? Tom. > ---------- Forwarded Message ---------- >Subject: I-D ACTION:draft-gibson-pnfs-problem-statement-00.txt >Date: Mon, 9 Feb 2004 16:15:13 -0500 >From: >Reply-To: > >A New Internet-Draft is available from the on-line Internet-Drafts directories. > > > Title : pNFS Problem Statement > Author(s) : G. Gibson > Filename : draft-gibson-pnfs-problem-statement-00.txt > Pages : 12 > Date : 2004-2-9 > >This draft considers the problem of limited bandwidth to NFS servers. > The bandwidth limitation exists because an NFS server has limited > network, CPU, memory and disk I/O resources. Yet, access to any one > file system through the NFSv4 protocol requires that a single server > be accessed. While NFSv4 allows file system migration, it does not > provide a mechanism that supports multiple servers simultaneously > exporting a single writable file system. > > This problem has become aggravated in recent years with the advent of > very cheap and easily expanded clusters of application servers that > are also NFS clients. The aggregate bandwidth demands of such > clustered clients, typically working on a shared data set > preferentially stored in a single file system, can increase much more > quickly than the bandwidth of any server. The proposed solution is > to provide for the parallelization of file services, by enhancing > NFSv4 in a minor version. > >A URL for this Internet-Draft is: >http://www.ietf.org/internet-drafts/draft-gibson-pnfs-problem-statement-00.txt > >To remove yourself from the IETF Announcement list, send a message to >ietf-announce-request with the word unsubscribe in the body of the message. > >Internet-Drafts are also available by anonymous FTP. Login with the username >"anonymous" and a password of your e-mail address. After logging in, >type "cd internet-drafts" and then > "get draft-gibson-pnfs-problem-statement-00.txt". > >A list of Internet-Drafts directories can be found in >http://www.ietf.org/shadow.html >or ftp://ftp.ietf.org/ietf/1shadow-sites.txt > > >Internet-Drafts can also be obtained by e-mail. > >Send a message to: > mailserv@ietf.org. >In the body type: > "FILE /internet-drafts/draft-gibson-pnfs-problem-statement-00.txt". > >NOTE: The mail server at ietf.org can return the document in > MIME-encoded form by using the "mpack" utility. To use this > feature, insert the command "ENCODING mime" before the "FILE" > command. To decode the response(s), you will need "munpack" or > a MIME-compliant mail reader. Different MIME-compliant mail readers > exhibit different behavior, especially when dealing with > "multipart" MIME messages (i.e. documents which have been split > up into multiple messages), so check your local documentation on > how to manipulate these messages. > > >Below is the data which will enable a MIME compliant mail reader >implementation to automatically retrieve the ASCII version of the >Internet-Draft. > > > ---------- End of Forwarded Message ---------- From garth@panasas.com Wed Feb 11 16:11:10 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 92622 invoked from network); 12 Feb 2004 00:11:09 -0000 Received: from unknown (66.218.66.172) by m8.grp.scd.yahoo.com with QMQP; 12 Feb 2004 00:11:09 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 12 Feb 2004 00:11:08 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY3LX3; Wed, 11 Feb 2004 19:11:05 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; format=flowed To: pnfs-reqs@yahoogroups.com Date: Wed, 11 Feb 2004 16:10:54 -0800 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: concall tomorrow X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson We'll hold a concall tomorrow 11am EST. Agenda items can include: - we need to convert the problem statement into a presentation (our target was Feb 19) - we need to identify who is giving the presentation at Seoul, if our topic is given time - should we give it at Connectathon next week, and if so, who will give it (who is going?) - its time to get back to the original roles of the mailing lists: - a draft of a requirements doc - a draft of the operations we suggest for NFSv4 extension - a draft of the wire format of layout metadata for SBC (FC/SCSI) backends - a draft of the wire format of layout metadata for OSD backends - a draft of the wire format of layout metadata for NFS backends - get some milestones and dates beside the above garth From garth@panasas.com Wed Feb 11 16:16:57 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 57531 invoked from network); 12 Feb 2004 00:16:56 -0000 Received: from unknown (66.218.66.167) by m20.grp.scd.yahoo.com with QMQP; 12 Feb 2004 00:16:56 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 12 Feb 2004 00:16:55 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY3LY1; Wed, 11 Feb 2004 19:16:52 -0500 Mime-Version: 1.0 (Apple Message framework v612) To: pnfs-reqs@yahoogroups.com Message-Id: Content-Type: multipart/mixed; boundary=Apple-Mail-3--546444835 Date: Wed, 11 Feb 2004 16:16:42 -0800 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Fwd: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Reminder -- David's note on the presentation we need for IETF, and his example. Begin forwarded message: > From: black_david@emc.com > Date: December 23, 2003 12:32:59 PM PST > To: pnfs-reqs@yahoogroups.com > Subject: [pnfs-reqs] RE: NEPS-REQS: getting started > Reply-To: pnfs-reqs@yahoogroups.com > > Garth Gibson wrote: > >>> The RDDP problem statement is similar and dissimilar to what we are >>> doing. It is similar in that it is about higher performance, which >>> always turns out to be cost-performance. It is dissimilar in that it >>> was fighting an uphill battle to get RDMA into the IETF, while we are >>> looking at no preconceived support or opposition in the IETF (that I >>> am aware of). And it is dissimilar in that what we are proposing >>> helps in the manageability of federated systems, which is not really >>> a >>> performance issue. >>> >>> I followed the RDDP example closely because it was easy -- our >>> arguments on strictly bandwidth are at least as strong, in my >>> opinion. >>> And because I am not certain how to predict the IETF management's >>> reaction to a manageability argument. And the standardized client >>> code argument, although very import to some of us, seemed outside my >>> notion of the IETF scope. >>> >>> Perhaps those with more experience selling ideas to the IETF could >>> educate us? Should we focus on a small number of the most easily >>> demonstrated problems or fill the problem statement out with all the >>> problems we can contribute to solving? > > Having been heavily involved in getting both IPS and RDDP work underway > in the IETF, I have a few observations: > > - A problem statement draft is a good thing to have, but the folks in > charge of the IETF are looking for a concise summary of what the > problem is, how to go about solving it, and **why** the IETF should > solve it. The latter is of particular importance, as I'll > explain shortly. > - I've attached a slide deck that I used for RDDP at the Spring 2002 > IETF BOF on this topic. This sort of "elevator pitch" style > coverage of the topics is needed in addition to the more in-depth > academic approach that is in the RDDP problem statement. > - Goals and battles need to be chosen carefully. One of the things > that delayed RDDP work is that the RDDP proponents were > absolutely > convinced that they needed to change TCP, and hence decided to go > to battle with the IETF Transport community which was equally > convinced that TCP should not be changed. In 20/20 hindsight, > this was a mistake, as the IETF Transport community turned out > to be correct that TCP does not require normative changes for RDDP. > - Nonetheless, there is somewhat of an "uphill battle" to be engaged, > as > Beepy and/or Spencer described in Ann Arbor - the IETF has grown to > a potentially unwieldy size, and as a consequence has developed a > healthy institutional bias against new work. As a result, it is > necessary to have good reasons not only for why work should be > done, but also why it should be done in the IETF. The fact that > we want to extend an existing IETF protocol (NFSv4) in a way that > can take advantage of another (iSCSI) provides at least two reasons. > Beyond this, there is value in drawing on the IETF's network > expertise > in areas such as security. > - A draft WG statement/scope of work is very important at an early > stage, > including not only what we want to do, but what we do *not* want to > do. I tend to view the latter as more important, as a shared view > of what will not be worked on is a significant sign that a technical > community has coalesced around a common effort and goals. For > example, > there are fairly strong statements about work that is out of scope > in > both the IPS and RDDP charters, and as a WG chair, I've found those > statements useful from time to time ... > > I hope this helps, > --David > ---------------------------------------------------- > David L. Black, Senior Technologist > EMC Corporation, 176 South St., Hopkinton, MA 01748 > +1 (508) 293-7953 FAX: +1 (508) 293-7786 > black_david@emc.com Mobile: +1 (978) 394-7754 > ---------------------------------------------------- > > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > > > ------------------------ Yahoo! Groups Sponsor > ---------------------~--> > Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark > Printer at MyInks.com. Free s/h on orders $50 or more to the US & > Canada. > http://www.c1tracking.com/l.asp?cid=5511 > http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM > --------------------------------------------------------------------- > ~-> > > Yahoo! Groups Links > > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > Attachment (not stored) ROI-Problem-Scenario-0302.ppt Type: application/vnd.ms-powerpoint From ggrider@lanl.gov Wed Feb 11 21:55:17 2004 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 3577 invoked from network); 12 Feb 2004 05:55:16 -0000 Received: from unknown (66.218.66.172) by m13.grp.scd.yahoo.com with QMQP; 12 Feb 2004 05:55:16 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta4.grp.scd.yahoo.com with SMTP; 12 Feb 2004 05:55:15 -0000 Received: from mailrelay3.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1C5tEHR023258 for ; Wed, 11 Feb 2004 22:55:14 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay3.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1C5tEeI025221 for ; Wed, 11 Feb 2004 22:55:14 -0700 Received: from cthulu.lanl.gov (vpn-client-141.lanl.gov [128.165.253.141]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1C5t2Yi014519 for ; Wed, 11 Feb 2004 22:55:12 -0700 Message-Id: <5.2.0.9.2.20040211225331.015c28f0@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Wed, 11 Feb 2004 22:55:03 -0700 To: pnfs-reqs@yahoogroups.com In-Reply-To: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=====================_9859727==_" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: Re: Fwd: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs elevator pitch Gary Attachment (not stored) pNFS-elevator-pitch.ppt Type: application/octet-stream From pcorbett@netapp.com Thu Feb 12 07:22:44 2004 Return-Path: X-Sender: Peter.Corbett@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 66471 invoked from network); 12 Feb 2004 15:22:43 -0000 Received: from unknown (66.218.66.218) by m8.grp.scd.yahoo.com with QMQP; 12 Feb 2004 15:22:43 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta3.grp.scd.yahoo.com with SMTP; 12 Feb 2004 15:22:43 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i1CFMIJC026157 for ; Thu, 12 Feb 2004 07:22:18 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i1CFMIiJ004972 for ; Thu, 12 Feb 2004 07:22:18 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Thu, 12 Feb 2004 07:22:13 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Fwd: [pnfs-reqs] RE: NEPS-REQS: getting started Thread-Index: AcPxLMx2H5E+soocRACCb74eZ9kJkQATki0A To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Corbett, Peter" From: "Corbett, Peter" Subject: RE: Fwd: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=44152959 X-Yahoo-Profile: pfcorbett2004 The slides mention server bypass twice, but don't talk about parallel access. Our problem statement is much more focussed on parallel access. We are trying to define a standard for parallel access, whether that is to nfs based data servers, object servers, virtualized SAN or non-virtualized SAN devices. So, I think we need to be clearer when talking about bypassing the server that we are really talking about direct access to a parallel data store from clustered clients, with a shared NFS server acting as a metadata server. To me, that is the key point that applies across the entire solution space, whereas direct access to devices is a data access technique in part of the solution space. -----Original Message----- From: Gary Grider [mailto:ggrider@lanl.gov] Sent: Thursday, February 12, 2004 12:55 AM To: pnfs-reqs@yahoogroups.com Subject: Re: Fwd: [pnfs-reqs] RE: NEPS-REQS: getting started elevator pitch Gary Yahoo! Groups Links From margaret.susairaj@oracle.com Thu Feb 12 07:33:55 2004 Return-Path: X-Sender: Margaret.Susairaj@oracle.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 94736 invoked from network); 12 Feb 2004 15:33:53 -0000 Received: from unknown (66.218.66.217) by m20.grp.scd.yahoo.com with QMQP; 12 Feb 2004 15:33:53 -0000 Received: from unknown (HELO agminet02.oracle.com) (141.146.126.229) by mta2.grp.scd.yahoo.com with SMTP; 12 Feb 2004 15:33:53 -0000 Received: from rgmgw4.us.oracle.com (rgmgw4.us.oracle.com [138.1.191.13]) by agminet02.oracle.com (Switch-3.1.2/Switch-3.1.0) with ESMTP id i1CFVEcq006415 for ; Thu, 12 Feb 2004 07:31:52 -0800 Received: from rgmgw4.us.oracle.com (localhost [127.0.0.1]) by rgmgw4.us.oracle.com (Switch-2.1.5/Switch-2.1.0) with ESMTP id i1CFVDb23530 for ; Thu, 12 Feb 2004 08:31:13 -0700 (MST) Received: from oracle.com (dhcp-amer-vpn-gw2-east-141-144-81-15.vpn.oracle.com [141.144.81.15]) by rgmgw4.us.oracle.com (Switch-2.1.5/Switch-2.1.0) with ESMTP id i1CFVDb23508 for ; Thu, 12 Feb 2004 08:31:13 -0700 (MST) Message-ID: <402B9E6C.FD2926D8@oracle.com> Date: Thu, 12 Feb 2004 07:40:28 -0800 Organization: Oracle Corporation X-Mailer: Mozilla 4.7 [en] (WinNT; I) X-Accept-Language: en MIME-Version: 1.0 To: pnfs-reqs@yahoogroups.com References: Content-Type: multipart/mixed; boundary="------------ABD85AA3E0FD053C29F098BA" X-Brightmail-Tracker: AAAAAQAAAAI= X-White-List-Member: TRUE X-eGroups-Remote-IP: 141.146.126.229 From: Margaret Susairaj Subject: Re: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=175561634 X-Yahoo-Profile: msusaira ADVERTISEMENT Garth, I joined this group recently. What is the number I can call to attend the concall? Regards, Margaret Garth Gibson wrote: > We'll hold a concall tomorrow 11am EST. > > Agenda items can include: > > - we need to convert the problem statement into a presentation (our > target was Feb 19) > > - we need to identify who is giving the presentation at Seoul, if our > topic is given time > > - should we give it at Connectathon next week, and if so, who will give > it (who is going?) > > - its time to get back to the original roles of the mailing lists: > - a draft of a requirements doc > - a draft of the operations we suggest for NFSv4 extension > - a draft of the wire format of layout metadata for SBC (FC/SCSI) > backends > - a draft of the wire format of layout metadata for OSD backends > - a draft of the wire format of layout metadata for NFS backends > > - get some milestones and dates beside the above > > garth From bhalevy@panasas.com Thu Feb 12 07:58:28 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 65082 invoked from network); 12 Feb 2004 15:58:24 -0000 Received: from unknown (66.218.66.217) by m16.grp.scd.yahoo.com with QMQP; 12 Feb 2004 15:58:24 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta2.grp.scd.yahoo.com with SMTP; 12 Feb 2004 15:58:23 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Thu, 12 Feb 2004 10:58:07 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D38863@PIKES.panasas.com> To: "'pnfs-reqs@yahoogroups.com'" Date: Thu, 12 Feb 2004 10:58:07 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3F181.01463780" X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy 1-800-387-6159 conf. id #5370035 Benny -----Original Message----- From: Margaret Susairaj [mailto:margaret.susairaj@oracle.com] Sent: Thursday, February 12, 2004 10:40 AM To: pnfs-reqs@yahoogroups.com Subject: Re: [pnfs-reqs] concall tomorrow Garth, I joined this group recently. What is the number I can call to attend the concall? Regards, Margaret Garth Gibson wrote: > We'll hold a concall tomorrow 11am EST. > > Agenda items can include: > > - we need to convert the problem statement into a presentation (our > target was Feb 19) > > - we need to identify who is giving the presentation at Seoul, if our > topic is given time > > - should we give it at Connectathon next week, and if so, who will give > it (who is going?) > > - its time to get back to the original roles of the mailing lists: > - a draft of a requirements doc > - a draft of the operations we suggest for NFSv4 extension > - a draft of the wire format of layout metadata for SBC (FC/SCSI) > backends > - a draft of the wire format of layout metadata for OSD backends > - a draft of the wire format of layout metadata for NFS backends > > - get some milestones and dates beside the above > > garth From bhalevy@panasas.com Thu Feb 12 09:04:58 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@groups.yahoo.com Received: (qmail 50691 invoked from network); 12 Feb 2004 17:04:56 -0000 Received: from unknown (66.218.66.218) by m6.grp.scd.yahoo.com with QMQP; 12 Feb 2004 17:04:56 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 12 Feb 2004 17:04:56 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Thu, 12 Feb 2004 12:04:41 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D38869@PIKES.panasas.com> To: "'pnfs-reqs@groups.yahoo.com'" Date: Thu, 12 Feb 2004 12:04:40 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: access to pnfs-* mail archives X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy Following up on our conference call from today, people asked what are all the pnfs-* lists and how to access their e-mail archives. The groups are pnfs-reqs where we discussed the problem statement and will get deeper into the requirements draft. pnfs-ops where we discuss the common extensions. and pnfs-nfs, pnfs-obj, and pnfs-sbc where we intend to discuss the specifics of each flavor such as layout format, addressing scheme, security details, etc. We have limited access to the groups' email archives for members only. If you are interested in the archive but not in getting each posting in (soft) real time for that list the membership configuration for "Group Messages Delivery" allows you to select either "Individual Emails" (default. you get it all), "Special Notices", "Daily Digest", or "No email". All the groups can be accessed via http://groups.yahoo.com/group/ Benny -- Benny Halevy Software Architect, Panasas Inc. Delivering the premier storage system for scalable Linux clusters http://www.panasas.com bhalevy@panasas.com tel: 412-323-6437 cell: 412-580-2520 From bwelch@panasas.com Thu Feb 12 09:20:33 2004 Return-Path: X-Sender: welch@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 65152 invoked from network); 12 Feb 2004 17:20:32 -0000 Received: from unknown (66.218.66.217) by m19.grp.scd.yahoo.com with QMQP; 12 Feb 2004 17:20:32 -0000 Received: from unknown (HELO medlicott.panasas.com) (63.80.58.202) by mta2.grp.scd.yahoo.com with SMTP; 12 Feb 2004 17:20:32 -0000 Received: from panasas.com (welch@localhost) by medlicott.panasas.com (8.11.6/8.11.6) with ESMTP id i1CHKVU02851 for ; Thu, 12 Feb 2004 09:20:31 -0800 Message-Id: <200402121720.i1CHKVU02851@medlicott.panasas.com> X-Authentication-Warning: medlicott.panasas.com: welch owned process doing -bs X-Mailer: exmh version 2.6.3 04/02/2003 with nmh-1.0.4 To: pnfs-reqs@yahoogroups.com In-reply-to: References: Comments: In-reply-to Garth Gibson message dated "Wed, 11 Feb 2004 16:10:54 -0800." X-URL: http://www.panasas.com/ X-Face: "HxE|?EnC9fVMV8f70H83&{fgLE.|FZ^$>@Q(yb#N,Eh~N]e&]=> r5~UnRml1:4EglY{9B+ :'wJq$@c_C!l8@<$t,{YUr4K,QJGHSvS~U]H`<+L*x?eGzSk>XH\W:AK\j?@?c1o From: Brent Welch Subject: Re: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=169551413 X-Yahoo-Profile: brent_welch_1960 Here are my notes from the call today. Present were Brent Welch, Benny Halevy, Peter Corbett, Dave Noveck, Tom Talpey, David Black >>>Garth Gibson said: > - we need to convert the problem statement into a presentation (our > target was Feb 19) We will work on this via email today and tomorrow as next week is full of travel and vacation days for several of us. Look for another email from me with a sketch talk outline (cannon fodder) > - we need to identify who is giving the presentation at Seoul, if our > topic is given time Tom will give the talk. BP has been informed we want the slot, which will probably be 10 minutes. There is 2 hours for NFSv4, so it seems there will be ample room. Haven't gotten confirmation, yet. > - should we give it at Connectathon next week, and if so, who will give > it (who is going?) Tom will be there Monday and part of Tuesday, and will try to give an informal talk - the talk schedule is currently full except for overflow time on Wednesday. To prime the pump, Tom is going to send the problem statement (or a pointer) to the nfsv4 working group mailing list. We'll use the same material as the IETF talk, (heavy overlap) but the focus of the two talks will be different. IETF more about why IETF should be interested. Connectathon, why this is technically cool. > - its time to get back to the original roles of the mailing lists: > - a draft of a requirements doc > - a draft of the operations we suggest for NFSv4 extension > - a draft of the wire format of layout metadata for SBC (FC/SCSI) > backends > - a draft of the wire format of layout metadata for OSD backends > - a draft of the wire format of layout metadata for NFS backends > > - get some milestones and dates beside the above We seemed mostly interested in focusing on the requirements doc next, although we should at least have a sketch in place for the ops and the metadata formats for the block/object/file mechanisms. There was also brief discussion about the mailing lists. All the traffic is on this the pnfs-reqs list right now, with some older traffic on the pnfs-ops group. Eventually we expect most stuff to transition to the nfsv4 list, but not until we get a more official charter. -- Brent Welch Software Architect, Panasas Inc Delivering the premier storage system for scalable Linux clusters www.panasas.com welch@panasas.com From ggrider@lanl.gov Thu Feb 12 15:31:38 2004 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 16743 invoked from network); 12 Feb 2004 23:31:38 -0000 Received: from unknown (66.218.66.218) by m12.grp.scd.yahoo.com with QMQP; 12 Feb 2004 23:31:38 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta3.grp.scd.yahoo.com with SMTP; 12 Feb 2004 23:31:37 -0000 Received: from mailrelay3.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1CNVaHR032738 for ; Thu, 12 Feb 2004 16:31:36 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay3.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1CNVaeI021291 for ; Thu, 12 Feb 2004 16:31:36 -0700 Received: from cthulu.lanl.gov (vpn-client-224.lanl.gov [128.165.253.224]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1CNVYYi004263 for ; Thu, 12 Feb 2004 16:31:35 -0700 Message-Id: <5.2.0.9.2.20040212162218.0160a6c8@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Thu, 12 Feb 2004 16:31:34 -0700 To: pnfs-reqs@yahoogroups.com In-Reply-To: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=====================_26921190==.ALT" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: Re: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs At 04:10 PM 2/11/2004 -0800, Garth Gibson wrote: > We'll hold a concall tomorrow 11am EST. > > Agenda items can include: > > - we need to convert the problem statement into a presentation (our > target was Feb 19) > > - we need to identify who is giving the presentation at Seoul, if our > topic is given time > > - should we give it at Connectathon next week, and if so, who will give > it (who is going?) > > - its time to get back to the original roles of the mailing lists: > - a draft of a requirements doc > - a draft of the operations we suggest for NFSv4 extension > - a draft of the wire format of layout metadata for SBC (FC/SCSI) > backends > - a draft of the wire format of layout metadata for OSD backends > - a draft of the wire format of layout metadata for NFS backends Ok, I have a big problem with the last three statements. I know I have not been following this stuff as closely as I should have, but why cant the maps be sent to from the server in an agnostic way? Why does the NFS server have to understand any of what is in the map? Why isnt there a plug in on the server side to get the map info for the server to send to the client, and why isnt there a plug in on the client side to pass the map down to? What if the world decides to add another way to do I/O besides Block, Object, and NFS? What happens if the Object model evolves, does NFS have to change to stay in sync with it? Shouldnt all this stuff be as opaque as it can be? You do need locks for making sure the map doesnt change. I am confused why we gave up on trying to make this agnostic. Sorry for the question, but I have this built in alarm, that goes off when I see any numbers besides 0,1, and N (not 3). Thanks Gary > - get some milestones and dates beside the above > > garth > > > Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From garth@panasas.com Thu Feb 12 16:17:25 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 12811 invoked from network); 13 Feb 2004 00:17:18 -0000 Received: from unknown (66.218.66.172) by m13.grp.scd.yahoo.com with QMQP; 13 Feb 2004 00:17:18 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 13 Feb 2004 00:17:18 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY3RG1; Thu, 12 Feb 2004 19:17:16 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <5.2.0.9.2.20040212162218.0160a6c8@cic-mail.lanl.gov> References: <5.2.0.9.2.20040212162218.0160a6c8@cic-mail.lanl.gov> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Message-Id: Content-Transfer-Encoding: quoted-printable Date: Thu, 12 Feb 2004 16:17:08 -0800 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson On Feb 12, 2004, at 3:31 PM, Gary Grider wrote: > At 04:10 PM 2/11/2004 -0800, Garth Gibson wrote: >> - its time to get back to the original roles of the mailing lists: >> - a draft of a requirements doc >> - a draft of the operations we suggest for NFSv4 extension >> - a draft of the wire format of layout metadata for SBC >> (FC/SCSI) backends >> - a draft of the wire format of layout metadata for OSD >> backends >> - a draft of the wire format of layout metadata for NFS >> backends > > Ok, I have a big problem with the last three statements. I know I > have not been following > this stuff as closely as I should have, but why cant the maps be sent > to from the server > in an agnostic way? Why does the NFS server have to understand any > of what is in the > map? Why isnt there a plug in on the server side to get the map info > for the server to > send to the client, and why isnt there a plug in on the client side > to pass the map down to? The NFS server does not have to understand what is in the map, other than enough to know what file the layout delegation and map pertain to for recalling that delegation. However, for any chance of interoperability, the map formats must be documented. And since each backend server flavor has addressing characteristics that will be visible in the map, the documented map formats will be specific to the backend flavors. The theory we started with was that the base NFSv4 extensions would describe opaque "maps". And that we would propose separate internet drafts for a wire format for each flavor of map. Hence the NFSv4 extensions are backend protocol agnostic, yet the client implementations can be interoperable. > What if the world decides to add another way to do I/O besides Block, > Object, and NFS? > What happens if the Object model evolves, does NFS have to change to > stay in > sync with it? Shouldnt all this stuff be as opaque as it can be? > You do need locks > for making sure the map doesnt change. A new backend protocol would cause a new map flavor, and a new document for that map flavor's wire format. The challenge for us is to make the typing and sizing of the opaque map flexible enough to allow said extension. The fall back, if the new backend is very much different from the scope of SBC, OSD and NFS, would be do further extend NFSv4. I hope we do not have to do that. > I am confused why we gave up on trying to make this agnostic. > Sorry for the question, but I have this built in alarm, that goes off > when I see > any numbers besides 0,1, and N (not 3). This is intended to be a structure for 1, 2 and 3, with an inductive step enabling any positive whole number to be induced :-) > Thanks > Gary garth From ggrider@lanl.gov Thu Feb 12 16:44:30 2004 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 3162 invoked from network); 13 Feb 2004 00:44:27 -0000 Received: from unknown (66.218.66.167) by m5.grp.scd.yahoo.com with QMQP; 13 Feb 2004 00:44:27 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta6.grp.scd.yahoo.com with SMTP; 13 Feb 2004 00:44:26 -0000 Received: from mailrelay2.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1D0iQHR004985 for ; Thu, 12 Feb 2004 17:44:26 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay2.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1D0iPq1026837 for ; Thu, 12 Feb 2004 17:44:25 -0700 Received: from cthulu.lanl.gov (vpn-client-224.lanl.gov [128.165.253.224]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1D0iNYi010035 for ; Thu, 12 Feb 2004 17:44:24 -0700 Message-Id: <5.2.0.9.2.20040212173055.015c5670@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Thu, 12 Feb 2004 17:44:21 -0700 To: pnfs-reqs@yahoogroups.com In-Reply-To: References: <5.2.0.9.2.20040212162218.0160a6c8@cic-mail.lanl.gov> <5.2.0.9.2.20040212162218.0160a6c8@cic-mail.lanl.gov> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=====================_31290192==.ALT" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: Re: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs At 04:17 PM 2/12/2004 -0800, Garth Gibson wrote: > On Feb 12, 2004, at 3:31 PM, Gary Grider wrote: > > At 04:10 PM 2/11/2004 -0800, Garth Gibson wrote: > >> - its time to get back to the original roles of the mailing lists: > >> - a draft of a requirements doc > >> - a draft of the operations we suggest for NFSv4 extension > >> - a draft of the wire format of layout metadata for SBC > >> (FC/SCSI) backends > >> - a draft of the wire format of layout metadata for OSD > >> backends > >> - a draft of the wire format of layout metadata for NFS > >> backends > > > > Ok, I have a big problem with the last three statements. I know I > > have not been following > > this stuff as closely as I should have, but why cant the maps be sent > > to from the server > > in an agnostic way? Why does the NFS server have to understand any > > of what is in the > > map? Why isnt there a plug in on the server side to get the map info > > for the server to > > send to the client, and why isnt there a plug in on the client side > > to pass the map down to? > > The NFS server does not have to understand what is in the map, other > than enough to know what file the layout delegation and map pertain to > for recalling that delegation. > > However, for any chance of interoperability, the map formats must be > documented. agree, but does IETF care about this, its just information that NFS can pass on, seems like if a consortia of T10 folks want a common set of plugins, thats great, if SBC folks want a common plug in, thats great, but why should NFS/IETF care about this? > And since each backend server flavor has addressing > characteristics that will be visible in the map, the documented map > formats will be specific to the backend flavors. > > The theory we started with was that the base NFSv4 extensions would > describe opaque "maps". And that we would propose separate internet > drafts for a wire format for each flavor of map. Why do we need a different wire format? Isnt it just a blob of data that is passed through some normal NFS protocol mechanism? Why does it need a wire format? I agree it needs a published format of the stream of data. Am I reading more into "wire" than I should. If we do a separate IETF process for each new format, wont it be hard to keep up. I agree we need to have a "type" of back end and maybe a version or something, but what is in the map could be of no concern to the IETF, couldnt it? > Hence the NFSv4 > extensions are backend protocol agnostic, yet the client > implementations can be interoperable. > > > What if the world decides to add another way to do I/O besides Block, > > Object, and NFS? > > What happens if the Object model evolves, does NFS have to change to > > stay in > > sync with it? Shouldnt all this stuff be as opaque as it can be? > > You do need locks > > for making sure the map doesnt change. > > A new backend protocol would cause a new map flavor, and a new document > for that map flavor's wire format. Do we want to get an IETF action every time we need to add a new flavor? How easy is this process going to be? How easy is it going to be to change. Sounds like it will be very much harder than just changing out your plugins. Are we linking several different communities to work in lock step? Is that good or bad? Thanks Gary > The challenge for us is to make the typing and sizing of the opaque map > flexible enough to allow said extension. The fall back, if the new > backend is very much different from the scope of SBC, OSD and NFS, > would be do further extend NFSv4. I hope we do not have to do that. > > > I am confused why we gave up on trying to make this agnostic. > > Sorry for the question, but I have this built in alarm, that goes off > > when I see > > any numbers besides 0,1, and N (not 3). > > This is intended to be a structure for 1, 2 and 3, with an inductive > step enabling any positive whole number to be induced :-) > > > Thanks > > Gary > > garth > > Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From bhalevy@panasas.com Thu Feb 12 16:52:15 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 46343 invoked from network); 13 Feb 2004 00:52:15 -0000 Received: from unknown (66.218.66.216) by m14.grp.scd.yahoo.com with QMQP; 13 Feb 2004 00:52:15 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta1.grp.scd.yahoo.com with SMTP; 13 Feb 2004 00:52:14 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Thu, 12 Feb 2004 19:52:13 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D38874@PIKES.panasas.com> To: "'pnfs-reqs@yahoogroups.com'" Date: Thu, 12 Feb 2004 19:52:12 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy ADVERTISEMENT I agree with everything Garth says below. Just as an example of how an extensible mechanism was spec'ed before in the NFS community you can take the RPC protocol security flavors. RPC (rfc1831) specifies how RPC security flavors are encoded in the RPC protocol. It specifies the basic flavors and the wire format of the auth_flavor "frame" which contains a type and an opaque_body. It says: enum auth_flavor { AUTH_NONE = 0, AUTH_SYS = 1, AUTH_SHORT = 2 /* and more to be defined */ }; struct opaque_auth { auth_flavor flavor; opaque body<400>; }; ... The interpretation and semantics of the data contained within the authentication fields is specified by individual, independent authentication protocol specifications. (Section 9 defines the various authentication protocols.) ... 9. AUTHENTICATION PROTOCOLS As previously stated, authentication parameters are opaque, but open-ended to the rest of the RPC protocol. This section defines two standard "flavors" of authentication. Implementors are free to invent new authentication types, with the same rules of flavor number assignment as there is for program number assignment. The "flavor" of a credential or verifier refers to the value of the "flavor" field in the opaque_auth structure. Flavor numbers, like RPC program numbers, are also administered centrally, and developers may assign new flavor numbers by applying through electronic mail to "rpc@sun.com". Credentials and verifiers are represented as variable length opaque data (the "body" field in the opaque_auth structure). In this document, two flavors of authentication are described. Of these, Null authentication (described in the next subsection) is mandatory - it must be available in all implementations. System authentication is described in Appendix A. And it then defines the NULL auth flavor and later in appendix the SYS auth. Later, more auth flavors were added, and lately NFSv4 mandated a new auth flavor, GSS-API, that is defined in a separate RFC (rfc2743) which is referred to by the NFSv4 RFC. The transport protocols, RPC and NFS, do not define the wire format of all security flavors but provide enough metadata in the spec for clients and servers to interoperate. Benny >-----Original Message----- >From: Garth Gibson [mailto:garth@Panasas.Com] >Sent: Thursday, February 12, 2004 7:17 PM >To: pnfs-reqs@yahoogroups.com >Subject: Re: [pnfs-reqs] concall tomorrow > > >On Feb 12, 2004, at 3:31 PM, Gary Grider wrote: >> At 04:10 PM 2/11/2004 -0800, Garth Gibson wrote: >>> - its time to get back to the original roles of the mailing lists: >>> - a draft of a requirements doc >>> - a draft of the operations we suggest for NFSv4 extension >>> - a draft of the wire format of layout metadata for SBC >>> (FC/SCSI) backends >>> - a draft of the wire format of layout metadata for OSD >>> backends >>> - a draft of the wire format of layout metadata for NFS >>> backends >> >> Ok, I have a big problem with the last three statements. I know I >> have not been following >> this stuff as closely as I should have, but why cant the >maps be sent >> to from the server >> in an agnostic way? Why does the NFS server have to understand any >> of what is in the >> map? Why isnt there a plug in on the server side to get >the map info >> for the server to >> send to the client, and why isnt there a plug in on the client side >> to pass the map down to? > >The NFS server does not have to understand what is in the map, other >than enough to know what file the layout delegation and map pertain to >for recalling that delegation. > >However, for any chance of interoperability, the map formats must be >documented. And since each backend server flavor has addressing >characteristics that will be visible in the map, the documented map >formats will be specific to the backend flavors. > >The theory we started with was that the base NFSv4 extensions would >describe opaque "maps". And that we would propose separate internet >drafts for a wire format for each flavor of map. Hence the NFSv4 >extensions are backend protocol agnostic, yet the client >implementations can be interoperable. > >> What if the world decides to add another way to do I/O >besides Block, >> Object, and NFS? >> What happens if the Object model evolves, does NFS have to >change to >> stay in >> sync with it? Shouldnt all this stuff be as opaque as it can be? >> You do need locks >> for making sure the map doesnt change. > >A new backend protocol would cause a new map flavor, and a new >document >for that map flavor's wire format. > >The challenge for us is to make the typing and sizing of the >opaque map >flexible enough to allow said extension. The fall back, if the new >backend is very much different from the scope of SBC, OSD and NFS, >would be do further extend NFSv4. I hope we do not have to do that. > >> I am confused why we gave up on trying to make this agnostic. >> Sorry for the question, but I have this built in alarm, >that goes off >> when I see >> any numbers besides 0,1, and N (not 3). > >This is intended to be a structure for 1, 2 and 3, with an inductive >step enabling any positive whole number to be induced :-) > >> Thanks >> Gary > >garth > > > >Yahoo! Groups Links > > > > > From bhalevy@panasas.com Thu Feb 12 17:03:27 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 61264 invoked from network); 13 Feb 2004 01:03:25 -0000 Received: from unknown (66.218.66.167) by m3.grp.scd.yahoo.com with QMQP; 13 Feb 2004 01:03:25 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 13 Feb 2004 01:03:25 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Thu, 12 Feb 2004 20:03:24 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D38875@PIKES.panasas.com> To: "'pnfs-reqs@yahoogroups.com'" Date: Thu, 12 Feb 2004 20:03:23 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3F1CD.2DA4CA70" X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy ADVERTISEMENT > If we do a separate IETF process for each new format, wont it > be hard to keep up. I don't see that as necessarily a bad thing... > I agree we need to have a "type" of back end and maybe a version > or something, but what is in the map could be of no concern to the IETF, couldnt it? The IETF is concerned about interoperability. I believe we should be concerned with standardizing the wire format of the layout maps. Do we *have* to do that within the IETF? There could be other options but since you have to refer to some external standard to talk about interoperability it'll be difficult to refer to non-existing standards that are being standardized outside of the IETF. T10-OSD is one such external standard that substantial enough so it can be referred to. > Do we want to get an IETF action every time we need to add a new flavor? Probably, in order to extend the vector of available flavors. This could be done within NFSv4's minor versioning model. > How easy is this process going to be? How easy is it going to be to change. > Sounds like it will be very much harder than just changing out your plugins. Again, there's a trade-off between versatility and interoperability. > Are we linking several different communities to work in lock > step? Is that good or bad? The hierarchical standard model is intended to allow each flavor to make progress at its own pace. Benny -----Original Message----- From: Gary Grider [mailto:ggrider@lanl.gov] Sent: Thursday, February 12, 2004 7:44 PM To: pnfs-reqs@yahoogroups.com Subject: Re: [pnfs-reqs] concall tomorrow At 04:17 PM 2/12/2004 -0800, Garth Gibson wrote: > On Feb 12, 2004, at 3:31 PM, Gary Grider wrote: > > At 04:10 PM 2/11/2004 -0800, Garth Gibson wrote: > >> - its time to get back to the original roles of the mailing lists: > >> - a draft of a requirements doc > >> - a draft of the operations we suggest for NFSv4 extension > >> - a draft of the wire format of layout metadata for SBC > >> (FC/SCSI) backends > >> - a draft of the wire format of layout metadata for OSD > >> backends > >> - a draft of the wire format of layout metadata for NFS > >> backends > > > > Ok, I have a big problem with the last three statements. I know I > > have not been following > > this stuff as closely as I should have, but why cant the maps be sent > > to from the server > > in an agnostic way? Why does the NFS server have to understand any > > of what is in the > > map? Why isnt there a plug in on the server side to get the map info > > for the server to > > send to the client, and why isnt there a plug in on the client side > > to pass the map down to? > > The NFS server does not have to understand what is in the map, other > than enough to know what file the layout delegation and map pertain to > for recalling that delegation. > > However, for any chance of interoperability, the map formats must be > documented. agree, but does IETF care about this, its just information that NFS can pass on, seems like if a consortia of T10 folks want a common set of plugins, thats great, if SBC folks want a common plug in, thats great, but why should NFS/IETF care about this? > And since each backend server flavor has addressing > characteristics that will be visible in the map, the documented map > formats will be specific to the backend flavors. > > The theory we started with was that the base NFSv4 extensions would > describe opaque "maps". And that we would propose separate internet > drafts for a wire format for each flavor of map. Why do we need a different wire format? Isnt it just a blob of data that is passed through some normal NFS protocol mechanism? Why does it need a wire format? I agree it needs a published format of the stream of data. Am I reading more into "wire" than I should. If we do a separate IETF process for each new format, wont it be hard to keep up. I agree we need to have a "type" of back end and maybe a version or something, but what is in the map could be of no concern to the IETF, couldnt it? > Hence the NFSv4 > extensions are backend protocol agnostic, yet the client > implementations can be interoperable. > > > What if the world decides to add another way to do I/O besides Block, > > Object, and NFS? > > What happens if the Object model evolves, does NFS have to change to > > stay in > > sync with it? Shouldnt all this stuff be as opaque as it can be? > > You do need locks > > for making sure the map doesnt change. > > A new backend protocol would cause a new map flavor, and a new document > for that map flavor's wire format. Do we want to get an IETF action every time we need to add a new flavor? How easy is this process going to be? How easy is it going to be to change. Sounds like it will be very much harder than just changing out your plugins. Are we linking several different communities to work in lock step? Is that good or bad? Thanks Gary > The challenge for us is to make the typing and sizing of the opaque map > flexible enough to allow said extension. The fall back, if the new > backend is very much different from the scope of SBC, OSD and NFS, > would be do further extend NFSv4. I hope we do not have to do that. > > > I am confused why we gave up on trying to make this agnostic. > > Sorry for the question, but I have this built in alarm, that goes off > > when I see > > any numbers besides 0,1, and N (not 3). > > This is intended to be a structure for 1, 2 and 3, with an inductive > step enabling any positive whole number to be induced :-) > > > Thanks > > Gary > > garth > > Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From ggrider@lanl.gov Thu Feb 12 17:19:21 2004 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 24129 invoked from network); 13 Feb 2004 01:19:20 -0000 Received: from unknown (66.218.66.166) by m7.grp.scd.yahoo.com with QMQP; 13 Feb 2004 01:19:20 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta5.grp.scd.yahoo.com with SMTP; 13 Feb 2004 01:19:19 -0000 Received: from mailrelay3.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1D1JJHR007154 for ; Thu, 12 Feb 2004 18:19:19 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay3.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1D1JIeI007886 for ; Thu, 12 Feb 2004 18:19:18 -0700 Received: from cthulu.lanl.gov (vpn-client-224.lanl.gov [128.165.253.224]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1D1JFYi012624 for ; Thu, 12 Feb 2004 18:19:16 -0700 Message-Id: <5.2.0.9.2.20040212181154.01621d60@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Thu, 12 Feb 2004 18:19:15 -0700 To: pnfs-reqs@yahoogroups.com In-Reply-To: <30489F1321F5C343ACF6872B2CF7942A05D38875@PIKES.panasas.com > Mime-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="=====================_33382571==.REL" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: RE: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs So I think having a header that has the info that is needed for interop is fine and having a payload that is opaque is fine. So explain why we need three different "wire formats" for these three different methods? Why arent these things just 3 different types within the same wire format? Or is that what is envisioned, and I am just hung up on the terminology, and what you are saying is that we need to 1) define the header for interop type, version, length perhaps etc. and 2) define a) opaque info for SBC type = 1 version = 1 b) opaque info for Object type = 2 version = 1 c) opaque info for NFS back ends type = 3 version = 1 and if someone wants to write an IETF request to have type = 4 added they can write a draft and get it approved and document the opaque info format? Thanks Gary At 08:03 PM 2/12/2004 -0500, Halevy, Benny wrote: > > If we do a separate IETF process for each new format, wont it > > be hard to keep up. > > I don't see that as necessarily a bad thing... > > > I agree we need to have a "type" of back end and maybe a version > > or something, but what is in the map could be of no concern to the IETF, couldnt it? > The IETF is concerned about interoperability. I believe we should be concerned with standardizing > the wire format of the layout maps. Do we *have* to do that within the IETF? There > could be other options but since you have to refer to some external standard to talk about > interoperability it'll be difficult to refer to non-existing standards that are being standardized > outside of the IETF. > > T10-OSD is one such external standard that substantial enough so it can be referred to. > > > Do we want to get an IETF action every time we need to add a new flavor? > > Probably, in order to extend the vector of available flavors. This could be done within > NFSv4's minor versioning model. > > > How easy is this process going to be? How easy is it going to be to change. > > Sounds like it will be very much harder than just changing out your plugins. > > Again, there's a trade-off between versatility and interoperability. > > > Are we linking several different communities to work in lock > > step? Is that good or bad? > > The hierarchical standard model is intended to allow each flavor to make progress at its own pace. > > Benny > > -----Original Message----- > From: Gary Grider [mailto:ggrider@lanl.gov] > Sent: Thursday, February 12, 2004 7:44 PM > To: pnfs-reqs@yahoogroups.com > Subject: Re: [pnfs-reqs] concall tomorrow > > At 04:17 PM 2/12/2004 -0800, Garth Gibson wrote: > >> On Feb 12, 2004, at 3:31 PM, Gary Grider wrote: >> > At 04:10 PM 2/11/2004 -0800, Garth Gibson wrote: >> >> - its time to get back to the original roles of the mailing lists: >> >> - a draft of a requirements doc >> >> - a draft of the operations we suggest for NFSv4 extension >> >> - a draft of the wire format of layout metadata for SBC >> >> (FC/SCSI) backends >> >> - a draft of the wire format of layout metadata for OSD >> >> backends >> >> - a draft of the wire format of layout metadata for NFS >> >> backends >> > >> > Ok, I have a big problem with the last three statements. I know I >> > have not been following >> > this stuff as closely as I should have, but why cant the maps be sent >> > to from the server >> > in an agnostic way? Why does the NFS server have to understand any >> > of what is in the >> > map? Why isnt there a plug in on the server side to get the map info >> > for the server to >> > send to the client, and why isnt there a plug in on the client side >> > to pass the map down to? >> >> The NFS server does not have to understand what is in the map, other >> than enough to know what file the layout delegation and map pertain to >> for recalling that delegation. >> >> However, for any chance of interoperability, the map formats must be >> documented. > > > agree, but does IETF care about this, its just information that NFS can pass on, > seems like if a consortia of T10 folks want a common set of plugins, thats great, > if SBC folks want a common plug in, thats great, but why should NFS/IETF care > about this? > >> And since each backend server flavor has addressing >> characteristics that will be visible in the map, the documented map >> formats will be specific to the backend flavors. >> >> The theory we started with was that the base NFSv4 extensions would >> describe opaque "maps". And that we would propose separate internet >> drafts for a wire format for each flavor of map. > > > Why do we need a different wire format? Isnt it just a blob of data that is passed > through some normal NFS protocol mechanism? Why does it need a wire format? > I agree it needs a published format of the stream of data. Am I reading more into > "wire" than I should. If we do a separate IETF process for each new format, wont it > be hard to keep up. I agree we need to have a "type" of back end and maybe a version > or something, but what is in the map could be of no concern to the IETF, couldnt it? > >> Hence the NFSv4 >> extensions are backend protocol agnostic, yet the client >> implementations can be interoperable. >> >> > What if the world decides to add another way to do I/O besides Block, >> > Object, and NFS? >> > What happens if the Object model evolves, does NFS have to change to >> > stay in >> > sync with it? Shouldnt all this stuff be as opaque as it can be? >> > You do need locks >> > for making sure the map doesnt change. >> >> A new backend protocol would cause a new map flavor, and a new document >> for that map flavor's wire format. > > > Do we want to get an IETF action every time we need to add a new flavor? > How easy is this process going to be? How easy is it going to be to change. > Sounds like it will be very much harder than just changing out your plugins. > Are we linking several different communities to work in lock > step? Is that good or bad? > > > Thanks > Gary > > >> The challenge for us is to make the typing and sizing of the opaque map >> flexible enough to allow said extension. The fall back, if the new >> backend is very much different from the scope of SBC, OSD and NFS, >> would be do further extend NFSv4. I hope we do not have to do that. >> >> > I am confused why we gave up on trying to make this agnostic. >> > Sorry for the question, but I have this built in alarm, that goes off >> > when I see >> > any numbers besides 0,1, and N (not 3). >> >> This is intended to be a structure for 1, 2 and 3, with an inductive >> step enabling any positive whole number to be induced :-) >> >> > Thanks >> > Gary >> >> garth >> >> Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Yahoo! Groups Sponsor ADVERTISEMENT 1fd596e.jpgClick Here 1fd5ab9.jpg Yahoo! Groups Links * To visit your group on the web, go to: * http://groups.yahoo.com/group/pnfs-reqs/ * * To unsubscribe from this group, send an email to: * pnfs-reqs-unsubscribe@yahoogroups.com * * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From Thomas.Talpey@netapp.com Thu Feb 12 17:22:21 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 8672 invoked from network); 13 Feb 2004 01:22:21 -0000 Received: from unknown (66.218.66.218) by m11.grp.scd.yahoo.com with QMQP; 13 Feb 2004 01:22:21 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta3.grp.scd.yahoo.com with SMTP; 13 Feb 2004 01:22:21 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i1D1MKJC029608 for ; Thu, 12 Feb 2004 17:22:20 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i1D1MKiH015003 for ; Thu, 12 Feb 2004 17:22:20 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.30]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Thu, 12 Feb 2004 20:22:14 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3F1CF.CF99E700" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Thu, 12 Feb 2004 17:22:04 -0800 Message-ID: <5.2.1.1.2.20040212202018.01e2aea0@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] concall tomorrow Thread-Index: AcPxz9A7m6fy2BsXR2KPKZkslTsDvQ== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Re: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu At 07:17 PM 2/12/2004, Garth Gibson wrote: >The NFS server does not have to understand what is in the map, other >than enough to know what file the layout delegation and map pertain to >for recalling that delegation. What map? Did we go from problem statement to protocol definition overnight? Confused, Tom. From ggrider@lanl.gov Thu Feb 12 20:44:07 2004 Return-Path: X-Sender: ggrider@lanl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 30018 invoked from network); 13 Feb 2004 04:44:06 -0000 Received: from unknown (66.218.66.217) by m18.grp.scd.yahoo.com with QMQP; 13 Feb 2004 04:44:06 -0000 Received: from unknown (HELO mailwasher-b.lanl.gov) (192.16.0.25) by mta2.grp.scd.yahoo.com with SMTP; 13 Feb 2004 04:44:05 -0000 Received: from mailrelay1.lanl.gov (localhost.localdomain [127.0.0.1]) by mailwasher-b.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1D4i5HR020583 for ; Thu, 12 Feb 2004 21:44:05 -0700 Received: from cic-mail.lanl.gov (localhost.localdomain [127.0.0.1]) by mailrelay1.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1D4i4rl018790 for ; Thu, 12 Feb 2004 21:44:04 -0700 Received: from cthulu.lanl.gov (vpn-client-160.lanl.gov [128.165.253.160]) by cic-mail.lanl.gov (8.12.10/8.12.10/(ccn-5)) with ESMTP id i1D4hvYk027660; Thu, 12 Feb 2004 21:44:00 -0700 Message-Id: <5.2.0.9.2.20040212213826.038a2208@cic-mail.lanl.gov> X-Sender: ggrider@cic-mail.lanl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Thu, 12 Feb 2004 21:41:14 -0700 To: pnfs-reqs@yahoogroups.com, In-Reply-To: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=====================_45664431==_" X-Scanned-By: MIMEDefang 2.35 X-eGroups-Remote-IP: 192.16.0.25 From: Gary Grider Subject: RE: Fwd: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169341185 X-Yahoo-Profile: ggriderpnfs I agree. I added some parallel stuff. Thanks Gary At 07:22 AM 2/12/2004 -0800, Corbett, Peter wrote: > The slides mention server bypass twice, but don't talk about parallel > access. Our problem statement is much more focussed on parallel access. > We are trying to define a standard for parallel access, whether that is > to nfs based data servers, object servers, virtualized SAN or > non-virtualized SAN devices. So, I think we need to be clearer when > talking about bypassing the server that we are really talking about > direct access to a parallel data store from clustered clients, with a > shared NFS server acting as a metadata server. To me, that is the key > point that applies across the entire solution space, whereas direct > access to devices is a data access technique in part of the solution > space. > > -----Original Message----- > From: Gary Grider [mailto:ggrider@lanl.gov] > Sent: Thursday, February 12, 2004 12:55 AM > To: pnfs-reqs@yahoogroups.com > Subject: Re: Fwd: [pnfs-reqs] RE: NEPS-REQS: getting started > > > elevator pitch > > Gary > > > Yahoo! Groups Links > > > > > > Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Attachment (not stored) pNFS-elevator-pitch.ppt Type: application/octet-stream From bhalevy@panasas.com Thu Feb 12 21:41:40 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 48383 invoked from network); 13 Feb 2004 05:41:39 -0000 Received: from unknown (66.218.66.218) by m14.grp.scd.yahoo.com with QMQP; 13 Feb 2004 05:41:38 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 13 Feb 2004 05:41:38 -0000 Received: from yang ([172.17.19.58]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SVSY3R0N; Fri, 13 Feb 2004 00:41:35 -0500 To: Date: Fri, 13 Feb 2004 00:41:27 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) In-reply-to: <5.2.0.9.2.20040212181154.01621d60@cic-mail.lanl.gov> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal X-eGroups-Remote-IP: 65.194.124.178 From: "Benny Halevy" Subject: RE: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy > Or is that what is envisioned, and I am just hung up on the > terminology, and what you are saying is that we need to > 1) define the header for interop > type, version, length perhaps etc. > and > 2) define > a) opaque info for SBC type = 1 version = 1 > b) opaque info for Object type = 2 version = 1 > c) opaque info for NFS back ends type = 3 version = 1 > If I understand what you think I think then yes, this is pretty much they way I see it :) "define opaque info" is a little ambiguous. I believe we should define the wire format of each flavor to a level necessary to demonstrate interoperability. Some encapsulated data structures such as device addresses, file handles or capabilities can be defined "by reference" to other standards but the overall framing protocol must be well defined so that it is parseable by any *independent* implementation of a NFSv4.p client. The client shouldn't require any knowledge about the underlying server file system internals and it shouldn't require any external configuration to help it parse the opaque structures. It needs to be able to understand whatever it needs to know about the flavor's "opaque info" in order to use it from the standard's types and versions. > and if someone wants to write an IETF request to have type = 4 added > they can write a draft and get it approved and document the opaque info format? Yes and there should be a process in place to register new types in a central repository (likely to be IANA, see below). So far I think this discussion exposed a couple requirements: 1) Interoperability To conform with rfc2026, the extensions we propose must provide enough details to eventually satisfy "The requirement for at least two independent and interoperable implementations". 2) Extensibility The NFSv4 protocol extensions should allow to add new pNFS flavors. Some (or all) flavors may have a requirement for extensibility in their own wire format. IANA considerations pNFS flavors should be registered with IANA. Need to define the process requirements and add to NFSv4 IANA requirements section. For example, see rfc3530 section 17.2. ONC RPC Network Identifiers (netids): ... the registration of new Network Identifiers will require the publication of an Information RFC with similar detail as listed above for the Network Identifier itself and corresponding Universal Address. Benny -----Original Message----- From: Gary Grider [mailto:ggrider@lanl.gov] Sent: Thursday, February 12, 2004 20:19 To: pnfs-reqs@yahoogroups.com Subject: RE: [pnfs-reqs] concall tomorrow So I think having a header that has the info that is needed for interop is fine and having a payload that is opaque is fine. So explain why we need three different "wire formats" for these three different methods? Why arent these things just 3 different types within the same wire format? Or is that what is envisioned, and I am just hung up on the terminology, and what you are saying is that we need to 1) define the header for interop type, version, length perhaps etc. and 2) define a) opaque info for SBC type = 1 version = 1 b) opaque info for Object type = 2 version = 1 c) opaque info for NFS back ends type = 3 version = 1 and if someone wants to write an IETF request to have type = 4 added they can write a draft and get it approved and document the opaque info format? Thanks Gary At 08:03 PM 2/12/2004 -0500, Halevy, Benny wrote: > If we do a separate IETF process for each new format, wont it > be hard to keep up. I don't see that as necessarily a bad thing... > I agree we need to have a "type" of back end and maybe a version > or something, but what is in the map could be of no concern to the IETF, couldnt it? The IETF is concerned about interoperability. I believe we should be concerned with standardizing the wire format of the layout maps. Do we *have* to do that within the IETF? There could be other options but since you have to refer to some external standard to talk about interoperability it'll be difficult to refer to non-existing standards that are being standardized outside of the IETF. T10-OSD is one such external standard that substantial enough so it can be referred to. > Do we want to get an IETF action every time we need to add a new flavor? Probably, in order to extend the vector of available flavors. This could be done within NFSv4's minor versioning model. > How easy is this process going to be? How easy is it going to be to change. > Sounds like it will be very much harder than just changing out your plugins. Again, there's a trade-off between versatility and interoperability. > Are we linking several different communities to work in lock > step? Is that good or bad? The hierarchical standard model is intended to allow each flavor to make progress at its own pace. Benny -----Original Message----- From: Gary Grider [mailto:ggrider@lanl.gov] Sent: Thursday, February 12, 2004 7:44 PM To: pnfs-reqs@yahoogroups.com Subject: Re: [pnfs-reqs] concall tomorrow At 04:17 PM 2/12/2004 -0800, Garth Gibson wrote: On Feb 12, 2004, at 3:31 PM, Gary Grider wrote: > At 04:10 PM 2/11/2004 -0800, Garth Gibson wrote: >> - its time to get back to the original roles of the mailing lists: >> - a draft of a requirements doc >> - a draft of the operations we suggest for NFSv4 extension >> - a draft of the wire format of layout metadata for SBC >> (FC/SCSI) backends >> - a draft of the wire format of layout metadata for OSD >> backends >> - a draft of the wire format of layout metadata for NFS >> backends > > Ok, I have a big problem with the last three statements. I know I > have not been following > this stuff as closely as I should have, but why cant the maps be sent > to from the server > in an agnostic way? Why does the NFS server have to understand any > of what is in the > map? Why isnt there a plug in on the server side to get the map info > for the server to > send to the client, and why isnt there a plug in on the client side > to pass the map down to? The NFS server does not have to understand what is in the map, other than enough to know what file the layout delegation and map pertain to for recalling that delegation. However, for any chance of interoperability, the map formats must be documented. agree, but does IETF care about this, its just information that NFS can pass on, seems like if a consortia of T10 folks want a common set of plugins, thats great, if SBC folks want a common plug in, thats great, but why should NFS/IETF care about this? And since each backend server flavor has addressing characteristics that will be visible in the map, the documented map formats will be specific to the backend flavors. The theory we started with was that the base NFSv4 extensions would describe opaque "maps". And that we would propose separate internet drafts for a wire format for each flavor of map. Why do we need a different wire format? Isnt it just a blob of data that is passed through some normal NFS protocol mechanism? Why does it need a wire format? I agree it needs a published format of the stream of data. Am I reading more into "wire" than I should. If we do a separate IETF process for each new format, wont it be hard to keep up. I agree we need to have a "type" of back end and maybe a version or something, but what is in the map could be of no concern to the IETF, couldnt it? Hence the NFSv4 extensions are backend protocol agnostic, yet the client implementations can be interoperable. > What if the world decides to add another way to do I/O besides Block, > Object, and NFS? > What happens if the Object model evolves, does NFS have to change to > stay in > sync with it? Shouldnt all this stuff be as opaque as it can be? > You do need locks > for making sure the map doesnt change. A new backend protocol would cause a new map flavor, and a new document for that map flavor's wire format. Do we want to get an IETF action every time we need to add a new flavor? How easy is this process going to be? How easy is it going to be to change. Sounds like it will be very much harder than just changing out your plugins. Are we linking several different communities to work in lock step? Is that good or bad? Thanks Gary The challenge for us is to make the typing and sizing of the opaque map flexible enough to allow said extension. The fall back, if the new backend is very much different from the scope of SBC, OSD and NFS, would be do further extend NFSv4. I hope we do not have to do that. > I am confused why we gave up on trying to make this agnostic. > Sorry for the question, but I have this built in alarm, that goes off > when I see > any numbers besides 0,1, and N (not 3). This is intended to be a structure for 1, 2 and 3, with an inductive step enabling any positive whole number to be induced :-) > Thanks > Gary garth Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Yahoo! Groups Sponsor ADVERTISEMENT Click Here Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From pcorbett@netapp.com Fri Feb 13 06:13:10 2004 Return-Path: X-Sender: Peter.Corbett@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 52136 invoked from network); 13 Feb 2004 14:13:06 -0000 Received: from unknown (66.218.66.172) by m14.grp.scd.yahoo.com with QMQP; 13 Feb 2004 14:13:06 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta4.grp.scd.yahoo.com with SMTP; 13 Feb 2004 14:13:06 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i1DED5JC022822 for ; Fri, 13 Feb 2004 06:13:05 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i1DED5iH002891 for ; Fri, 13 Feb 2004 06:13:05 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="----_=_NextPart_001_01C3F23B.797167B2" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Fri, 13 Feb 2004 06:12:55 -0800 Message-ID: X-MS-Has-Attach: yes X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] concall tomorrow Thread-Index: AcPxz2tga1ZicUe6RVmgDK9LWOKhBQAa63FA To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Corbett, Peter" From: "Corbett, Peter" Subject: RE: [pnfs-reqs] concall tomorrow X-Yahoo-Group-Post: member; u=44152959 X-Yahoo-Profile: pfcorbett2004 ADVERTISEMENT That's right. I think we'll do most of the work in the ops group, and the 3 transport mechanism groups will work out the specific details of the descriptors. There will be some back and forth to make sure we get the ops set right. -----Original Message----- From: Gary Grider [mailto:ggrider@lanl.gov] Sent: Thursday, February 12, 2004 8:19 PM To: pnfs-reqs@yahoogroups.com Subject: RE: [pnfs-reqs] concall tomorrow So I think having a header that has the info that is needed for interop is fine and having a payload that is opaque is fine. So explain why we need three different "wire formats" for these three different methods? Why arent these things just 3 different types within the same wire format? Or is that what is envisioned, and I am just hung up on the terminology, and what you are saying is that we need to 1) define the header for interop type, version, length perhaps etc. and 2) define a) opaque info for SBC type = 1 version = 1 b) opaque info for Object type = 2 version = 1 c) opaque info for NFS back ends type = 3 version = 1 and if someone wants to write an IETF request to have type = 4 added they can write a draft and get it approved and document the opaque info format? Thanks Gary At 08:03 PM 2/12/2004 -0500, Halevy, Benny wrote: > > If we do a separate IETF process for each new format, wont it > > be hard to keep up. > > I don't see that as necessarily a bad thing... > > > I agree we need to have a "type" of back end and maybe a version > > or something, but what is in the map could be of no concern to the IETF, couldnt it? > The IETF is concerned about interoperability. I believe we should be concerned with standardizing > the wire format of the layout maps. Do we *have* to do that within the IETF? There > could be other options but since you have to refer to some external standard to talk about > interoperability it'll be difficult to refer to non-existing standards that are being standardized > outside of the IETF. > > T10-OSD is one such external standard that substantial enough so it can be referred to. > > > Do we want to get an IETF action every time we need to add a new flavor? > > Probably, in order to extend the vector of available flavors. This could be done within > NFSv4's minor versioning model. > > > How easy is this process going to be? How easy is it going to be to change. > > Sounds like it will be very much harder than just changing out your plugins. > > Again, there's a trade-off between versatility and interoperability. > > > Are we linking several different communities to work in lock > > step? Is that good or bad? > > The hierarchical standard model is intended to allow each flavor to make progress at its own pace. > > Benny > > -----Original Message----- > From: Gary Grider [mailto:ggrider@lanl.gov] > Sent: Thursday, February 12, 2004 7:44 PM > To: pnfs-reqs@yahoogroups.com > Subject: Re: [pnfs-reqs] concall tomorrow > > At 04:17 PM 2/12/2004 -0800, Garth Gibson wrote: > >> On Feb 12, 2004, at 3:31 PM, Gary Grider wrote: >> > At 04:10 PM 2/11/2004 -0800, Garth Gibson wrote: >> >> - its time to get back to the original roles of the mailing lists: >> >> - a draft of a requirements doc >> >> - a draft of the operations we suggest for NFSv4 extension >> >> - a draft of the wire format of layout metadata for SBC >> >> (FC/SCSI) backends >> >> - a draft of the wire format of layout metadata for OSD >> >> backends >> >> - a draft of the wire format of layout metadata for NFS >> >> backends >> > >> > Ok, I have a big problem with the last three statements. I know I >> > have not been following >> > this stuff as closely as I should have, but why cant the maps be sent >> > to from the server >> > in an agnostic way? Why does the NFS server have to understand any >> > of what is in the >> > map? Why isnt there a plug in on the server side to get the map info >> > for the server to >> > send to the client, and why isnt there a plug in on the client side >> > to pass the map down to? >> >> The NFS server does not have to understand what is in the map, other >> than enough to know what file the layout delegation and map pertain to >> for recalling that delegation. >> >> However, for any chance of interoperability, the map formats must be >> documented. > > > agree, but does IETF care about this, its just information that NFS can pass on, > seems like if a consortia of T10 folks want a common set of plugins, thats great, > if SBC folks want a common plug in, thats great, but why should NFS/IETF care > about this? > >> And since each backend server flavor has addressing >> characteristics that will be visible in the map, the documented map >> formats will be specific to the backend flavors. >> >> The theory we started with was that the base NFSv4 extensions would >> describe opaque "maps". And that we would propose separate internet >> drafts for a wire format for each flavor of map. > > > Why do we need a different wire format? Isnt it just a blob of data that is passed > through some normal NFS protocol mechanism? Why does it need a wire format? > I agree it needs a published format of the stream of data. Am I reading more into > "wire" than I should. If we do a separate IETF process for each new format, wont it > be hard to keep up. I agree we need to have a "type" of back end and maybe a version > or something, but what is in the map could be of no concern to the IETF, couldnt it? > >> Hence the NFSv4 >> extensions are backend protocol agnostic, yet the client >> implementations can be interoperable. >> >> > What if the world decides to add another way to do I/O besides Block, >> > Object, and NFS? >> > What happens if the Object model evolves, does NFS have to change to >> > stay in >> > sync with it? Shouldnt all this stuff be as opaque as it can be? >> > You do need locks >> > for making sure the map doesnt change. >> >> A new backend protocol would cause a new map flavor, and a new document >> for that map flavor's wire format. > > > Do we want to get an IETF action every time we need to add a new flavor? > How easy is this process going to be? How easy is it going to be to change. > Sounds like it will be very much harder than just changing out your plugins. > Are we linking several different communities to work in lock > step? Is that good or bad? > > > Thanks > Gary > > >> The challenge for us is to make the typing and sizing of the opaque map >> flexible enough to allow said extension. The fall back, if the new >> backend is very much different from the scope of SBC, OSD and NFS, >> would be do further extend NFSv4. I hope we do not have to do that. >> >> > I am confused why we gave up on trying to make this agnostic. >> > Sorry for the question, but I have this built in alarm, that goes off >> > when I see >> > any numbers besides 0,1, and N (not 3). >> >> This is intended to be a structure for 1, 2 and 3, with an inductive >> step enabling any positive whole number to be induced :-) >> >> > Thanks >> > Gary >> >> garth >> >> Yahoo! Groups Links > > * To visit your group on the web, go to: > * http://groups.yahoo.com/group/pnfs-reqs/ > * To unsubscribe from this group, send an email to: > * pnfs-reqs-unsubscribe@yahoogroups.com > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. Yahoo! Groups Sponsor ADVERTISEMENT 1fd596e.jpgClick Here 1fd5ab9.jpg Yahoo! Groups Links * To visit your group on the web, go to: * http://groups.yahoo.com/group/pnfs-reqs/ * * To unsubscribe from this group, send an email to: * pnfs-reqs-unsubscribe@yahoogroups.com * * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From julian_satran@il.ibm.com Mon Feb 16 08:53:48 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 83072 invoked from network); 16 Feb 2004 16:53:48 -0000 Received: from unknown (66.218.66.167) by m16.grp.scd.yahoo.com with QMQP; 16 Feb 2004 16:53:48 -0000 Received: from unknown (HELO mtagate3.uk.ibm.com) (195.212.29.136) by mta6.grp.scd.yahoo.com with SMTP; 16 Feb 2004 16:53:47 -0000 Received: from d06nrmr1407.portsmouth.uk.ibm.com (d06nrmr1407.portsmouth.uk.ibm.com [9.149.38.185]) by mtagate3.uk.ibm.com (8.12.10/8.12.10) with ESMTP id i1GGrkMf044458 for ; Mon, 16 Feb 2004 16:53:46 GMT Received: from d12ml102.megacenter.de.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1407.portsmouth.uk.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i1GGrjFK184550 for ; Mon, 16 Feb 2004 16:53:46 GMT In-Reply-To: <5.2.0.9.2.20040212181154.01621d60@cic-mail.lanl.gov> To: pnfs-reqs@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Mon, 16 Feb 2004 18:55:39 +0200 X-MIMETrack: Serialize by Router on D12ML102/12/M/IBM(Release 6.0.2CF2|July 23, 2003) at 16/02/2004 18:55:41, Serialize complete at 16/02/2004 18:55:41 Content-Type: text/plain; charset="US-ASCII" X-eGroups-Remote-IP: 195.212.29.136 From: Julian Satran Subject: RE: [pnfs-reqs] plugins vs. wire protocol - a false dilemma X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran I followed this thread with interest. I agree with Garth that wire format is required for interoperability - and an opaque format is good only for communication between entities of the same kind. That is required and having more than one is not necessarily a bad thing. However Gary has a good point in that by introducing many different formats server and client implementers may feel somewhat uneasy (and may delay implementations). Security that Benny used as a benchmark has solved this by standardizing an API (GSS-API) besides the wire protocol (however beware IETF does not like to standardize APIs - unless presented as a semantic definition for the endpoint operation). Perhaps the right thing to do is define both an API for the plug-in Gary suggested and the wire protocol (that enables interoperability). And BTW - using a meaningful subject line is part of the mailing list etiquette - I almost dropped the whole thread knowing that I can't make the call. Regards, Julo From black_david@emc.com Mon Feb 16 14:43:45 2004 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 82548 invoked from network); 16 Feb 2004 22:43:44 -0000 Received: from unknown (66.218.66.167) by m19.grp.scd.yahoo.com with QMQP; 16 Feb 2004 22:43:44 -0000 Received: from unknown (HELO srexchimc2.eng.emc.com) (168.159.100.11) by mta6.grp.scd.yahoo.com with SMTP; 16 Feb 2004 22:43:44 -0000 Received: from maho3msx2.corp.emc.com ([128.221.11.32]) by srexchimc2.eng.emc.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id 1ZQJFTF2; Mon, 16 Feb 2004 17:43:43 -0500 Received: by maho3msx2.isus.emc.com with Internet Mail Service (5.5.2653.19) id <1YVFBR8B>; Mon, 16 Feb 2004 17:43:42 -0500 Message-ID: X-Sybari-Trust: 858f5b34 1d8c424f c070781c 0000013d To: pnfs-reqs@yahoogroups.com Date: Mon, 16 Feb 2004 17:43:36 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain X-eGroups-Remote-IP: 168.159.100.11 From: black_david@emc.com Subject: Data formats & Interoperability (was: concall tomorrow) X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 ADVERTISEMENT click here Gary Grider wrote: So I think having a header that has the info that is needed for interop is fine and having a payload that is opaque is fine. So explain why we need three different "wire formats" for these three different methods? Why arent these things just 3 different types within the same wire format? Or is that what is envisioned, and I am just hung up on the terminology, and what you are saying is that we need to 1) define the header for interop type, version, length perhaps etc. and 2) define a) opaque info for SBC type = 1 version = 1 b) opaque info for Object type = 2 version = 1 c) opaque info for NFS back ends type = 3 version = 1 and if someone wants to write an IETF request to have type = 4 added they can write a draft and get it approved and document the opaque info format? That's correct, and there are lots of IETF examples of protocols that are structured in this fashion. There is one issue that may cause some difficulty - mandatory requirements for interoperability. IETF requires that two implementations of the same protocol be capable of interoperating when implementers have made different choices among optional to implement features. This interoperation can be dependent on suitable configuration of the implementations. For example, there are mandatory-to-implement cryptographic algorithm requirements for protocols like IPsec and TLS to ensure that any two implementations can interoperate even if they've implemented different sets of cryptographic algorithms. In that case the "mandatory to implement" cryptographic algorithms will have been implemented by both, and will result in interoperation if they are chosen by both sides, although there's no requirement that they be offered or selected in negotiation. Requiring any one of the three metadata types Gary lists above to always be implemented is going to cause problems for some of the envisioned implementations. I think the best bet for the pNFS extensions will be to define "none" (i.e., vanilla NFSv4) as the "mandatory to implement" interoperable mode, with all pNFS extensions being optional among mutually consenting clients and servers. This may also settle an earlier issue about whether metadata-only servers should not be supported - they can't fall back to NFSv4 as the "mandatory to implement" interoperable mode, and hence will get us into a tarpit over which metadata type must be "mandatory to implement" to ensure interoperability. IMHO, the best bet is to "just say no" - pNFS is an NFSv4 extension, therefore any implementation of pNFS is required to implement vanilla NFSv4 without any pNFS extensions in addition to pNFS-extended NFSv4. Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- From bhalevy@panasas.com Mon Feb 16 15:01:17 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 69170 invoked from network); 16 Feb 2004 23:01:14 -0000 Received: from unknown (66.218.66.218) by m3.grp.scd.yahoo.com with QMQP; 16 Feb 2004 23:01:14 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 16 Feb 2004 23:01:14 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Mon, 16 Feb 2004 18:01:12 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D38883@PIKES.panasas.com> To: "'pnfs-reqs@yahoogroups.com'" Date: Mon, 16 Feb 2004 18:01:06 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-reqs] Data formats & Interoperability (was: concall tom orrow) X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy >I think the best bet for the pNFS extensions will be to define "none" >(i.e., vanilla NFSv4) as the "mandatory to implement" >interoperable mode, >with all pNFS extensions being optional among mutually >consenting clients >and servers. This may also settle an earlier issue about whether >metadata-only servers should not be supported - they can't fall back to >NFSv4 as the "mandatory to implement" interoperable mode, and hence >will get us into a tarpit over which metadata type must be "mandatory >to implement" to ensure interoperability. IMHO, the best bet is to >"just say no" - pNFS is an NFSv4 extension, therefore any >implementation >of pNFS is required to implement vanilla NFSv4 without any >pNFS extensions >in addition to pNFS-extended NFSv4. I think that makes a lot of sense and goes well with NFSv4's minor versioning rules. Benny >-----Original Message----- >From: black_david@emc.com [mailto:black_david@emc.com] >Sent: Monday, February 16, 2004 5:44 PM >To: pnfs-reqs@yahoogroups.com >Subject: [pnfs-reqs] Data formats & Interoperability (was: concall >tomorrow) > > >Gary Grider wrote: > > So I think having a header that has the info that is needed for > interop is fine and having a payload that is opaque is fine. > > So explain why we need three different "wire formats" >for these >three > different methods? Why arent these things just 3 >different types >within > the same wire format? Or is that what is envisioned, >and I am just >hung > up on the terminology, and what you are saying is that >we need to > > 1) define the header for interop type, version, length >perhaps etc. > and > 2) define > a) opaque info for SBC type = 1 version = 1 > b) opaque info for Object type = 2 version = 1 > c) opaque info for NFS back ends type = 3 version = 1 > > and if someone wants to write an IETF request to have >type = 4 added > they can write a draft and get it approved and document >the opaque >info format? > >That's correct, and there are lots of IETF examples of protocols that >are structured in this fashion. > >There is one issue that may cause some difficulty - mandatory >requirements >for interoperability. IETF requires that two implementations >of the same >protocol be capable of interoperating when implementers have >made different >choices among optional to implement features. This interoperation can >be dependent on suitable configuration of the implementations. > >For example, there are mandatory-to-implement cryptographic algorithm >requirements for protocols like IPsec and TLS to ensure that any two >implementations can interoperate even if they've implemented different >sets of cryptographic algorithms. In that case the "mandatory >to implement" >cryptographic algorithms will have been implemented by both, >and will result >in interoperation if they are chosen by both sides, although there's no >requirement that they be offered or selected in negotiation. > >Requiring any one of the three metadata types Gary lists above to >always be implemented is going to cause problems for some of the >envisioned implementations. > >I think the best bet for the pNFS extensions will be to define "none" >(i.e., vanilla NFSv4) as the "mandatory to implement" >interoperable mode, >with all pNFS extensions being optional among mutually >consenting clients >and servers. This may also settle an earlier issue about whether >metadata-only servers should not be supported - they can't fall back to >NFSv4 as the "mandatory to implement" interoperable mode, and hence >will get us into a tarpit over which metadata type must be "mandatory >to implement" to ensure interoperability. IMHO, the best bet is to >"just say no" - pNFS is an NFSv4 extension, therefore any >implementation >of pNFS is required to implement vanilla NFSv4 without any >pNFS extensions >in addition to pNFS-extended NFSv4. > >Thanks, >--David >---------------------------------------------------- >David L. Black, Senior Technologist >EMC Corporation, 176 South St., Hopkinton, MA 01748 >+1 (508) 293-7953 FAX: +1 (508) 293-7786 >black_david@emc.com Mobile: +1 (978) 394-7754 >---------------------------------------------------- > > > > > > > > > > > > > >------------------------ Yahoo! Groups Sponsor >---------------------~--> >Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark >Printer at MyInks.com. Free s/h on orders $50 or more to the >US & Canada. >http://www.c1tracking.com/l.asp?cid=5511 >http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM >--------------------------------------------------------------- >------~-> > > >Yahoo! Groups Links > > > > > From black_david@emc.com Mon Feb 16 15:04:45 2004 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 88331 invoked from network); 16 Feb 2004 23:04:42 -0000 Received: from unknown (66.218.66.172) by m13.grp.scd.yahoo.com with QMQP; 16 Feb 2004 23:04:42 -0000 Received: from unknown (HELO mercury.eng.emc.com) (168.159.100.12) by mta4.grp.scd.yahoo.com with SMTP; 16 Feb 2004 23:04:41 -0000 Received: from mxic2.corp.emc.com ([128.221.12.9]) by mercury.eng.emc.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2656.59) id 1ZQHJRNX; Mon, 16 Feb 2004 18:04:40 -0500 Received: by mxic2.corp.emc.com with Internet Mail Service (5.5.2653.19) id <1YVBKVS1>; Mon, 16 Feb 2004 18:03:37 -0500 Message-ID: X-Sybari-Trust: e925cc80 1d8c424f 321aa0c1 0000013d To: pnfs-reqs@yahoogroups.com Date: Mon, 16 Feb 2004 18:04:33 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain X-eGroups-Remote-IP: 168.159.100.12 From: black_david@emc.com Subject: RE: [pnfs-reqs] plugins vs. wire protocol - a false dilemma X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 Julian, The opaque format (and API) approach for interoperability works when the opaqueness is to things that are above the level of the protocol involved. For example, NFSv4 has nothing to say about the contents of the files it provides access to - they're opaque and rightly so. The problem we have here is that pNFS is at the functional level of NFSv4. If pNFS can't be configured to work (e.g., no metadata type is common to client and server) and NFSv4 is not supported by both ends, the result is no file access, no matter how the ends of the connection are configured. Needless to say, this would be wrong for implementations claiming to meet the requirements of an IETF spec. As I noted in my previous message, requiring vanilla NFSv4 without pNFS extensions as the interoperable mode is probably the best way forward here. I don't think defining a plug-in API will help with this interoperability issue, although describing the functional interface between the generic pNFS client and the storage-specific functionality is probably still a good thing to do. Also, attempting to anticipate a further issue from Gary - the IETF requirement is functionality only, not performance. In other words, NFSv4 just has to work, it doesn't have to be fast (e.g., if all the clients in a pNFS system suddenly fall back on NFSv4 for some reason, it's ok w/IETF if the results are slow, although Gary might be very unhappy if one of his systems ever did this). Thanks, --David > I followed this thread with interest. I agree with Garth that wire format > is required for interoperability - and an opaque format is good only for > communication between entities of the same kind. That is required and > having more than one is not necessarily a bad thing. > > However Gary has a good point in that by introducing many different > formats server and client implementers may feel somewhat uneasy (and may > delay implementations). > > Security that Benny used as a benchmark has solved this by standardizing > an API (GSS-API) besides the wire protocol (however beware IETF does not > like to standardize APIs - unless presented as a semantic definition for > the endpoint operation). > > Perhaps the right thing to do is define both an API for the plug-in Gary > suggested and the wire protocol (that enables interoperability). > > And BTW - using a meaningful subject line is part of the mailing list > etiquette - I almost dropped the whole thread knowing that I > can't make > the call. > > Regards, > Julo From garth@panasas.com Fri Feb 20 16:22:40 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 57576 invoked from network); 21 Feb 2004 00:22:36 -0000 Received: from unknown (66.218.66.217) by m18.grp.scd.yahoo.com with QMQP; 21 Feb 2004 00:22:36 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta2.grp.scd.yahoo.com with SMTP; 21 Feb 2004 00:22:36 -0000 Received: from [172.17.2.81] ([172.17.2.81]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id FK58QLDY; Fri, 20 Feb 2004 19:22:34 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <5.2.0.9.2.20040212213826.038a2208@cic-mail.lanl.gov> References: <5.2.0.9.2.20040212213826.038a2208@cic-mail.lanl.gov> Content-Type: multipart/mixed; boundary=Apple-Mail-1-231497768 Message-Id: <064537AE-6404-11D8-A7AF-000A95A94F04@panasas.com> Date: Fri, 20 Feb 2004 19:22:25 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Based on feedback from Brent's concall 8 days ago, here is my cut at Gary's proposal for a short problem statement introduction presentation. garth On Feb 12, 2004, at 11:41 PM, Gary Grider wrote: > I agree. > > I added some parallel stuff. > > Thanks > Gary > > At 07:22 AM 2/12/2004 -0800, Corbett, Peter wrote: > > The slides mention server bypass twice, but don't talk about parallel > access. Our problem statement is much more focussed on parallel > access. > We are trying to define a standard for parallel access, whether that is > to nfs based data servers, object servers, virtualized SAN or > non-virtualized SAN devices. So, I think we need to be clearer when > talking about bypassing the server that we are really talking about > direct access to a parallel data store from clustered clients, with a > shared NFS server acting as a metadata server. To me, that is the key > point that applies across the entire solution space, whereas direct > access to devices is a data access technique in part of the solution > space. > > -----Original Message----- > From: Gary Grider [ mailto:ggrider@lanl.gov > ] Sent: Thursday, February 12, 2004 12:55 AM > To: pnfs-reqs@yahoogroups.com > Subject: Re: Fwd: [pnfs-reqs] RE: NEPS-REQS: getting started > > > elevator pitch > > Gary > Attachment (not stored) pNFS-intro.ppt Type: application/vnd.ms-powerpoint From dnoveck@netapp.com Sun Feb 22 10:22:12 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 79037 invoked from network); 22 Feb 2004 18:22:11 -0000 Received: from unknown (66.218.66.167) by m6.grp.scd.yahoo.com with QMQP; 22 Feb 2004 18:22:11 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta6.grp.scd.yahoo.com with SMTP; 22 Feb 2004 18:22:11 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i1MIMBJC011267 for ; Sun, 22 Feb 2004 10:22:11 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i1MIMBDU023705 for ; Sun, 22 Feb 2004 10:22:11 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Sun, 22 Feb 2004 10:22:02 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] RE: NEPS-REQS: getting started Thread-Index: AcP4ENaVWrKnCLT9S1ivd9w4C4AJJQBXI02Q To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck I have some suggestions on slides 5 and 6. I would drop the line about delegations from this slide. Unless you come to this from the sorts of discussions we have been having (and thus aren't the critical part of the audience), this is really not going to be understandable. One problem that we have in presenting this is that if we explain the situation, we wind up having to explain that we think we pretty much know how to do this already and just need the IETF to bless our choice (I'm exaggerating but only some), and that isn't likely to go down very well with a lot of people. I'd express the last sub-bullet in this section as something like: NFSv4 minor version model a good way to provide incremental extensions which doesn't say that we know pretty much what these are (but it doesn't say we don't :-) As to the last section of slide 6, I'd revise to be something like the following, again to reduce the we-know-how-to-do-this tone. Much interest in exploring how v4 could be extended to solve this Extension of delegations to provide "layout" information to clients Clients use layout information to do IO and avoid single-server bottleneck NFS, SCSI Block, SCSI Object layout formats all discussed Support for multiple formats looks desirable (and doable). -----Original Message----- From: Garth Gibson [mailto:garth@panasas.com] Sent: Friday, February 20, 2004 7:22 PM To: pnfs-reqs@yahoogroups.com Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started Based on feedback from Brent's concall 8 days ago, here is my cut at Gary's proposal for a short problem statement introduction presentation. garth On Feb 12, 2004, at 11:41 PM, Gary Grider wrote: > I agree. > > I added some parallel stuff. > > Thanks > Gary > > At 07:22 AM 2/12/2004 -0800, Corbett, Peter wrote: > > The slides mention server bypass twice, but don't talk about parallel > access. Our problem statement is much more focussed on parallel > access. > We are trying to define a standard for parallel access, whether that is > to nfs based data servers, object servers, virtualized SAN or > non-virtualized SAN devices. So, I think we need to be clearer when > talking about bypassing the server that we are really talking about > direct access to a parallel data store from clustered clients, with a > shared NFS server acting as a metadata server. To me, that is the key > point that applies across the entire solution space, whereas direct > access to devices is a data access technique in part of the solution > space. > > -----Original Message----- > From: Gary Grider [ mailto:ggrider@lanl.gov > ] Sent: Thursday, February 12, 2004 12:55 AM > To: pnfs-reqs@yahoogroups.com > Subject: Re: Fwd: [pnfs-reqs] RE: NEPS-REQS: getting started > > > elevator pitch > > Gary > Yahoo! Groups Links From garth@panasas.com Sun Feb 22 15:44:29 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 14930 invoked from network); 22 Feb 2004 23:44:28 -0000 Received: from unknown (66.218.66.166) by m18.grp.scd.yahoo.com with QMQP; 22 Feb 2004 23:44:28 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta5.grp.scd.yahoo.com with SMTP; 22 Feb 2004 23:44:27 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id FLP29YY1; Sun, 22 Feb 2004 18:44:26 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: References: Content-Type: multipart/mixed; boundary=Apple-Mail-3-402012520 Message-Id: <08FB86C8-6591-11D8-9D79-000A95A94F04@panasas.com> Date: Sun, 22 Feb 2004 18:44:20 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson I understand the sensitivity of IETF to having solutions presented instead of problems. Here is a revision following the recommendations below, with some word choices of my own. Specifically, I'm reluctant to say nothing about how NFSv4 is better for fixing this than NFSv3; that is, the definition of NFSv4 creates the opportunity for "direct" or "out-of-band" access. Page 5: the two lines in question, - NFSv4, relative to NFSv3, has enhanced client side optimizations - NFSv4 minor extensions may suffice for incremental functionality Page 6: last section: Much interest in exploring NFSv4 extensions to meet scalability needs - Extend NFSv4 delegations to provide layout information to clients - Clients use layout to directly access storage, avoiding single-server bottleneck - NFS, SCSI Block, and SCSI Object layout formats all discussed - Support for multiple layout formats desirable (and looks doable) Dave, how is this? garth On Feb 22, 2004, at 1:22 PM, Noveck, Dave wrote: > I have some suggestions on slides 5 and 6. > > I would drop the line about delegations from this slide. Unless > you come to this from the sorts of discussions we have been having > (and thus aren't the critical part of the audience), this is > really not going to be understandable. One problem that we have > in presenting this is that if we explain the situation, we wind up > having to explain that we think we pretty much know how to do this > already and just need the IETF to bless our choice (I'm exaggerating > but only some), and that isn't likely to go down very well with a > lot of people. > > I'd express the last sub-bullet in this section as something like: > > NFSv4 minor version model a good way to provide incremental > extensions > > which doesn't say that we know pretty much what these are (but it > doesn't say we don't :-) > > As to the last section of slide 6, I'd revise to be something like > the following, again to reduce the we-know-how-to-do-this tone. > > Much interest in exploring how v4 could be extended to solve this > > Extension of delegations to provide "layout" information to > clients > > Clients use layout information to do IO and avoid > single-server bottleneck > > NFS, SCSI Block, SCSI Object layout formats all discussed > > Support for multiple formats looks desirable (and doable). > > -----Original Message----- > From: Garth Gibson [mailto:garth@panasas.com] > Sent: Friday, February 20, 2004 7:22 PM > To: pnfs-reqs@yahoogroups.com > Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started > > > Based on feedback from Brent's concall 8 days ago, here is my cut at > Gary's proposal for a short problem statement introduction > presentation. > > garth > Attachment (not stored) pNFS-intro-2-22.ppt Type: application/vnd.ms-powerpoint From dnoveck@netapp.com Sun Feb 22 16:29:36 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 26422 invoked from network); 23 Feb 2004 00:29:34 -0000 Received: from unknown (66.218.66.167) by m13.grp.scd.yahoo.com with QMQP; 23 Feb 2004 00:29:34 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta6.grp.scd.yahoo.com with SMTP; 23 Feb 2004 00:29:34 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i1N0TTJC016163 for ; Sun, 22 Feb 2004 16:29:29 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i1N0TTB4000373 for ; Sun, 22 Feb 2004 16:29:29 -0800 (PST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Sun, 22 Feb 2004 16:29:18 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] RE: NEPS-REQS: getting started Thread-Index: AcP5ndac4dbytrq3RvKJaomlNrNv9AAA7Y1g To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck > I understand the sensitivity of IETF to having solutions presented > instead of problems. > Here is a revision following the recommendations below, with some word > choices of my own. Specifically, I'm reluctant to say nothing about > how NFSv4 is better for fixing this than NFSv3; that is, the definition > of NFSv4 creates the opportunity for "direct" or "out-of-band" access > Page 5: the two lines in question, > - NFSv4, relative to NFSv3, has enhanced client side optimizations I can see that the idea that this is just an extension of an already establiched v4 trend is worth mentioning. My preference would be not to mention v3 explicitly, but maybe that's one of those things that we'll never agree on. No big deal. How about: NFSv4 already provides mechanisms for clients to act autonomously when circumstances allow it. > - NFSv4 minor extensions may suffice for incremental functionality "minor extensions" sounds like you are talking about the size of the extensions so I would somehow use "minor version" or "minor versioning" to indicate that v4 already has made provisions for extensions like this (even without knowing what they would be :-) > Page 6: last section: > > Much interest in exploring NFSv4 extensions to meet scalability needs > - Extend NFSv4 delegations to provide layout information to clients I wouldn't have the quotes around 'delegations'. Those should be an understood idea in the NFSv4 context. > - Clients use layout to directly access storage, avoiding > single-server bottleneck > - NFS, SCSI Block, and SCSI Object layout formats all discussed > - Support for multiple layout formats desirable (and looks doable) > Dave, how is this? Looks OK to me. From Thomas.Talpey@netapp.com Sun Feb 22 16:49:04 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 86191 invoked from network); 23 Feb 2004 00:49:00 -0000 Received: from unknown (66.218.66.218) by m20.grp.scd.yahoo.com with QMQP; 23 Feb 2004 00:49:00 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta3.grp.scd.yahoo.com with SMTP; 23 Feb 2004 00:49:00 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i1N0n0JC017966 for ; Sun, 22 Feb 2004 16:49:00 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i1N0n0DU011074 for ; Sun, 22 Feb 2004 16:49:00 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.32]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Sun, 22 Feb 2004 19:48:51 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3F9A6.CDD9DB80" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Sun, 22 Feb 2004 16:48:40 -0800 Message-ID: <6.0.3.0.0.20040222194310.01b7da00@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] RE: NEPS-REQS: getting started Thread-Index: AcP5ps6EfXbzcs3NTDaL2WopU4HhCw== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu ADVERTISEMENT click here What is "EDA" (slide 3)? Spell it out, whatever it is! :-) I wouldn't include "FCIP" on slide 5. Does anyone use it, especially for storage? I think there should be another slide after slide 2, which drills down at least a little into the ideas that have been discussed. It's a good place to establish the context of the presentation - as it is there isn't really any "proposal". I'm thinking maybe a single bullet for some or all of the whitepapers at NEPS? Tom. At 06:44 PM 2/22/2004, Garth Gibson wrote: >I understand the sensitivity of IETF to having solutions presented >instead of problems. > >Here is a revision following the recommendations below, with some word >choices of my own. Specifically, I'm reluctant to say nothing about >how NFSv4 is better for fixing this than NFSv3; that is, the definition >of NFSv4 creates the opportunity for "direct" or "out-of-band" access. > >Page 5: the two lines in question, > >- NFSv4, relative to NFSv3, has enhanced client side optimizations >- NFSv4 minor extensions may suffice for incremental functionality > >Page 6: last section: > >Much interest in exploring NFSv4 extensions to meet scalability needs >- Extend NFSv4 delegations to provide layout information to clients >- Clients use layout to directly access storage, avoiding >single-server bottleneck >- NFS, SCSI Block, and SCSI Object layout formats all discussed >- Support for multiple layout formats desirable (and looks doable) > >Dave, how is this? > >garth > > >On Feb 22, 2004, at 1:22 PM, Noveck, Dave wrote: > >> I have some suggestions on slides 5 and 6. >> >> I would drop the line about delegations from this slide. Unless >> you come to this from the sorts of discussions we have been having >> (and thus aren't the critical part of the audience), this is >> really not going to be understandable. One problem that we have >> in presenting this is that if we explain the situation, we wind up >> having to explain that we think we pretty much know how to do this >> already and just need the IETF to bless our choice (I'm exaggerating >> but only some), and that isn't likely to go down very well with a >> lot of people. >> >> I'd express the last sub-bullet in this section as something like: >> >> NFSv4 minor version model a good way to provide incremental >> extensions >> >> which doesn't say that we know pretty much what these are (but it >> doesn't say we don't :-) >> >> As to the last section of slide 6, I'd revise to be something like >> the following, again to reduce the we-know-how-to-do-this tone. >> >> Much interest in exploring how v4 could be extended to solve this >> >> Extension of delegations to provide "layout" information to >> clients >> >> Clients use layout information to do IO and avoid >> single-server bottleneck >> >> NFS, SCSI Block, SCSI Object layout formats all discussed >> >> Support for multiple formats looks desirable (and doable). >> >> -----Original Message----- >> From: Garth Gibson [mailto:garth@panasas.com] >> Sent: Friday, February 20, 2004 7:22 PM >> To: pnfs-reqs@yahoogroups.com >> Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started >> >> >> Based on feedback from Brent's concall 8 days ago, here is my cut at >> Gary's proposal for a short problem statement introduction >> presentation. >> >> garth >> > > > >------------------------ Yahoo! Groups Sponsor ---------------------~--> >Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark >Printer at MyInks.com. Free s/h on orders $50 or more to the US & Canada. >http://www.c1tracking.com/l.asp?cid=5511 >http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM >---------------------------------------------------------------------~-> > > >Yahoo! Groups Links > ><*> To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > ><*> To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > ><*> Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > From garth@panasas.com Sun Feb 22 17:52:39 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 88387 invoked from network); 23 Feb 2004 01:52:38 -0000 Received: from unknown (66.218.66.167) by m16.grp.scd.yahoo.com with QMQP; 23 Feb 2004 01:52:38 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 23 Feb 2004 01:52:37 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id FLP29Z15; Sun, 22 Feb 2004 20:52:35 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <6.0.3.0.0.20040222194310.01b7da00@silver.nane.netapp.com> References: <6.0.3.0.0.20040222194310.01b7da00@silver.nane.netapp.com> Content-Type: multipart/mixed; boundary=Apple-Mail-5-409702016 Message-Id: Date: Sun, 22 Feb 2004 20:52:29 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Okay, I think I'm in tune with all of Dave Noveck's comments and the first two of these comments from Tom. The third comment, calling for more description of the ideas that were in the NEPS workshop, has me confused. I think Dave has been encouraging that the proposal have less in the way of what the IETF should do to solve this problem, and I read this request as suggesting that we have more in the way of proposals for the solution. I'm certain we can do either, and I think I'd like to hear a little more direction from the group. Do we stay with this mostly solution free problem presentation, or do we add more about layout delegations and other NEPS proposed ideas? garth On Feb 22, 2004, at 7:48 PM, Talpey, Thomas wrote: > What is "EDA" (slide 3)? Spell it out, whatever it is! :-) > > I wouldn't include "FCIP" on slide 5. Does anyone use it, > especially for storage? > > I think there should be another slide after slide 2, which > drills down at least a little into the ideas that have been > discussed. It's a good place to establish the context of > the presentation - as it is there isn't really any "proposal". > I'm thinking maybe a single bullet for some or all of the > whitepapers at NEPS? > > Tom. > > At 06:44 PM 2/22/2004, Garth Gibson wrote: >> I understand the sensitivity of IETF to having solutions presented >> instead of problems. >> >> Here is a revision following the recommendations below, with some word >> choices of my own. Specifically, I'm reluctant to say nothing about >> how NFSv4 is better for fixing this than NFSv3; that is, the >> definition > >> of NFSv4 creates the opportunity for "direct" or "out-of-band" access. >> >> Page 5: the two lines in question, >> >> - NFSv4, relative to NFSv3, has enhanced client side optimizations >> - NFSv4 minor extensions may suffice for incremental functionality >> >> Page 6: last section: >> >> Much interest in exploring NFSv4 extensions to meet scalability needs >> - Extend NFSv4 "delegations" to provide "layout" information to >> clients > >> - Clients use "layout" to directly access storage, avoiding >> single-server bottleneck >> - NFS, SCSI Block, and SCSI Object "layout" formats all discussed >> - Support for multiple "layout" formats desirable (and looks doable) >> >> Dave, how is this? >> >> garth >> >> >> On Feb 22, 2004, at 1:22 PM, Noveck, Dave wrote: >> >>> I have some suggestions on slides 5 and 6. >>> >>> I would drop the line about delegations from this slide. Unless >>> you come to this from the sorts of discussions we have been having >>> (and thus aren't the critical part of the audience), this is >>> really not going to be understandable. One problem that we have >>> in presenting this is that if we explain the situation, we wind up >>> having to explain that we think we pretty much know how to do this >>> already and just need the IETF to bless our choice (I'm exaggerating >>> but only some), and that isn't likely to go down very well with a >>> lot of people. >>> >>> I'd express the last sub-bullet in this section as something like: >>> >>> NFSv4 minor version model a good way to provide incremental >>> extensions >>> >>> which doesn't say that we know pretty much what these are (but it >>> doesn't say we don't :-) >>> >>> As to the last section of slide 6, I'd revise to be something like >>> the following, again to reduce the we-know-how-to-do-this tone. >>> >>> Much interest in exploring how v4 could be extended to solve >>> this >>> >>> Extension of delegations to provide "layout" information >>> to clients >>> >>> Clients use layout information to do IO and avoid >>> single-server bottleneck >>> >>> NFS, SCSI Block, SCSI Object layout formats all discussed >>> >>> Support for multiple formats looks desirable (and doable). >>> >>> -----Original Message----- >>> From: Garth Gibson [ mailto:garth@panasas.com >>> ] >>> Sent: Friday, February 20, 2004 7:22 PM >>> To: pnfs-reqs@yahoogroups.com >>> Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started >>> >>> >>> Based on feedback from Brent's concall 8 days ago, here is my cut at >>> Gary's proposal for a short problem statement introduction >>> presentation. >>> >>> garth Attachment (not stored) pNFS-intro-2-22.2.ppt Type: application/vnd.ms-powerpoint From julian_satran@il.ibm.com Mon Feb 23 08:25:56 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 50186 invoked from network); 23 Feb 2004 16:25:42 -0000 Received: from unknown (66.218.66.167) by m16.grp.scd.yahoo.com with QMQP; 23 Feb 2004 16:25:42 -0000 Received: from unknown (HELO mtagate7.uk.ibm.com) (195.212.29.140) by mta6.grp.scd.yahoo.com with SMTP; 23 Feb 2004 16:25:41 -0000 Received: from d06nrmr1307.portsmouth.uk.ibm.com (d06nrmr1307.portsmouth.uk.ibm.com [9.149.38.129]) by mtagate7.uk.ibm.com (8.12.10/8.12.10) with ESMTP id i1NGPc4n121918 for ; Mon, 23 Feb 2004 16:25:39 GMT Received: from d12ml102.megacenter.de.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1307.portsmouth.uk.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i1NGPbtS240322 for ; Mon, 23 Feb 2004 16:25:38 GMT In-Reply-To: <6.0.3.0.0.20040222194310.01b7da00@silver.nane.netapp.com> To: pnfs-reqs@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Mon, 23 Feb 2004 18:27:36 +0200 X-MIMETrack: Serialize by Router on D12ML102/12/M/IBM(Release 6.0.2CF2|July 23, 2003) at 23/02/2004 18:27:37, Serialize complete at 23/02/2004 18:27:37 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: base64 X-eGroups-Remote-IP: 195.212.29.140 From: Julian Satran Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran "Talpey, Thomas" wrote on 23/02/2004 02:48:40: > What is "EDA" (slide 3)? Spell it out, whatever it is! :-) > I wouldn't include "FCIP" on slide 5. Does anyone use it, > especially for storage? And I would include NFS-RDMA - as an NFS extension example that IETF is doing. > I think there should be another slide after slide 2, which > drills down at least a little into the ideas that have been > discussed. It's a good place to establish the context of > the presentation - as it is there isn't really any "proposal". > I'm thinking maybe a single bullet for some or all of the > whitepapers at NEPS? > Tom. > At 06:44 PM 2/22/2004, Garth Gibson wrote: > >I understand the sensitivity of IETF to having solutions presented > >instead of problems. > > > >Here is a revision following the recommendations below, with some word > >choices of my own. Specifically, I'm reluctant to say nothing about > >how NFSv4 is better for fixing this than NFSv3; that is, the definition > >of NFSv4 creates the opportunity for "direct" or "out-of-band" access. > > > >Page 5: the two lines in question, > > > >- NFSv4, relative to NFSv3, has enhanced client side optimizations > >- NFSv4 minor extensions may suffice for incremental functionality > > > >Page 6: last section: > > > >Much interest in exploring NFSv4 extensions to meet scalability needs > >- Extend NFSv4 “delegations” to provide “layout” information to clients > >- Clients use “layout” to directly access storage, avoiding > >single-server bottleneck > >- NFS, SCSI Block, and SCSI Object “layout” formats all discussed > >- Support for multiple “layout” formats desirable (and looks doable) > > > >Dave, how is this? > > > >garth > > > > > >On Feb 22, 2004, at 1:22 PM, Noveck, Dave wrote: > > > >> I have some suggestions on slides 5 and 6. > >> > >> I would drop the line about delegations from this slide. Unless > >> you come to this from the sorts of discussions we have been having > >> (and thus aren't the critical part of the audience), this is > >> really not going to be understandable. One problem that we have > >> in presenting this is that if we explain the situation, we wind up > >> having to explain that we think we pretty much know how to do this > >> already and just need the IETF to bless our choice (I'm exaggerating > >> but only some), and that isn't likely to go down very well with a > >> lot of people. > >> > >> I'd express the last sub-bullet in this section as something like: > >> > >> NFSv4 minor version model a good way to provide incremental > >> extensions > >> > >> which doesn't say that we know pretty much what these are (but it > >> doesn't say we don't :-) > >> > >> As to the last section of slide 6, I'd revise to be something like > >> the following, again to reduce the we-know-how-to-do-this tone. > >> > >> Much interest in exploring how v4 could be extended to solve this > >> > >> Extension of delegations to provide "layout" information to > >> clients > >> > >> Clients use layout information to do IO and avoid > >> single-server bottleneck > >> > >> NFS, SCSI Block, SCSI Object layout formats all discussed > >> > >> Support for multiple formats looks desirable (and doable). > >> > >> -----Original Message----- > >> From: Garth Gibson [mailto:garth@panasas.com] > >> Sent: Friday, February 20, 2004 7:22 PM > >> To: pnfs-reqs@yahoogroups.com > >> Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started > >> > >> > >> Based on feedback from Brent's concall 8 days ago, here is my cut at > >> Gary's proposal for a short problem statement introduction > >> presentation. > >> > >> garth > >> > > > > > > > >------------------------ Yahoo! Groups Sponsor ---------------------~--> > >Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark > >Printer at MyInks.com. Free s/h on orders $50 or more to the US & Canada. > >http://www.c1tracking.com/l.asp?cid=5511 > >http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM > >---------------------------------------------------------------------~-> > > > > > >Yahoo! Groups Links > > > > > > > > > > > > > > Yahoo! Groups Sponsor > > ADVERTISEMENT > > [image removed] > > > Yahoo! Groups Links > To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From julian_satran@il.ibm.com Mon Feb 23 08:26:01 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 50171 invoked from network); 23 Feb 2004 16:25:42 -0000 Received: from unknown (66.218.66.216) by m16.grp.scd.yahoo.com with QMQP; 23 Feb 2004 16:25:42 -0000 Received: from unknown (HELO mtagate5.uk.ibm.com) (195.212.29.138) by mta1.grp.scd.yahoo.com with SMTP; 23 Feb 2004 16:25:40 -0000 Received: from d06nrmr1307.portsmouth.uk.ibm.com (d06nrmr1307.portsmouth.uk.ibm.com [9.149.38.129]) by mtagate5.uk.ibm.com (8.12.10/8.12.10) with ESMTP id i1NGPcWH026470 for ; Mon, 23 Feb 2004 16:25:38 GMT Received: from d12ml102.megacenter.de.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1307.portsmouth.uk.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i1NGPbtT240322 for ; Mon, 23 Feb 2004 16:25:38 GMT In-Reply-To: To: pnfs-reqs@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5 September 26, 2003 Message-ID: Date: Mon, 23 Feb 2004 18:27:36 +0200 X-MIMETrack: Serialize by Router on D12ML102/12/M/IBM(Release 6.0.2CF2|July 23, 2003) at 23/02/2004 18:27:38 Content-Type: multipart/mixed; boundary="=_mixed 00567A40C2256E43_=" X-eGroups-Remote-IP: 195.212.29.138 From: Julian Satran Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran The only think fair to say (I think) is that there are initial ideas. Otherwise the community may not like that you inted to push what they'll perceive as a done deal. So it is fair to mention things like delegations - but beyond "solutions could be based on". Julo Garth Gibson 23/02/2004 03:52 Please respond to pnfs-reqs To pnfs-reqs@yahoogroups.com cc Subject Re: [pnfs-reqs] RE: NEPS-REQS: getting started Okay, I think I'm in tune with all of Dave Noveck's comments and the first two of these comments from Tom. The third comment, calling for more description of the ideas that were in the NEPS workshop, has me confused. I think Dave has been encouraging that the proposal have less in the way of what the IETF should do to solve this problem, and I read this request as suggesting that we have more in the way of proposals for the solution. I'm certain we can do either, and I think I'd like to hear a little more direction from the group. Do we stay with this mostly solution free problem presentation, or do we add more about layout delegations and other NEPS proposed ideas? garth On Feb 22, 2004, at 7:48 PM, Talpey, Thomas wrote: > What is "EDA" (slide 3)? Spell it out, whatever it is! :-) > > I wouldn't include "FCIP" on slide 5. Does anyone use it, > especially for storage? > > I think there should be another slide after slide 2, which > drills down at least a little into the ideas that have been > discussed. It's a good place to establish the context of > the presentation - as it is there isn't really any "proposal". > I'm thinking maybe a single bullet for some or all of the > whitepapers at NEPS? > > Tom. > > At 06:44 PM 2/22/2004, Garth Gibson wrote: >> I understand the sensitivity of IETF to having solutions presented >> instead of problems. >> >> Here is a revision following the recommendations below, with some word >> choices of my own. Specifically, I'm reluctant to say nothing about >> how NFSv4 is better for fixing this than NFSv3; that is, the >> definition > >> of NFSv4 creates the opportunity for "direct" or "out-of-band" access. >> >> Page 5: the two lines in question, >> >> - NFSv4, relative to NFSv3, has enhanced client side optimizations >> - NFSv4 minor extensions may suffice for incremental functionality >> >> Page 6: last section: >> >> Much interest in exploring NFSv4 extensions to meet scalability needs >> - Extend NFSv4 "delegations" to provide "layout" information to >> clients > >> - Clients use "layout" to directly access storage, avoiding >> single-server bottleneck >> - NFS, SCSI Block, and SCSI Object "layout" formats all discussed >> - Support for multiple "layout" formats desirable (and looks doable) >> >> Dave, how is this? >> >> garth >> >> >> On Feb 22, 2004, at 1:22 PM, Noveck, Dave wrote: >> >>> I have some suggestions on slides 5 and 6. >>> >>> I would drop the line about delegations from this slide. Unless >>> you come to this from the sorts of discussions we have been having >>> (and thus aren't the critical part of the audience), this is >>> really not going to be understandable. One problem that we have >>> in presenting this is that if we explain the situation, we wind up >>> having to explain that we think we pretty much know how to do this >>> already and just need the IETF to bless our choice (I'm exaggerating >>> but only some), and that isn't likely to go down very well with a >>> lot of people. >>> >>> I'd express the last sub-bullet in this section as something like: >>> >>> NFSv4 minor version model a good way to provide incremental >>> extensions >>> >>> which doesn't say that we know pretty much what these are (but it >>> doesn't say we don't :-) >>> >>> As to the last section of slide 6, I'd revise to be something like >>> the following, again to reduce the we-know-how-to-do-this tone. >>> >>> Much interest in exploring how v4 could be extended to solve >>> this >>> >>> Extension of delegations to provide "layout" information >>> to clients >>> >>> Clients use layout information to do IO and avoid >>> single-server bottleneck >>> >>> NFS, SCSI Block, SCSI Object layout formats all discussed >>> >>> Support for multiple formats looks desirable (and doable). >>> >>> -----Original Message----- >>> From: Garth Gibson [ mailto:garth@panasas.com >>> ] >>> Sent: Friday, February 20, 2004 7:22 PM >>> To: pnfs-reqs@yahoogroups.com >>> Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started >>> >>> >>> Based on feedback from Brent's concall 8 days ago, here is my cut at >>> Gary's proposal for a short problem statement introduction >>> presentation. >>> >>> garth Yahoo! Groups Links Attachment (not stored) pNFS-intro-2-22.2.ppt Type: application/octet-stream From mclarty3@llnl.gov Mon Feb 23 11:15:50 2004 Return-Path: X-Sender: mclarty3@llnl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 52894 invoked from network); 23 Feb 2004 19:15:47 -0000 Received: from unknown (66.218.66.172) by m12.grp.scd.yahoo.com with QMQP; 23 Feb 2004 19:15:47 -0000 Received: from unknown (HELO smtp-4.llnl.gov) (128.115.41.84) by mta4.grp.scd.yahoo.com with SMTP; 23 Feb 2004 19:15:47 -0000 Received: from poptop.llnl.gov (localhost [127.0.0.1]) by smtp-4.llnl.gov (8.12.3p2-20030917/8.12.3/LLNL evision: 1.13 $) with ESMTP id i1NJETNU015799; Mon, 23 Feb 2004 11:15:46 -0800 (PST) Received: from POLARBEAR.llnl.gov ([134.9.18.59] verified) by poptop.llnl.gov (CommuniGate Pro SMTP 4.0.6) with ESMTP id 36870990; Mon, 23 Feb 2004 11:15:23 -0800 Message-Id: <5.0.0.25.2.20040223110657.027281a0@poptop.llnl.gov> X-Sender: e002801@poptop.llnl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.0 Date: Mon, 23 Feb 2004 11:15:22 -0800 To: pnfs-reqs@yahoogroups.com, pnfs-reqs@yahoogroups.com In-Reply-To: References: <6.0.3.0.0.20040222194310.01b7da00@silver.nane.netapp.com> <6.0.3.0.0.20040222194310.01b7da00@silver.nane.netapp.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-eGroups-Remote-IP: 128.115.41.84 From: Tyce McLarty Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169320772 X-Yahoo-Profile: mclarty3 ADVERTISEMENT At 08:52 PM 2/22/2004 -0500, Garth Gibson wrote: >Okay, I think I'm in tune with all of Dave Noveck's comments and the >first two of these comments from Tom. > >The third comment, calling for more description of the ideas that were >in the NEPS workshop, has me confused. I think Dave has been >encouraging that the proposal have less in the way of what the IETF >should do to solve this problem, and I read this request as suggesting >that we have more in the way of proposals for the solution. > >I'm certain we can do either, and I think I'd like to hear a little >more direction from the group. > >Do we stay with this mostly solution free problem presentation, or do >we add more about layout delegations and other NEPS proposed ideas? I think it depends on how we intend to use the "presentation". I expect two scenarios, neither of which involves the IETF: 1. I catch a higher level manager or VIP in the hall for a couple minutes: He gets slide 2 first. If he shows any interest at all give him the rest of slide 7. That's all he needs to know and he will almost surely lose interest if I try to tell him more. 2. Discussion with technical peer: Again start with slide 2 information. Expect that he will show enough interest to want it all. Have to use some judgement here. In case 2, I think being ready with what we are thinking about for a solution is on target. A technical type will find supporting an problem areas that we really want to find out about. You asked for opinions. Tyce >garth > > >On Feb 22, 2004, at 7:48 PM, Talpey, Thomas wrote: > > > What is "EDA" (slide 3)? Spell it out, whatever it is! :-) > > > > I wouldn't include "FCIP" on slide 5. Does anyone use it, > > especially for storage? > > > > I think there should be another slide after slide 2, which > > drills down at least a little into the ideas that have been > > discussed. It's a good place to establish the context of > > the presentation - as it is there isn't really any "proposal". > > I'm thinking maybe a single bullet for some or all of the > > whitepapers at NEPS? > > > > Tom. > > > > At 06:44 PM 2/22/2004, Garth Gibson wrote: > >> I understand the sensitivity of IETF to having solutions presented > >> instead of problems. > >> > >> Here is a revision following the recommendations below, with some word > >> choices of my own. Specifically, I'm reluctant to say nothing about > >> how NFSv4 is better for fixing this than NFSv3; that is, the > >> definition > > > >> of NFSv4 creates the opportunity for "direct" or "out-of-band" access. > >> > >> Page 5: the two lines in question, > >> > >> - NFSv4, relative to NFSv3, has enhanced client side optimizations > >> - NFSv4 minor extensions may suffice for incremental functionality > >> > >> Page 6: last section: > >> > >> Much interest in exploring NFSv4 extensions to meet scalability needs > >> - Extend NFSv4 "delegations" to provide "layout" information to > >> clients > > > >> - Clients use "layout" to directly access storage, avoiding > >> single-server bottleneck > >> - NFS, SCSI Block, and SCSI Object "layout" formats all discussed > >> - Support for multiple "layout" formats desirable (and looks doable) > >> > >> Dave, how is this? > >> > >> garth > >> > >> > >> On Feb 22, 2004, at 1:22 PM, Noveck, Dave wrote: > >> > >>> I have some suggestions on slides 5 and 6. > >>> > >>> I would drop the line about delegations from this slide. Unless > >>> you come to this from the sorts of discussions we have been having > >>> (and thus aren't the critical part of the audience), this is > >>> really not going to be understandable. One problem that we have > >>> in presenting this is that if we explain the situation, we wind up > >>> having to explain that we think we pretty much know how to do this > >>> already and just need the IETF to bless our choice (I'm exaggerating > >>> but only some), and that isn't likely to go down very well with a > >>> lot of people. > >>> > >>> I'd express the last sub-bullet in this section as something like: > >>> > >>> NFSv4 minor version model a good way to provide incremental > >>> extensions > >>> > >>> which doesn't say that we know pretty much what these are (but it > >>> doesn't say we don't :-) > >>> > >>> As to the last section of slide 6, I'd revise to be something like > >>> the following, again to reduce the we-know-how-to-do-this tone. > >>> > >>> Much interest in exploring how v4 could be extended to solve > >>> this > >>> > >>> Extension of delegations to provide "layout" information > >>> to clients > >>> > >>> Clients use layout information to do IO and avoid > >>> single-server bottleneck > >>> > >>> NFS, SCSI Block, SCSI Object layout formats all discussed > >>> > >>> Support for multiple formats looks desirable (and doable). > >>> > >>> -----Original Message----- > >>> From: Garth Gibson [ mailto:garth@panasas.com > >>> ] > >>> Sent: Friday, February 20, 2004 7:22 PM > >>> To: pnfs-reqs@yahoogroups.com > >>> Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started > >>> > >>> > >>> Based on feedback from Brent's concall 8 days ago, here is my cut at > >>> Gary's proposal for a short problem statement introduction > >>> presentation. > >>> > >>> garth > > > > > >Yahoo! Groups Links > > > > > > From Thomas.Talpey@netapp.com Mon Feb 23 12:00:05 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 2658 invoked from network); 23 Feb 2004 20:00:02 -0000 Received: from unknown (66.218.66.172) by m7.grp.scd.yahoo.com with QMQP; 23 Feb 2004 20:00:02 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta4.grp.scd.yahoo.com with SMTP; 23 Feb 2004 20:00:01 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i1NK00JC014888 for ; Mon, 23 Feb 2004 12:00:00 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i1NK00DU004861 for ; Mon, 23 Feb 2004 12:00:00 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.37]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Mon, 23 Feb 2004 14:59:58 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3FA47.9CFDB300" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Mon, 23 Feb 2004 11:59:51 -0800 Message-ID: <6.0.3.0.0.20040223145154.01c24c68@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] RE: NEPS-REQS: getting started Thread-Index: AcP6R504DgBYzoRWQRWi2hKoiYvDNQ== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu ADVERTISEMENT click here Garth - sorry for the delay in responding. Connectathon calls. I don't mean to suggest presenting the NEPS ideas as task items. What I did mean was to use them to set some context, and show come diverse interests which pNFS is in the process of bringing together under the NFSv4 tent. The reason to bring them up early in the talk is because it's very abstract without them. Most folks are seeing this for the first time, they're going to want to know roughly what it's all about. It's an important message to the IETF as well - there are multiple viewpoints, there is discussion, there is unified desire to move it to the open IETF forum. Definitely don't drill down on them - just the basic components of each. At least, this is my idea - comments? Tom. At 08:52 PM 2/22/2004, Garth Gibson wrote: >Okay, I think I'm in tune with all of Dave Noveck's comments and the >first two of these comments from Tom. > >The third comment, calling for more description of the ideas that were >in the NEPS workshop, has me confused. I think Dave has been >encouraging that the proposal have less in the way of what the IETF >should do to solve this problem, and I read this request as suggesting >that we have more in the way of proposals for the solution. > >I'm certain we can do either, and I think I'd like to hear a little >more direction from the group. > >Do we stay with this mostly solution free problem presentation, or do >we add more about layout delegations and other NEPS proposed ideas? > >garth > > >On Feb 22, 2004, at 7:48 PM, Talpey, Thomas wrote: > >> What is "EDA" (slide 3)? Spell it out, whatever it is! :-) >> >> I wouldn't include "FCIP" on slide 5. Does anyone use it, >> especially for storage? >> >> I think there should be another slide after slide 2, which >> drills down at least a little into the ideas that have been >> discussed. It's a good place to establish the context of >> the presentation - as it is there isn't really any "proposal". >> I'm thinking maybe a single bullet for some or all of the >> whitepapers at NEPS? >> >> Tom. >> >> At 06:44 PM 2/22/2004, Garth Gibson wrote: >>> I understand the sensitivity of IETF to having solutions presented >>> instead of problems. >>> >>> Here is a revision following the recommendations below, with some word >>> choices of my own. Specifically, I'm reluctant to say nothing about >>> how NFSv4 is better for fixing this than NFSv3; that is, the >>> definition >> >>> of NFSv4 creates the opportunity for "direct" or "out-of-band" access. >>> >>> Page 5: the two lines in question, >>> >>> - NFSv4, relative to NFSv3, has enhanced client side optimizations >>> - NFSv4 minor extensions may suffice for incremental functionality >>> >>> Page 6: last section: >>> >>> Much interest in exploring NFSv4 extensions to meet scalability needs >>> - Extend NFSv4 "delegations" to provide "layout" information to >>> clients >> >>> - Clients use "layout" to directly access storage, avoiding >>> single-server bottleneck >>> - NFS, SCSI Block, and SCSI Object "layout" formats all discussed >>> - Support for multiple "layout" formats desirable (and looks doable) >>> >>> Dave, how is this? >>> >>> garth >>> >>> >>> On Feb 22, 2004, at 1:22 PM, Noveck, Dave wrote: >>> >>>> I have some suggestions on slides 5 and 6. >>>> >>>> I would drop the line about delegations from this slide. Unless >>>> you come to this from the sorts of discussions we have been having >>>> (and thus aren't the critical part of the audience), this is >>>> really not going to be understandable. One problem that we have >>>> in presenting this is that if we explain the situation, we wind up >>>> having to explain that we think we pretty much know how to do this >>>> already and just need the IETF to bless our choice (I'm exaggerating >>>> but only some), and that isn't likely to go down very well with a >>>> lot of people. >>>> >>>> I'd express the last sub-bullet in this section as something like: >>>> >>>> NFSv4 minor version model a good way to provide incremental >>>> extensions >>>> >>>> which doesn't say that we know pretty much what these are (but it >>>> doesn't say we don't :-) >>>> >>>> As to the last section of slide 6, I'd revise to be something like >>>> the following, again to reduce the we-know-how-to-do-this tone. >>>> >>>> Much interest in exploring how v4 could be extended to solve >>>> this >>>> >>>> Extension of delegations to provide "layout" information >>>> to clients >>>> >>>> Clients use layout information to do IO and avoid >>>> single-server bottleneck >>>> >>>> NFS, SCSI Block, SCSI Object layout formats all discussed >>>> >>>> Support for multiple formats looks desirable (and doable). >>>> >>>> -----Original Message----- >>>> From: Garth Gibson [ mailto:garth@panasas.com >>>> ] >>>> Sent: Friday, February 20, 2004 7:22 PM >>>> To: pnfs-reqs@yahoogroups.com >>>> Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started >>>> >>>> >>>> Based on feedback from Brent's concall 8 days ago, here is my cut at >>>> Gary's proposal for a short problem statement introduction >>>> presentation. >>>> >>>> garth > > > >------------------------ Yahoo! Groups Sponsor ---------------------~--> >Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark >Printer at MyInks.com. Free s/h on orders $50 or more to the US & Canada. >http://www.c1tracking.com/l.asp?cid=5511 >http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM >---------------------------------------------------------------------~-> > > >Yahoo! Groups Links > ><*> To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > ><*> To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > ><*> Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > From garth@panasas.com Thu Feb 26 08:07:43 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 48095 invoked from network); 26 Feb 2004 16:07:13 -0000 Received: from unknown (66.218.66.167) by m14.grp.scd.yahoo.com with QMQP; 26 Feb 2004 16:07:13 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 26 Feb 2004 16:07:12 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id FWFX67WT; Thu, 26 Feb 2004 11:07:11 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain To: pnfs-reqs@yahoogroups.com Date: Thu, 26 Feb 2004 11:07:05 -0500 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: concall now to finalize problem statement slides for Seoul if possible X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson From Thomas.Talpey@netapp.com Thu Feb 26 14:32:10 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 10494 invoked from network); 26 Feb 2004 22:32:08 -0000 Received: from unknown (66.218.66.166) by m16.grp.scd.yahoo.com with QMQP; 26 Feb 2004 22:32:08 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta5.grp.scd.yahoo.com with SMTP; 26 Feb 2004 22:32:08 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i1QMW8JC015683 for ; Thu, 26 Feb 2004 14:32:08 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i1QMVlDY029216 for ; Thu, 26 Feb 2004 14:32:07 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.35]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Thu, 26 Feb 2004 17:31:43 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3FCB8.4F3BB180" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Thu, 26 Feb 2004 14:31:38 -0800 Message-ID: <6.0.3.0.2.20040226172850.01f42ec0@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] concall now to finalize problem statement slides for Seoul if possible Thread-Index: AcP8uE/Rtfws08/2T2CcDBLrAGtqlA== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Re: [pnfs-reqs] concall now to finalize problem statement slides for Seoul if possible X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu Garth - sorry I couldn't join the call from the Far East. Could you please send the minutes and the latest version so I can comment? I'm ready to deliver the presentation, in any case. Tom. At 11:07 AM 2/26/2004, Garth Gibson wrote: > > > > >Yahoo! Groups Links > ><*> To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > ><*> To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > ><*> Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > From garth@panasas.com Thu Feb 26 18:36:42 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 83484 invoked from network); 27 Feb 2004 02:36:42 -0000 Received: from unknown (66.218.66.172) by m10.grp.scd.yahoo.com with QMQP; 27 Feb 2004 02:36:42 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 27 Feb 2004 02:36:41 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id FWFX6066; Thu, 26 Feb 2004 21:36:39 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <6.0.3.0.2.20040226172850.01f42ec0@silver.nane.netapp.com> References: <6.0.3.0.2.20040226172850.01f42ec0@silver.nane.netapp.com> Content-Type: multipart/mixed; boundary=Apple-Mail-37-757939893 Message-Id: Date: Thu, 26 Feb 2004 21:36:27 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] concall now to finalize problem statement slides for Seoul if possible X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Tom and others In this mornings call Julian Satran, Peter Corbett, Benny Halevy and I discussed how to give Tom something more concrete to cut the level of abstraction without giving the appearance that we are providing a solution and not describing a problem. Here is a new draft, unchanged except that I added a slide after slide 2 that shows a "Now vs Goal" pair of diagrams of storage, NFS server and client and says "Now: requested data moves through NFS server" and "Goal: reply from NFS server enables parallel access to storage servers." Tom. Use it or not. Modify it as needed. You are the speaker :-) garth On Feb 26, 2004, at 5:31 PM, Talpey, Thomas wrote: > Garth - sorry I couldn't join the call from the Far East. > Could you please send the minutes and the latest > version so I can comment? I'm ready to deliver the > presentation, in any case. > > Tom. Attachment (not stored) pNFS-intro-2-26.ppt Type: application/vnd.ms-powerpoint From garth@panasas.com Thu Feb 26 18:57:52 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 71470 invoked from network); 27 Feb 2004 02:57:51 -0000 Received: from unknown (66.218.66.166) by m17.grp.scd.yahoo.com with QMQP; 27 Feb 2004 02:57:51 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta5.grp.scd.yahoo.com with SMTP; 27 Feb 2004 02:57:51 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id FWFX609T; Thu, 26 Feb 2004 21:57:49 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <5.0.0.25.2.20040223110657.027281a0@poptop.llnl.gov> References: <6.0.3.0.0.20040222194310.01b7da00@silver.nane.netapp.com> <6.0.3.0.0.20040222194310.01b7da00@silver.nane.netapp.com> <5.0.0.25.2.20040223110657.027281a0@poptop.llnl.gov> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit Date: Thu, 26 Feb 2004 21:57:41 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT Thanks Tyce. It is good that these slides are useful for internal and technical conversations as well as the IETF. I hope the additional slide helps with the technical conversations too. garth On Feb 23, 2004, at 2:15 PM, Tyce McLarty wrote: > At 08:52 PM 2/22/2004 -0500, Garth Gibson wrote: >> Okay, I think I'm in tune with all of Dave Noveck's comments and the >> first two of these comments from Tom. >> >> The third comment, calling for more description of the ideas that were >> in the NEPS workshop, has me confused. I think Dave has been >> encouraging that the proposal have less in the way of what the IETF >> should do to solve this problem, and I read this request as suggesting >> that we have more in the way of proposals for the solution. >> >> I'm certain we can do either, and I think I'd like to hear a little >> more direction from the group. >> >> Do we stay with this mostly solution free problem presentation, or do >> we add more about layout delegations and other NEPS proposed ideas? > > I think it depends on how we intend to use the "presentation". I > expect two > scenarios, neither of which involves the IETF: > > 1. I catch a higher level manager or VIP in the hall for a couple > minutes: > He gets slide 2 first. If he shows any interest at all give him the > rest of > slide 7. That's all he needs to know and he will almost surely lose > interest if I try to tell him more. > > 2. Discussion with technical peer: Again start with slide 2 > information. > Expect that he will show enough interest to want it all. Have to use > some > judgement here. > > In case 2, I think being ready with what we are thinking about for a > solution is on target. A technical type will find supporting an problem > areas that we really want to find out about. > > You asked for opinions. > > Tyce > > >> garth >> >> >> On Feb 22, 2004, at 7:48 PM, Talpey, Thomas wrote: >> >>> What is "EDA" (slide 3)? Spell it out, whatever it is! :-) >>> >>> I wouldn't include "FCIP" on slide 5. Does anyone use it, >>> especially for storage? >>> >>> I think there should be another slide after slide 2, which >>> drills down at least a little into the ideas that have been >>> discussed. It's a good place to establish the context of >>> the presentation - as it is there isn't really any "proposal". >>> I'm thinking maybe a single bullet for some or all of the >>> whitepapers at NEPS? >>> >>> Tom. >>> >>> At 06:44 PM 2/22/2004, Garth Gibson wrote: >>>> I understand the sensitivity of IETF to having solutions presented >>>> instead of problems. >>>> >>>> Here is a revision following the recommendations below, with some >>>> word >>>> choices of my own. Specifically, I'm reluctant to say nothing about >>>> how NFSv4 is better for fixing this than NFSv3; that is, the >>>> definition >>> >>>> of NFSv4 creates the opportunity for "direct" or "out-of-band" >>>> access. >>>> >>>> Page 5: the two lines in question, >>>> >>>> - NFSv4, relative to NFSv3, has enhanced client side optimizations >>>> - NFSv4 minor extensions may suffice for incremental functionality >>>> >>>> Page 6: last section: >>>> >>>> Much interest in exploring NFSv4 extensions to meet scalability >>>> needs >>>> - Extend NFSv4 "delegations" to provide "layout" information to >>>> clients >>> >>>> - Clients use "layout" to directly access storage, avoiding >>>> single-server bottleneck >>>> - NFS, SCSI Block, and SCSI Object "layout" formats all discussed >>>> - Support for multiple "layout" formats desirable (and looks doable) >>>> >>>> Dave, how is this? >>>> >>>> garth >>>> >>>> >>>> On Feb 22, 2004, at 1:22 PM, Noveck, Dave wrote: >>>> >>>>> I have some suggestions on slides 5 and 6. >>>>> >>>>> I would drop the line about delegations from this slide. Unless >>>>> you come to this from the sorts of discussions we have been having >>>>> (and thus aren't the critical part of the audience), this is >>>>> really not going to be understandable. One problem that we have >>>>> in presenting this is that if we explain the situation, we wind up >>>>> having to explain that we think we pretty much know how to do this >>>>> already and just need the IETF to bless our choice (I'm >>>>> exaggerating >>>>> but only some), and that isn't likely to go down very well with a >>>>> lot of people. >>>>> >>>>> I'd express the last sub-bullet in this section as something like: >>>>> >>>>> NFSv4 minor version model a good way to provide incremental >>>>> extensions >>>>> >>>>> which doesn't say that we know pretty much what these are (but it >>>>> doesn't say we don't :-) >>>>> >>>>> As to the last section of slide 6, I'd revise to be something like >>>>> the following, again to reduce the we-know-how-to-do-this tone. >>>>> >>>>> Much interest in exploring how v4 could be extended to solve >>>>> this >>>>> >>>>> Extension of delegations to provide "layout" information >>>>> to clients >>>>> >>>>> Clients use layout information to do IO and avoid >>>>> single-server bottleneck >>>>> >>>>> NFS, SCSI Block, SCSI Object layout formats all discussed >>>>> >>>>> Support for multiple formats looks desirable (and >>>>> doable). >>>>> >>>>> -----Original Message----- >>>>> From: Garth Gibson [ mailto:garth@panasas.com >>>>> ] >>>>> Sent: Friday, February 20, 2004 7:22 PM >>>>> To: pnfs-reqs@yahoogroups.com >>>>> Subject: Re: [pnfs-reqs] RE: NEPS-REQS: getting started >>>>> >>>>> >>>>> Based on feedback from Brent's concall 8 days ago, here is my cut >>>>> at >>>>> Gary's proposal for a short problem statement introduction >>>>> presentation. >>>>> >>>>> garth >> >> >> >> >> >> Yahoo! Groups Links >> >> >> >> >> >> > > > > ------------------------ Yahoo! Groups Sponsor > ---------------------~--> > Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark > Printer at MyInks.com. Free s/h on orders $50 or more to the US & > Canada. > http://www.c1tracking.com/l.asp?cid=5511 > http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/W6uqlB/TM > --------------------------------------------------------------------- > ~-> > > > Yahoo! Groups Links > > > > From Thomas.Talpey@netapp.com Sun Feb 29 20:24:09 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 37218 invoked from network); 1 Mar 2004 04:24:08 -0000 Received: from unknown (66.218.66.167) by m3.grp.scd.yahoo.com with QMQP; 1 Mar 2004 04:24:08 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta6.grp.scd.yahoo.com with SMTP; 1 Mar 2004 04:24:08 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i214O8JC008573 for ; Sun, 29 Feb 2004 20:24:08 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i214O78L027043 for ; Sun, 29 Feb 2004 20:24:07 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.31]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Sun, 29 Feb 2004 23:23:57 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C3FF45.03516C80" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Sun, 29 Feb 2004 20:23:56 -0800 Message-ID: <6.0.3.0.2.20040229231754.01b88ec0@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] concall now to finalize problem statement slides for Seoul if possible Thread-Index: AcP/RQO41PpGg+75T2KWjKxeFLsw/g== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Re: [pnfs-reqs] concall now to finalize problem statement slides for Seoul if possible X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu I like the new version better. It roots the pNFS solution in reality pretty well. Kind of the extended version of the elevator pitch. I might try to split the longer slides. The WG meeting is Thursday Korean time - Wednesday US. Tom. At 09:36 PM 2/26/2004, Garth Gibson wrote: >Tom and others > >In this mornings call Julian Satran, Peter Corbett, Benny Halevy and I >discussed how to give Tom something more concrete to cut the level of >abstraction without giving the appearance that we are providing a >solution and not describing a problem. > >Here is a new draft, unchanged except that I added a slide after slide >2 that shows a "Now vs Goal" pair of diagrams of storage, NFS server >and client and says "Now: requested data moves through NFS server" and >"Goal: reply from NFS server enables parallel access to storage >servers." > >Tom. Use it or not. Modify it as needed. You are the speaker :-) > >garth > > >On Feb 26, 2004, at 5:31 PM, Talpey, Thomas wrote: >> Garth - sorry I couldn't join the call from the Far East. >> Could you please send the minutes and the latest >> version so I can comment? I'm ready to deliver the >> presentation, in any case. >> >> Tom. > > > > > >Yahoo! Groups Links > ><*> To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > ><*> To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > ><*> Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > From garth@panasas.com Wed Mar 03 09:05:37 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 59921 invoked from network); 3 Mar 2004 17:05:33 -0000 Received: from unknown (66.218.66.167) by m12.grp.scd.yahoo.com with QMQP; 3 Mar 2004 17:05:33 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 3 Mar 2004 17:05:33 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GFV6TLLQ; Wed, 3 Mar 2004 12:05:09 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit Cc: Garth Gibson Date: Wed, 3 Mar 2004 12:05:00 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: concall tomorrow cancelled X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT click here Sometime in the next 24 hours our problem statement gets presented to the IETF NFS TWG. Hey, cool, we delivered on that goal! Next week lets hear what Tom, Julian and David have to say about the reception. And lets get back to the core requirements document next week - review the items that are on it ... ie, things like: 1.0 Minimalism 1.1 Proxying 1.1.0 Legacy proxying 1.1.1 Strict proxying 1.1.2 Functional proxying 1.2 Cache consistency 1.3 Delegation promotion & reacquisition 1.4 Layout delegations 1.5 Concurrent write 1.6 Map revocation 1.7 Separability 1.8 NTFS application semantics etc. garth From andros@citi.umich.edu Wed Mar 03 10:27:04 2004 Return-Path: X-Sender: andros@citi.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 38659 invoked from network); 3 Mar 2004 18:27:02 -0000 Received: from unknown (66.218.66.218) by m8.grp.scd.yahoo.com with QMQP; 3 Mar 2004 18:27:02 -0000 Received: from unknown (HELO citi.umich.edu) (141.211.133.111) by mta3.grp.scd.yahoo.com with SMTP; 3 Mar 2004 18:27:02 -0000 Received: from citi.umich.edu (citi.umich.edu [141.211.133.111]) by citi.umich.edu (Postfix) with ESMTP id ECB22207EB; Wed, 3 Mar 2004 13:26:39 -0500 (EST) X-Mailer: exmh version 2.5 07/13/2001 with version: MH 6.8.3 #74[UCI] To: pnfs-reqs@yahoogroups.com Cc: andros@citi.umich.edu In-reply-to: Your message of "Wed, 03 Mar 2004 12:05:00 EST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 03 Mar 2004 13:26:39 -0500 Message-Id: <20040303182640.ECB22207EB@citi.umich.edu> X-eGroups-Remote-IP: 141.211.133.111 From: "William A.(Andy) Adamson" Subject: Re: [pnfs-reqs] concall tomorrow cancelled X-Yahoo-Group-Post: member; u=169434965 hi garth where can i get a copy of the core requirements document? does it have any place holder for the 'single OPEN for a group of clients, meta data scaling' issue i brought up with you and harvey? -->Andy > Sometime in the next 24 hours our problem statement gets presented to > the IETF NFS TWG. Hey, cool, we delivered on that goal! Next week > lets hear what Tom, Julian and David have to say about the reception. > > And lets get back to the core requirements document next week - review > the items that are on it ... ie, things like: > > 1.0 Minimalism > 1.1 Proxying > 1.1.0 Legacy proxying > 1.1.1 Strict proxying > 1.1.2 Functional proxying > 1.2 Cache consistency > 1.3 Delegation promotion & reacquisition > 1.4 Layout delegations > 1.5 Concurrent write > 1.6 Map revocation > 1.7 Separability > 1.8 NTFS application semantics > > etc. > > garth > > > > > > Yahoo! Groups Links > > > > > > From garth@panasas.com Wed Mar 03 10:33:59 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 51877 invoked from network); 3 Mar 2004 18:33:58 -0000 Received: from unknown (66.218.66.172) by m9.grp.scd.yahoo.com with QMQP; 3 Mar 2004 18:33:58 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 3 Mar 2004 18:33:58 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GFV6TL8V; Wed, 3 Mar 2004 13:33:56 -0500 In-Reply-To: <20040303182640.ECB22207EB@citi.umich.edu> References: <20040303182640.ECB22207EB@citi.umich.edu> Mime-Version: 1.0 (Apple Message framework v612) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <4F64D15E-6D41-11D8-A101-000A95A94F04@panasas.com> Content-Transfer-Encoding: 7bit Cc: Garth Gibson Date: Wed, 3 Mar 2004 13:33:47 -0500 To: Andy Adamson , pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] concall tomorrow cancelled X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Andy, there is not a core requirements document -- this document is the work item we need to get back to -- we have items that belong in it that have been described and discussed in email on this reflector -- lets add "group open" to that list and discuss it next week garth On Mar 3, 2004, at 1:26 PM, William A.(Andy) Adamson wrote: > hi garth > > where can i get a copy of the core requirements document? does it have > any > place holder for the 'single OPEN for a group of clients, meta data > scaling' > issue i brought up with you and harvey? > > -->Andy > >> Sometime in the next 24 hours our problem statement gets presented to >> the IETF NFS TWG. Hey, cool, we delivered on that goal! Next week >> lets hear what Tom, Julian and David have to say about the reception. >> >> And lets get back to the core requirements document next week - review >> the items that are on it ... ie, things like: >> >> 1.0 Minimalism >> 1.1 Proxying >> 1.1.0 Legacy proxying >> 1.1.1 Strict proxying >> 1.1.2 Functional proxying >> 1.2 Cache consistency >> 1.3 Delegation promotion & reacquisition >> 1.4 Layout delegations >> 1.5 Concurrent write >> 1.6 Map revocation >> 1.7 Separability >> 1.8 NTFS application semantics >> >> etc. >> >> garth >> >> >> >> >> >> Yahoo! Groups Links >> >> >> >> >> >> > > > > > > Yahoo! Groups Links > > > > From mclarty3@llnl.gov Wed Mar 03 16:30:19 2004 Return-Path: X-Sender: mclarty3@llnl.gov X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 13572 invoked from network); 4 Mar 2004 00:30:16 -0000 Received: from unknown (66.218.66.216) by m13.grp.scd.yahoo.com with QMQP; 4 Mar 2004 00:30:16 -0000 Received: from unknown (HELO smtp-1.llnl.gov) (128.115.250.81) by mta1.grp.scd.yahoo.com with SMTP; 4 Mar 2004 00:30:16 -0000 Received: from poptop.llnl.gov (localhost [127.0.0.1]) by smtp-1.llnl.gov (8.12.3p2-20030917/8.12.3/LLNL evision: 1.13 $) with ESMTP id i240U3mm020251 for ; Wed, 3 Mar 2004 16:30:04 -0800 (PST) Received: from POLARBEAR.llnl.gov ([134.9.18.59] verified) by poptop.llnl.gov (CommuniGate Pro SMTP 4.0.6) with ESMTP id 37710231 for pnfs-reqs@yahoogroups.com; Wed, 03 Mar 2004 16:30:03 -0800 Message-Id: <5.0.0.25.2.20040303162808.0276deb0@poptop.llnl.gov> X-Sender: e002801@poptop.llnl.gov X-Mailer: QUALCOMM Windows Eudora Version 5.0 Date: Wed, 03 Mar 2004 16:30:03 -0800 To: pnfs-reqs@yahoogroups.com In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-eGroups-Remote-IP: 128.115.250.81 From: Tyce McLarty Subject: Re: [pnfs-reqs] concall tomorrow cancelled X-Yahoo-Group-Post: member; u=169320772 X-Yahoo-Profile: mclarty3 Garth, Didn't we say something in December about another face-to-face meeting at the time of the FAST'04 conference? I do not remember seeing anything more definite. Is that still on? Thanks, Tyce At 12:05 PM 3/3/2004 -0500, you wrote: >Sometime in the next 24 hours our problem statement gets presented to >the IETF NFS TWG. Hey, cool, we delivered on that goal! Next week >lets hear what Tom, Julian and David have to say about the reception. > >And lets get back to the core requirements document next week - review >the items that are on it ... ie, things like: > >1.0 Minimalism >1.1 Proxying >1.1.0 Legacy proxying >1.1.1 Strict proxying >1.1.2 Functional proxying >1.2 Cache consistency >1.3 Delegation promotion & reacquisition >1.4 Layout delegations >1.5 Concurrent write >1.6 Map revocation >1.7 Separability >1.8 NTFS application semantics > >etc. > >garth > > > > > >Yahoo! Groups Links > > > > From Thomas.Talpey@netapp.com Wed Mar 03 17:16:58 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 28052 invoked from network); 4 Mar 2004 01:16:56 -0000 Received: from unknown (66.218.66.172) by m20.grp.scd.yahoo.com with QMQP; 4 Mar 2004 01:16:56 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta4.grp.scd.yahoo.com with SMTP; 4 Mar 2004 01:16:56 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i241GtJC026219 for ; Wed, 3 Mar 2004 17:16:56 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i241Gt8N013164 for ; Wed, 3 Mar 2004 17:16:55 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.36]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 3 Mar 2004 20:16:45 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C40186.5BC37C80" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Wed, 3 Mar 2004 17:16:31 -0800 Message-ID: <6.0.3.0.2.20040303201511.01eba6d8@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] concall tomorrow cancelled Thread-Index: AcQBhl1nhCX47ptYSnazBJeLzm1jWA== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Re: [pnfs-reqs] concall tomorrow cancelled X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu Good point - I've already put this in the slides. Someone let me know in the next 2.5 hours if that's not the case! BTW I'll send the updated slides in a second... Tom. At 07:30 PM 3/3/2004, Tyce McLarty wrote: >Garth, > >Didn't we say something in December about another face-to-face meeting at >the time of the FAST'04 conference? I do not remember seeing anything more >definite. Is that still on? > >Thanks, >Tyce > >At 12:05 PM 3/3/2004 -0500, you wrote: >>Sometime in the next 24 hours our problem statement gets presented to >>the IETF NFS TWG. Hey, cool, we delivered on that goal! Next week >>lets hear what Tom, Julian and David have to say about the reception. >> >>And lets get back to the core requirements document next week - review >>the items that are on it ... ie, things like: >> >>1.0 Minimalism >>1.1 Proxying >>1.1.0 Legacy proxying >>1.1.1 Strict proxying >>1.1.2 Functional proxying >>1.2 Cache consistency >>1.3 Delegation promotion & reacquisition >>1.4 Layout delegations >>1.5 Concurrent write >>1.6 Map revocation >>1.7 Separability >>1.8 NTFS application semantics >> >>etc. >> >>garth >> >> >> >> >> >>Yahoo! Groups Links >> >> >> >> > > > > >Yahoo! Groups Links > ><*> To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > ><*> To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > ><*> Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > From peter-yahoo@honeyman.org Wed Mar 03 17:20:44 2004 Return-Path: X-Sender: peter-yahoo@honeyman.org X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 3642 invoked from network); 4 Mar 2004 01:20:44 -0000 Received: from unknown (66.218.66.172) by m10.grp.scd.yahoo.com with QMQP; 4 Mar 2004 01:20:44 -0000 Received: from unknown (HELO n6.grp.scd.yahoo.com) (66.218.66.90) by mta4.grp.scd.yahoo.com with SMTP; 4 Mar 2004 01:20:43 -0000 Received: from [66.218.67.179] by n6.grp.scd.yahoo.com with NNFMP; 04 Mar 2004 01:20:43 -0000 Date: Thu, 04 Mar 2004 01:20:43 -0000 To: pnfs-reqs@yahoogroups.com Message-ID: In-Reply-To: <5.0.0.25.2.20040303162808.0276deb0@poptop.llnl.gov> User-Agent: eGroups-EW/0.82 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Length: 281 X-Mailer: Yahoo Groups Message Poster X-eGroups-Remote-IP: 66.218.66.90 From: "peterhoneyman" X-Originating-IP: 68.248.9.51 Subject: Re: concall tomorrow cancelled X-Yahoo-Group-Post: member; u=117991698 X-Yahoo-Profile: peterhoneyman yes, we have a room reserved in the conference hotel for the morning of march 31. peter > Didn't we say something in December about another face-to-face meeting at > the time of the FAST'04 conference? I do not remember seeing anything more > definite. Is that still on? From garth@panasas.com Wed Mar 03 17:27:22 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 37476 invoked from network); 4 Mar 2004 01:27:22 -0000 Received: from unknown (66.218.66.216) by m7.grp.scd.yahoo.com with QMQP; 4 Mar 2004 01:27:22 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta1.grp.scd.yahoo.com with SMTP; 4 Mar 2004 01:27:17 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GFV6T3BW; Wed, 3 Mar 2004 20:27:13 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <0B9C112B-6D7B-11D8-A101-000A95A94F04@panasas.com> Content-Transfer-Encoding: 7bit Date: Wed, 3 Mar 2004 20:27:05 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] Re: concall tomorrow cancelled X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson 3/31 (Wed) from 8:30am - 12:00pm in the Dolores room at the Grand Hyatt, in San Francisco -- same hotel as the NSDI and FAST conferences thanks to Peter for arranging this garth On Mar 3, 2004, at 8:20 PM, peterhoneyman wrote: > yes, we have a room reserved in the conference hotel for the morning > of march 31. > > peter > >> Didn't we say something in December about another face-to-face >> meeting at >> the time of the FAST'04 conference? I do not remember seeing anything >> more >> definite. Is that still on? > From Thomas.Talpey@netapp.com Wed Mar 03 17:49:53 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 41657 invoked from network); 4 Mar 2004 01:49:51 -0000 Received: from unknown (66.218.66.166) by m9.grp.scd.yahoo.com with QMQP; 4 Mar 2004 01:49:51 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta5.grp.scd.yahoo.com with SMTP; 4 Mar 2004 01:49:52 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i241npJC001618 for ; Wed, 3 Mar 2004 17:49:52 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i241np8L023439 for ; Wed, 3 Mar 2004 17:49:51 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.36]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 3 Mar 2004 20:49:39 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----_=_NextPart_001_01C4018A.F45BFB80" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Wed, 3 Mar 2004 17:48:51 -0800 Message-ID: <6.0.3.0.2.20040303204633.01c62ec0@silver.nane.netapp.com> X-MS-Has-Attach: yes X-MS-TNEF-Correlator: Thread-Topic: IETF-59 presentation updated draft Thread-Index: AcQBivg0h9qI3oqJRKStEnDSL6Ptzw== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: IETF-59 presentation updated draft X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu Here's the latest version. The content is largely the same but I simplified a couple of areas and jazzed up the picture a little. Also I added a "what" slide to the front by adapting stuff from the back. The WG meeting is in two hours so sorry for the late appearance. I'll check for comments regularly until then. Tom. Attachment (not stored) pNFS-intro-ietf59.ppt Type: application/octet-stream From Thomas.Talpey@netapp.com Wed Mar 03 23:12:00 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 86040 invoked from network); 4 Mar 2004 07:12:00 -0000 Received: from unknown (66.218.66.217) by m14.grp.scd.yahoo.com with QMQP; 4 Mar 2004 07:12:00 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta2.grp.scd.yahoo.com with SMTP; 4 Mar 2004 07:11:59 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i247BlJC010083 for ; Wed, 3 Mar 2004 23:11:47 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i247Bl8L013242 for ; Wed, 3 Mar 2004 23:11:47 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.30]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Thu, 4 Mar 2004 02:11:40 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----_=_NextPart_001_01C401B7.F0929E00" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Wed, 3 Mar 2004 23:11:22 -0800 Message-ID: <6.0.3.0.2.20040304020718.01c55ec0@silver.nane.netapp.com> X-MS-Has-Attach: yes X-MS-TNEF-Correlator: Thread-Topic: Source for today's presentation Thread-Index: AcQBt/Ndrsk+aGmQQCODwZ9eFak9QA== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Source for today's presentation X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu Here's the final powerpoint source. The presentation went well, though the important message conduit is the minutes/proceedings. I think I wrote the "elevator pitch" for them. :-) Tom. Attachment (not stored) pNFS-intro-ietf59-final.ppt Type: application/octet-stream From julian_satran@il.ibm.com Thu Mar 04 00:56:52 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 55119 invoked from network); 4 Mar 2004 08:56:51 -0000 Received: from unknown (66.218.66.166) by m16.grp.scd.yahoo.com with QMQP; 4 Mar 2004 08:56:51 -0000 Received: from unknown (HELO mtagate1.de.ibm.com) (195.212.29.150) by mta5.grp.scd.yahoo.com with SMTP; 4 Mar 2004 08:56:50 -0000 Received: from d12relay01.megacenter.de.ibm.com (d12relay01.megacenter.de.ibm.com [9.149.165.180]) by mtagate1.de.ibm.com (8.12.10/8.12.10) with ESMTP id i248tQpS103912; Thu, 4 Mar 2004 08:55:27 GMT Received: from d12ml102.megacenter.de.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay01.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i248tTOQ289120; Thu, 4 Mar 2004 09:55:30 +0100 In-Reply-To: To: pnfs-reqs@yahoogroups.com Cc: Garth Gibson , pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5.1 January 21, 2004 Message-ID: Date: Thu, 4 Mar 2004 17:55:24 +0900 X-MIMETrack: Serialize by Router on D12ML102/12/M/IBM(Release 6.0.2CF2|July 23, 2003) at 04/03/2004 10:55:25, Serialize complete at 04/03/2004 10:55:25 Content-Type: text/plain; charset="US-ASCII" X-eGroups-Remote-IP: 195.212.29.150 From: Julian Satran Subject: Re: [pnfs-reqs] concall tomorrow cancelled X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran The room was not very full (and that is an eufemism). I tried to bring in the area director (Jon Petersen) and told him briefly what it's all about. He did not come but I met him again after the session and reiterated my position. As Jon is trying to close whatever groups he can - we will have to persuade him that NFSV4 is not yet done and this activity can't be pursued as a tiny addition that can be reviewed by 2 experts (that is what they do after closing WGs). David did not show-up. Tom will have all the details. He chaired the WG for Beepy (who is sick). We proposed to make pNFS an official work-item of the WG but given the completely minor presence in the room (other session where in this state too) this has to be brought to the mailing list and it has to generate some traffic (which I assume it will). Jon complained also about the lack of volunteers to write drafts. I assume he was not referring to the NFS only though. Regards, Julo Garth Gibson 04/03/2004 02:05 Please respond to pnfs-reqs To pnfs-reqs@yahoogroups.com cc Garth Gibson Subject [pnfs-reqs] concall tomorrow cancelled Sometime in the next 24 hours our problem statement gets presented to the IETF NFS TWG. Hey, cool, we delivered on that goal! Next week lets hear what Tom, Julian and David have to say about the reception. And lets get back to the core requirements document next week - review the items that are on it ... ie, things like: 1.0 Minimalism 1.1 Proxying 1.1.0 Legacy proxying 1.1.1 Strict proxying 1.1.2 Functional proxying 1.2 Cache consistency 1.3 Delegation promotion & reacquisition 1.4 Layout delegations 1.5 Concurrent write 1.6 Map revocation 1.7 Separability 1.8 NTFS application semantics etc. garth Yahoo! Groups Links From black_david@emc.com Thu Mar 04 03:00:54 2004 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 96985 invoked from network); 4 Mar 2004 11:00:52 -0000 Received: from unknown (66.218.66.172) by m14.grp.scd.yahoo.com with QMQP; 4 Mar 2004 11:00:52 -0000 Received: from unknown (HELO srexchimc2.eng.emc.com) (168.159.100.11) by mta4.grp.scd.yahoo.com with SMTP; 4 Mar 2004 11:00:51 -0000 Received: from MAHO3MSX2.corp.emc.com (maho3msx2.isus.emc.com [128.221.11.32]) by srexchimc2.eng.emc.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id GD38HLZM; Thu, 4 Mar 2004 06:00:05 -0500 Received: by maho3msx2.corp.emc.com with Internet Mail Service (5.5.2653.19) id ; Thu, 4 Mar 2004 06:00:05 -0500 Message-ID: X-Sybari-Trust: 03b6d71b 1d8c424f 578b0cff 0000013d To: pnfs-reqs@yahoogroups.com Date: Thu, 4 Mar 2004 06:00:04 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain X-eGroups-Remote-IP: 168.159.100.11 From: black_david@emc.com Subject: Seoul results (was: concall tomorrow cancelled) X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 Julian writes: > The room was not very full (and that is an eufemism). I tried to bring in > the area director (Jon Petersen) and told him briefly what it's all about. > He did not come but I met him again after the session and reiterated my > position. > As Jon is trying to close whatever groups he can - we will have to > persuade him that NFSV4 is not yet done and this activity can't be pursued > as a tiny addition that can be reviewed by 2 experts (that is what they do > after closing WGs). David did not show-up. Tom will have all the details. > He chaired the WG for Beepy (who is sick). My lack of attendance was a brain-fault on my part, for which I sincerely apologize. I talked to Jon both before and after the meeting - Jon's ok with adding pNFS to the nfsv4 WG charter, even considering the overall desire to close out long-running WGs like nfsv4. That's good news, but keep in mind that this is today's answer, and ADs always reserve the right to change their minds. > We proposed to make pNFS an official work-item of the WG but given the > completely minor presence in the room (other session where in this state > too) this has to be brought to the mailing list and it has to generate > some traffic (which I assume it will). Attendance from the US is down across the board here in Seoul. pNFS does need to be followed up on the mailing list and then the chairs (Beepy and Spencer) will need to work out the addition with Jon. So far, so good. Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- From Brian.Pawlowski@netapp.com Thu Mar 04 03:29:54 2004 Return-Path: X-Sender: beepy@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 87650 invoked from network); 4 Mar 2004 11:29:53 -0000 Received: from unknown (66.218.66.218) by m5.grp.scd.yahoo.com with QMQP; 4 Mar 2004 11:29:53 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta3.grp.scd.yahoo.com with SMTP; 4 Mar 2004 11:29:53 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i24BTdJC014388 for ; Thu, 4 Mar 2004 03:29:39 -0800 (PST) Received: from tooting-fe.eng.netapp.com (tooting-fe.eng.netapp.com [10.56.10.118]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i24BTd8L022129 for ; Thu, 4 Mar 2004 03:29:39 -0800 (PST) Received: (from beepy@localhost) by tooting-fe.eng.netapp.com (8.11.7p1+Sun/8.11.6) id i24BTcH14038; Thu, 4 Mar 2004 03:29:38 -0800 (PST) Message-Id: <200403041129.i24BTcH14038@tooting-fe.eng.netapp.com> In-Reply-To: from "black_david@emc.com" at "Mar 4, 4 06:00:04 am" To: pnfs-reqs@yahoogroups.com Date: Thu, 4 Mar 2004 03:29:38 -0800 (PST) Cc: pnfs-reqs@yahoogroups.com X-Mailer: ELM [version 2.4ME++ PL40 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: Brian Pawlowski From: Brian Pawlowski Subject: Re: [pnfs-reqs] Seoul results (was: concall tomorrow cancelled) X-Yahoo-Group-Post: member; u=169504717 X-Yahoo-Profile: brianpawlowski Most of the people that actually do real work on NFS V4 (implementations and even spec work) do not attend IETF. Such is life. I'm still returning from the dead. > Julian writes: > > > The room was not very full (and that is an eufemism). I tried to bring in > > the area director (Jon Petersen) and told him briefly what it's all about. > > > He did not come but I met him again after the session and reiterated my > > position. > > As Jon is trying to close whatever groups he can - we will have to > > persuade him that NFSV4 is not yet done and this activity can't be pursued > > > as a tiny addition that can be reviewed by 2 experts (that is what they do > > > after closing WGs). David did not show-up. Tom will have all the details. > > He chaired the WG for Beepy (who is sick). > > My lack of attendance was a brain-fault on my part, for which I sincerely > apologize. I talked to Jon both before and after the meeting - Jon's ok > with adding pNFS to the nfsv4 WG charter, even considering the overall > desire to close out long-running WGs like nfsv4. That's good news, but > keep in mind that this is today's answer, and ADs always reserve the right > to change their minds. > > > We proposed to make pNFS an official work-item of the WG but given the > > completely minor presence in the room (other session where in this state > > too) this has to be brought to the mailing list and it has to generate > > some traffic (which I assume it will). > > Attendance from the US is down across the board here in Seoul. pNFS does > need to be followed up on the mailing list and then the chairs (Beepy and > Spencer) will need to work out the addition with Jon. So far, so good. > > Thanks, > --David > ---------------------------------------------------- > David L. Black, Senior Technologist > EMC Corporation, 176 South St., Hopkinton, MA 01748 > +1 (508) 293-7953 FAX: +1 (508) 293-7786 > black_david@emc.com Mobile: +1 (978) 394-7754 > ---------------------------------------------------- > > > > Yahoo! Groups Links > > > > > From Thomas.Talpey@netapp.com Thu Mar 04 03:36:45 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 30803 invoked from network); 4 Mar 2004 11:36:44 -0000 Received: from unknown (66.218.66.218) by m10.grp.scd.yahoo.com with QMQP; 4 Mar 2004 11:36:44 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta3.grp.scd.yahoo.com with SMTP; 4 Mar 2004 11:36:44 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i24BW8JC015149 for ; Thu, 4 Mar 2004 03:32:08 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i24BW88L023248 for ; Thu, 4 Mar 2004 03:32:08 -0800 (PST) Received: from tmt.netapp.com ([10.58.52.57]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Thu, 4 Mar 2004 06:32:00 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C401DC.4ED17800" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Thu, 4 Mar 2004 03:31:42 -0800 Message-ID: <6.0.3.0.2.20040304062612.01b91708@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Seoul results/status Thread-Index: AcQB3E+j2XIdnixkSGWV5n7Qy6UwSg== To: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: Seoul results/status X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu Julian's message about summed up the meeting itself, the attendance was disappointing, but not unexpected. The key thing is that we have put out the message, and the AD, and the NFS community, are aware of it. I too spoke with Jon before the meeting, and he did say that he was unlikely to attend but was open to the message. I sent the slides in pdf format to the nfsv4 reflector along with the agenda/proceedings. They're stuck in the moderator queue due to the size (only 50K) but should appear soon. So, folks will see them. I suggest a concall sometime next week to discuss next steps. Maybe just use our regular Thursday slot? It's important to start some buzz before the 3/31 BOF. Tom. From garth@panasas.com Thu Mar 04 08:06:06 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 90649 invoked from network); 4 Mar 2004 16:06:03 -0000 Received: from unknown (66.218.66.166) by m2.grp.scd.yahoo.com with QMQP; 4 Mar 2004 16:06:03 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta5.grp.scd.yahoo.com with SMTP; 4 Mar 2004 16:06:03 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GFV6TQPD; Thu, 4 Mar 2004 11:05:39 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <6.0.3.0.2.20040304062612.01b91708@silver.nane.netapp.com> References: <6.0.3.0.2.20040304062612.01b91708@silver.nane.netapp.com> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit Date: Thu, 4 Mar 2004 11:05:30 -0500 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: [pnfs-reqs] Seoul results/status X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT Yes, the regular Thurs 8am PST, 11am EST concall can be used for this. If anyone has lost the dialin numbers, please send me a note requesting them. Thanks garth On Mar 4, 2004, at 6:31 AM, Talpey, Thomas wrote: > I suggest a concall sometime next week to discuss next > steps. Maybe just use our regular Thursday slot? It's > important to start some buzz before the 3/31 BOF. From garth@panasas.com Thu Mar 11 00:08:10 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 96609 invoked from network); 11 Mar 2004 08:08:07 -0000 Received: from unknown (66.218.66.172) by m9.grp.scd.yahoo.com with QMQP; 11 Mar 2004 08:08:07 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 11 Mar 2004 08:08:08 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GT4QJN26; Thu, 11 Mar 2004 03:08:07 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <343800F0-7333-11D8-BDD5-000A95A94F04@panasas.com> Content-Transfer-Encoding: 7bit Cc: Garth Gibson Date: Thu, 11 Mar 2004 00:07:56 -0800 To: pnfs-reqs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: reminder: concall at 8am PST, 11am EST Thursday (today) X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT Tentative agenda: - Seoul debrief - making plans for FAST BOF - review requirements items in flight garth From andros@citi.umich.edu Wed Mar 17 09:45:57 2004 Return-Path: X-Sender: andros@citi.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 59954 invoked from network); 17 Mar 2004 17:43:41 -0000 Received: from unknown (66.218.66.166) by m13.grp.scd.yahoo.com with QMQP; 17 Mar 2004 17:43:41 -0000 Received: from unknown (HELO citi.umich.edu) (141.211.133.111) by mta5.grp.scd.yahoo.com with SMTP; 17 Mar 2004 17:43:41 -0000 Received: from citi.umich.edu (citi.umich.edu [141.211.133.111]) by citi.umich.edu (Postfix) with ESMTP id B805420F71; Wed, 17 Mar 2004 12:43:40 -0500 (EST) X-Mailer: exmh version 2.5 07/13/2001 with version: MH 6.8.3 #74[UCI] To: pnfs-reqs@yahoogroups.com Cc: pnfs-ops@yahoogroups.com, andros@citi.umich.edu Mime-Version: 1.0 Content-Type: multipart/mixed ; boundary="==_Exmh_5501136620" Date: Wed, 17 Mar 2004 12:43:40 -0500 Message-Id: <20040317174340.B805420F71@citi.umich.edu> X-eGroups-Remote-IP: 141.211.133.111 From: "William A.(Andy) Adamson" Subject: pNFS, MPIO, and client group open X-Yahoo-Group-Post: member; u=169434965 Sorry for the long email :) At the conclusion of the NEPS conference last November, Brent Welch emailed his notes as a starting point for a requirements document (attached). I use his pNFS extention language to describe a pNFS client using a 'normal open' servicing an open/write/close with direct access, and a large MPIO application using a proposed 'group open'. I note that my knowledge of parallel filesystems is growing, so please excuse any misconceptions, comments welcome... The architecture i'm picturing is a large cluster with a Parallel File System (PFS) consisting of PFS Meta data servers(PFS MD) and PFS NAS/SAN. I know it's only one of many architectures the pNFS set of extensions is trying to address. 1000's of pNFS clients 10's of pNFSd, one per PFS MD 100's NAS/SAN 'Normal' open ******************** a) pNFS client issues a compound to one pNFSd consisting of: OPEN with share: Access/Deny Multiple pNFSds need to resolve share. DELEG_ASK: Request Byte-range Delegation Multiple pNFSds need to resolve delegation READ/WRITE_IND request direct data access pNFSd queries PFS MD to get location map b) pNFS client can then issue READ/WRITE directly to the NAS/SAN using the map returned in READ/WRITE_IND. c) pNFS client issues a compound to one pNFSd consisting of: COMMIT_IND: CLOSE An MPIO application opens one very large file, shared by 1000's of compute clients. Each compute client manipulates its portion of the file. The MPIO layer manages compute clients so that no client shares a byte range of the file with another. This MPIO application consists of - supervisor code running on 1 MPIO supervisor node - compute code running on 1000's of MPIO compute nodes This MPIO application has cyclic behavior. I) Read initial data II) compute intermediate result III) wait for other compute nodes to finish computing IV) all compute nodes write to file (their portion) V) compute nodes trade 'edge conditions' VI) goto II (compute). While the application is not in IV (writing), another application, say the visualizer, needs READ access to the file in order to crunch it for visualization. Visualization is needed to tell if the MPIO application intermediate results are converging on a solution. If in step IV all the compute nodes open/write/close as described above as the Normal open, the pNFSds will be doing a lot of metadata processing: resolving share and delegation state between themselves as well as delivering per byte-range layout info. The group open is designed to reduce the metadata processing from 1000's to one. I mention a couple of new fcntls used by the MPIO layer to communicate pNFS state from the supervisor node to the compute nodes. Don't worry about that(!). Do worry about this: is there anything stoping a compute node from using OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as described below? If so, are the changes to pNFS to make this work small enough to be considered at this time? Group Open ********** step IV: supervisor OPENs file, all compute clients write file, supervisor CLOSES file. specifically: a) supervisor issues a compound with OPEN: Access - Both, Deny Both to a pNFSd - pNFSds need to resolve the share - is this a normal nfsv4 OPEN? does pNFSd or the PFS need to know about the other compute clients? - do we need the concept of a group clientid? DELEG_ASK: supervisor asks for WRITE delegation which should be granted given the OPEN Access-Both, Deny-Both share. - pNFSds need to resolve delegation request WRITE_IND: supervisor gets whole file layout info b) supervisor calls fcntl(fd, GET_GRPOPEN, cookie_buf); which returns the filehandle,stateid, and layout map from the supervisor pNFS. c) the supervisor code passes filehandle, stateid, and layout map to each compute node which calls fcntl(fd, SET_GRPOPEN, cookie_buf); the pNFS compute node client receives the filehandle, stateid, and layout map. performs a local open (nothing need go across the wire) stuffing the filehandle, stateid, and layout map into it's state tree just as if an across the wire OPEN/DELEG_ASK/WRITE_IND occured. d) compute clients use SET_GRPOPEN filehandle, stateid and map to directly write the data to the appropriate NAS/SAN - what besides the filehandle, stateid, and layout map is needed? - when done writing, each compute client issues a COMMIT_IND. e) when compute clients have flushed all data back to the file, supervisor issues a compound with CLOSE Attachment (not stored) brent_welch_pnfs_ops Type: text/plain From dnoveck@netapp.com Wed Mar 17 12:23:00 2004 Return-Path: X-Sender: Dave.Noveck@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 38995 invoked from network); 17 Mar 2004 20:22:57 -0000 Received: from unknown (66.218.66.217) by m9.grp.scd.yahoo.com with QMQP; 17 Mar 2004 20:22:57 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta2.grp.scd.yahoo.com with SMTP; 17 Mar 2004 20:22:59 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i2HKMvJC013738; Wed, 17 Mar 2004 12:22:57 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.57.156.135]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i2HKMn3G018277; Wed, 17 Mar 2004 12:22:56 -0800 (PST) content-class: urn:content-classes:message Date: Wed, 17 Mar 2004 11:42:55 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Thread-Topic: [pnfs-ops] pNFS, MPIO, and client group open Thread-Index: AcQMR7Z6ZGojAvNQQaq0OSeLkpqjuAAC6hBw To: , X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Noveck, Dave" From: "Noveck, Dave" Subject: RE: [pnfs-ops] pNFS, MPIO, and client group open X-Yahoo-Group-Post: member; u=44152831 X-Yahoo-Profile: davidnoveck Andy wrote: > Do worry about this: is there anything stoping a compute node from using > OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as described > below? I'm going to say "No". I know this wasn't the answer that I gave at the conference call, (and might not be the answer I give at the next conference call :-), but listen to my reasoning before you decide I'm crazy. In order to resolve this issue it is necessary to get all philosophical and address the question "What is a computer?". I know lots of people have already hit delete but I hope somebody is still reading. Suppose I have an application cluster with 1K nodes and I put on my marketing hat (Gee, I hope I don't need a marketing jacket and tie, too :-) and say "This is really a powerful computer with a thousand (maybe two thousand) CPU's". Now that's marketing bullshit but it isn't exactly false. There are certainly tasks where you want a large number of CPU's sharing memory and a DSM arrangement's performance is going to suck. On the other hand, there are applications where having a thousand memories is going to be much better than trying to provide adequate memory bandwidth from a single memory to many many CPU's. So what's the point? I think the point is that as far as the NFS server is concerned, whether the computer that is talking to it is "really" a computer, i.e. it has CPU's sharing memory, or is only a computer qua marketing bullshit, i.e. a collection of cpu's that don't share memory, that use other methods to co-ordinate common activities, doesn't matter. All the server sees are the requests made and if the cluster represents itself as a single machine (i.e. in V4 does a single SETCLIENTID or in v4.1 maintains many connections bound to a single session), it is one. The server doesn't see the cluster's memory architecture. It sees an open and then use of that that stateid. The fact that it comes over a different IP address doesn't disqualify it. A server might have options to check that (as a matter of security) but it isn't part of the protocol and we already have clients with multiple IP addresses. Having a thousand of them is a difference of degree (and may pose implementation issues) but I don't see a real protocol issue. OK. Now you can decide if I'm crazy. -----Original Message----- From: William A.(Andy) Adamson [mailto:andros@citi.umich.edu] Sent: Wednesday, March 17, 2004 12:44 PM To: pnfs-reqs@yahoogroups.com Cc: pnfs-ops@yahoogroups.com; andros@citi.umich.edu Subject: [pnfs-ops] pNFS, MPIO, and client group open Sorry for the long email :) At the conclusion of the NEPS conference last November, Brent Welch emailed his notes as a starting point for a requirements document (attached). I use his pNFS extention language to describe a pNFS client using a 'normal open' servicing an open/write/close with direct access, and a large MPIO application using a proposed 'group open'. I note that my knowledge of parallel filesystems is growing, so please excuse any misconceptions, comments welcome... The architecture i'm picturing is a large cluster with a Parallel File System (PFS) consisting of PFS Meta data servers(PFS MD) and PFS NAS/SAN. I know it's only one of many architectures the pNFS set of extensions is trying to address. 1000's of pNFS clients 10's of pNFSd, one per PFS MD 100's NAS/SAN 'Normal' open ******************** a) pNFS client issues a compound to one pNFSd consisting of: OPEN with share: Access/Deny Multiple pNFSds need to resolve share. DELEG_ASK: Request Byte-range Delegation Multiple pNFSds need to resolve delegation READ/WRITE_IND request direct data access pNFSd queries PFS MD to get location map b) pNFS client can then issue READ/WRITE directly to the NAS/SAN using the map returned in READ/WRITE_IND. c) pNFS client issues a compound to one pNFSd consisting of: COMMIT_IND: CLOSE An MPIO application opens one very large file, shared by 1000's of compute clients. Each compute client manipulates its portion of the file. The MPIO layer manages compute clients so that no client shares a byte range of the file with another. This MPIO application consists of - supervisor code running on 1 MPIO supervisor node - compute code running on 1000's of MPIO compute nodes This MPIO application has cyclic behavior. I) Read initial data II) compute intermediate result III) wait for other compute nodes to finish computing IV) all compute nodes write to file (their portion) V) compute nodes trade 'edge conditions' VI) goto II (compute). While the application is not in IV (writing), another application, say the visualizer, needs READ access to the file in order to crunch it for visualization. Visualization is needed to tell if the MPIO application intermediate results are converging on a solution. If in step IV all the compute nodes open/write/close as described above as the Normal open, the pNFSds will be doing a lot of metadata processing: resolving share and delegation state between themselves as well as delivering per byte-range layout info. The group open is designed to reduce the metadata processing from 1000's to one. I mention a couple of new fcntls used by the MPIO layer to communicate pNFS state from the supervisor node to the compute nodes. Don't worry about that(!). Do worry about this: is there anything stoping a compute node from using OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as described below? If so, are the changes to pNFS to make this work small enough to be considered at this time? Group Open ********** step IV: supervisor OPENs file, all compute clients write file, supervisor CLOSES file. specifically: a) supervisor issues a compound with OPEN: Access - Both, Deny Both to a pNFSd - pNFSds need to resolve the share - is this a normal nfsv4 OPEN? does pNFSd or the PFS need to know about the other compute clients? - do we need the concept of a group clientid? DELEG_ASK: supervisor asks for WRITE delegation which should be granted given the OPEN Access-Both, Deny-Both share. - pNFSds need to resolve delegation request WRITE_IND: supervisor gets whole file layout info b) supervisor calls fcntl(fd, GET_GRPOPEN, cookie_buf); which returns the filehandle,stateid, and layout map from the supervisor pNFS. c) the supervisor code passes filehandle, stateid, and layout map to each compute node which calls fcntl(fd, SET_GRPOPEN, cookie_buf); the pNFS compute node client receives the filehandle, stateid, and layout map. performs a local open (nothing need go across the wire) stuffing the filehandle, stateid, and layout map into it's state tree just as if an across the wire OPEN/DELEG_ASK/WRITE_IND occured. d) compute clients use SET_GRPOPEN filehandle, stateid and map to directly write the data to the appropriate NAS/SAN - what besides the filehandle, stateid, and layout map is needed? - when done writing, each compute client issues a COMMIT_IND. e) when compute clients have flushed all data back to the file, supervisor issues a compound with CLOSE Yahoo! Groups Links From bhalevy@panasas.com Wed Mar 17 12:48:55 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 35299 invoked from network); 17 Mar 2004 20:48:53 -0000 Received: from unknown (66.218.66.218) by m16.grp.scd.yahoo.com with QMQP; 17 Mar 2004 20:48:53 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 17 Mar 2004 20:48:52 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Wed, 17 Mar 2004 15:48:51 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D38938@PIKES.panasas.com> To: "'pnfs-reqs@yahoogroups.com'" , pnfs-ops@yahoogroups.com Date: Wed, 17 Mar 2004 15:48:50 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy I completely agree with Dave and I certainly don't think he's crazy. I perceive this solution as a "clustered" implementation of a nfsv4 client in which the v4 drivers in the client cluster are cooperating and propagating state (e.g. file handles, stateids) among each other. I believe that the server should not be able to distinguish such client from a multi-homed client that may have several ip addresses. In the nfsv4 sessions world a (clustered) client may open multiple connections to the server that are associated with the same session - this will make life for such client even easier, I hope. Benny >-----Original Message----- >From: Noveck, Dave [mailto:dnoveck@netapp.com] >Sent: Wednesday, March 17, 2004 2:43 PM >To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com >Subject: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open > > >Andy wrote: >> Do worry about this: is there anything stoping a compute node from >using >> OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as >described >> below? > >I'm going to say "No". I know this wasn't the answer that I >gave at the > >conference call, (and might not be the answer I give at the next >conference >call :-), but listen to my reasoning before you decide I'm crazy. > >In order to resolve this issue it is necessary to get all philosophical >and address the question "What is a computer?". I know lots of people >have already hit delete but I hope somebody is still reading. > >Suppose I have an application cluster with 1K nodes and I put on my >marketing >hat (Gee, I hope I don't need a marketing jacket and tie, too :-) and >say "This is really a powerful computer with a thousand (maybe two >thousand) >CPU's". Now that's marketing bullshit but it isn't exactly false. >There >are certainly tasks where you want a large number of CPU's sharing >memory >and a DSM arrangement's performance is going to suck. On the other >hand, >there are applications where having a thousand memories is going to be >much >better than trying to provide adequate memory bandwidth from a single >memory >to many many CPU's. > >So what's the point? I think the point is that as far as the >NFS server >is >concerned, whether the computer that is talking to it is "really" a >computer, >i.e. it has CPU's sharing memory, or is only a computer qua marketing >bullshit, >i.e. a collection of cpu's that don't share memory, that use other >methods to >co-ordinate common activities, doesn't matter. All the server sees are >the >requests made and if the cluster represents itself as a single machine >(i.e. >in V4 does a single SETCLIENTID or in v4.1 maintains many connections >bound to >a single session), it is one. The server doesn't see the cluster's >memory >architecture. It sees an open and then use of that that stateid. The >fact >that it comes over a different IP address doesn't disqualify it. A >server >might have options to check that (as a matter of security) but it isn't >part >of the protocol and we already have clients with multiple IP addresses. >Having >a thousand of them is a difference of degree (and may pose >implementation >issues) but I don't see a real protocol issue. > >OK. Now you can decide if I'm crazy. > >-----Original Message----- >From: William A.(Andy) Adamson [mailto:andros@citi.umich.edu] >Sent: Wednesday, March 17, 2004 12:44 PM >To: pnfs-reqs@yahoogroups.com >Cc: pnfs-ops@yahoogroups.com; andros@citi.umich.edu >Subject: [pnfs-ops] pNFS, MPIO, and client group open > > >Sorry for the long email :) > >At the conclusion of the NEPS conference last November, Brent Welch >emailed >his notes as a starting point for a requirements document (attached). I >use >his pNFS extention language to describe a pNFS client using a 'normal >open' >servicing an open/write/close with direct access, and a large MPIO >application >using a proposed 'group open'. > >I note that my knowledge of parallel filesystems is growing, so please >excuse >any misconceptions, comments welcome... > >The architecture i'm picturing is a large cluster with a Parallel File >System >(PFS) >consisting of PFS Meta data servers(PFS MD) and PFS NAS/SAN. I >know it's >only >one of many architectures the pNFS set of extensions is trying to >address. > >1000's of pNFS clients >10's of pNFSd, one per PFS MD >100's NAS/SAN > > >'Normal' open >******************** >a) pNFS client issues a compound to one pNFSd consisting of: >OPEN with share: Access/Deny > Multiple pNFSds need to resolve share. >DELEG_ASK: Request Byte-range Delegation > Multiple pNFSds need to resolve delegation >READ/WRITE_IND request direct data access > pNFSd queries PFS MD to get location map > >b) pNFS client can then issue READ/WRITE directly to the NAS/SAN using >the map >returned in READ/WRITE_IND. > >c) pNFS client issues a compound to one pNFSd consisting of: >COMMIT_IND: >CLOSE > > >An MPIO application opens one very large file, shared by 1000's of >compute >clients. Each compute client manipulates its portion of the file. The >MPIO >layer manages compute clients so that no client shares a byte range of >the >file with another. > >This MPIO application consists of > - supervisor code running on 1 MPIO supervisor node > - compute code running on 1000's of MPIO compute nodes > >This MPIO application has cyclic behavior. >I) Read initial data >II) compute intermediate result >III) wait for other compute nodes to finish computing >IV) all compute nodes write to file (their portion) >V) compute nodes trade 'edge conditions' >VI) goto II (compute). > >While the application is not in IV (writing), another application, say >the >visualizer, needs READ access to the file in order to crunch it for >visualization. Visualization is needed to tell if the MPIO application >intermediate results are converging on a solution. > >If in step IV all the compute nodes open/write/close as described above >as the >Normal open, the pNFSds will be doing a lot of metadata processing: >resolving >share and delegation state between themselves as well as delivering per >byte-range layout info. The group open is designed to reduce the >metadata >processing from 1000's to one. > >I mention a couple of new fcntls used by the MPIO layer to communicate >pNFS >state from the supervisor node to the compute nodes. Don't worry about >that(!). > >Do worry about this: is there anything stoping a compute node >from using >OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as >described >below? If so, are the changes to pNFS to make this work small enough to >be >considered at this time? > >Group Open >********** >step IV: supervisor OPENs file, all compute clients write file, >supervisor >CLOSES file. > >specifically: >a) supervisor issues a compound with >OPEN: Access - Both, Deny Both to a pNFSd > - pNFSds need to resolve the share > - is this a normal nfsv4 OPEN? does pNFSd or the PFS need to >know >about the other compute clients? > - do we need the concept of a group clientid? >DELEG_ASK: supervisor asks for WRITE delegation which should be > granted given the OPEN Access-Both, Deny-Both share. > - pNFSds need to resolve delegation request >WRITE_IND: supervisor gets whole file layout info > >b) supervisor calls > fcntl(fd, GET_GRPOPEN, cookie_buf); > which returns the filehandle,stateid, and layout map from the >supervisor pNFS. > >c) the supervisor code passes filehandle, stateid, and layout map to >each >compute >node which calls > fcntl(fd, SET_GRPOPEN, cookie_buf); >the pNFS compute node client receives the filehandle, stateid, and >layout map. >performs a local open (nothing need go across the wire) stuffing the >filehandle, stateid, and layout map into it's state tree just as if an >across >the wire OPEN/DELEG_ASK/WRITE_IND occured. > >d) compute clients use SET_GRPOPEN filehandle, stateid and map to >directly >write the data to the appropriate NAS/SAN > - what besides the filehandle, stateid, and layout map is >needed? > - when done writing, each compute client issues a COMMIT_IND. > >e) when compute clients have flushed all data back to the file, >supervisor >issues a compound with > >CLOSE > > > > >Yahoo! Groups Links > > > > > > > >Yahoo! Groups Links > > > > > From garth@panasas.com Wed Mar 17 21:17:49 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 7857 invoked from network); 18 Mar 2004 05:17:45 -0000 Received: from unknown (66.218.66.172) by m9.grp.scd.yahoo.com with QMQP; 18 Mar 2004 05:17:45 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 18 Mar 2004 05:17:47 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GT4QKPM4; Thu, 18 Mar 2004 00:17:46 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Transfer-Encoding: 7bit Message-Id: <90E9DAD9-789B-11D8-ACBD-000A95A94F04@panasas.com> Content-Type: text/plain; charset=US-ASCII; format=flowed To: pnfs-reqs@yahoogroups.com Date: Thu, 18 Mar 2004 00:17:35 -0500 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Concall notes from last week's call (Mar 11 2004, 11am EST) X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson FAST BOF planning We have the Dolores room from 9am to 12 noon on Wed 3/31. It will be set for 50 people, continental breakfast, LCD projector, screen and microphone. We have requested that this be extended until 2pm, which is when the FAST conference starts. The 9am - 12 noon time slot overlaps the last two sessions of the NSDI conference 9-10:30 and 11-12:30 (agendas attached below). There is a special registration deal for FAST attendees to go to these Wed morning sessions for free, so we should expect many/most will be doing just that. Some of us even. We plan to accommodate this conflict by moving the wide open (no FAST registration needed) BOF to the lunch period between NSDI and FAST, 12:30 - 2 pm on Wed. We will need to arrange lunch food, and *** HELP NEEDED HERE *** find funds for it. The contents of this wide open BOF are educational and direction setting, not technical debates (though they might so become with an interesting audience). We propose to plan the 90 mins something like this: Reprise Seoul IETF pitch: 20-30 mins (Tom would be best, but won't be there) Then 3-6 10-min pitches that cover parts of the possible solutions and show the range of active contributors. This is to generally show common focus and the intention to make progress, but if they divide up the proposed multiple backends and the core ops, that would be great too. So far: - Peter Corbett, NetApp - Brent Welch, Panasas - Peter Honeyman or Andy Adamson, CITI If time is left, we suggest a panel session of all speakers for general Q&A. I should note that I am the keynote speaker for FAST, which will be 9 am on Thurs 4/1. I intend to pitch this activity in a portion of my talk, so the entire FAST audience (I hope) will catch at least the highest level pitch. ================================================= We need an advertisement plan for this BOF. Posters at FAST and NSDI, probably. Maybe an email from FAST to registered attendees? Email on the NFSv4 reflector. Email to SNIA NAS working group and SPEC SFS committees? Other suggestions? ================================================= We do not plan to give back the Dolores room from 9 to 12:30. We think that we should use this for a working meeting of the pNFS participants. Not closed, per se, but announced only on these distribution lists. Seems like we should not waste an opportunity for face-to-face debate on NFSv4 extension operations, requirements text and backend metadata formats. ================================================= Last NSDI sessions, Wed 3/31 9-10:30 am: Miguel Castro, chair Session State: Beyond Soft State Benjamin C. Ling, Emre Kiciman, and Armando Fox, Stanford University Path-Based Failure and Evolution Management Mike Y. Chen, University of California, Berkeley; Anthony Accardi, Tellme; Emre Kiciman, Stanford University; Dave Patterson, University of California, Berkeley; Armando Fox, Stanford University; Eric Brewer, University of California, Berkeley Consistent and Automatic Replica Regeneration Haifeng Yu, Intel Research Pittsburgh and Carnegie Mellon University; Amin Vahdat, University of California, San Diego 11-12:30: Jeff Chase, chair Total Recall: System Support for Automated Availability Management Ranjita Bhagwan, Kiran Tati, Yu-Chung Cheng, Stefan Savage, and Geoffrey M. Voelker, University of California, San Diego TimeLine: A High Performance Archive for a Distributed Object Store Chuang-Hue Moh and Barbara Liskov, MIT Computer Science and Artificial Intelligence Laboratory Explicit Control in the Batch-Aware Distributed File System John Bent, Douglas Thain, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Miron Livny, University of Wisconsin, Madison ========================================== Seoul debrief (Garth's impressions from the reports of others) - success -- area director Jon Peterson is now aware of the proposal to extend the charter of NFSv4 WG, and although in general IETF directors are discouraging long lived working groups, in this case he is receptive - next steps are to work with WG chairs to persuade and assist their deliberation on this proposal -- we don't know how long this step is, but it is not short garth From dhildebz@eecs.umich.edu Fri Mar 19 09:21:25 2004 Return-Path: X-Sender: dhildebz@eecs.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 99385 invoked from network); 19 Mar 2004 17:21:24 -0000 Received: from unknown (66.218.66.217) by m18.grp.scd.yahoo.com with QMQP; 19 Mar 2004 17:21:24 -0000 Received: from unknown (HELO willow.eecs.umich.edu) (141.213.4.14) by mta2.grp.scd.yahoo.com with SMTP; 19 Mar 2004 17:21:24 -0000 Received: from willow.eecs.umich.edu (localhost.eecs.umich.edu [127.0.0.1]) by willow.eecs.umich.edu (8.12.11/8.12.11) with ESMTP id i2JHKtdm013621 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 19 Mar 2004 12:20:56 -0500 Received: from localhost (dhildebz@localhost) by willow.eecs.umich.edu (8.12.11/8.12.11/Submit) with ESMTP id i2JHKtSS013618; Fri, 19 Mar 2004 12:20:55 -0500 X-Authentication-Warning: willow.eecs.umich.edu: dhildebz owned process doing -bs Date: Fri, 19 Mar 2004 12:20:55 -0500 (EST) To: "'pnfs-reqs@yahoogroups.com'" Cc: pnfs-ops@yahoogroups.com In-Reply-To: <30489F1321F5C343ACF6872B2CF7942A05D38938@PIKES.panasas.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-eGroups-Remote-IP: 141.213.4.14 From: Dean Hildebrand Subject: RE: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open X-Yahoo-Group-Post: member; u=169352062 X-Yahoo-Profile: seattleplus Hi Benny, Did you mean, 'In the nfsv4 sessions world a (clustered) client may open simultaneous connections to servers associated with the same session' or 'In the nfsv4 sessions world a (clustered) client may open multiple simultaneous connections to a server that is associated with the same session' I'm assuming the first as I'm not even sure what the second one means...but I do not know a lot of about sessions. Dean On Wed, 17 Mar 2004, Halevy, Benny wrote: > I completely agree with Dave and I certainly don't think he's > crazy. > > I perceive this solution as a "clustered" implementation of a > nfsv4 client in which the v4 drivers in the client cluster are > cooperating and propagating state (e.g. file handles, stateids) > among each other. > > I believe that the server should not be able to distinguish > such client from a multi-homed client that may have several > ip addresses. > > In the nfsv4 sessions world a (clustered) client may open > multiple connections to the server that are associated with > the same session - this will make life for such client even > easier, I hope. > > Benny > > >-----Original Message----- > >From: Noveck, Dave [mailto:dnoveck@netapp.com] > >Sent: Wednesday, March 17, 2004 2:43 PM > >To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com > >Subject: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open > > > > > >Andy wrote: > >> Do worry about this: is there anything stoping a compute node from > >using > >> OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as > >described > >> below? > > > >I'm going to say "No". I know this wasn't the answer that I > >gave at the > > > >conference call, (and might not be the answer I give at the next > >conference > >call :-), but listen to my reasoning before you decide I'm crazy. > > > >In order to resolve this issue it is necessary to get all philosophical > >and address the question "What is a computer?". I know lots of people > >have already hit delete but I hope somebody is still reading. > > > >Suppose I have an application cluster with 1K nodes and I put on my > >marketing > >hat (Gee, I hope I don't need a marketing jacket and tie, too :-) and > >say "This is really a powerful computer with a thousand (maybe two > >thousand) > >CPU's". Now that's marketing bullshit but it isn't exactly false. > >There > >are certainly tasks where you want a large number of CPU's sharing > >memory > >and a DSM arrangement's performance is going to suck. On the other > >hand, > >there are applications where having a thousand memories is going to be > >much > >better than trying to provide adequate memory bandwidth from a single > >memory > >to many many CPU's. > > > >So what's the point? I think the point is that as far as the > >NFS server > >is > >concerned, whether the computer that is talking to it is "really" a > >computer, > >i.e. it has CPU's sharing memory, or is only a computer qua marketing > >bullshit, > >i.e. a collection of cpu's that don't share memory, that use other > >methods to > >co-ordinate common activities, doesn't matter. All the server sees are > >the > >requests made and if the cluster represents itself as a single machine > >(i.e. > >in V4 does a single SETCLIENTID or in v4.1 maintains many connections > >bound to > >a single session), it is one. The server doesn't see the cluster's > >memory > >architecture. It sees an open and then use of that that stateid. The > >fact > >that it comes over a different IP address doesn't disqualify it. A > >server > >might have options to check that (as a matter of security) but it isn't > >part > >of the protocol and we already have clients with multiple IP addresses. > >Having > >a thousand of them is a difference of degree (and may pose > >implementation > >issues) but I don't see a real protocol issue. > > > >OK. Now you can decide if I'm crazy. > > > >-----Original Message----- > >From: William A.(Andy) Adamson [mailto:andros@citi.umich.edu] > >Sent: Wednesday, March 17, 2004 12:44 PM > >To: pnfs-reqs@yahoogroups.com > >Cc: pnfs-ops@yahoogroups.com; andros@citi.umich.edu > >Subject: [pnfs-ops] pNFS, MPIO, and client group open > > > > > >Sorry for the long email :) > > > >At the conclusion of the NEPS conference last November, Brent Welch > >emailed > >his notes as a starting point for a requirements document (attached). I > >use > >his pNFS extention language to describe a pNFS client using a 'normal > >open' > >servicing an open/write/close with direct access, and a large MPIO > >application > >using a proposed 'group open'. > > > >I note that my knowledge of parallel filesystems is growing, so please > >excuse > >any misconceptions, comments welcome... > > > >The architecture i'm picturing is a large cluster with a Parallel File > >System > >(PFS) > >consisting of PFS Meta data servers(PFS MD) and PFS NAS/SAN. I > >know it's > >only > >one of many architectures the pNFS set of extensions is trying to > >address. > > > >1000's of pNFS clients > >10's of pNFSd, one per PFS MD > >100's NAS/SAN > > > > > >'Normal' open > >******************** > >a) pNFS client issues a compound to one pNFSd consisting of: > >OPEN with share: Access/Deny > > Multiple pNFSds need to resolve share. > >DELEG_ASK: Request Byte-range Delegation > > Multiple pNFSds need to resolve delegation > >READ/WRITE_IND request direct data access > > pNFSd queries PFS MD to get location map > > > >b) pNFS client can then issue READ/WRITE directly to the NAS/SAN using > >the map > >returned in READ/WRITE_IND. > > > >c) pNFS client issues a compound to one pNFSd consisting of: > >COMMIT_IND: > >CLOSE > > > > > >An MPIO application opens one very large file, shared by 1000's of > >compute > >clients. Each compute client manipulates its portion of the file. The > >MPIO > >layer manages compute clients so that no client shares a byte range of > >the > >file with another. > > > >This MPIO application consists of > > - supervisor code running on 1 MPIO supervisor node > > - compute code running on 1000's of MPIO compute nodes > > > >This MPIO application has cyclic behavior. > >I) Read initial data > >II) compute intermediate result > >III) wait for other compute nodes to finish computing > >IV) all compute nodes write to file (their portion) > >V) compute nodes trade 'edge conditions' > >VI) goto II (compute). > > > >While the application is not in IV (writing), another application, say > >the > >visualizer, needs READ access to the file in order to crunch it for > >visualization. Visualization is needed to tell if the MPIO application > >intermediate results are converging on a solution. > > > >If in step IV all the compute nodes open/write/close as described above > >as the > >Normal open, the pNFSds will be doing a lot of metadata processing: > >resolving > >share and delegation state between themselves as well as delivering per > >byte-range layout info. The group open is designed to reduce the > >metadata > >processing from 1000's to one. > > > >I mention a couple of new fcntls used by the MPIO layer to communicate > >pNFS > >state from the supervisor node to the compute nodes. Don't worry about > >that(!). > > > >Do worry about this: is there anything stoping a compute node > >from using > >OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as > >described > >below? If so, are the changes to pNFS to make this work small enough to > >be > >considered at this time? > > > >Group Open > >********** > >step IV: supervisor OPENs file, all compute clients write file, > >supervisor > >CLOSES file. > > > >specifically: > >a) supervisor issues a compound with > >OPEN: Access - Both, Deny Both to a pNFSd > > - pNFSds need to resolve the share > > - is this a normal nfsv4 OPEN? does pNFSd or the PFS need to > >know > >about the other compute clients? > > - do we need the concept of a group clientid? > >DELEG_ASK: supervisor asks for WRITE delegation which should be > > granted given the OPEN Access-Both, Deny-Both share. > > - pNFSds need to resolve delegation request > >WRITE_IND: supervisor gets whole file layout info > > > >b) supervisor calls > > fcntl(fd, GET_GRPOPEN, cookie_buf); > > which returns the filehandle,stateid, and layout map from the > >supervisor pNFS. > > > >c) the supervisor code passes filehandle, stateid, and layout map to > >each > >compute > >node which calls > > fcntl(fd, SET_GRPOPEN, cookie_buf); > >the pNFS compute node client receives the filehandle, stateid, and > >layout map. > >performs a local open (nothing need go across the wire) stuffing the > >filehandle, stateid, and layout map into it's state tree just as if an > >across > >the wire OPEN/DELEG_ASK/WRITE_IND occured. > > > >d) compute clients use SET_GRPOPEN filehandle, stateid and map to > >directly > >write the data to the appropriate NAS/SAN > > - what besides the filehandle, stateid, and layout map is > >needed? > > - when done writing, each compute client issues a COMMIT_IND. > > > >e) when compute clients have flushed all data back to the file, > >supervisor > >issues a compound with > > > >CLOSE > > > > > > > > > >Yahoo! Groups Links > > > > > > > > > > > > > > > >Yahoo! Groups Links > > > > > > > > > > > > Yahoo! Groups Sponsor > ADVERTISEMENT > click here > > ________________________________________________________________________________ > Yahoo! Groups Links > * To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > > * To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > > * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. > > From bhalevy@panasas.com Fri Mar 19 11:32:41 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 44665 invoked from network); 19 Mar 2004 19:32:40 -0000 Received: from unknown (66.218.66.167) by m18.grp.scd.yahoo.com with QMQP; 19 Mar 2004 19:32:40 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 19 Mar 2004 19:32:39 -0000 Received: by PIKES.panasas.com with Internet Mail Service (5.5.2653.19) id ; Fri, 19 Mar 2004 14:32:35 -0500 Message-ID: <30489F1321F5C343ACF6872B2CF7942A05D38951@PIKES.panasas.com> To: "'dhildebz@eecs.umich.edu'" Cc: "'pnfs-ops@yahoogroups.com'" , "'pnfs-reqs@yahoogroups.com'" Date: Fri, 19 Mar 2004 14:32:26 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-eGroups-Remote-IP: 65.194.124.178 From: "Halevy, Benny" Subject: RE: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy ADVERTISEMENT What I meant was In the nfsv4 sessions world a (clustered) client may open simultaneous connections to a server that are associated with the same session. Simply put: one session can have multiple connections associated with it. The proposed NFSv4 session model (http://www.ietf.org/internet-drafts/draft-talpey-nfsv4-rdma-sess-01.txt) have another abstraction, a channel, that needs to be thought of too. My intuition is that for a clustered client, which I think of as a single logical NFSv4 client, (i.e. all nfsv4 client instances share the same client id and state), we definitely want all connections to bind to the same session. It makes a lot of sense in this architecture to have separate operations channels and back channels for the client hosts, one or more of each per host. Assuming the client hosts do not share memory, it will be a burden on them to manage the per-channel resource management state. Benny >-----Original Message----- >From: Dean Hildebrand [mailto:dhildebz@eecs.umich.edu] >Sent: Friday, March 19, 2004 12:21 PM >To: 'pnfs-reqs@yahoogroups.com' >Cc: pnfs-ops@yahoogroups.com >Subject: RE: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group >open > > >Hi Benny, >Did you mean, > 'In the nfsv4 sessions world a (clustered) client may open > simultaneous connections to servers associated with the same session' >or > 'In the nfsv4 sessions world a (clustered) client may open > multiple simultaneous connections to a server that is associated with > the same session' > >I'm assuming the first as I'm not even sure what the second one >means...but I do not know a lot of about sessions. >Dean > >On Wed, 17 Mar 2004, Halevy, Benny wrote: > >> I completely agree with Dave and I certainly don't think he's >> crazy. >> >> I perceive this solution as a "clustered" implementation of a >> nfsv4 client in which the v4 drivers in the client cluster are >> cooperating and propagating state (e.g. file handles, stateids) >> among each other. >> >> I believe that the server should not be able to distinguish >> such client from a multi-homed client that may have several >> ip addresses. >> >> In the nfsv4 sessions world a (clustered) client may open >> multiple connections to the server that are associated with >> the same session - this will make life for such client even >> easier, I hope. >> >> Benny >> >> >-----Original Message----- >> >From: Noveck, Dave [mailto:dnoveck@netapp.com] >> >Sent: Wednesday, March 17, 2004 2:43 PM >> >To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com >> >Subject: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client >group open >> > >> > >> >Andy wrote: >> >> Do worry about this: is there anything stoping a compute node from >> >using >> >> OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as >> >described >> >> below? >> > >> >I'm going to say "No". I know this wasn't the answer that I >> >gave at the >> > >> >conference call, (and might not be the answer I give at the next >> >conference >> >call :-), but listen to my reasoning before you decide I'm crazy. >> > >> >In order to resolve this issue it is necessary to get all >philosophical >> >and address the question "What is a computer?". I know >lots of people >> >have already hit delete but I hope somebody is still reading. >> > >> >Suppose I have an application cluster with 1K nodes and I put on my >> >marketing >> >hat (Gee, I hope I don't need a marketing jacket and tie, >too :-) and >> >say "This is really a powerful computer with a thousand (maybe two >> >thousand) >> >CPU's". Now that's marketing bullshit but it isn't exactly false. >> >There >> >are certainly tasks where you want a large number of CPU's sharing >> >memory >> >and a DSM arrangement's performance is going to suck. On the other >> >hand, >> >there are applications where having a thousand memories is >going to be >> >much >> >better than trying to provide adequate memory bandwidth >from a single >> >memory >> >to many many CPU's. >> > >> >So what's the point? I think the point is that as far as the >> >NFS server >> >is >> >concerned, whether the computer that is talking to it is "really" a >> >computer, >> >i.e. it has CPU's sharing memory, or is only a computer qua >marketing >> >bullshit, >> >i.e. a collection of cpu's that don't share memory, that use other >> >methods to >> >co-ordinate common activities, doesn't matter. All the >server sees are >> >the >> >requests made and if the cluster represents itself as a >single machine >> >(i.e. >> >in V4 does a single SETCLIENTID or in v4.1 maintains many >connections >> >bound to >> >a single session), it is one. The server doesn't see the cluster's >> >memory >> >architecture. It sees an open and then use of that that >stateid. The >> >fact >> >that it comes over a different IP address doesn't disqualify it. A >> >server >> >might have options to check that (as a matter of security) >but it isn't >> >part >> >of the protocol and we already have clients with multiple >IP addresses. >> >Having >> >a thousand of them is a difference of degree (and may pose >> >implementation >> >issues) but I don't see a real protocol issue. >> > >> >OK. Now you can decide if I'm crazy. >> > >> >-----Original Message----- >> >From: William A.(Andy) Adamson [mailto:andros@citi.umich.edu] >> >Sent: Wednesday, March 17, 2004 12:44 PM >> >To: pnfs-reqs@yahoogroups.com >> >Cc: pnfs-ops@yahoogroups.com; andros@citi.umich.edu >> >Subject: [pnfs-ops] pNFS, MPIO, and client group open >> > >> > >> >Sorry for the long email :) >> > >> >At the conclusion of the NEPS conference last November, Brent Welch >> >emailed >> >his notes as a starting point for a requirements document >(attached). I >> >use >> >his pNFS extention language to describe a pNFS client using >a 'normal >> >open' >> >servicing an open/write/close with direct access, and a large MPIO >> >application >> >using a proposed 'group open'. >> > >> >I note that my knowledge of parallel filesystems is >growing, so please >> >excuse >> >any misconceptions, comments welcome... >> > >> >The architecture i'm picturing is a large cluster with a >Parallel File >> >System >> >(PFS) >> >consisting of PFS Meta data servers(PFS MD) and PFS NAS/SAN. I >> >know it's >> >only >> >one of many architectures the pNFS set of extensions is trying to >> >address. >> > >> >1000's of pNFS clients >> >10's of pNFSd, one per PFS MD >> >100's NAS/SAN >> > >> > >> >'Normal' open >> >******************** >> >a) pNFS client issues a compound to one pNFSd consisting of: >> >OPEN with share: Access/Deny >> > Multiple pNFSds need to resolve share. >> >DELEG_ASK: Request Byte-range Delegation >> > Multiple pNFSds need to resolve delegation >> >READ/WRITE_IND request direct data access >> > pNFSd queries PFS MD to get location map >> > >> >b) pNFS client can then issue READ/WRITE directly to the >NAS/SAN using >> >the map >> >returned in READ/WRITE_IND. >> > >> >c) pNFS client issues a compound to one pNFSd consisting of: >> >COMMIT_IND: >> >CLOSE >> > >> > >> >An MPIO application opens one very large file, shared by 1000's of >> >compute >> >clients. Each compute client manipulates its portion of the >file. The >> >MPIO >> >layer manages compute clients so that no client shares a >byte range of >> >the >> >file with another. >> > >> >This MPIO application consists of >> > - supervisor code running on 1 MPIO supervisor node >> > - compute code running on 1000's of MPIO compute nodes >> > >> >This MPIO application has cyclic behavior. >> >I) Read initial data >> >II) compute intermediate result >> >III) wait for other compute nodes to finish computing >> >IV) all compute nodes write to file (their portion) >> >V) compute nodes trade 'edge conditions' >> >VI) goto II (compute). >> > >> >While the application is not in IV (writing), another >application, say >> >the >> >visualizer, needs READ access to the file in order to crunch it for >> >visualization. Visualization is needed to tell if the MPIO >application >> >intermediate results are converging on a solution. >> > >> >If in step IV all the compute nodes open/write/close as >described above >> >as the >> >Normal open, the pNFSds will be doing a lot of metadata processing: >> >resolving >> >share and delegation state between themselves as well as >delivering per >> >byte-range layout info. The group open is designed to reduce the >> >metadata >> >processing from 1000's to one. >> > >> >I mention a couple of new fcntls used by the MPIO layer to >communicate >> >pNFS >> >state from the supervisor node to the compute nodes. Don't >worry about >> >that(!). >> > >> >Do worry about this: is there anything stoping a compute node >> >from using >> >OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as >> >described >> >below? If so, are the changes to pNFS to make this work >small enough to >> >be >> >considered at this time? >> > >> >Group Open >> >********** >> >step IV: supervisor OPENs file, all compute clients write file, >> >supervisor >> >CLOSES file. >> > >> >specifically: >> >a) supervisor issues a compound with >> >OPEN: Access - Both, Deny Both to a pNFSd >> > - pNFSds need to resolve the share >> > - is this a normal nfsv4 OPEN? does pNFSd or the PFS need to >> >know >> >about the other compute clients? >> > - do we need the concept of a group clientid? >> >DELEG_ASK: supervisor asks for WRITE delegation which should be >> > granted given the OPEN Access-Both, Deny-Both share. >> > - pNFSds need to resolve delegation request >> >WRITE_IND: supervisor gets whole file layout info >> > >> >b) supervisor calls >> > fcntl(fd, GET_GRPOPEN, cookie_buf); >> > which returns the filehandle,stateid, and layout >map from the >> >supervisor pNFS. >> > >> >c) the supervisor code passes filehandle, stateid, and layout map to >> >each >> >compute >> >node which calls >> > fcntl(fd, SET_GRPOPEN, cookie_buf); >> >the pNFS compute node client receives the filehandle, stateid, and >> >layout map. >> >performs a local open (nothing need go across the wire) stuffing the >> >filehandle, stateid, and layout map into it's state tree >just as if an >> >across >> >the wire OPEN/DELEG_ASK/WRITE_IND occured. >> > >> >d) compute clients use SET_GRPOPEN filehandle, stateid and map to >> >directly >> >write the data to the appropriate NAS/SAN >> > - what besides the filehandle, stateid, and layout map is >> >needed? >> > - when done writing, each compute client issues a >COMMIT_IND. >> > >> >e) when compute clients have flushed all data back to the file, >> >supervisor >> >issues a compound with >> > >> >CLOSE >> > >> > >> > >> > >> >Yahoo! Groups Links >> > >> > >> > >> > >> > >> > >> > >> >Yahoo! Groups Links >> > >> > >> > >> > >> > >> >> Yahoo! Groups Sponsor >> ADVERTISEMENT >> click here >> >> >_______________________________________________________________ >_________________ >> Yahoo! Groups Links >> * To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >> * To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >> * Your use of Yahoo! Groups is subject to the Yahoo! Terms >of Service. >> >> > > > >------------------------ Yahoo! Groups Sponsor >---------------------~--> >Upgrade to 128-bit SSL Security! >http://us.click.yahoo.com/LPJzrA/yjVHAA/TtwFAA/W6uqlB/TM >--------------------------------------------------------------- >------~-> > > >Yahoo! Groups Links > > > > > From Thomas.Talpey@netapp.com Fri Mar 19 20:00:18 2004 Return-Path: X-Sender: Thomas.Talpey@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 80343 invoked from network); 20 Mar 2004 04:00:16 -0000 Received: from unknown (66.218.66.167) by m17.grp.scd.yahoo.com with QMQP; 20 Mar 2004 04:00:16 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta6.grp.scd.yahoo.com with SMTP; 20 Mar 2004 04:00:16 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i2K40FZh029459; Fri, 19 Mar 2004 20:00:15 -0800 (PST) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.57.156.135]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i2K404Tt010051; Fri, 19 Mar 2004 20:00:15 -0800 (PST) Received: from tmt.netapp.com ([10.97.6.35]) by silver.hq.netapp.com with Microsoft SMTPSVC(5.0.2195.5329); Fri, 19 Mar 2004 22:59:53 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C40E2F.CC79C280" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message Date: Fri, 19 Mar 2004 18:09:55 -0800 Message-ID: <6.0.3.0.2.20040319210159.01ec0508@silver.nane.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-ops] pNFS, MPIO, and client group open Thread-Index: AcQOL8zPrk1+CSVUSeO7GdDagJ+XWg== To: Cc: X-eGroups-Remote-IP: 198.95.226.53 From: "Talpey, Thomas" Subject: RE: [pnfs-ops] pNFS, MPIO, and client group open X-Yahoo-Group-Post: member; u=44154239 X-Yahoo-Profile: tmtymailu I can tell you what the v4/sessions world might say - both. Currently the sessions proposal explores a single client opening multiple connections to a given server, then binding them together by a single session. This allows trunking, failover, etc. During the call I described that there is another possibility, that multiple servers can share a session. This would allow the client to stripe, in a similar way that the one-client-one-server trunks. In fact, we could also consider multiple clients sharing a session, but that makes my head hurt at the moment. Basically, you can think of a session as a mount point, abstracted to the server. The client would generally create one for each new mount, and bind both operation and callback channels to it. In the pNFS case, the server would exchange topology with the client, which in turn would lead to additional (parallel) pipes to the data being created and bound by the client. The important thing is that the whole picture hinges on the scope of the clientid (or sessionid). When we talk about a server-to-server protocol to allow standard server pooling, we effectively are making a way for this scope to be distributed. BTW I think we should defer that... Tom. [Do we really need to cc pnfs-reqs on these? Is everyone on both?] At 12:20 PM 3/19/2004, Dean Hildebrand wrote: >Hi Benny, >Did you mean, > 'In the nfsv4 sessions world a (clustered) client may open > simultaneous connections to servers associated with the same session' >or > 'In the nfsv4 sessions world a (clustered) client may open > multiple simultaneous connections to a server that is associated with > the same session' > >I'm assuming the first as I'm not even sure what the second one >means...but I do not know a lot of about sessions. >Dean > >On Wed, 17 Mar 2004, Halevy, Benny wrote: > >> I completely agree with Dave and I certainly don't think he's >> crazy. >> >> I perceive this solution as a "clustered" implementation of a >> nfsv4 client in which the v4 drivers in the client cluster are >> cooperating and propagating state (e.g. file handles, stateids) >> among each other. >> >> I believe that the server should not be able to distinguish >> such client from a multi-homed client that may have several >> ip addresses. >> >> In the nfsv4 sessions world a (clustered) client may open >> multiple connections to the server that are associated with >> the same session - this will make life for such client even >> easier, I hope. >> >> Benny >> >> >-----Original Message----- >> >From: Noveck, Dave [mailto:dnoveck@netapp.com] >> >Sent: Wednesday, March 17, 2004 2:43 PM >> >To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com >> >Subject: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open >> > >> > >> >Andy wrote: >> >> Do worry about this: is there anything stoping a compute node from >> >using >> >> OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as >> >described >> >> below? >> > >> >I'm going to say "No". I know this wasn't the answer that I >> >gave at the >> > >> >conference call, (and might not be the answer I give at the next >> >conference >> >call :-), but listen to my reasoning before you decide I'm crazy. >> > >> >In order to resolve this issue it is necessary to get all philosophical >> >and address the question "What is a computer?". I know lots of people >> >have already hit delete but I hope somebody is still reading. >> > >> >Suppose I have an application cluster with 1K nodes and I put on my >> >marketing >> >hat (Gee, I hope I don't need a marketing jacket and tie, too :-) and >> >say "This is really a powerful computer with a thousand (maybe two >> >thousand) >> >CPU's". Now that's marketing bullshit but it isn't exactly false. >> >There >> >are certainly tasks where you want a large number of CPU's sharing >> >memory >> >and a DSM arrangement's performance is going to suck. On the other >> >hand, >> >there are applications where having a thousand memories is going to be >> >much >> >better than trying to provide adequate memory bandwidth from a single >> >memory >> >to many many CPU's. >> > >> >So what's the point? I think the point is that as far as the >> >NFS server >> >is >> >concerned, whether the computer that is talking to it is "really" a >> >computer, >> >i.e. it has CPU's sharing memory, or is only a computer qua marketing >> >bullshit, >> >i.e. a collection of cpu's that don't share memory, that use other >> >methods to >> >co-ordinate common activities, doesn't matter. All the server sees are >> >the >> >requests made and if the cluster represents itself as a single machine >> >(i.e. >> >in V4 does a single SETCLIENTID or in v4.1 maintains many connections >> >bound to >> >a single session), it is one. The server doesn't see the cluster's >> >memory >> >architecture. It sees an open and then use of that that stateid. The >> >fact >> >that it comes over a different IP address doesn't disqualify it. A >> >server >> >might have options to check that (as a matter of security) but it isn't >> >part >> >of the protocol and we already have clients with multiple IP addresses. >> >Having >> >a thousand of them is a difference of degree (and may pose >> >implementation >> >issues) but I don't see a real protocol issue. >> > >> >OK. Now you can decide if I'm crazy. >> > >> >-----Original Message----- >> >From: William A.(Andy) Adamson [mailto:andros@citi.umich.edu] >> >Sent: Wednesday, March 17, 2004 12:44 PM >> >To: pnfs-reqs@yahoogroups.com >> >Cc: pnfs-ops@yahoogroups.com; andros@citi.umich.edu >> >Subject: [pnfs-ops] pNFS, MPIO, and client group open >> > >> > >> >Sorry for the long email :) >> > >> >At the conclusion of the NEPS conference last November, Brent Welch >> >emailed >> >his notes as a starting point for a requirements document (attached). I >> >use >> >his pNFS extention language to describe a pNFS client using a 'normal >> >open' >> >servicing an open/write/close with direct access, and a large MPIO >> >application >> >using a proposed 'group open'. >> > >> >I note that my knowledge of parallel filesystems is growing, so please >> >excuse >> >any misconceptions, comments welcome... >> > >> >The architecture i'm picturing is a large cluster with a Parallel File >> >System >> >(PFS) >> >consisting of PFS Meta data servers(PFS MD) and PFS NAS/SAN. I >> >know it's >> >only >> >one of many architectures the pNFS set of extensions is trying to >> >address. >> > >> >1000's of pNFS clients >> >10's of pNFSd, one per PFS MD >> >100's NAS/SAN >> > >> > >> >'Normal' open >> >******************** >> >a) pNFS client issues a compound to one pNFSd consisting of: >> >OPEN with share: Access/Deny >> > Multiple pNFSds need to resolve share. >> >DELEG_ASK: Request Byte-range Delegation >> > Multiple pNFSds need to resolve delegation >> >READ/WRITE_IND request direct data access >> > pNFSd queries PFS MD to get location map >> > >> >b) pNFS client can then issue READ/WRITE directly to the NAS/SAN using >> >the map >> >returned in READ/WRITE_IND. >> > >> >c) pNFS client issues a compound to one pNFSd consisting of: >> >COMMIT_IND: >> >CLOSE >> > >> > >> >An MPIO application opens one very large file, shared by 1000's of >> >compute >> >clients. Each compute client manipulates its portion of the file. The >> >MPIO >> >layer manages compute clients so that no client shares a byte range of >> >the >> >file with another. >> > >> >This MPIO application consists of >> > - supervisor code running on 1 MPIO supervisor node >> > - compute code running on 1000's of MPIO compute nodes >> > >> >This MPIO application has cyclic behavior. >> >I) Read initial data >> >II) compute intermediate result >> >III) wait for other compute nodes to finish computing >> >IV) all compute nodes write to file (their portion) >> >V) compute nodes trade 'edge conditions' >> >VI) goto II (compute). >> > >> >While the application is not in IV (writing), another application, say >> >the >> >visualizer, needs READ access to the file in order to crunch it for >> >visualization. Visualization is needed to tell if the MPIO application >> >intermediate results are converging on a solution. >> > >> >If in step IV all the compute nodes open/write/close as described above >> >as the >> >Normal open, the pNFSds will be doing a lot of metadata processing: >> >resolving >> >share and delegation state between themselves as well as delivering per >> >byte-range layout info. The group open is designed to reduce the >> >metadata >> >processing from 1000's to one. >> > >> >I mention a couple of new fcntls used by the MPIO layer to communicate >> >pNFS >> >state from the supervisor node to the compute nodes. Don't worry about >> >that(!). >> > >> >Do worry about this: is there anything stoping a compute node >> >from using >> >OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as >> >described >> >below? If so, are the changes to pNFS to make this work small enough to >> >be >> >considered at this time? >> > >> >Group Open >> >********** >> >step IV: supervisor OPENs file, all compute clients write file, >> >supervisor >> >CLOSES file. >> > >> >specifically: >> >a) supervisor issues a compound with >> >OPEN: Access - Both, Deny Both to a pNFSd >> > - pNFSds need to resolve the share >> > - is this a normal nfsv4 OPEN? does pNFSd or the PFS need to >> >know >> >about the other compute clients? >> > - do we need the concept of a group clientid? >> >DELEG_ASK: supervisor asks for WRITE delegation which should be >> > granted given the OPEN Access-Both, Deny-Both share. >> > - pNFSds need to resolve delegation request >> >WRITE_IND: supervisor gets whole file layout info >> > >> >b) supervisor calls >> > fcntl(fd, GET_GRPOPEN, cookie_buf); >> > which returns the filehandle,stateid, and layout map from the >> >supervisor pNFS. >> > >> >c) the supervisor code passes filehandle, stateid, and layout map to >> >each >> >compute >> >node which calls >> > fcntl(fd, SET_GRPOPEN, cookie_buf); >> >the pNFS compute node client receives the filehandle, stateid, and >> >layout map. >> >performs a local open (nothing need go across the wire) stuffing the >> >filehandle, stateid, and layout map into it's state tree just as if an >> >across >> >the wire OPEN/DELEG_ASK/WRITE_IND occured. >> > >> >d) compute clients use SET_GRPOPEN filehandle, stateid and map to >> >directly >> >write the data to the appropriate NAS/SAN >> > - what besides the filehandle, stateid, and layout map is >> >needed? >> > - when done writing, each compute client issues a COMMIT_IND. >> > >> >e) when compute clients have flushed all data back to the file, >> >supervisor >> >issues a compound with >> > >> >CLOSE >> > >> > >> > >> > >> >Yahoo! Groups Links >> > >> > >> > >> > >> > >> > >> > >> >Yahoo! Groups Links >> > >> > >> > >> > >> > >> >> Yahoo! Groups Sponsor >> ADVERTISEMENT >> click here >> >> >____________________________________________________________________________ >____ >> Yahoo! Groups Links >> * To visit your group on the web, go to: >> http://groups.yahoo.com/group/pnfs-reqs/ >> >> * To unsubscribe from this group, send an email to: >> pnfs-reqs-unsubscribe@yahoogroups.com >> >> * Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. >> >> > > > >------------------------ Yahoo! Groups Sponsor ---------------------~--> >Upgrade to 128-bit SSL Security! >http://us.click.yahoo.com/LPJzrA/yjVHAA/TtwFAA/W6uqlB/TM >---------------------------------------------------------------------~-> > > >Yahoo! Groups Links > ><*> To visit your group on the web, go to: > http://groups.yahoo.com/group/pnfs-reqs/ > ><*> To unsubscribe from this group, send an email to: > pnfs-reqs-unsubscribe@yahoogroups.com > ><*> Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > From julian_satran@il.ibm.com Sun Mar 21 15:10:05 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 77571 invoked from network); 21 Mar 2004 23:10:03 -0000 Received: from unknown (66.218.66.167) by m14.grp.scd.yahoo.com with QMQP; 21 Mar 2004 23:10:03 -0000 Received: from unknown (HELO mtagate3.de.ibm.com) (195.212.29.152) by mta6.grp.scd.yahoo.com with SMTP; 21 Mar 2004 23:09:55 -0000 Received: from d12relay02.megacenter.de.ibm.com (d12relay02.megacenter.de.ibm.com [9.149.165.196]) by mtagate3.de.ibm.com (8.12.10/8.12.10) with ESMTP id i2LN9ixJ130726; Sun, 21 Mar 2004 23:09:44 GMT Received: from d12ml102.megacenter.de.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay02.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i2LN9jgh085230; Mon, 22 Mar 2004 00:09:46 +0100 In-Reply-To: <30489F1321F5C343ACF6872B2CF7942A05D38938@PIKES.panasas.com> To: pnfs-ops@yahoogroups.com Cc: pnfs-ops@yahoogroups.com, "'pnfs-reqs@yahoogroups.com'" MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5.1 January 21, 2004 Message-ID: Date: Mon, 22 Mar 2004 01:11:48 +0200 X-MIMETrack: Serialize by Router on D12ML102/12/M/IBM(Release 6.0.2CF2|July 23, 2003) at 22/03/2004 01:11:58, Serialize complete at 22/03/2004 01:11:58 Content-Type: multipart/alternative; boundary="=_alternative 00608DF1C2256E5E_=" X-eGroups-Remote-IP: 195.212.29.152 From: Julian Satran Subject: RE: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran ADVERTISEMENT click here I am sure we all want to be aware of this "twist". It may be more than just putting a label on it - e.g., security protocols must be able to "delegate" credentials and that makes some bindings bad. Julo "Halevy, Benny" 17/03/04 22:48 Please respond to pnfs-ops To "'pnfs-reqs@yahoogroups.com'" , pnfs-ops@yahoogroups.com cc Subject RE: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open I completely agree with Dave and I certainly don't think he's crazy. I perceive this solution as a "clustered" implementation of a nfsv4 client in which the v4 drivers in the client cluster are cooperating and propagating state (e.g. file handles, stateids) among each other. I believe that the server should not be able to distinguish such client from a multi-homed client that may have several ip addresses. In the nfsv4 sessions world a (clustered) client may open multiple connections to the server that are associated with the same session - this will make life for such client even easier, I hope. Benny >-----Original Message----- >From: Noveck, Dave [mailto:dnoveck@netapp.com] >Sent: Wednesday, March 17, 2004 2:43 PM >To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com >Subject: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open > > >Andy wrote: >> Do worry about this: is there anything stoping a compute node from >using >> OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as >described >> below? > >I'm going to say "No". I know this wasn't the answer that I >gave at the > >conference call, (and might not be the answer I give at the next >conference >call :-), but listen to my reasoning before you decide I'm crazy. > >In order to resolve this issue it is necessary to get all philosophical >and address the question "What is a computer?". I know lots of people >have already hit delete but I hope somebody is still reading. > >Suppose I have an application cluster with 1K nodes and I put on my >marketing >hat (Gee, I hope I don't need a marketing jacket and tie, too :-) and >say "This is really a powerful computer with a thousand (maybe two >thousand) >CPU's". Now that's marketing bullshit but it isn't exactly false. >There >are certainly tasks where you want a large number of CPU's sharing >memory >and a DSM arrangement's performance is going to suck. On the other >hand, >there are applications where having a thousand memories is going to be >much >better than trying to provide adequate memory bandwidth from a single >memory >to many many CPU's. > >So what's the point? I think the point is that as far as the >NFS server >is >concerned, whether the computer that is talking to it is "really" a >computer, >i.e. it has CPU's sharing memory, or is only a computer qua marketing >bullshit, >i.e. a collection of cpu's that don't share memory, that use other >methods to >co-ordinate common activities, doesn't matter. All the server sees are >the >requests made and if the cluster represents itself as a single machine >(i.e. >in V4 does a single SETCLIENTID or in v4.1 maintains many connections >bound to >a single session), it is one. The server doesn't see the cluster's >memory >architecture. It sees an open and then use of that that stateid. The >fact >that it comes over a different IP address doesn't disqualify it. A >server >might have options to check that (as a matter of security) but it isn't >part >of the protocol and we already have clients with multiple IP addresses. >Having >a thousand of them is a difference of degree (and may pose >implementation >issues) but I don't see a real protocol issue. > >OK. Now you can decide if I'm crazy. > >-----Original Message----- >From: William A.(Andy) Adamson [mailto:andros@citi.umich.edu] >Sent: Wednesday, March 17, 2004 12:44 PM >To: pnfs-reqs@yahoogroups.com >Cc: pnfs-ops@yahoogroups.com; andros@citi.umich.edu >Subject: [pnfs-ops] pNFS, MPIO, and client group open > > >Sorry for the long email :) > >At the conclusion of the NEPS conference last November, Brent Welch >emailed >his notes as a starting point for a requirements document (attached). I >use >his pNFS extention language to describe a pNFS client using a 'normal >open' >servicing an open/write/close with direct access, and a large MPIO >application >using a proposed 'group open'. > >I note that my knowledge of parallel filesystems is growing, so please >excuse >any misconceptions, comments welcome... > >The architecture i'm picturing is a large cluster with a Parallel File >System >(PFS) >consisting of PFS Meta data servers(PFS MD) and PFS NAS/SAN. I >know it's >only >one of many architectures the pNFS set of extensions is trying to >address. > >1000's of pNFS clients >10's of pNFSd, one per PFS MD >100's NAS/SAN > > >'Normal' open >******************** >a) pNFS client issues a compound to one pNFSd consisting of: >OPEN with share: Access/Deny > Multiple pNFSds need to resolve share. >DELEG_ASK: Request Byte-range Delegation > Multiple pNFSds need to resolve delegation >READ/WRITE_IND request direct data access > pNFSd queries PFS MD to get location map > >b) pNFS client can then issue READ/WRITE directly to the NAS/SAN using >the map >returned in READ/WRITE_IND. > >c) pNFS client issues a compound to one pNFSd consisting of: >COMMIT_IND: >CLOSE > > >An MPIO application opens one very large file, shared by 1000's of >compute >clients. Each compute client manipulates its portion of the file. The >MPIO >layer manages compute clients so that no client shares a byte range of >the >file with another. > >This MPIO application consists of > - supervisor code running on 1 MPIO supervisor node > - compute code running on 1000's of MPIO compute nodes > >This MPIO application has cyclic behavior. >I) Read initial data >II) compute intermediate result >III) wait for other compute nodes to finish computing >IV) all compute nodes write to file (their portion) >V) compute nodes trade 'edge conditions' >VI) goto II (compute). > >While the application is not in IV (writing), another application, say >the >visualizer, needs READ access to the file in order to crunch it for >visualization. Visualization is needed to tell if the MPIO application >intermediate results are converging on a solution. > >If in step IV all the compute nodes open/write/close as described above >as the >Normal open, the pNFSds will be doing a lot of metadata processing: >resolving >share and delegation state between themselves as well as delivering per >byte-range layout info. The group open is designed to reduce the >metadata >processing from 1000's to one. > >I mention a couple of new fcntls used by the MPIO layer to communicate >pNFS >state from the supervisor node to the compute nodes. Don't worry about >that(!). > >Do worry about this: is there anything stoping a compute node >from using >OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as >described >below? If so, are the changes to pNFS to make this work small enough to >be >considered at this time? > >Group Open >********** >step IV: supervisor OPENs file, all compute clients write file, >supervisor >CLOSES file. > >specifically: >a) supervisor issues a compound with >OPEN: Access - Both, Deny Both to a pNFSd > - pNFSds need to resolve the share > - is this a normal nfsv4 OPEN? does pNFSd or the PFS need to >know >about the other compute clients? > - do we need the concept of a group clientid? >DELEG_ASK: supervisor asks for WRITE delegation which should be > granted given the OPEN Access-Both, Deny-Both share. > - pNFSds need to resolve delegation request >WRITE_IND: supervisor gets whole file layout info > >b) supervisor calls > fcntl(fd, GET_GRPOPEN, cookie_buf); > which returns the filehandle,stateid, and layout map from the >supervisor pNFS. > >c) the supervisor code passes filehandle, stateid, and layout map to >each >compute >node which calls > fcntl(fd, SET_GRPOPEN, cookie_buf); >the pNFS compute node client receives the filehandle, stateid, and >layout map. >performs a local open (nothing need go across the wire) stuffing the >filehandle, stateid, and layout map into it's state tree just as if an >across >the wire OPEN/DELEG_ASK/WRITE_IND occured. > >d) compute clients use SET_GRPOPEN filehandle, stateid and map to >directly >write the data to the appropriate NAS/SAN > - what besides the filehandle, stateid, and layout map is >needed? > - when done writing, each compute client issues a COMMIT_IND. > >e) when compute clients have flushed all data back to the file, >supervisor >issues a compound with > >CLOSE > > > > >Yahoo! Groups Links > > > > > > > >Yahoo! Groups Links > > > > > ------------------------ Yahoo! Groups Sponsor ---------------------~--> Upgrade to 128-bit SSL Security! http://us.click.yahoo.com/LPJzrA/yjVHAA/TtwFAA/W6uqlB/TM ---------------------------------------------------------------------~-> Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ <*> To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com <*> Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ From julian_satran@il.ibm.com Sun Mar 21 15:11:41 2004 Return-Path: X-Sender: Julian_Satran@il.ibm.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 78562 invoked from network); 21 Mar 2004 23:11:41 -0000 Received: from unknown (66.218.66.172) by m15.grp.scd.yahoo.com with QMQP; 21 Mar 2004 23:11:41 -0000 Received: from unknown (HELO mtagate3.de.ibm.com) (195.212.29.152) by mta4.grp.scd.yahoo.com with SMTP; 21 Mar 2004 23:11:39 -0000 Received: from d12relay02.megacenter.de.ibm.com (d12relay02.megacenter.de.ibm.com [9.149.165.196]) by mtagate3.de.ibm.com (8.12.10/8.12.10) with ESMTP id i2LN9ixJ057250; Sun, 21 Mar 2004 23:09:44 GMT Received: from d12ml102.megacenter.de.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12relay02.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id i2LN9jgg085230; Mon, 22 Mar 2004 00:09:46 +0100 In-Reply-To: <20040317174340.B805420F71@citi.umich.edu> To: pnfs-ops@yahoogroups.com Cc: andros@citi.umich.edu, pnfs-ops@yahoogroups.com, pnfs-reqs@yahoogroups.com MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5.1 January 21, 2004 Message-ID: Date: Mon, 22 Mar 2004 01:11:37 +0200 X-MIMETrack: Serialize by Router on D12ML102/12/M/IBM(Release 6.0.2CF2|July 23, 2003) at 22/03/2004 01:11:58 Content-Type: multipart/mixed; boundary="=_mixed 005F4336C2256E5E_=" X-eGroups-Remote-IP: 195.212.29.152 From: Julian Satran Subject: Re: [pnfs-ops] pNFS, MPIO, and client group open X-Yahoo-Group-Post: member; u=64714603 X-Yahoo-Profile: julian_satran Nice description. I would add that having a single node coordinate the MPIO access simplifies also the coordination needed at open/close and fsync (when transitioning between computation phases). Julo "William A.(Andy) Adamson" 17/03/04 19:43 Please respond to pnfs-ops To pnfs-reqs@yahoogroups.com cc pnfs-ops@yahoogroups.com, andros@citi.umich.edu Subject [pnfs-ops] pNFS, MPIO, and client group open Sorry for the long email :) At the conclusion of the NEPS conference last November, Brent Welch emailed his notes as a starting point for a requirements document (attached). I use his pNFS extention language to describe a pNFS client using a 'normal open' servicing an open/write/close with direct access, and a large MPIO application using a proposed 'group open'. I note that my knowledge of parallel filesystems is growing, so please excuse any misconceptions, comments welcome... The architecture i'm picturing is a large cluster with a Parallel File System (PFS) consisting of PFS Meta data servers(PFS MD) and PFS NAS/SAN. I know it's only one of many architectures the pNFS set of extensions is trying to address. 1000's of pNFS clients 10's of pNFSd, one per PFS MD 100's NAS/SAN 'Normal' open ******************** a) pNFS client issues a compound to one pNFSd consisting of: OPEN with share: Access/Deny Multiple pNFSds need to resolve share. DELEG_ASK: Request Byte-range Delegation Multiple pNFSds need to resolve delegation READ/WRITE_IND request direct data access pNFSd queries PFS MD to get location map b) pNFS client can then issue READ/WRITE directly to the NAS/SAN using the map returned in READ/WRITE_IND. c) pNFS client issues a compound to one pNFSd consisting of: COMMIT_IND: CLOSE An MPIO application opens one very large file, shared by 1000's of compute clients. Each compute client manipulates its portion of the file. The MPIO layer manages compute clients so that no client shares a byte range of the file with another. This MPIO application consists of - supervisor code running on 1 MPIO supervisor node - compute code running on 1000's of MPIO compute nodes This MPIO application has cyclic behavior. I) Read initial data II) compute intermediate result III) wait for other compute nodes to finish computing IV) all compute nodes write to file (their portion) V) compute nodes trade 'edge conditions' VI) goto II (compute). While the application is not in IV (writing), another application, say the visualizer, needs READ access to the file in order to crunch it for visualization. Visualization is needed to tell if the MPIO application intermediate results are converging on a solution. If in step IV all the compute nodes open/write/close as described above as the Normal open, the pNFSds will be doing a lot of metadata processing: resolving share and delegation state between themselves as well as delivering per byte-range layout info. The group open is designed to reduce the metadata processing from 1000's to one. I mention a couple of new fcntls used by the MPIO layer to communicate pNFS state from the supervisor node to the compute nodes. Don't worry about that(!). Do worry about this: is there anything stoping a compute node from using OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as described below? If so, are the changes to pNFS to make this work small enough to be considered at this time? Group Open ********** step IV: supervisor OPENs file, all compute clients write file, supervisor CLOSES file. specifically: a) supervisor issues a compound with OPEN: Access - Both, Deny Both to a pNFSd - pNFSds need to resolve the share - is this a normal nfsv4 OPEN? does pNFSd or the PFS need to know about the other compute clients? - do we need the concept of a group clientid? DELEG_ASK: supervisor asks for WRITE delegation which should be granted given the OPEN Access-Both, Deny-Both share. - pNFSds need to resolve delegation request WRITE_IND: supervisor gets whole file layout info b) supervisor calls fcntl(fd, GET_GRPOPEN, cookie_buf); which returns the filehandle,stateid, and layout map from the supervisor pNFS. c) the supervisor code passes filehandle, stateid, and layout map to each compute node which calls fcntl(fd, SET_GRPOPEN, cookie_buf); the pNFS compute node client receives the filehandle, stateid, and layout map. performs a local open (nothing need go across the wire) stuffing the filehandle, stateid, and layout map into it's state tree just as if an across the wire OPEN/DELEG_ASK/WRITE_IND occured. d) compute clients use SET_GRPOPEN filehandle, stateid and map to directly write the data to the appropriate NAS/SAN - what besides the filehandle, stateid, and layout map is needed? - when done writing, each compute client issues a COMMIT_IND. e) when compute clients have flushed all data back to the file, supervisor issues a compound with CLOSE Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-ops/ <*> To unsubscribe from this group, send an email to: pnfs-ops-unsubscribe@yahoogroups.com <*> Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ Attachment (not stored) brent_welch_pnfs_ops Type: application/octet-stream From bhalevy@panasas.com Sun Mar 21 17:20:33 2004 Return-Path: X-Sender: bhalevy@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 27471 invoked from network); 22 Mar 2004 01:20:32 -0000 Received: from unknown (66.218.66.167) by m13.grp.scd.yahoo.com with QMQP; 22 Mar 2004 01:20:32 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta6.grp.scd.yahoo.com with SMTP; 22 Mar 2004 01:20:32 -0000 Received: from yang ([172.17.19.44]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id H2LG662S; Sun, 21 Mar 2004 20:20:29 -0500 To: , Cc: Date: Sun, 21 Mar 2004 20:20:30 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) In-Reply-To: Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-eGroups-Remote-IP: 65.194.124.178 From: "Benny Halevy" Subject: RE: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open X-Yahoo-Group-Post: member; u=169276676 X-Yahoo-Profile: benny_halevy ADVERTISEMENT click here Julo, I agree that the proposed method for parallel access requires the ability to delegate security information along with layout delegation. Still, I'm not sure there's a problem... My assumptions were that block based storage networks are secured with some form of host to LUN mapping thus capabilities do not play a role in this game. For file or object storage "capabilities" cannot be delegated if they allow access to an object only to a specific client host. OSD capabilities (at least SNIA http://www.t10.org/ftp/t10/document.03/03-279r0.pdf) do not have that limitation as far as I know. The intention with this regards is well versed there: "Note this protocol does allow delegation of a credential if a host transfers both the secret part of the credential as well as the public capability arguments." We haven't yet discussed the pNFS/NFS security model in details but assuming the back-end speaks NFSv4 the front-end metadata server needs to give the pNFS client a NFSv4 filehandle (which can theoretically serve as a capability) and then the client must still authenticate with the back end data server. Benny -----Original Message----- From: Julian Satran [mailto:julian_satran@il.ibm.com] Sent: Sunday, March 21, 2004 18:12 To: pnfs-ops@yahoogroups.com Cc: pnfs-ops@yahoogroups.com; 'pnfs-reqs@yahoogroups.com' Subject: RE: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open I am sure we all want to be aware of this "twist". It may be more than just putting a label on it - e.g., security protocols must be able to "delegate" credentials and that makes some bindings bad. Julo "Halevy, Benny" 17/03/04 22:48 Please respond to pnfs-ops To"'pnfs-reqs@yahoogroups.com'" , pnfs-ops@yahoogroups.com cc SubjectRE: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open I completely agree with Dave and I certainly don't think he's crazy. I perceive this solution as a "clustered" implementation of a nfsv4 client in which the v4 drivers in the client cluster are cooperating and propagating state (e.g. file handles, stateids) among each other. I believe that the server should not be able to distinguish such client from a multi-homed client that may have several ip addresses. In the nfsv4 sessions world a (clustered) client may open multiple connections to the server that are associated with the same session - this will make life for such client even easier, I hope. Benny >-----Original Message----- >From: Noveck, Dave [mailto:dnoveck@netapp.com] >Sent: Wednesday, March 17, 2004 2:43 PM >To: pnfs-ops@yahoogroups.com; pnfs-reqs@yahoogroups.com >Subject: [pnfs-reqs] RE: [pnfs-ops] pNFS, MPIO, and client group open > > >Andy wrote: >> Do worry about this: is there anything stoping a compute node from >using >> OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as >described >> below? > >I'm going to say "No". I know this wasn't the answer that I >gave at the > >conference call, (and might not be the answer I give at the next >conference >call :-), but listen to my reasoning before you decide I'm crazy. > >In order to resolve this issue it is necessary to get all philosophical >and address the question "What is a computer?". I know lots of people >have already hit delete but I hope somebody is still reading. > >Suppose I have an application cluster with 1K nodes and I put on my >marketing >hat (Gee, I hope I don't need a marketing jacket and tie, too :-) and >say "This is really a powerful computer with a thousand (maybe two >thousand) >CPU's". Now that's marketing bullshit but it isn't exactly false. >There >are certainly tasks where you want a large number of CPU's sharing >memory >and a DSM arrangement's performance is going to suck. On the other >hand, >there are applications where having a thousand memories is going to be >much >better than trying to provide adequate memory bandwidth from a single >memory >to many many CPU's. > >So what's the point? I think the point is that as far as the >NFS server >is >concerned, whether the computer that is talking to it is "really" a >computer, >i.e. it has CPU's sharing memory, or is only a computer qua marketing >bullshit, >i.e. a collection of cpu's that don't share memory, that use other >methods to >co-ordinate common activities, doesn't matter. All the server sees are >the >requests made and if the cluster represents itself as a single machine >(i.e. >in V4 does a single SETCLIENTID or in v4.1 maintains many connections >bound to >a single session), it is one. The server doesn't see the cluster's >memory >architecture. It sees an open and then use of that that stateid. The >fact >that it comes over a different IP address doesn't disqualify it. A >server >might have options to check that (as a matter of security) but it isn't >part >of the protocol and we already have clients with multiple IP addresses. >Having >a thousand of them is a difference of degree (and may pose >implementation >issues) but I don't see a real protocol issue. > >OK. Now you can decide if I'm crazy. > >-----Original Message----- >From: William A.(Andy) Adamson [mailto:andros@citi.umich.edu] >Sent: Wednesday, March 17, 2004 12:44 PM >To: pnfs-reqs@yahoogroups.com >Cc: pnfs-ops@yahoogroups.com; andros@citi.umich.edu >Subject: [pnfs-ops] pNFS, MPIO, and client group open > > >Sorry for the long email :) > >At the conclusion of the NEPS conference last November, Brent Welch >emailed >his notes as a starting point for a requirements document (attached). I >use >his pNFS extention language to describe a pNFS client using a 'normal >open' >servicing an open/write/close with direct access, and a large MPIO >application >using a proposed 'group open'. > >I note that my knowledge of parallel filesystems is growing, so please >excuse >any misconceptions, comments welcome... > >The architecture i'm picturing is a large cluster with a Parallel File >System >(PFS) >consisting of PFS Meta data servers(PFS MD) and PFS NAS/SAN. I >know it's >only >one of many architectures the pNFS set of extensions is trying to >address. > >1000's of pNFS clients >10's of pNFSd, one per PFS MD >100's NAS/SAN > > >'Normal' open >******************** >a) pNFS client issues a compound to one pNFSd consisting of: >OPEN with share: Access/Deny > Multiple pNFSds need to resolve share. >DELEG_ASK: Request Byte-range Delegation > Multiple pNFSds need to resolve delegation >READ/WRITE_IND request direct data access > pNFSd queries PFS MD to get location map > >b) pNFS client can then issue READ/WRITE directly to the NAS/SAN using >the map >returned in READ/WRITE_IND. > >c) pNFS client issues a compound to one pNFSd consisting of: >COMMIT_IND: >CLOSE > > >An MPIO application opens one very large file, shared by 1000's of >compute >clients. Each compute client manipulates its portion of the file. The >MPIO >layer manages compute clients so that no client shares a byte range of >the >file with another. > >This MPIO application consists of > - supervisor code running on 1 MPIO supervisor node > - compute code running on 1000's of MPIO compute nodes > >This MPIO application has cyclic behavior. >I) Read initial data >II) compute intermediate result >III) wait for other compute nodes to finish computing >IV) all compute nodes write to file (their portion) >V) compute nodes trade 'edge conditions' >VI) goto II (compute). > >While the application is not in IV (writing), another application, say >the >visualizer, needs READ access to the file in order to crunch it for >visualization. Visualization is needed to tell if the MPIO application >intermediate results are converging on a solution. > >If in step IV all the compute nodes open/write/close as described above >as the >Normal open, the pNFSds will be doing a lot of metadata processing: >resolving >share and delegation state between themselves as well as delivering per >byte-range layout info. The group open is designed to reduce the >metadata >processing from 1000's to one. > >I mention a couple of new fcntls used by the MPIO layer to communicate >pNFS >state from the supervisor node to the compute nodes. Don't worry about >that(!). > >Do worry about this: is there anything stoping a compute node >from using >OPEN/DELEG_ASK/WRITE_IND state obtained by the supervisor node as >described >below? If so, are the changes to pNFS to make this work small enough to >be >considered at this time? > >Group Open >********** >step IV: supervisor OPENs file, all compute clients write file, >supervisor >CLOSES file. > >specifically: >a) supervisor issues a compound with >OPEN: Access - Both, Deny Both to a pNFSd > - pNFSds need to resolve the share > - is this a normal nfsv4 OPEN? does pNFSd or the PFS need to >know >about the other compute clients? > - do we need the concept of a group clientid? >DELEG_ASK: supervisor asks for WRITE delegation which should be > granted given the OPEN Access-Both, Deny-Both share. > - pNFSds need to resolve delegation request >WRITE_IND: supervisor gets whole file layout info > >b) supervisor calls > fcntl(fd, GET_GRPOPEN, cookie_buf); > which returns the filehandle,stateid, and layout map from the >supervisor pNFS. > >c) the supervisor code passes filehandle, stateid, and layout map to >each >compute >node which calls > fcntl(fd, SET_GRPOPEN, cookie_buf); >the pNFS compute node client receives the filehandle, stateid, and >layout map. >performs a local open (nothing need go across the wire) stuffing the >filehandle, stateid, and layout map into it's state tree just as if an >across >the wire OPEN/DELEG_ASK/WRITE_IND occured. > >d) compute clients use SET_GRPOPEN filehandle, stateid and map to >directly >write the data to the appropriate NAS/SAN > - what besides the filehandle, stateid, and layout map is >needed? > - when done writing, each compute client issues a COMMIT_IND. > >e) when compute clients have flushed all data back to the file, >supervisor >issues a compound with > >CLOSE > > > > >Yahoo! Groups Links > > > > > > > >Yahoo! Groups Links > > > > > Yahoo! Groups Links Yahoo! Groups Links To visit your group on the web, go to: http://groups.yahoo.com/group/pnfs-reqs/ To unsubscribe from this group, send an email to: pnfs-reqs-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. From garth@panasas.com Sun Mar 28 19:58:52 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 1848 invoked from network); 29 Mar 2004 03:58:48 -0000 Received: from unknown (66.218.66.217) by m9.grp.scd.yahoo.com with QMQP; 29 Mar 2004 03:58:48 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta2.grp.scd.yahoo.com with SMTP; 29 Mar 2004 03:58:51 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id HZGBBXTN; Sun, 28 Mar 2004 22:58:16 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Transfer-Encoding: quoted-printable Message-Id: <46F74125-8135-11D8-BB3C-000A95A94F04@panasas.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed To: pnfs-sbc@yahoogroups.com, pnfs-obj@yahoogroups.com, pnfs-reqs@yahoogroups.com, pNFS Operations , pnfs-nfs@yahoogroups.com Date: Sun, 28 Mar 2004 22:58:03 -0500 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: FAST04 BOF 3/31 12:30pm: seeking a Parallel NFS (pNFS) X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Announcing a public Birds of a Feather meeting for those interested in bringing into existence a Parallel NFS (pNFS) standard for network attached storage. This BOF is to be held between USENIX' NSDI (www.usenix.org/events/nsdi04) and FAST (www.usenix.org/events/fast04) conferences, 12:30pm - 2pm, Wednesday March 31, 2004, in the Dolores room of the Grand Hyatt hotel in San Francisco. A simple box lunch for the first 50 attendees will be provided by Panasas Inc. Speakers at this BOF will include Peter Corbett, Network Appliance; David Black, EMC; Julian Satran, IBM; Peter Honeyman, CITI; Sumanta Chatterjee, Oracle; and Brent Welch, Panasas. Background materials from the organizers of this BOF can be found in the proceedings of a recent workshop, NFS Extensions for Parallel Storage, www.citi.umich.edu/NEPS/agenda.html, held by the Center for Information Technology Integration at the University of Michigan. Participants of that workshop summarized a statement of the problem pNFS might address in the following recent informational internet draft: Title: pNFS Problem Statement Author(s): Garth Gibson, Panasas & CMU, Peter Corbett, Network Appliance Filename: draft-gibson-pnfs-problem-statement-00.txt Pages: 12 Date: 2004-2-9 This draft considers the problem of limited bandwidth to NFS servers. The bandwidth limitation exists because an NFS server has limited network, CPU, memory and disk I/O resources. Yet, access to any one file system through the NFSv4 protocol requires that a single server be accessed. While NFSv4 allows file system migration, it does not provide a mechanism that supports multiple servers simultaneously exporting a single writable file system. This problem has become aggravated in recent years with the advent of very cheap and easily expanded clusters of application servers that are also NFS clients. The aggregate bandwidth demands of such clustered clients, typically working on a shared data set preferentially stored in a single file system, can increase much more quickly than the bandwidth of any server. The proposed solution is to provide for the parallelization of file services, by enhancing NFSv4 in a minor version. A URL for this Internet-Draft is: www.ietf.org/internet-drafts/draft-gibson-pnfs-problem-statement-00.txt From garth@panasas.com Sun Mar 28 20:03:07 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 33647 invoked from network); 29 Mar 2004 04:03:06 -0000 Received: from unknown (66.218.66.166) by m16.grp.scd.yahoo.com with QMQP; 29 Mar 2004 04:03:06 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta5.grp.scd.yahoo.com with SMTP; 29 Mar 2004 04:03:06 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id HZGBBX4A; Sun, 28 Mar 2004 23:02:41 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit Cc: Garth Gibson Date: Sun, 28 Mar 2004 23:02:27 -0500 To: pnfs-sbc@yahoogroups.com, pnfs-obj@yahoogroups.com, pnfs-reqs@yahoogroups.com, pNFS Operations , pnfs-nfs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: FACE-TO-FACE pNFS working meeting: 3/31 9am-12:30 Grand Hyatt, Dolores Rm, SF X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson Reminder, folks, Before the upcoming pNFS BOF at FAST Wed Mar 31, 12:30 - 2pm, in the Dolores room of the Grand Hyatt hotel in San Francisco, the pNFS community reached by these mailing lists will be meeting for a working session, 9am - 12:30pm. Tentative agenda: - Requirements update and discussion (update coming from Garth) - Operations update and discussion (update coming from Brent) - Use cases discussion (initial use cases coming from Andy) See you there! garth From garth@panasas.com Mon Mar 29 12:49:03 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 91946 invoked from network); 29 Mar 2004 20:48:59 -0000 Received: from unknown (66.218.66.216) by m1.grp.scd.yahoo.com with QMQP; 29 Mar 2004 20:48:59 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta1.grp.scd.yahoo.com with SMTP; 29 Mar 2004 20:48:59 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id HZGBB9M4; Mon, 29 Mar 2004 15:48:56 -0500 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <7694A65B-81C2-11D8-BB3C-000A95A94F04@panasas.com> Content-Transfer-Encoding: 7bit Date: Mon, 29 Mar 2004 15:48:42 -0500 To: pnfs-reqs@yahoogroups.com, pnfs-obj@yahoogroups.com, pnfs-sbc@yahoogroups.com, pNFS Operations , pnfs-nfs@yahoogroups.com X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Re: FACE-TO-FACE pNFS working meeting: 3/31 9am-12:30 Grand Hyatt, Dolores Rm, SF X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT Word is that we will have a polycom and phone line in the FACE-to-FACE meeting. I will not post the call in, but I will respond to requests for the dialin details (once I have them, which is currently not yet). garth On Mar 28, 2004, at 11:02 PM, Garth Gibson wrote: > Reminder, folks, > > Before the upcoming pNFS BOF at FAST Wed Mar 31, 12:30 - 2pm, in the > Dolores room of the Grand Hyatt hotel in San Francisco, the pNFS > community reached by these mailing lists will be meeting for a working > session, 9am - 12:30pm. > > Tentative agenda: > > - Requirements update and discussion (update coming from Garth) > - Operations update and discussion (update coming from Brent) > - Use cases discussion (initial use cases coming from Andy) > > See you there! > garth From bwelch@panasas.com Mon Mar 29 23:41:49 2004 Return-Path: X-Sender: welch@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 61803 invoked from network); 30 Mar 2004 07:41:48 -0000 Received: from unknown (66.218.66.172) by m11.grp.scd.yahoo.com with QMQP; 30 Mar 2004 07:41:48 -0000 Received: from unknown (HELO medlicott.panasas.com) (63.80.58.202) by mta4.grp.scd.yahoo.com with SMTP; 30 Mar 2004 07:41:48 -0000 Received: from panasas.com (welch@localhost) by medlicott.panasas.com (8.11.6/8.11.6) with ESMTP id i2U7ccB14789; Mon, 29 Mar 2004 23:38:39 -0800 Message-Id: <200403300738.i2U7ccB14789@medlicott.panasas.com> X-Authentication-Warning: medlicott.panasas.com: welch owned process doing -bs X-Mailer: exmh version 2.6.3 04/02/2003 with nmh-1.0.4 To: pnfs-obj@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com, pnfs-sbc@yahoogroups.com, pNFS Operations , pnfs-nfs@yahoogroups.com In-reply-to: <7694A65B-81C2-11D8-BB3C-000A95A94F04@panasas.com> References: <7694A65B-81C2-11D8-BB3C-000A95A94F04@panasas.com> Comments: In-reply-to Garth Gibson message dated "Mon, 29 Mar 2004 15:48:42 -0500." X-URL: http://www.panasas.com/ X-Face: "HxE|?EnC9fVMV8f70H83&{fgLE.|FZ^$>@Q(yb#N,Eh~N]e&]=> r5~UnRml1:4EglY{9B+ :'wJq$@c_C!l8@<$t,{YUr4K,QJGHSvS~U]H`<+L*x?eGzSk>XH\W:AK\j?@?c1o From: Brent Welch Subject: pNFS summary, v2 X-Yahoo-Group-Post: member; u=169551413 X-Yahoo-Profile: brent_welch_1960 ADVERTISEMENT In preparation for the meeting at FAST I was tasked with updating my previous workshop summary with the ideas that have been developing on the mailing lists. I'm attaching what I have. The main caveat is that these are my words about lots of peoples ideas, so I'm sure I'm not always conveying them as you may have intended. I'm sure you'll speak up where I've strayed or if I've left out important items. I made no attempt to summarize all the arguments that flowed across the list, but instead I'm giving the reader's digest of what I think we agree on, and have enumerated the issue areas. See you Wednesday. -- Brent Welch Software Architect, Panasas Inc Delivering the premier storage system for scalable Linux clusters www.panasas.com welch@panasas.com Attachment (not stored) pnfs_summary.v2.txt Type: text/plain From black_david@emc.com Tue Mar 30 07:46:53 2004 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 41595 invoked from network); 30 Mar 2004 15:46:51 -0000 Received: from unknown (66.218.66.166) by m18.grp.scd.yahoo.com with QMQP; 30 Mar 2004 15:46:51 -0000 Received: from unknown (HELO MAHO3MSX2.corp.emc.com) (128.221.11.32) by mta5.grp.scd.yahoo.com with SMTP; 30 Mar 2004 15:46:51 -0000 Received: by maho3msx2.corp.emc.com with Internet Mail Service (5.5.2653.19) id ; Tue, 30 Mar 2004 10:46:27 -0500 Message-ID: To: pnfs-reqs@yahoogroups.com, pnfs-obj@yahoogroups.com Cc: pnfs-sbc@yahoogroups.com, pnfs-ops@yahoogroups.com, pnfs-nfs@yahoogroups.com Date: Tue, 30 Mar 2004 10:46:21 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain X-eGroups-Remote-IP: 128.221.11.32 From: black_david@emc.com Subject: RE: [pnfs-reqs] pNFS summary, v2 X-Yahoo-Group-Post: member; u=82420288 X-Yahoo-Profile: dlb237 ADVERTISEMENT click here I looked over Brent's summary and the FMP protocol that is used in HighRoad, and turned up the following items for discussion: - The exact details of COMMIT_IND (order in which it does things) matter a lot. That's below the level of Brent's current summary, and I presume we'll get to it as we flesh out the design. - The notification functionality needs more fleshing out. Here's what FMP provides: o Recall specific extent delegation(s) o Downgrade (write to read) specific extent delegation(s) o Recall all extent delegations for a file handle o Recall all extent delegations for a filesystem o Set EOF (This is subtle, consider the case where client A sets EOF in the middle of an extent for which client B has a write delegation.) Again, this is something for further design, but I think this list is about at the level of Brent's summary. - Completion callbacks. FMP supports both server queuing of requests (completion is a notification) and server rejection of requests. Rejection supports cases where the server has a notify outstanding to the client when it receives the client request and wants to force the client to process the notify; rejection is the right thing to do because the notification (e.g., recall) may affect whether the client resubmits the same request. I agree with Brent's view that the client has to retry if the operation is rejected, however the ability to support server queuing of conflicting requests allows the server to provide some liveness/fairness assurances if the server implementer chooses to do so. In essence a "Queued" response from the server promises to do the operation, but not immediately, and frees a client RPC execution context for things like notification handling. - Operation ordering. There are some ordering requirements involving interaction of notifications and operations - for example, if a client responds to a recall notification and submits an operation based on having completed the notification, the server will need to process the notification completion before the client operation. I strongly favor a "cut on the dotted line" approach where the ordering requirements for direct data access are clearly specified as part of the extension (even if they rely on existing NFS facilities for their realization) so that it's clear what has to be done to achieve the same functionality for other distributed filesystem protocols. Thanks, --David ---------------------------------------------------- David L. Black, Senior Technologist EMC Corporation, 176 South St., Hopkinton, MA 01748 +1 (508) 293-7953 FAX: +1 (508) 293-7786 black_david@emc.com Mobile: +1 (978) 394-7754 ---------------------------------------------------- > -----Original Message----- > From: Brent Welch [mailto:bwelch@panasas.com] > Sent: Tuesday, March 30, 2004 2:39 AM > To: pnfs-obj@yahoogroups.com > Cc: pnfs-reqs@yahoogroups.com; pnfs-sbc@yahoogroups.com; pNFS > Operations; pnfs-nfs@yahoogroups.com > Subject: [pnfs-reqs] pNFS summary, v2 > > > In preparation for the meeting at FAST I was tasked with updating my > previous workshop summary with the ideas that have been developing on > the mailing lists. I'm attaching what I have. The main > caveat is that > these are my words about lots of peoples ideas, so I'm sure I'm not > always conveying them as you may have intended. I'm sure you'll speak > up where I've strayed or if I've left out important items. I made no > attempt to summarize all the arguments that flowed across the list, > but instead I'm giving the reader's digest of what I think we > agree on, > and have enumerated the issue areas. > > See you Wednesday. > > -- > Brent Welch > Software Architect, Panasas Inc > Delivering the premier storage system for scalable Linux clusters > www.panasas.com welch@panasas.com Yahoo! Groups Links From andros@citi.umich.edu Tue Mar 30 14:08:41 2004 Return-Path: X-Sender: andros@citi.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 7863 invoked from network); 30 Mar 2004 22:08:29 -0000 Received: from unknown (66.218.66.167) by m14.grp.scd.yahoo.com with QMQP; 30 Mar 2004 22:08:28 -0000 Received: from unknown (HELO citi.umich.edu) (141.211.133.111) by mta6.grp.scd.yahoo.com with SMTP; 30 Mar 2004 22:08:27 -0000 Received: from citi.umich.edu (citi.umich.edu [141.211.133.111]) by citi.umich.edu (Postfix) with ESMTP id 26F9E2084D; Tue, 30 Mar 2004 17:08:27 -0500 (EST) X-Mailer: exmh version 2.5 07/13/2001 with version: MH 6.8.3 #74[UCI] To: pnfs-reqs@yahoogroups.com Cc: pnfs-obj@yahoogroups.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 30 Mar 2004 17:08:27 -0500 Message-Id: <20040330220827.26F9E2084D@citi.umich.edu> X-eGroups-Remote-IP: 141.211.133.111 From: "William A.(Andy) Adamson" Subject: my 10 minute FAST talk X-Yahoo-Group-Post: member; u=169434965 hi speakers i quickly describe an ASCI type cluster that uses a Scaleable Global Parallel File System (SGPFS), and quickly describe how NFSv2/v3 is currently used. then move onto using stock NFSv4.0, and where it fails, ending up with pNFS and how it can succeed. * getting rid of the NFSD on SGPFS client (a la NFSv2/v3) * enterprise desktop NFSv4.0 can access proprietary SGPFS data * high speed parallel data transfer between pNFS clusters. -->Andy From andros@citi.umich.edu Tue Mar 30 15:33:34 2004 Return-Path: X-Sender: andros@citi.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 17814 invoked from network); 30 Mar 2004 23:33:33 -0000 Received: from unknown (66.218.66.166) by m19.grp.scd.yahoo.com with QMQP; 30 Mar 2004 23:33:33 -0000 Received: from unknown (HELO citi.umich.edu) (141.211.133.111) by mta5.grp.scd.yahoo.com with SMTP; 30 Mar 2004 23:33:32 -0000 Received: from citi.umich.edu (citi.umich.edu [141.211.133.111]) by citi.umich.edu (Postfix) with ESMTP id D90E7207F3; Tue, 30 Mar 2004 18:33:31 -0500 (EST) X-Mailer: exmh version 2.5 07/13/2001 with version: MH 6.8.3 #74[UCI] To: pnfs-obj@yahoogroups.com Cc: pnfs-reqs@yahoogroups.com, pnfs-sbc@yahoogroups.com, pnfs-ops@yahoogroups.com, pnfs-nfs@yahoogroups.com Mime-Version: 1.0 Content-Type: multipart/mixed ; boundary="==_Exmh_-17735067940" Date: Tue, 30 Mar 2004 18:33:31 -0500 Message-Id: <20040330233331.D90E7207F3@citi.umich.edu> X-eGroups-Remote-IP: 141.211.133.111 From: "William A.(Andy) Adamson" Subject: pNFS use cases for the FAST meeting, first pass! X-Yahoo-Group-Post: member; u=169434965 In preparation for tomorrows meeting at FAST, i was tasked with beginning a list of use cases for pNFS. I came up with an initial list which is attached. I know it's not complete, and i hope the descriptions are meaningful :o see you tomorrow. -->Andy Attachment (not stored) pnfs_use.txt Type: text/plain From garth@panasas.com Tue Mar 30 23:13:14 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 15375 invoked from network); 31 Mar 2004 07:13:12 -0000 Received: from unknown (66.218.66.172) by m16.grp.scd.yahoo.com with QMQP; 31 Mar 2004 07:13:12 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 31 Mar 2004 07:13:11 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id HZGBCF03; Wed, 31 Mar 2004 02:12:58 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Transfer-Encoding: quoted-printable Message-Id: Content-Type: text/plain; charset=UTF-8; format=flowed To: pnfs-reqs@yahoogroups.com Date: Tue, 30 Mar 2004 23:12:41 -0800 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: Requirements discussions update X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson An update on requirements discussions that have taken place on the pnfs-reqs@yahoogroups.com reflector (http://groups.yahoo.com/group/pnfs-reqs). These notes were prepared for the Wed 3/31 9am-12:30 face-to-face meeting before the pNFS BOF at FAST 2004. Though the ideas and concerns reported here are drawn from many people, including but limited to Dave Noveck, Brent Welch, David Black, Andy Adamson, Craig Everhart, Julian Satran, Tom Talpey, Benny Halevy, Gary Grider, Tyce McLarty, Dean Hildebrand, Peter Honeyman, Peter Corbett, the errors and opinions coloring this document are probably mine. garth gibson ---------------------------------------- Topics: 0.0 Defining Requirements 1.0 Minimalism 1.1 Proxying 1.2 Cache consistency 1.3 Delegation promotion & reacquisition 1.4 Layout delegations 1.5 Concurrent write 1.6 Map revocation 1.7 Separability 1.8 NTFS application semantics 2.0 NFS Append 2.1 separate read & write mappings; the ability to punch a hole 2.2 Extensible backend mappings 2.3 Group operations 2.4 Clustered server implementations 2.5 Client modified layouts ---------------------------------------- [0.0 Defining Requirements]: What is the scope of requirements subgroup doing and how is it related to the ops subgroup discussions? I am beginning to see a significant difference between a "problem statement" document and a "requirements" document. I believe that in a problem statement we can make a strong case for a set of properties and applications that are currently underserved in NFSv4, and a direction that could in one or more steps resolve some or all of the problem. Alternatively I am coming to see the detailed requirements as a compendium of the most contentious and impactful issues, how they were argued and what resolution was accepted. I can see the problem statement getting done before we have sorted out all the hard problems, or even run into all of them, so it is a good document for establishing our interests in the IETF. But I suspect that the requirements document stays open well into agreement on the specification issues. For comparison, the first NFSv4 document was called "Design Considerations" (rfc2624): This document is to cover the "limitations and deficiencies of NFS version 3". This document will also be used as a mechanism to focus discussion and avenues of investigation as the definition of NFS version 4 progresses. Therefore, the contents of this document cover the general functional/feature areas that are anticipated for NFS version 4. I propose that what we have started into in the requirements subgroup is the problem statement, and that we should be careful to not let it get bogged down in the longer term requirements resolutions. ---------------------------------------- [1.0 Minimalism]: How much additional functionality do we sacrifice to limit the changes we seek in NFSv4? On one hand, some have said that getting to one true file system, with the high performance and the manageability of federated systems that might come with out-of-band access, is worth not matching *every* feature of all existing out-of-band file systems with this first set of extensions to NFSv4. That we should bite off what we can do quickly, correctly, with a clear incremental value to NFSv4, and roadmap more aggressive changes that could bog us down, or introduce so much complexity that interoperability becomes elusive. And that we should be mindful of the reception we may get from the IETF NFS working group if we *appear* to use out-of-band as an excuse to ask for a brace of changes in other aspects of NFSv4. On the other hand, the other out-of-band file systems that are inspiring the evolution of NFSv4 have customers that may not accept any backward sets in an evolution to NFSv4. This could create the need to develop, carry and differentiate all the diverse one-off out-of-band files systems plus a new out-of-band NFSv4. Some think it makes more sense to go far enough with this first NFSv4 to simplify the marketplace by making it reasonable for various vendors to deprecate/end-of-life/begin to wean from their proprietary offering. While it is certainly conceivable that we could be designing a roadmap of solutions in detail from the start, communication among standards bodies is hard enough without the challenge of designing specs for both with and without a requirement. This is a central issue in defining the requirements for out-of-band NFSv4, or at least for defining the scope of the first set of extensions. JS: I am afraid that this text makes achieving compliance with existing out-of-band filesytems sound more complex than it might be. I see several items that we should strive to keep even in a minimalist set of requirements: • attribute set rich enough to enable expressing the attributes of the major local-filesytems (Unix brands and Windows) • access control that accommodates the access control mechanisms of the major local-filesytems and some of the popular distributed file-systems (AFS?) • coherency mechanisms that enable vendors to optionally implement the two major flavor of coherent file access: ◦ completely coherent ◦ close-to-open coherent None of those seem to me as involving major departures from NFSv4. ---------------------------------------- [1.1 Proxying]: Operations/work that can only be done out-of-band vs alternative access through the NFSv4 server for all operations/work On one hand, some suggest that a set of out-of-band clients should not have to also have a data path through the NFSv4 metadata server. One reason is that customers may not tolerate the large variability in performance between out-of-band (when the going is good) and in-band (when the server chooses not to grant or to take away a delegation) accesses. Another reason, and I paraphrase someone else here, is that it is possible to construct out-of-band metadata servers that do not have access to the data servers except through the clients -- I encourage the source of this scenario to replace my paraphrasing with a correct use case, because I find it odd to design for file servers that do not have access to the data servers. On the other hand, others have suggested that any access or work that a client can do out-of-band should be possible with one or more commands applied to the metadata server's data path. This has been proposed for coping with recalled delegations, including concurrent writing by multiple clients; retry after client access errors, provided adequate idempotency of out-of-band operations; and many alternative implementations of out-of-band clients, including legacy clients that use out-of-band never or rarely. I think this is a topic that should be argued one way or the other in the requirements document. Use cases and examples in other systems would be best. [1.1.0 Legacy proxying]: an NFS-v4.x server must be able to execute the full NFS-v4.0 or NFS-v4.1 protocol. JS: Is it legal for a "compliant server" to have serving data disabled by a local administrative function (the old "must implement but may use")? Otherwise an organization that wants to discourage use of data serving through the metadata server has very little it can do to enforce policy in a way that will not affect other clients (it may do serve poorly but this still affects other clients). [1.1.1 Strict proxying]: does an NFS-v4.x server have to be able to execute exactly the wire packet that an NFS-v4.x client might have sent to a SBC/OSD/NFS data server? This captures the notion that a metadata server must also be a store-and-forward proxy for every data server it manages. It requires NFS-v4.x servers implement SCSI SBC over FC, if their data servers implement it; and the same for objects and files. This only makes sense to me for NFS data servers. And it is not what I intended in my prior summary, although it is a relevant question. I would say that pNFS requirements not require Strict Proxying. [1.1.2 Functional proxying]: a file transformation achievable by an NFS-v4.x client using a set of data server operations must be a equivalently achievable using a (probably different) set of NFS-v4.x server operations This is the topic I intended to address in the last email. I believe Dave is arguing that even with metadata servers that do not have access to their data servers, the vendor of such a metadata server can construct a proprietary protocol for the metadata server to (strict) proxy data server accesses through clients that do have data server access. I am not comfortable making up a counter to this, so I exhort those that want a metadata server without data server access to speak up if they disagree. More on proxying -- suppose that a metadata server is asked to do reads or writes and it would rather not do this work (because it is busy or because its connection to storage is not as good as other nodes) -- can it "refer" the request to another server that is in a better position (by load or connectivity to storage) to do the work -- like a file system referral [1.1.3 Recovery proxying]: a file transformation begun by an NFS-v4.x client using a set of data server operations, but interrupted before completion, must be equivalently completable using a (probably different) set of NFS-v4.x server operations Some have suggested that having this property will greatly simplify the amount of spec that is devoted to out-of-band error recovery. Others have commented that a simple way to achieve this would be to require that all operations on data servers should be idempotent. ---------------------------------------- [1.2 Cache consistency]: NFSv4 delegations are not about client cache consistency; does out-of-band access require stronger cache consistency than NFSv4 provides NFSv4 cache consistency is a client function, based on testing file attributes on open and close. While a client holds a delegation, its users can close and reopen a file without recourse to the server, so inside a delegation a client cache contents for that file must be valid and up to date. However, a client cannot mandate getting a delegation on open, it must immediately (approximately) give up a delegation if it is recalled and a client has no way to reacquire a delegation on an open file after that delegation has been recalled. So we must not confuse delegations with strong cache consistency. Many of the various proprietary out-of-band file systems have much stronger client cache consistency, involving more different types and interactions of cache callbacks. Some of these differences may have been motivated by desire for differentiation, some by apps underserved by NFS cache consistency semantics, and some by the long standing designer belief that stronger semantics are theoretically better. The question we must resolve, and argue in the requirements document, is whether out-of-band access only within the NFSv4 cache consistency and delegations is not sufficient, why and how much more must/should be added before such a product is valuable. I think that application use cases should be discussed. And I caution us that most of us are the converted, coming to NFSv4 from one of these proprietary file systems, so gaining agreement amongst ourselves easily is not a good predictor of the challenge of gaining the agreement of the NFS standards working group. DB: HighRoad uses the same FMP protocol to provide both NFS-style close-to-open consistency for NFS clients and the stronger forms of consistency required by CIFS - as long as the server knows what clients have which access rights to what blocks, cache consistency strength comes down to server implementation decisions about what outstanding access rights conflict with a new request. We've actually built server prototypes that provide stronger consistency for NFS without change to either the FMP protocol or clients, but the shipped product only provides NFS-style consistency for NFS. JS: I think that if we work towards common structures for mapping and caching we might end up letting the implementer or user decide about the consistency level he wants and support all. We certainly can't afford to ignore those that require consistency beyond the close-to-open level conventionally associated with NFS especially when there are distributed or cluster file-systems that got their customers use it today (GPFS, SAN-FS). DN: mapping and caching information are distinct pieces of information in that one can change while the other does not, and if we decide to treat these two pieces of information as the same, we are going to be doing some silly things. If I write in place then my data is changing but the layout isn't. If my caching strategy to deal with multiple writers is not to cache data (which is for many applications quite reasonable) and I need the mapping information to access the data servers directly, then I don't want to have my layout delegations recalled because the data is changing. Because I am not caching I don't need or want data delegations but do need layout delegations and should be allowed to keep them when the layout is not changing. DN: In many envirnments data and layout delegations will be recalled together and so it makes sense not to have these so distinct that, for example, I am doing separate recall messages for each piece of data. But in other environments, it may make a lot of sense for me to have one guarantee (the layout won't change) and not the other (the data won't change). ---------------------------------------- [1.3 Delegation promotion & reacquisition]: must/should NFSv4 offer mechanisms for clients to possess a delegations more than once per open Delegations in NFSv4 are new, and came with significant concern about lots of complexity for not much performance, as they may do as little as avoid the client waiting for one round trip to the server on open. So, as described above with respect to cache consistency, the limitations on delegations can mean great difficulties for clients having performance requirements calling for out-of-band access mostly, or exclusively. DB: Yes, and this is a strong reason for separating "layout" delegations from the existing "data" delegations, IMHO. Consider a web or video server that is caching file opens for performance reasons - if updating the content underneath the server makes it impossible to get the direct access ("layout") delegations back, the result is that one has to shut down and restart all the servers after the content update in order to restore performance. The sysadmin responsible for this annoying work will want to tar-and-feather the system designers who made it necessary (that would be us if we get this wrong ...). So we have begun to propose mechanisms for clients to be more aggressive about seeking, obtaining, reobtaining after a recall, and even waiting for a signal that a denied delegation is now available. This could lead to discussions of transitioning from a write delegation to a read delegation, rather than no delegation, when a second delegation is requested. We all know, or can imagine, plenty of mechanism for this type of logic -- after all, it is not far from what some systems do for cache consistency. But all of this comes with complexity, that threat to interoperability, and chips away at minimalism. DN: Downgrade in particular needs special attention. If I have a write delegation, then DELEGRETURN followed by DELEG_ASK (read), means that the data I have cached is not valid and may need to be fetched again, whereas a straight downgrade means nobody has ever had a conflicting delegation, and so allows me to do more. There are similar considerations for downgrade of a write delegation to a group-write (aka CW) delegation and downgrade of a group-write to a read delegation. ---------------------------------------- [1.4 Layout delegations]: can/should layout metadata "ride" on NFSv4 delegations or are new "layout" delegations needed If the delegations currently provided by NFSv4 are insufficient, for reasons of cache consistency or the needed to be able to reacquire a delegation in order to ensure that performance degradations can be limited, then some are suggesting that rather than proposing to change the semantics of the current delegations, we add new delegations tailored to the purpose, so called layout delegations. This is consistent with the advice we heard Dec 4 that it is much easier, and more welcomed, to add new things to NFSv4 than to change what is already there. Assuming that in response to requirements arguments, we find the existing NFSv4 delegations insufficient, then I think this topic is an implementation issue for the NFSv4 operations subgroup. But I for one would like to err on the side of fewer NFSv4 changes and slightly weaker semantics, where possible. I'd summarize a lot of discussion to say that we need new operations for layout delegations. And many are suggesting that these layout delegations should be able to cover only portions of a file, and not imply anything about the data consistency. ---------------------------------------- [1.5 Concurrent write]: write delegations now are held by exactly one client, if any; should/must NFS support multiple clients holding concurrent layout delegations One specifically excluded use case for out-of-band access is concurrent write, actually concurrent read and write, or write and write, by different clients. This is normally associated with expensive client cache consistency algorithms, but for our purposes here, the issue is managing the ordering, grouping/atomicity, and failure recovery of changes on data servers, not updating/invalidating the contents of client caches. It is certainly feasible to address out-of-band concurrent writing to data servers without addressing client cache consistency, if we so choose. I believe three folks with experience with different existing file systems referred to databases as the use case for needing concurrent write. I believe out-of-band concurrent write is an important use case to call out carefully, because a ambitious implementation of it could lead to a lot of state-maintaining messaging. Some have said that, allowing multiple clients to hold the same lock is a current need in NFSv4, and that a solution to this can provide the infrastructure for concurrent delegation of layout maps for read and overwrite (when growing the size of the file is not needed). This seems like a good operations discussion topic. DB: I understand the value of this to the self-coordinating HPC applications, but would like to see this functionality specified (assuming it is specified) as a cleanly separable option, as I think the desire to self-coordinate a shared write delegation will be limited to a small number of application spaces, like HPC. I also note Gary's comment that it's sufficient for parallel write to work in the non-overlapping case, which does not require any new concurrent write delegation as long as each client can hold an exclusive write delegation for its range. JS: I agree with Gary that handling efficiently the "good-path" (e.g., concurrent writers with non-overlapping regions, or single writer with readers needing only close-to-open consistency) is essential. To me it looks as all those could be better handled if we could approach mapping and caching concurrently. ---------------------------------------- [1.6 Map revocation]: can/must the NFS server be able to revoke a client's use of a map, and enforce no future use (fence off the map) NFSv4 delegations allow a broken or malicious client no additional power to damage the stored file system because state changes must go through the server. But a delegated layout map that is held and used by a broken or malicious client after the delegation has been recalled could damage the stored file system in a way that the server, by not being on the data path, has no obvious way to protect against. So there has been a call for the ability for the server to fence out a client or enforce the revocation of a client's access to a specific file or filesystem. At first glance all three data server technologies, blocks, objects and files have some solution (blocks: lun masking/acls or SAN zoning; objects: capability revocation, key replacement; files: component file acls, volatile file handles). The scope and cost of each of these mechanisms maybe dramatically different. Some would say that this is going to end up being a differentiating property of the choice of underlying data server. For example, many would say that in systems that allow out-of-band block access, the client machines must be trustworthy to respect the delegation recall message (and lease timeouts). Others would object to this weakening of the NFS server integrity. I also see this as a requirements argument. DB: I tend to take the former position, as if one cannot fence off client access, not allowing access to untrustworthy clients becomes a fallback. In the block world, while mechanisms exist to fence off access, standard means of invoking them are somewhat immature. ---------------------------------------- [1.7 Separability]: Independence vs co-dependence of layout metadata access and NFSv4 On one hand, simple "an address per block/object/file" maps could be represented as an array of NFSv4 attributes, manipulated using existing NFSv4 attribute accessing commands, so to reduce the amount of change to NFSv4. On the other hand, particularly for block maps of large files composed of extents, simple array indexing may be cumbersome and much bulkier than necessary. And also on the other hand, some suggest that it is desirable for the metadata access protocol to be separate from NFSv4 attribute access, so that the same metadata access protocol might be reusable under other file services. I think this topic would benefit from proposed metadata formats, particularly the SBC (block) maps. ---------------------------------------- [1.8 NTFS application semantics]: applications coded to NTFS semantics are different from those coded to POSIX and UNIX semantics NFS originated as a exported file system, whose semantics were defined by the underlying local filesystem on the file server. But since that local filesystem has almost always been UNIX or UNIX like, customers have come to think of NFS semantics as a well defined thing, not far from UNIX semantics (but with a customary list of POSIX exceptions). The semantics NTFS presents to applications using its storage is different in significant ways. Some of us see an evolution to better support for clients trying to support NTFS well to be very desirable. Others see chasing this as more than the NFS group as a whole is likely to bite off. This, and any other issues about wire protocol support for important semantics needed by different application file system interfaces (middleware exploited API extensions in databases or parallel programming systems such as MPI-IO) are also requirements topics. DB: IMHO, this is an orthogonal tarpit we should stay out of. I strongly believe that trying to extend NFSv4 so it can be just as good as CIFS for applications coded to Windows APIs should be someone else's problem. ---------------------------------------- [2.0 NFS Append]: is append semantics part of this effort, or separable? Folks think it is interesting. But it can be taken to NFSv4 directly, and not necessarily as a part of the pNFS extensions. DN: I am a big fan of this, but my experience is that it can be controversial. I'm not sure I understand why but there are some people who really don't like it. I think it may have to do with the fact that some people are uncomfortable with the idea that you can do a write (and append is a write) and have no way to valdily reflect that write your buffer cache. ---------------------------------------- [2.1 separate read & write mappings; the ability to punch a hole] SAN.FS extents come with both read and write extent mappings and block usage bitmaps. The separate read and write mappings allow for clients to participate in copy-on- write functionality - IIRC, Craig has described this. [2.1.1]: Should protocol include support for client participation in copy-on-write? A motivation for the separate arrays of block usage bits appears to be allowing clients to turn file data into holes (e.g., AIX fclear system call). [2.1.2]: Is the ability to turn valid data into a file "hole" (e.g., AIX fclear) at the client important to support? FMP does not support separate read mappings or usage bitmaps, and hence is not capable of involving clients in copy-on-write or allowing a client to turn valid data into a file "hole". DN: If we do it, should be an NFSv4 server operation too because the space recovery benefits are not unique to a block backend. There was also some confusion between holes in the data and holes in the layout map. Writing into a hole in the data changes the data, so any other client mapping that same region of the file sees or does not see the change according to the cache consistency mechanism employed. But if one client has a layout map with holes then data can be written into these holes without recalling the map because the client cannot assume or see anything about the missing part of the map. That is, are maps delegated in part or only as a whole? I think I heard strong support for delegation of map ranges as well as maps for whole files. ---------------------------------------- [2.2 Extensible backend mappings.] The precedent for our proposed multiple backends in the one IETF protocol is the GSS security framework and extensible flavors. One backend may be required, though I think the required backend is the NFSv4 metadata server that has to be able to do data access for a legacy client. ---------------------------------------- [2.3 Group operations] A cluster can be seen as one computer, so we may need to explore group operations; that is, a clientid may cover all the CPUs in a "cluster computer." This brings up a client with many fault domains, which is not generally a problem in NFS today. Will we need a NFSv4 metadata filer to deliver a callback to an alternative or failover address in some circumstances? DN: Suppose I have an application cluster with 1K nodes and I say "This is really a powerful computer with a thousand (maybe two thousand) CPU's". Now that's marketing bullshit but it isn't exactly false. I think the point is that as far as the NFS server is concerned, whether the computer that is talking to it is "really" a computer, i.e. it has CPU's sharing memory, or is only a computer qua marketing bullshit, i.e. a collection of cpu's that don't share memory, that use other methods to co-ordinate common activities, doesn't matter. All the server sees are the requests made and if the cluster represents itself as a single machine (i.e. in V4 does a single SETCLIENTID or in v4.1 maintains many connections bound to a single session), it is one. The server doesn't see the cluster's memory architecture. It sees an open and then use of that stateid. The fact that it comes over a different IP address doesn't disqualify it. A server might have options to check that (as a matter of security) but it isn't part of the protocol and we already have clients with multiple IP addresses. Having a thousand of them is a difference of degree (and may pose implementation issues) but I don't see a real protocol issue. BH: This is a "clustered" implementation of a nfsv4 client in which the v4 drivers in the client cluster are cooperating and propagating state (e.g. file handles, stateids) among each other. I believe that the server should not be able to distinguish such client from a multi-homed client that may have several ip addresses. In the nfsv4 sessions world a (clustered) client may open multiple connections to the server that are associated with the same session - this will make life for such client even easier, I hope. ---------------------------------------- [2.4 Clustered server implementations.] Talking about clustered clients brought up the issue of clustered servers. We do not think the server-to-server protocols needed for implementing clustered servers should be a part of the client protocols we are discussing herein. This might change if client protocols have to understand anything other than filesystem migration and server failover, as they do now. That is, pNFS extensions are not necessarily part of a solution to a standard clustered filer protocol. ---------------------------------------- [2.5 Client modified layouts] BH: A write layout delegation (which I'm not proposing) could be a delegation to modify the *layout*, it is theoretically possible to give a single client such exclusive access to the file layout but I think this is going one step too far from where we are right now and can be problematic with respect to interoperability. DN: I agree. This would add problems. Unless there is a big payoff, I'd stay away from it. ---------------------------------------- From Brian.Pawlowski@netapp.com Wed Mar 31 09:02:28 2004 Return-Path: X-Sender: beepy@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 70593 invoked from network); 31 Mar 2004 17:02:24 -0000 Received: from unknown (66.218.66.216) by m11.grp.scd.yahoo.com with QMQP; 31 Mar 2004 17:02:24 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta1.grp.scd.yahoo.com with SMTP; 31 Mar 2004 17:02:24 -0000 Received: from frejya.corp.netapp.com (frejya [10.57.157.119]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i2VGwYZh010906 for ; Wed, 31 Mar 2004 08:58:34 -0800 (PST) Received: from tooting-fe.eng.netapp.com (tooting-fe.eng.netapp.com [10.56.10.118]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i2VGwYbc010248 for ; Wed, 31 Mar 2004 08:58:34 -0800 (PST) Received: (from beepy@localhost) by tooting-fe.eng.netapp.com (8.11.7p1+Sun/8.11.6) id i2VGwXO28367 for pnfs-reqs@yahoogroups.com; Wed, 31 Mar 2004 08:58:33 -0800 (PST) Message-Id: <200403311658.i2VGwXO28367@tooting-fe.eng.netapp.com> In-Reply-To: from Garth Gibson at "Mar 28, 4 11:02:27 pm" To: pnfs-reqs@yahoogroups.com Date: Wed, 31 Mar 2004 08:58:33 -0800 (PST) X-Mailer: ELM [version 2.4ME++ PL40 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: Brian Pawlowski From: Brian Pawlowski Subject: Re: [pnfs-reqs] FACE-TO-FACE pNFS working meeting: 3/31 9am-12:30 Grand Hyatt, Dolores Rm, SF X-Yahoo-Group-Post: member; u=169504717 X-Yahoo-Profile: brianpawlowski ADVERTISEMENT I'm a little behind - will be there shortly. > Reminder, folks, > > Before the upcoming pNFS BOF at FAST Wed Mar 31, 12:30 - 2pm, in the > Dolores room of the Grand Hyatt hotel in San Francisco, the pNFS > community reached by these mailing lists will be meeting for a working > session, 9am - 12:30pm. > > Tentative agenda: > > - Requirements update and discussion (update coming from Garth) > - Operations update and discussion (update coming from Brent) > - Use cases discussion (initial use cases coming from Andy) > > See you there! > garth > > > > > > Yahoo! Groups Links > > > > > From dhildebz@eecs.umich.edu Wed Mar 31 11:22:28 2004 Return-Path: X-Sender: dhildebz@eecs.umich.edu X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 90049 invoked from network); 31 Mar 2004 18:08:40 -0000 Received: from unknown (66.218.66.218) by m14.grp.scd.yahoo.com with QMQP; 31 Mar 2004 18:08:39 -0000 Received: from unknown (HELO smtp.eecs.umich.edu) (141.213.4.43) by mta3.grp.scd.yahoo.com with SMTP; 31 Mar 2004 18:08:39 -0000 Received: from oemcomputer (da001d1735.stl-mo.osd.concentric.net [66.236.102.199]) (authenticated bits=0) by smtp.eecs.umich.edu (8.12.11/8.12.11) with ESMTP id i2VI7X3e003579 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Wed, 31 Mar 2004 13:07:36 -0500 Message-ID: <002101c4174a$bc39a650$06396a83@oemcomputer> To: , Date: Wed, 31 Mar 2004 13:05:11 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Spam-Status: No -- Hits: -4.901 Required: 5 X-Spam-Summary: BAYES_00 X-Scanned-By: MIMEDefang 2.40 X-eGroups-Remote-IP: 141.213.4.43 From: "Dean Hildebrand" Subject: QoS papers X-Yahoo-Group-Post: member; u=169352062 X-Yahoo-Profile: seattleplus Here are 2 papers on QoS, I'm sure there are many others. http://www.usenix.org/events/fast03/tech/lumb.html http://www.almaden.ibm.com/StorageSystems/autonomic_storage/clockwork/index.shtml Dean From garth@panasas.com Wed Mar 31 15:46:21 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 26118 invoked from network); 31 Mar 2004 23:46:14 -0000 Received: from unknown (66.218.66.172) by m19.grp.scd.yahoo.com with QMQP; 31 Mar 2004 23:46:14 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta4.grp.scd.yahoo.com with SMTP; 31 Mar 2004 23:46:14 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id HZGBCJAD; Wed, 31 Mar 2004 18:46:07 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Transfer-Encoding: 7bit Message-Id: <8A31B552-836D-11D8-BB3C-000A95A94F04@panasas.com> Content-Type: text/plain; charset=US-ASCII; format=flowed To: pnfs-reqs@yahoogroups.com Date: Wed, 31 Mar 2004 15:45:50 -0800 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: FAST BOF was a crowded success X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson ADVERTISEMENT click here Thanks to all participants, today's pNFS BOF packed the room. We asked for space for 50, thinking that would be many more than would show up. Instead we had more than 85 people in the room, nearly all who stayed for the full 90 mins. Our speakers, as usual, were informative and enthusiastic. Most of the questions had to do with our commitment to supporting NFSv4 completely, that the granularity of layout delegation was in fact smaller than the filesystem, and that the semantics of file attributes like mtime and EOF might be vague as specific times while the file is being changed external to the metadata server. Our Oracle guest speaker, Sumanta Chatterjee, stirred up the room while said "good start and while you are at it, please look at full user-level IO, async IO, list IO, exposed layouts, batched interrupts, lower system CPU usage." With respect to our broad goal of spreading the message, raising the buzz and seeking additional participants, the first two look really well met. The third will be measurable on these mailing lists. garth From Brian.Pawlowski@netapp.com Wed Mar 31 17:03:48 2004 Return-Path: X-Sender: beepy@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 51103 invoked from network); 1 Apr 2004 01:03:46 -0000 Received: from unknown (66.218.66.167) by m1.grp.scd.yahoo.com with QMQP; 1 Apr 2004 01:03:46 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta6.grp.scd.yahoo.com with SMTP; 1 Apr 2004 01:03:45 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i3113jZh002905 for ; Wed, 31 Mar 2004 17:03:45 -0800 (PST) Received: from tooting-fe.eng.netapp.com (tooting-fe.eng.netapp.com [10.56.10.118]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i3113jTR012799 for ; Wed, 31 Mar 2004 17:03:45 -0800 (PST) Received: (from beepy@localhost) by tooting-fe.eng.netapp.com (8.11.7p1+Sun/8.11.6) id i3113iw20019; Wed, 31 Mar 2004 17:03:44 -0800 (PST) Message-Id: <200404010103.i3113iw20019@tooting-fe.eng.netapp.com> In-Reply-To: <8A31B552-836D-11D8-BB3C-000A95A94F04@panasas.com> from Garth Gibson at "Mar 31, 4 03:45:50 pm" To: pnfs-reqs@yahoogroups.com Date: Wed, 31 Mar 2004 17:03:44 -0800 (PST) Cc: pnfs-reqs@yahoogroups.com X-Mailer: ELM [version 2.4ME++ PL40 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: Brian Pawlowski From: Brian Pawlowski Subject: Re: [pnfs-reqs] FAST BOF was a crowded success X-Yahoo-Group-Post: member; u=169504717 X-Yahoo-Profile: brianpawlowski I counted up to 100 and then more people came in. > Thanks to all participants, today's pNFS BOF packed the room. We asked > for space for 50, thinking that would be many more than would show up. > Instead we had more than 85 people in the room, nearly all who stayed > for the full 90 mins. > > Our speakers, as usual, were informative and enthusiastic. Most of the > questions had to do with our commitment to supporting NFSv4 completely, > that the granularity of layout delegation was in fact smaller than the > filesystem, and that the semantics of file attributes like mtime and > EOF might be vague as specific times while the file is being changed > external to the metadata server. Our Oracle guest speaker, Sumanta > Chatterjee, stirred up the room while said "good start and while you > are at it, please look at full user-level IO, async IO, list IO, > exposed layouts, batched interrupts, lower system CPU usage." > > With respect to our broad goal of spreading the message, raising the > buzz and seeking additional participants, the first two look really > well met. The third will be measurable on these mailing lists. > > garth > > > > > > Yahoo! Groups Links > > > > > From garth@panasas.com Sat Apr 03 20:31:03 2004 Return-Path: X-Sender: garth@panasas.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 56257 invoked from network); 4 Apr 2004 04:31:00 -0000 Received: from unknown (66.218.66.218) by m14.grp.scd.yahoo.com with QMQP; 4 Apr 2004 04:31:00 -0000 Received: from unknown (HELO PIKES.panasas.com) (65.194.124.178) by mta3.grp.scd.yahoo.com with SMTP; 4 Apr 2004 04:31:00 -0000 Received: from [127.0.0.1] ([172.17.19.3]) by PIKES.panasas.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id 2B56KCPR; Sat, 3 Apr 2004 23:30:56 -0500 Mime-Version: 1.0 (Apple Message framework v612) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; format=flowed To: pnfs-reqs@yahoogroups.com Date: Sat, 3 Apr 2004 20:30:41 -0800 X-Mailer: Apple Mail (2.612) X-eGroups-Remote-IP: 65.194.124.178 From: Garth Gibson Subject: planning for our next face-to-face X-Yahoo-Group-Post: member; u=169457820 X-Yahoo-Profile: garth_a_gibson We talked about our next face to face being in Ann Arbor during the week of the NFSv4 bake-a-thon. in the week of June 7-11. David mentioned the conflict with T11 in Chicago, but thought it might be workable. Andy, Peter -- what are the best days in that week -- I think we are looking for a one day meeting. I personally prefer the 7th or 8th. Also, I encourage us all to post our notes from the face-to-face and BOF. thanks garth From pcorbett@netapp.com Sun Apr 04 10:20:22 2004 Return-Path: X-Sender: Peter.Corbett@netapp.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 61451 invoked from network); 4 Apr 2004 17:20:21 -0000 Received: from unknown (66.218.66.217) by m6.grp.scd.yahoo.com with QMQP; 4 Apr 2004 17:20:21 -0000 Received: from unknown (HELO mx01.netapp.com) (198.95.226.53) by mta2.grp.scd.yahoo.com with SMTP; 4 Apr 2004 17:20:21 -0000 Received: from hawk.corp.netapp.com (hawk [10.57.156.122]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id i34HKGZh006528 for ; Sun, 4 Apr 2004 10:20:16 -0700 (PDT) Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.57.156.135]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id i34HKGTR001543 for ; Sun, 4 Apr 2004 10:20:16 -0700 (PDT) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Date: Sun, 4 Apr 2004 10:20:08 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pnfs-reqs] planning for our next face-to-face Thread-Index: AcQZ/bxMb4ySfse8T2CP52qiBqibXgAa03bg To: X-eGroups-Remote-IP: 198.95.226.53 X-eGroups-From: "Corbett, Peter" From: "Corbett, Peter" Subject: RE: [pnfs-reqs] planning for our next face-to-face X-Yahoo-Group-Post: member; u=44152959 X-Yahoo-Profile: pfcorbett2004 I think any of those days is fine with me. -----Original Message----- From: Garth Gibson [mailto:garth@panasas.com] Sent: Saturday, April 03, 2004 8:31 PM To: pnfs-reqs@yahoogroups.com Subject: [pnfs-reqs] planning for our next face-to-face We talked about our next face to face being in Ann Arbor during the week of the NFSv4 bake-a-thon. in the week of June 7-11. David mentioned the conflict with T11 in Chicago, but thought it might be workable. Andy, Peter -- what are the best days in that week -- I think we are looking for a one day meeting. I personally prefer the 7th or 8th. Also, I encourage us all to post our notes from the face-to-face and BOF. thanks garth Yahoo! Groups Links From black_david@emc.com Mon Apr 05 07:55:42 2004 Return-Path: X-Sender: Black_David@emc.com X-Apparently-To: pnfs-reqs@yahoogroups.com Received: (qmail 21349 invoked from network); 5 Apr 2004 14:55:40 -0000 Received: from unknown (66.218.66.166) by m14.grp.scd.yahoo.com with QMQP; 5 Apr 2004 14:55:40 -0000 Received: from unknown (HELO MAHO3MSX2.corp.emc.com) (128.221.11.32) by mta5.grp.scd.yahoo.com with SMTP; 5 Apr 2004 14:55:39 -0000 Received: by maho3msx2.corp.emc.com with Internet Mail Service (5.5.2653.19) id ; Mon, 5 Apr 2004 10:55:14 -0400 Message-ID: