Teragrid 2011 Slides
This week we gave a presentation at Teragrid ‘11. The title of the talk was “SLASH2 – File System for Widely Distributed Systems”. Here are the slides.
This week we gave a presentation at Teragrid ‘11. The title of the talk was “SLASH2 – File System for Widely Distributed Systems”. Here are the slides.
Support for st_blocks in struct stat has been implemented to support utilities such as du(1) and statvfs(2) now more usefully reflects a mount_slash client’s preferred I/O system for supporting utilities such as df(1).
The statvfs(2) implementation simply tracks each I/O node’s backend file system through updates sent to the MDS whenever convenient. Then, when a mount_slash client issues a STATFS request to the MDS to get file system stats such as free blocks, the MDS returns the statvfs(2) pertaining to this client’s preferred I/O system.
Future improvements: persistently store the statvfs data somewhere in the MDS.
The st_blocks implementation is a bit more involved but works in an somewhat analogous fashion: whenever a sliod sends CRC or REPLWK updates, which happens after I/O has been issued from the client or a replica has been made/updated, the st_blocks for the file involved is sent along to the MDS. This value is tracked in the inode for each I/O system replica and, in the case of CRC updates, the delta from the previous value in the inode to the new value for this I/O system is applied to the file’s st_blocks.
Specifically, st_blocks in SLASH2 means the number of 512-byte blocks in use by a file across all non-replicated regions of data, wherever these regions may reside. Replicas update only their per-IOS count of blocks and not the aggregate st_blocks whereas WRITE updates affect both the per-IOS count as well as the aggregate st_blocks value returned to mount_slash clients.
Yu Fu from University of Florida has mounted the extenci-slash2 file system at his site. I believe they have a storage resource residing at UF but are using an MDS hosted here at PSC. PSC also has a storage resource in the configuration. Good work from Yu and Jared!
Jared gave a short talk to some application users of wide-area networks.
Over the past few weeks a number of patches addressing reliability in network failures have made their way into the tree. There are still lurking issues as the patches touch quite involved code but progress is coming along and the functionality should be in standing order “soon”.
The implementation makes a number of assumptions which may eventually evolve into tunable knobs. For example, by default, 10 retries are made before returning error to the issuing client process’ I/O syscall. This hard limit was chosen to prevent indefinite lockups but, as explained, can be easily made tunable.
There are other planned ideas, such as knobs that can be tuned to alert the process’ user of the condition (via system mail) or even writing to the stderr of the process (obviously not desired by default). As well as mechanisms which may allow dynamic control via process environment. The framework should be able to support these endeavors as soon as users request such mechanisms.
In the meantime, the goal is to make unreliable networks more reliable as one of our test environments by chance happens to have very flaky TCP sockets.