SLASH2 · blog

Full truncate / generation numbering bug

Sep 24, 2010 by pauln

Full truncate causes the FID’s generation number to be increased. This is mainly for garbage collection purposes so that sliod objects may be garbage collected asynchronously. Upon being presented with a new FID/gen pair, sliod will create a new object. When dealing with full truncate this technique gets tricky. In single writer mode, we must ensure that queued writes, originating prior to the truncate are either not flushed at all and/or do not end up in the sliod object representing the new FID/gen pair.

In multi-writer mode each clients should be in directio mode, meaning that some relative order of operations can be implied. Therefore bdbuf’s with an old generation number may be targeted at the current backing file so long as a directio indicator is set. A directio indicator should be similar to the one used for O_APPEND writes.

Any read, regardless of its bdbuf’s generation number, should be targeted at the current sliod object.

For now, our client-side implementation of truncate will work like this:

Full truncate:: All existing bmaps are "orphaned" so they may be replaced by bmaps with the most recent bdbuf tokens. Pending or inflight I/O's may still be processed. This is allowable because sliod will see that the generation number is old and that these are asynchronous writes. Therefore they will be dropped. The client could be made to cancel the flush operation for these writes too.
Partial truncate:: Block or cancel all writes past the truncation point so that asynchronous writes do not compromise the operational ordering.
Reads and DIO writes:: These always go the most recent object.

More fcmh generation number woes

Sep 20, 2010 by pauln

It seems there’s no good way for the MDS to inform the client, which holds existing bdbuf’s, that the generation number has changed. sliod relies on the bdbuf to authenticate the I/O from the client. However, when full truncates occur, there’s currently no good method for adjusting the generation number within existing bdbufs held by the client. The result is that I/O’s are being targeted to the incorrect backing file on sliod.

fcmh generation problems

Sep 20, 2010 by pauln

The method I’m exploring now is one where the clients will flush their bmap caches and release bdbufs for bmaps belonging to the truncated FID. We must ensure that no async I/O’s issued prior to truncate are sent to sliod before acquiring new bdbufs. Therefore we must lock the fcmh and iteratively free its bmaps prior to sending any truncate RPC to the MDS.

Verify that s2 generation number is being handled properly for CRC updates

Sep 16, 2010 by pauln

File size of 0 found during kernel untar

Sep 16, 2010 by pauln

I think this is due to several restarts of the MDS while I/O to the sliod was ongoing. Still tracking.

For now here’s a simple regression test to track this:

cli1:

(cd /tmp/ && tar jxf ~pauln/linux-2.6.34.tar.bz2 && find linux-2.6.34 -type f -exec md5sum {} ; > /tmp/md5s)

cli2:

(~pauln/s2reg/kernel.sh > /dev/null)