Spinning on bmapFlushQ

Pending writes should be refined with a field for unsent writes. This way the bmap flush thread would not keep reprocessing bmaps which have previously sent, uncompleted write biorqs. At the moment the bmap flush thread polls the flushQ, sleeping for 2µs.

Functionality test results (r13241)

  • STWT_SZV - FAIL (hang in the bmpc reaper. Run after Iozone and MTWT)
  • MTWT_SZV - SUCCESS
  • IOZONE - SUCCESS
  • CREATE+STAT - NOTRUN
Iozone completes!
	131072   16384  151040  152250    63398    63130   70574  110137   69885   742318   401939  113827    87376   63103    62733
	262144      64  127957  131976    50374    49968   51286  109407   50925   658132    53109   109602   109616   49703    48757
	262144     128  128460  130409    63484    64353   67561  109540   66389   939217    67346   61564    62272   63887    64436
	262144     256  124699  123667    64261    64619   66418  109579   66304   890023    68356   52875    60140   64129    63939
	262144     512  127747  128208    64963    64782   67166  109366   66881   910048    68942   51256    60371   63676    64156
	262144    1024  129086  132626    64449    64465   65850  109590   66171   836729    67849   45656    62659   64524    64553
	262144    2048  134714  135328    63958    64081   67473  109536   65392   718438    68401   41272    45193   63625    63095
	262144    4096  125039  127863    64413    64623   68209  109585   67035  1104080    69146   41757    58955   63877    63899
	262144    8192  125004  128574    64351    63647   66848  109677   63678   882054    72393   44599    46715   64699    64725
	262144   16384  124248  125691    64046    63929   68465  109797   63929   725180   405494   53512    63159   64061    63793
	524288      64  118469   85031    23169    30484   10701    7866   19810   672532     8548   107962   106495   20325    26591
	524288     128  105908   89033    25575    35472   18851   13254   19944   876107    18114   35142    33814   21247    30385
	524288     256  117260   95669    30480    34230   22685   29827   23022   817172    35354   30332    46171   21616    30748
	524288     512  116184   72952    26786    35881   28867   87851   27261   871125    38115   28145    42941   20739    30704
	524288    1024  117427   85795    28558    36190   37673  104597   27181   774002    38999   26876    40118   21071    30782
	524288    2048  107134   93798    28449    35137   37236  107733   30773   890050    39438   30477    52565   21195    31767
	524288    4096  119097   98051    28793    34432   37358  106891   29528  1109057    39114   30421    51219   22030    33060
	524288    8192  107633   96480    26593    34921   37258  107955   30603   889855    40154   35802    55059   22491    33081
	524288   16384  116682  106040    28736    34376   38013  107656   31217   736744    40580   36463    54324   23274    34070

iozone test complete.

bml_key removed from bmap lease structure

To addressing yesterday’s bmap lease bug, I’ve decided to remove the bml_key and hence, bml association from the odtable entry. At this point the odtable key is only known by the bmdsi. Hopefully this will properly address issues with out of order lock releases.

Rev 13218 should greatly improve performance for large file writes

The buffer cache LRU reclamation was horribly buggy and inefficient. Yesterday I determined that in some cases newer bmpce’s were being placed at the front of the list, causing older entries to be skipped (ie not considered for reclamation). Also found strong evidence that the bmpce lru’s were ordered from oldest to newest instead of the other way around. Preliminary test results with Iozone are promising. Performance has jumped by a factor of 3 to 4!

Tricky bug in the bmap write code

Iozone does some interesting things to SLASH2. By holding bmaps open for long periods of time and performing chwrmode on read bmaps, iozone has been shaking out many types of bugs - especially in the bmap lease code.

Here’s a case where sliod relinquishes his most recent lease for bmap@0x15e0c10 while another duplicate write lease remains. Since the duplicate is an older write lease, it doesn’t have the correct odtable key for releasing the odtable slot. Soon the MDS will crash because mdsi_writers == 0 but the mion is still assigned.

This bug is partially caused by the fact that sliod only remembers one release seq number per bmap. Subsequent release requests of write bmaps overwrite any previously stored sequence number. While this seems buggy, it in fact helps simulate downed or otherwise unreachable sliods. Therefore the problem must be dealt with by the MDS.

One suggestion is that for revocation of a newer write lease, all older write leases are invalidated too. This would solve the problem of the odtable entry key.

[1282675290:846711 slmrmithr03:7617:bmap:mds_handle_rls_bmap:1011] bmap@0x15e0c10 b:1 m:0 i:80000002d9347 opcnt=4 release 4294969716 nid=562995062530842 pid=2147502232 bml=0x15d4820
[1282675290:846742 slmrmithr03:7617:bmap:mds_bmap_bml_release:774] bmap@0x15e0c10 b:1 m:0 i:80000002d9347 opcnt=4 bml=0x15d4820 seq=4294969716 key=-3267403435540628159
[1282675290:846761 slmrmithr03:7617:bmap:mds_bmap_dupls_find:464] bmap@0x15e0c10 b:1 m:1024 i:80000002d9347 opcnt=4 bml=0x15d4c40 tmp=0x15d4c40 (wlease=1 rlease=0) (nwtrs=1 nrdrs=0)
[1282675290:846775 slmrmithr03:7617:bmap:mds_bmap_dupls_find:464] bmap@0x15e0c10 b:1 m:1024 i:80000002d9347 opcnt=4 bml=0x15d4c40 tmp=0x15d4820 (wlease=2 rlease=0) (nwtrs=1 nrdrs=0)
[1282675290:846791 slmrmithr03:7617:bmap:mds_bmap_bml_release:950] bmap@0x15e0c10 b:1 m:0 i:80000002d9347 opcnt=4 removing reference (type=2)
[1282675290:846807 slmrmithr03:7617:bmap:mds_handle_rls_bmap:1019] bmap@0x15e0c10 b:1 m:0 i:80000002d9347 opcnt=3 removing reference (type=0)
[1282675295:139646 slmbmaptimeothr:7613:bmap:mds_bmap_bml_release:774] bmap@0x15e0c10 b:1 m:0 i:80000002d9347 opcnt=2 bml=0x15d4c40 seq=4294969689 key=8707359475757897713
[1282675295:139661 slmbmaptimeothr:7613:bmap:mds_bmap_dupls_find:464] bmap@0x15e0c10 b:1 m:1024 i:80000002d9347 opcnt=2 bml=0x15d4c40 tmp=0x15d4c40 (wlease=1 rlease=0) (nwtrs=1 nrdrs=0)
[1282675295:139677 slmbmaptimeothr:7613:bmap:mds_bmap_bml_release:894] bmap@0x15e0c10 b:1 m:1024 i:80000002d9347 opcnt=2 bml=0x15d4c40 bmdsi_writers=0 bmdsi_readers=0
[1282675295:139697 slmbmaptimeothr:7613:bmap:mds_bmap_bml_release:928] bmap@0x15e0c10 b:1 m:1024 i:80000002d9347 opcnt=2 !bmdsi_writers but bml_key (8707359475757897713) != odtr_key(-3267403435540628159)
[1282675295:139711 slmbmaptimeothr:7613:bmap:mds_bmap_bml_release:950] bmap@0x15e0c10 b:1 m:0 i:80000002d9347 opcnt=2 removing reference (type=2)
[1282675299:523693 slmrmcthr25:7666:bmap:bmap_lookup_cache_locked:123] bmap@0x15e0c10 b:1 m:0 i:80000002d9347 opcnt=2 took reference (type=0)