More bugzzz

Read-before-writes being issued from the client needlessly:

[1282083602:849976 sliricthr01:14541:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=98304 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=0
[1282083602:962496 sliricthr03:14543:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=229376 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083602:983695 sliricthr02:14542:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=360448 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083602:995708 sliricthr18:14558:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=491520 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083602:999686 sliricthr14:14554:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=622592 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:015023 sliricthr00:14540:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=753664 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:021103 sliricthr03:14543:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=884736 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:027943 sliricthr10:14550:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=1015808 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:035300 sliricthr09:14549:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=1146880 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:042009 sliricthr21:14561:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=1277952 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:047650 sliricthr08:14548:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=1409024 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:055499 sliricthr21:14561:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=1540096 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:061232 sliricthr14:14554:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=1671168 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:068275 sliricthr04:14544:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=1802240 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:074854 sliricthr08:14548:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=1933312 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:080775 sliricthr17:14557:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=2064384 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:088872 sliricthr21:14561:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=2195456 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:096729 sliricthr08:14548:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=2326528 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:103729 sliricthr21:14561:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=2457600 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083603:109248 sliricthr29:14569:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=2588672 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083606:738186 sliricthr19:14559:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=2719744 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083607:150602 sliricthr17:14557:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:2 sz=76546032 :: bmapno=1 size=32768 off=2850816 rw=42  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083607:185957 sliricthr08:14548:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:3 sz=76546032 :: bmapno=1 size=1048576 off=0 rw=43  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083607:204033 sliricthr21:14561:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:4 sz=76546032 :: bmapno=1 size=1048576 off=1048576 rw=43  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302
[1282083607:221164 sliricthr20:14560:gen:sli_ric_handle_io:162] fcmh@0x71af20 fg:0x080000002d8eb0:0 DA ref:5 sz=76546032 :: bmapno=1 size=655344 off=2097152 rw=43  sbd_seq=4294967302 biod_cur_seqkey[0]=4294967302

Today's basic functionality tests (r12984)

  • Single-threaded write test size verification from multiple clients (STWT_SZV) - PASS
  • Multi-threaded write test size verification from multiple clients - FAIL
Multi-threaded write test size verification from multiple clients

This is NOT working at log level 5. Note these tests used the bessemer@PSC I/O backend, with 2 I/O nodes.

group 8peReadWrite {
  files_per_dir = 4;
  tree_depth    = 0;
  tree_width    = 0;
  pes           = 4;
  test_freq     = 0;
  block_freq    = 0;
  path          = /s2/pauln;
  output_path   = /home/pauln/fio/tmp;
  filename      = largeioc;
  file_size     = 4g;
  block_size    = 1m;
  thrash_lock   = yes;
  samedir       = yes;
  samefile      = no;
  intersperse   = no;
  seekoff       = no;
  fsync_block   = no;
  verify        = yes;
  barrier       = yes;
  time_block    = yes;
  block_barrier = no;
  time_barrier  = no;
  iterations    = 1;
  debug_conf    = no;
  debug_block    = no;
  debug_memory    = no;
  debug_buffer    = no;
  debug_output    = no;
  debug_dtree     = no;
  debug_barrier   = no;
  debug_iofunc    = no;

  iotests (
	WriteEmUp [create:openwr:write:close]
  )
}

All blocks were written:

(pauln@lemon:TGFIO_tests)$ grep "block# 4095"  ./largeio.test1.outc
1282068769.436847 PE_00002 do_io() :: bl_wr 0000.090650 MB/s 0011.031429 block# 4095 bwait 00.000000
1282068775.774260 PE_00003 do_io() :: bl_wr 0000.088850 MB/s 0011.254921 block# 4095 bwait 00.000000
1282068776.913274 PE_00001 do_io() :: bl_wr 0000.083602 MB/s 0011.961443 block# 4095 bwait 00.000000
1282068778.415998 PE_00000 do_io() :: bl_wr 0000.075364 MB/s 0013.268957 block# 4095 bwait 00.000000

However, all files should be 4294967296. At least the clients agree on the size which points to the mds or sliod as the culprit.

Orange:

-rw-r--r-- 1 pauln staff 4215275520 Aug 17 14:07 fio_f.pe0.largeioc.0.0
-rw-r--r-- 1 pauln staff 4294967296 Aug 17 14:07 fio_f.pe1.largeioc.0.0
-rw-r--r-- 1 pauln staff 4202037232 Aug 17 14:07 fio_f.pe2.largeioc.0.0
-rw-r--r-- 1 pauln staff 4215930864 Aug 17 14:07 fio_f.pe3.largeioc.0.0

Lemon:

-rw-r--r-- 1 pauln staff 4215275520 Aug 17 14:07 fio_f.pe0.largeioc.0.0
-rw-r--r-- 1 pauln staff 4294967296 Aug 17 14:07 fio_f.pe1.largeioc.0.0
-rw-r--r-- 1 pauln staff 4202037232 Aug 17 14:07 fio_f.pe2.largeioc.0.0
-rw-r--r-- 1 pauln staff 4215930864 Aug 17 14:07 fio_f.pe3.largeioc.0.0
Single threaded write test with size verification from multiple clients

This test is working at log level 5 on clients and servers. stat(2)’s from the writer client and a 3rd party client are both correct, with the 3rd party client timing out his size attributes after 8 seconds.

group 8peReadWrite {
  files_per_dir = 1;
  tree_depth    = 0;
  tree_width    = 0;
  pes           = 1;
  test_freq     = 0;
  block_freq    = 0;
  path          = /s2/pauln;
  output_path   = /home/pauln/fio/tmp;
  filename      = largeiob;
  file_size     = 4g;
  block_size    = 1m;
  thrash_lock   = yes;
  samedir       = yes;
  samefile      = no;
  intersperse   = no;
  seekoff       = no;
  fsync_block   = no;
  verify        = yes;
  barrier       = yes;
  time_block    = yes;
  block_barrier = no;
  time_barrier  = no;
  iterations    = 1;
  debug_conf    = no;
  debug_block    = no;
  debug_memory    = no;
  debug_buffer    = no;
  debug_output    = no;
  debug_dtree     = no;
  debug_barrier   = no;
  debug_iofunc    = no;

  iotests (
	WriteEmUp [create:openwr:write:close]
  )
}

Orange:
-rw-r--r-- 1 pauln staff 4294967296 Aug 17 13:58 fio_f.pe0.largeiob.0.0

Lemon:
-rw-r--r-- 1 pauln staff 4294967296 Aug 17 13:58 fio_f.pe0.largeiob.0.0

Wow. Been a while since I’ve updated this!

TG10 BOF talk

Paul gave a Birds of a Feather talk at Teragrid 2010.

Web site online

The Web site is now available.

SLASH client mounted at WVU

After a few go-rounds with various firewall and security mumbo-jumbo we finally have mounted wolverine’s SLASH2 export at WVU.

Here’s a df(1) command and gdb stack trace for the first bug:

(root@castor:mount_slash)# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda3             68959192   5880592  59575628   9% /
/dev/sda1               101086     20617     75250  22% /boot
tmpfs                  1029476       640   1028836   1% /dev/shm
/slashfs_client      478468950    672976 477795975   1% /slashfs_client

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe6be8950 (LWP 28034)]
0x0000003e8c232f05 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install fuse-libs-2.7.4-2.fc10.x86_64 glibc-2.9-3.x86_64
(gdb) bt
#0  0x0000003e8c232f05 in raise () from /lib64/libc.so.6
#1  0x0000003e8c234a73 in abort () from /lib64/libc.so.6
#2  0x0000000000456477 in _psclogv (
    fn=0x4aef30 "..//..//psc_fsutil_libs/include/psc_util/lock.h",
    func=0x4aef25 "_tands", line=203, subsys=5, level=0, options=0,
    fmt=0x4aef80 "lock %p has invalid value (%d)", ap=0x7fffe6be78c0)
    at ..//..//psc_fsutil_libs/psc_util/log.c:225
#3  0x00000000004566af in _psc_fatal (
    fn=0x4aef30 "..//..//psc_fsutil_libs/include/psc_util/lock.h",
    func=0x4aef25 "_tands", line=203, subsys=5, level=0, options=0,
    fmt=0x4aef80 "lock %p has invalid value (%d)")
    at ..//..//psc_fsutil_libs/psc_util/log.c:246
#4  0x0000000000404f9b in _tands (s=0x7ffe58)
    at ..//..//psc_fsutil_libs/include/psc_util/lock.h:203
#5  0x0000000000404ebd in spinlock (s=0x7ffe58)
    at ..//..//psc_fsutil_libs/include/psc_util/lock.h:212
#6  0x0000000000404e58 in reqlock (sl=0x7ffe58)
    at ..//..//psc_fsutil_libs/include/psc_util/lock.h:255
#7  0x00000000004105f2 in slash2fuse_lookup_helper (req=0x7fffe0011460,
    parent=13240447, name=0x7fffe43a1038 "fio_f.pe6.8peRW_1mbs.0.35")
    at main.c:1099
#8  0x0000000000403ee2 in slash2fuse_listener_loop (arg=0x0)
    at fuse_listener.c:260
#9  0x0000003e8ce073da in start_thread () from /lib64/libpthread.so.0
#10 0x0000003e8c2e62bd in clone () from /lib64/libc.so.6
(gdb)