Functionality test results (pre-r13130)

  • STWT_SZV - SUCCESS
  • MTWT_SZV - SUCCESS
  • IOZONE - FAIL
  • CREATE+STAT - NOTRUN
Today's bugfixes

Have uncommitted patches on grapefruit to deal with a problem spotted by Zhihui during a kernel compile. The problem was that the client was releasing a write bmap prior to doing any I/O. Shortly after the client issues I/O but sliod had already scheduled the bmap to be released. The bmap does not get released by sliod because pending I/O’s are detected but once the final bcr has been committed to the MDS. sliod then tries to schedule the bmap for release ops but fails because it had already been placed on the queue.

Testing patches for a chwrmode bug on the MDS where a mode change blindly increments bmdsi_writers, even if that client has a duplicate write lease.

Functionality test results (r13105)

  • STWT_SZV - NOTRUN
  • MTWT_SZV - NOTRUN
  • IOZONE - FAILS with stack trace below
  • CREATE+STAT - NOTRUN
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffebbfff710 (LWP 537)]
0x00000000004a2e26 in mds_bmap_directio (b=0x15e0c20, rw=SL_READ, np=0x15d3bd8)
    at /home/pauln/Code/fc13/projects/slash_nara/slashd/mds.c:207
warning: Source file is more recent than executable.
207                     psc_assert(bml->bml_flags & BML_WRITE);
(gdb) bt
#0  0x00000000004a2e26 in mds_bmap_directio (b=0x15e0c20, rw=SL_READ, np=0x15d3bd8)
    at /home/pauln/Code/fc13/projects/slash_nara/slashd/mds.c:207
#1  0x00000000004a63b6 in mds_bmap_bml_add (bml=0x15d3bc0, rw=SL_READ, prefios=131072)
    at /home/pauln/Code/fc13/projects/slash_nara/slashd/mds.c:597
#2  0x00000000004acd67 in mds_bmap_load_cli (f=0x8ecd30, bmapno=1, flags=0, rw=SL_READ,
    prefios=131072, sbd=0x7ffeb00028c0, exp=0x7ffe880010c0, bmap=0x7ffebbffe878)
    at /home/pauln/Code/fc13/projects/slash_nara/slashd/mds.c:1440
#3  0x00000000004c50be in slm_rmc_handle_getbmap (rq=0x19796d0)
    at /home/pauln/Code/fc13/projects/slash_nara/slashd/rmc.c:253
#4  0x00000000004ca59f in slm_rmc_handler (rq=0x19796d0)
    at /home/pauln/Code/fc13/projects/slash_nara/slashd/rmc.c:991
#5  0x000000000045b4f1 in pscrpc_server_handle_request (svc=0x183f210, thread=0x7ffeb00008c0)
    at /home/pauln/Code/fc13/projects/psc_fsutil_libs/psc_rpc/service.c:371
#6  0x000000000045e59b in pscrpcthr_main (thr=0x7ffeb00008c0)
    at /home/pauln/Code/fc13/projects/psc_fsutil_libs/psc_rpc/service.c:731
#7  0x0000000000489896 in _pscthr_begin (arg=0x7fffffffd4e0)
    at /home/pauln/Code/fc13/projects/psc_fsutil_libs/psc_util/thread.c:281
#8  0x0000003bf7e07761 in start_thread (arg=0x7ffebbfff710) at pthread_create.c:301
#9  0x0000003bf7ae14ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Client bug due to bmap_flush() sleep time being too short

To be clear this isn’t exactly a bug but it does cause some CPU abuse:

[1282327206:159214 msbflushthr:23667:bmap:bmap_flush:754] bmap@0x882110 b:3 m:8259 i:80000002d907b opcnt=19 restore to dirty list
[1282327206:161331 msbflushthr:23667:bmap:bmap_flush:639] bmap@0x881330 b:1 m:8259 i:80000002d907b opcnt=65 try flush (outstandingRpcCnt=75)
[1282327206:161347 msbflushthr:23667:bmap:bmap_flush:639] bmap@0x882110 b:3 m:8259 i:80000002d907b opcnt=19 try flush (outstandingRpcCnt=75)
[1282327206:161360 msbflushthr:23667:bmap:bmap_flush:754] bmap@0x881330 b:1 m:8259 i:80000002d907b opcnt=65 restore to dirty list
[1282327206:161374 msbflushthr:23667:bmap:bmap_flush:754] bmap@0x882110 b:3 m:8259 i:80000002d907b opcnt=19 restore to dirty list
[1282327206:163490 msbflushthr:23667:bmap:bmap_flush:639] bmap@0x881330 b:1 m:8259 i:80000002d907b opcnt=65 try flush (outstandingRpcCnt=75)
[1282327206:163507 msbflushthr:23667:bmap:bmap_flush:639] bmap@0x882110 b:3 m:8259 i:80000002d907b opcnt=19 try flush (outstandingRpcCnt=75)
[1282327206:163520 msbflushthr:23667:bmap:bmap_flush:754] bmap@0x881330 b:1 m:8259 i:80000002d907b opcnt=65 restore to dirty list
[1282327206:163533 msbflushthr:23667:bmap:bmap_flush:754] bmap@0x882110 b:3 m:8259 i:80000002d907b opcnt=19 restore to dirty list
[1282327206:165650 msbflushthr:23667:bmap:bmap_flush:639] bmap@0x881330 b:1 m:8259 i:80000002d907b opcnt=65 try flush (outstandingRpcCnt=75)
[1282327206:165668 msbflushthr:23667:bmap:bmap_flush:639] bmap@0x882110 b:3 m:8259 i:80000002d907b opcnt=19 try flush (outstandingRpcCnt=75)
[1282327206:165682 msbflushthr:23667:bmap:bmap_flush:754] bmap@0x881330 b:1 m:8259 i:80000002d907b opcnt=65 restore to dirty list
[1282327206:165700 msbflushthr:23667:bmap:bmap_flush:754] bmap@0x882110 b:3 m:8259 i:80000002d907b opcnt=19 restore to dirty list

Every 2ms the bmap_flush thread wakes up to process new work. The reason the thread’s sleep is so short is so that it can service new requests quickly. For small files this is especially important because it lowers the latency. Perhaps another queue for delayed or pending I/O’s would solve this problem and allow immediate access to the queue with no 2ms spin.

Filesizes for large files seem to be working again!

Today’s functionality test results (r13082)

  • STWT_SZV - PASS
  • MTWT_SZV - PASS
  • IOZONE - FAIL
  • CREATE+STAT ??

Lemon:

Every 1.0s: ls -la /s2/pauln/FIO_TEST_ROOT/fio_f.pe0.largeioj.0.0 /s2/pauln/FIO_TEST_ROOT/fio_f.pe1.large...  Thu Aug 19 17:42:16 2010

-rw-r--r-- 1 pauln staff 4294967296 Aug 19 17:27 /s2/pauln/FIO_TEST_ROOT/fio_f.pe0.largeioj.0.0
-rw-r--r-- 1 pauln staff 4294967296 Aug 19 17:27 /s2/pauln/FIO_TEST_ROOT/fio_f.pe1.largeioj.0.0
-rw-r--r-- 1 pauln staff 4294967296 Aug 19 17:27 /s2/pauln/FIO_TEST_ROOT/fio_f.pe2.largeioj.0.0
-rw-r--r-- 1 pauln staff 4294967296 Aug 19 17:27 /s2/pauln/FIO_TEST_ROOT/fio_f.pe3.largeioj.0.0

Orange:

Every 1.0s: ls -la /s2/pauln/FIO_TEST_ROOT/fio_f.pe0.largeioj.0.0 /s2/pauln/FIO_TEST_ROOT/fio_f.pe1.large...  Thu Aug 19 17:42:16 2010

-rw-r--r-- 1 pauln staff 4294967296 Aug 19 17:27 /s2/pauln/FIO_TEST_ROOT/fio_f.pe0.largeioj.0.0
-rw-r--r-- 1 pauln staff 4294967296 Aug 19 17:27 /s2/pauln/FIO_TEST_ROOT/fio_f.pe1.largeioj.0.0
-rw-r--r-- 1 pauln staff 4294967296 Aug 19 17:27 /s2/pauln/FIO_TEST_ROOT/fio_f.pe2.largeioj.0.0
-rw-r--r-- 1 pauln staff 4294967296 Aug 19 17:27 /s2/pauln/FIO_TEST_ROOT/fio_f.pe3.largeioj.0.0

biod_bcr, biod_bcr_sched, and biod_bklog_bcrs

Retooling this a bit. The current code was last touched during the mad rush for TG’10.

biod_bcr
The pointer to the most recent, non-full bcr for the given biodi.
biod_bcr_sched
Means that a bcr is either on the hold or ready list.
biod_bklog_bcrs
Overflow queue for bcrs waiting to be sent to MDS or for more CRCs. bcr's not at the tail of the list must be full.

Bcr processing is tricky because at any given time, sliod can only send a single bcr to the MDS. A bcr must be on a list.. either the hold, ready, or backlog.