0

I have a Pi 3 B running Raspbian Stretch. Recently, I started experiencing lag logging in with SSH, and sporadically during use. It really became apparent when I tried zipping my 2GB Plex library files. It would hang a few times per minute on a small file. Monitoring in htop, the process will change to state D and drop to 0% CPU. Typically when it's hung, I can't execute a command from another window. It resumes after 1-2 seconds.

I suspected power issues, but running this command...

$ /opt/vc/bin/vcgencmd get_throttled

...indicates nothing is throttling the system when the freezes happen. I also don't get the lighting bolt on the screen.

I ran this command as root:

# echo w > /proc/sysrq-trigger

When no processes are hung, it puts this in /var/log/messages:

Dec  8 21:00:57 gax-pi-2 kernel: [  549.369634] sysrq: SysRq : Show Blocked State
Dec  8 21:00:57 gax-pi-2 kernel: [  549.369662]   task                PC stack   pid father

When the zip process is hung, it puts this in /var/log/messages:

Dec  8 21:01:41 gax-pi-2 kernel: [  594.077763] sysrq: SysRq : Show Blocked State
Dec  8 21:01:41 gax-pi-2 kernel: [  594.077802]   task                PC stack   pid father
Dec  8 21:01:41 gax-pi-2 kernel: [  594.077856] jbd2/mmcblk0p2- D    0    69      2 0x00000000
Dec  8 21:01:41 gax-pi-2 kernel: [  594.077903] [<8079f530>] (__schedule) from [<8079fb98>] (schedule+0x50/0xa8)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.077924] [<8079fb98>] (schedule) from [<8014b690>] (io_schedule+0x20/0x40)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.077943] [<8014b690>] (io_schedule) from [<8021d718>] (wait_on_page_bit+0x110/0x130)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.077963] [<8021d718>] (wait_on_page_bit) from [<8021d818>] (__filemap_fdatawait_range+0xe0/0x114)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.077979] [<8021d818>] (__filemap_fdatawait_range) from [<8021d8b8>] (filemap_fdatawait_keep_errors+0x34/0x60)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078000] [<8021d8b8>] (filemap_fdatawait_keep_errors) from [<8037b7bc>] (jbd2_journal_commit_transaction+0x79c/0x17f0)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078020] [<8037b7bc>] (jbd2_journal_commit_transaction) from [<80381920>] (kjournald2+0x110/0x2c0)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078037] [<80381920>] (kjournald2) from [<8013dd98>] (kthread+0x13c/0x16c)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078058] [<8013dd98>] (kthread) from [<801080ac>] (ret_from_fork+0x14/0x28)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078174] zip             D    0  1693   1689 0x00000000
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078200] [<8079f530>] (__schedule) from [<8079fb98>] (schedule+0x50/0xa8)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078215] [<8079fb98>] (schedule) from [<8014b690>] (io_schedule+0x20/0x40)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078230] [<8014b690>] (io_schedule) from [<807a0508>] (bit_wait_io+0x1c/0x70)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078247] [<807a0508>] (bit_wait_io) from [<807a0198>] (__wait_on_bit+0x94/0xd0)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078264] [<807a0198>] (__wait_on_bit) from [<807a025c>] (out_of_line_wait_on_bit+0x88/0x90)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078281] [<807a025c>] (out_of_line_wait_on_bit) from [<80379544>] (do_get_write_access+0x254/0x4f4)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078299] [<80379544>] (do_get_write_access) from [<8037982c>] (jbd2_journal_get_write_access+0x48/0x74)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078319] [<8037982c>] (jbd2_journal_get_write_access) from [<8031c68c>] (__ext4_journal_get_write_access+0x44/0x84)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078340] [<8031c68c>] (__ext4_journal_get_write_access) from [<8033a618>] (ext4_reserve_inode_write+0x8c/0xc0)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078356] [<8033a618>] (ext4_reserve_inode_write) from [<8033a830>] (ext4_mark_inode_dirty+0x48/0x1e0)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078371] [<8033a830>] (ext4_mark_inode_dirty) from [<8033f8cc>] (ext4_dirty_inode+0x54/0x70)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078389] [<8033f8cc>] (ext4_dirty_inode) from [<802bbe9c>] (__mark_inode_dirty+0x54/0x408)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078414] [<802bbe9c>] (__mark_inode_dirty) from [<802a7e78>] (generic_update_time+0x84/0xc0)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078432] [<802a7e78>] (generic_update_time) from [<802a81c0>] (file_update_time+0xc8/0x11c)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078451] [<802a81c0>] (file_update_time) from [<802217fc>] (__generic_file_write_iter+0xa0/0x1f8)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078472] [<802217fc>] (__generic_file_write_iter) from [<80328dd0>] (ext4_file_write_iter+0x188/0x454)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078491] [<80328dd0>] (ext4_file_write_iter) from [<8028b8c0>] (__vfs_write+0xe0/0x138)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078511] [<8028b8c0>] (__vfs_write) from [<8028bae0>] (vfs_write+0xb4/0x1bc)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078528] [<8028bae0>] (vfs_write) from [<8028bd30>] (SyS_write+0x54/0xb0)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078545] [<8028bd30>] (SyS_write) from [<80108000>] (ret_fast_syscall+0x0/0x28)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078553] kworker/0:0     D    0  1717      2 0x00000000
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078577] Workqueue: events_freezable mmc_rescan
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078598] [<8079f530>] (__schedule) from [<8079fb98>] (schedule+0x50/0xa8)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078615] [<8079fb98>] (schedule) from [<8061a79c>] (__mmc_claim_host+0xb8/0x1cc)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078632] [<8061a79c>] (__mmc_claim_host) from [<8061a8e0>] (mmc_get_card+0x30/0x34)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078650] [<8061a8e0>] (mmc_get_card) from [<806234dc>] (mmc_sd_detect+0x20/0x74)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078665] [<806234dc>] (mmc_sd_detect) from [<8061d1a8>] (mmc_rescan+0x1c8/0x394)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078681] [<8061d1a8>] (mmc_rescan) from [<801379b4>] (process_one_work+0x158/0x454)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078695] [<801379b4>] (process_one_work) from [<80137d14>] (worker_thread+0x64/0x5b8)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078710] [<80137d14>] (worker_thread) from [<8013dd98>] (kthread+0x13c/0x16c)
Dec  8 21:01:41 gax-pi-2 kernel: [  594.078726] [<8013dd98>] (kthread) from [<801080ac>] (ret_from_fork+0x14/0x28)

I ran this command as well:

# time badblocks -sv /dev/mmcblk0p2 -o mmcblk0p2.log

And this was the result. Which seems pretty good to me.

Checking blocks 0 to 15511039
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)
0.52user 13.93system 13:24.49elapsed 1%CPU (0avgtext+0avgdata 2260maxresident)k
31022080inputs+0outputs (0major+139minor)pagefaults 0swaps

I added fsck.mode=force to cmdline.txt and rebooted, with this result:

Dec  8 22:11:13 gax-pi-2 systemd-fsck[122]: Pass 1: Checking inodes, blocks, and sizes
Dec  8 22:11:13 gax-pi-2 systemd-fsck[122]: Pass 2: Checking directory structure
Dec  8 22:11:13 gax-pi-2 systemd-fsck[122]: Pass 3: Checking directory connectivity
Dec  8 22:11:13 gax-pi-2 systemd-fsck[122]: Pass 4: Checking reference counts
Dec  8 22:11:13 gax-pi-2 systemd-fsck[122]: Pass 5: Checking group summary information
Dec  8 22:11:13 gax-pi-2 systemd-fsck[122]: /dev/mmcblk0p2: 335467/955808 files (0.3% non-contiguous), 3338340/3877760 blocks

I feel like the stack trace from /var/log/messages is telling me there are FS or SD Card problems, but it isn't showing up in the tests. Do I need to replace my card, or is there something else I should be trying?

Edit:

Before running the Zip, $ free -m says:

              total        used        free      shared  buff/cache   available
Mem:            927         224          33          12         669         633
Swap:            99           0          99

During a hang, it says:

              total        used        free      shared  buff/cache   available
Mem:            927         348          31           5         546         509
Swap:            99           7          92

It seems to hang whether the swap used is changing or not.

  • Have you checked your memory usage? My Pi lags when it starts swapping to disk. – CoderMike Dec 09 '18 at 16:57
  • htop typically indicates ~350M/927M. I'll update the post above with results of `$ free -m`. – Dustin Lewis Dec 09 '18 at 18:47
  • If you notice applications regularly and randomly being afflicted with a D state, that is a clue there may be storage problems. If it is only an app that is zipping a 2 GB file on a machine with 1 GB of memory and probably a 10 MB/s write speed, maybe that's not so strange. – goldilocks Dec 09 '18 at 19:38

1 Answers1

1

The tests you have made are mainly made for rotating magnetic disk storages. The tests do not fit very well to solid-state storages like SD Cards. The hardware of them correct bad cells silently without any reports to the operating system. I have never seen bad blocks on a solid-state drive (SSD) or a SD Card. The critical path accessing data is writing to the SD Card. If writing data is problematic but still completed after 1-2 seconds then also fsck will not find errors.

There is also another issue with buffering data on the operating system and writing it back to the storage later. Details for this you can see at Detect an SD card that became read-only. There you find also programs for checking SD Cards.

As long as you can read your SD Card I suggest to clone it to a new one and look if it works without delays on access.

Ingo
  • 40,606
  • 15
  • 76
  • 189
  • 1
    Even on spinning disks, passing a `badblocks` test with flying colours doesn't mean the device is working properly -- it just means it doesn't seem to have any bad blocks. In other words, you can prove something is broken that way, but you can't prove it isn't. – goldilocks Dec 09 '18 at 19:36
  • Makes sense. I went ahead and replaced the SD card and the misbehavior went away, performing the same tasks with the same disk image. – Dustin Lewis Dec 09 '18 at 21:45