2

I currently use a raspberry-pi that I connect over with SSH, since it`s physical location is >100km far away. After running 1.5 years, it has now some I/O-errors when writing into the database. In the syslog, I found the following entries:

Jan 28 12:24:33 raspberrypi kernel: [ 1573.144567] print_req_error: I/O error, dev mmcblk0, sector 12371948
Jan 28 12:24:41 raspberrypi kernel: [ 1581.336573] print_req_error: I/O error, dev mmcblk0, sector 12371949
Jan 28 12:24:49 raspberrypi kernel: [ 1589.528572] print_req_error: I/O error, dev mmcblk0, sector 12371950

These entries are also listed in dmesg.

The command df -l gives:

Filesystem     1GB-blocks  Used Available Use% Mounted on
/dev/root            32GB   6GB      25GB  18% /
devtmpfs              2GB   0GB       2GB   0% /dev
tmpfs                 3GB   0GB       3GB   0% /dev/shm
tmpfs                 3GB   1GB       3GB   1% /run
tmpfs                 1GB   1GB       1GB   1% /run/lock
tmpfs                 3GB   0GB       3GB   0% /sys/fs/cgroup
/dev/mmcblk0p1        1GB   1GB       1GB  21% /boot
tmpfs                 1GB   0GB       1GB   0% /run/user/1001

So I guess, that mmcblk0p1 is my SD-card. However when listing dev, I can see the following devices:mmcblk0 mmcblk0p1 mmcblk0p2

So:

  1. Is my SD-card really bricked? How can I doublecheck that?
  2. What exactly is mmblk0, mmcblk0p1 and mmcblk0p2?

Raspi Info: Linux raspberrypi 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux

Edit: Logging fsck at startup gives:

Jan 28 18:18:32 raspberrypi systemd-fsck[126]: e2fsck 1.44.5 (15-Dec-2018)
Jan 28 18:18:32 raspberrypi systemd-fsck[126]: Pass 1: Checking inodes, blocks, and sizes
Jan 28 18:18:44 raspberrypi systemd-fsck[126]: Pass 2: Checking directory structure
Jan 28 18:18:45 raspberrypi systemd-fsck[126]: Pass 3: Checking directory connectivity
Jan 28 18:18:46 raspberrypi systemd-fsck[126]: Pass 4: Checking reference counts
Jan 28 18:18:46 raspberrypi systemd-fsck[126]: Pass 5: Checking group summary information
Jan 28 18:18:46 raspberrypi systemd-fsck[126]: rootfs: 68411/1895552 files (0.5% non-contiguous), 1383640/7725184 blocks
Jan 28 18:18:46 raspberrypi systemd[1]: Started File System Check on Root Device.
Jan 28 18:18:48 raspberrypi systemd[1]: Starting File System Check on /dev/disk/by-partuuid/738a4d67-01...
Jan 28 18:18:49 raspberrypi systemd-fsck[253]: fsck.fat 4.1 (2017-01-24)
Jan 28 18:18:49 raspberrypi systemd-fsck[253]: /dev/mmcblk0p1: 232 files, 106929/516190 clusters
Jan 28 18:18:49 raspberrypi systemd[1]: Started File System Check on /dev/disk/by-partuuid/738a4d67-01.
Jan 28 18:19:19 raspberrypi systemd[1]: systemd-fsckd.service: Succeeded.
Seamus
  • 18,728
  • 2
  • 27
  • 57
moosehead42
  • 123
  • 3

2 Answers2

3

"After running 1.5 years..."

Hopefully, you have learned two things in 1.5 years:

  1. Keep a current backup - for example
  2. SD cards wear out & always fail eventually; there are many SD card Q&A here

"its physical location is >100km far away."

That will make things difficult. If you had physical access, you could un-mount (umount) the SD card, and run fsck on it. Here are a couple of things that may be worth trying. I can vouch for the first one, but have never tried the second:

  1. Run fsck on every boot & check the results

  2. A Tutorial fm SwitchDoc Labs

  3. NOTE: Both 1. & 2. above address checking the filesystem - not the media/SD card per se. I know of no reliable, non-destructive way to test the media; media testing seems to be a trial-and-error process. However, the fsck result informs a reasonable path forward.

How to proceed?

All of that said, the print_req_error error messages from var/log/syslog may be indicators of brewing trouble (ref @goldilocks cmt below). Here's one way to proceed:

  1. If your fsck looks clean, you can and should get serious about frequent, regular backups. A good backup solution for RPi is image-backup. It supports making both full image backups, and updating the same image file with incremental backups on a running system in very little time. See this post for a walk-through.

  2. It seems prudent to monitor your /var/log/syslog for recurrences of the print_req_error. One way to do this is to grep the file regularly, appending matches to an incident file, and then run uniq on that file. If you're particularly concerned, you can also search through your compressed syslog files with zgrep. Be especially alert for increasingly frequent errors.

"Other stuff":

A. IMHO, the best view of your file system is with this command:

# from my system

$ lsblk --fs

NAME        FSTYPE LABEL       UUID                                 FSAVAIL FSUSE% MOUNTPOINT
sda
└─sda1      ext4   PASSPORT2TB 86645948-d127-4991-888c-a466b7722f05    1.5T    10% /mnt/Passport2TB
mmcblk0
├─mmcblk0p1 vfat               6969-16D1                               206M    19% /boot
└─mmcblk0p2 ext4               f6ea6ef9-68be-479d-b447-5f76391cc02f   22.3G    19% /

With this command, you don't have do guess (incorrectly) that mmcblk0p1 is your SD-card. Your SD card is mmcblk0, and it has 2 partitions: mmcblk0p1 vfat (vfat file system, mounted at /boot), and mmcblk0p2 ext4 (ext4 file system, mounted at /; i.e. your operating system, etc).

B. Determination of actual failure vs Prediction of future failures:

There are (at least) two (2) closely-related Q&A here that address how to determine when an SD card needs to be replaced:

1.. One of the answers states, "fsck [is] useless for failure prediction". Based on what I know about "wear-leveling" in flash media, I believe this is a true statement.

2.. An answer was proposed, but the OP did not follow up to "close the loop". Personally, I do not understand how the proposed answer would identify an about-to-fail SD card in general.

I'll try to summarize what this means:

Writing to SD cards wears them out & inevitably causes failure. SD cards have an in-built controller that spreads the writes around to various storage locations so that all locations experience wearout in an even & gradual manner - this is called wear-leveling. As you can see, this sort of algorithm eventually results in all storage cells reaching end-of-life at approximately the same time. Unfortunately, while fsck is good at telling us about filesystem failures that have already occurred, there seems to be no way to accurately predict when the card will begin to fail.

Seamus
  • 18,728
  • 2
  • 27
  • 57
  • For sure I did regularly back-ups and was aware that the card might fail one day. But my question was more on how to infer if it really the SD-card that causes the troubles - is that what the error is saying? – moosehead42 Jan 28 '22 at 15:43
  • @moosehead42: Did you look at the logs as outlined in the link above? What did they tell you? AFAIK, the best discrete, definitive test for SD cards is `fsck`. Beyond that, you might check the SanDisk website for proprietary solutions. – Seamus Jan 28 '22 at 16:13
  • I did, posted the results of fsck above. But looks fine at all... Thanks for your help. – moosehead42 Jan 28 '22 at 17:23
  • 1
    *"the best discrete, definitive test for SD cards is fsck"* -> **No**, `fsck` is a test for **filesystems**. If the card is physically damaged and that corrupts the filesystem, fsck may fix the filesystem, but it cannot fix the card. Also, if the card is physically damaged but the filesystem is not corrupted (eg., because you just ran `fsck` on it successfully), `fsck` will tell you everything is 100% fine, because "everything" just refers to a filesystem (hence the OP's check did not reveal any problems). – goldilocks Jan 28 '22 at 20:14
  • 1
    Further: The I/O error from those logs is not caused by filesystem corruption, it is a physical error trying to read/write raw data from/to the device. This may or may not lead to fs corruption or other problems. If it does, there's nothing you can do about it; SD cards are not repairable and although using something like `badblocks` on it may (or may not) mask the problem it will likely not last long. – goldilocks Jan 28 '22 at 20:14
  • @goldilocks: Wrt your first comment: The OP question concerned his SD card - not his filesystem. Just so I am clear, are you saying that errors in the media **will never** manifest themselves as errors in the filesystem? I understand that filesystem & media are different entities, but it seems to me that media failures can lead to filesystem errors. Am I wrong? – Seamus Jan 28 '22 at 21:34
  • @goldilocks: Wrt your 2nd comment: Is your conclusion that the I/O error is caused by a device/media error due to the device & sector data in the error message? I did a couple of searches trying to determine the meaning, but drew a blank. – Seamus Jan 28 '22 at 21:44
  • 1
    It is from the kernel, labelled as an I/O error, and refers to a physical address on a physical device, neither of which has anything to do with filesystems (filesystem errors including those from `fsck` use addresses but these are offsets using filesystem based units). You can search the kernel source for bits of the error text when in doubt. I didn't, but I've been looking at these kinds of things for years and will call it 99.999% sure here == I'd bet a case of Islay magic elixir on this ;) – goldilocks Jan 28 '22 at 22:26
  • @goldilocks: OK, good enough for me. I'll make another revision incorporating all of this - I'd appreciate any feedback. – Seamus Jan 28 '22 at 22:40
1

I agree with most of the other Answer and Comments. If you get messages about defective sectors the card is faulty and attempting to repair the FS won't help.

All is not lost, although the fix is not simple. If you use the SD Association formatter and perform a low level format, the SD Card may be usable, with possibly reduced capacity. Ultimately problems will recur because the underlying NAND is failing but I have had some success this way.

SD Cards use NAND memory which is usually managed in 4MB blocks, which are mapped by SD Card firmware to the blocks used by the FS. If only a small number of 4MB block fail, they are removed from the pool by the formatter and the resultant may last for some time - although replacing with a new SD Card is a better strategy.

I have a couple of SD Cards I have "repaired" and use for non critical tasks.

Milliways
  • 54,718
  • 26
  • 92
  • 182