7

I have a Raspberry Pi in a remote location running from a battery charged by a solar panel and having Sleepy Pi starting it in every hour to run for a few minutes to snap some pictures, make some measurements and upload those.

The problem is, that fairly frequently (in about 2-7 days of usage) the SD card gets damaged and needs to be replaced. First I thought, that some kind of an issue with writing data to the SD card when power goes out, so I made all partitions to be mounted as read-only and all writing happens to RAM-drives only, but the SD card corruption keeps happening.

Question is, how can an all read-only SD card keep getting corrupted?

Actually I'm swapping two cards and happening with both, so probably not a card issue. Cards are of the same type, but bought at different times so likely different production batch (G.Skill 32Gb Class 10 MicroSDHC Flash Card with SD Adapter (FF-TSDG32GA-C10), http://www.amazon.com/gp/product/B007MO0YAI/ref=oh_aui_detailpage_o03_s00?ie=UTF8&psc=1)

Below is my fstab file:

proc            /proc   proc      defaults   0   0
/dev/mmcblk0p5  /boot   vfat      ro         0   0
/dev/mmcblk0p6  /       ext4      ro         0   0
/dev/mmcblk0p7  /home   ext4      ro         0   0  none
/var/run        ramfs   size=5M              0   0  none
/var/log        ramfs   size=50M             0   0

EDIT: To clarify some points pointed out by goldilocks:

  1. There are two SD cards (same type but purchased at different times, so common production issue is unlikely)

  2. The SD cards get written with DD from the same image after every corruption, so when the next corruption happens they just get swapped out - as such it is always the same 2 cards getting rotated.

  3. I don't know why the raspberry doesn't boot, as this is a headless system and only the maintenance crew has occasional access to it. I have asked them to take an image (dd) of a damaged card before they would reload it from the backup image and upload it to me. I will take a look at it when I receive it, maybe it will help me to identify at what point the boot fails.

  4. No, I'm not running fsck on the cards, they get reloaded completely from the backup image using dd.

  5. Both cards were bought for purpose, so they are unlikely to be worn-out.

  6. While I can't say for sure that this wasn't a corruption due to low voltage, the last time it happened it has happened when the battery was at 98%, the sun was up (so the solar was supplying power too), so it is unlikely that a low voltage scenario would have happened at least at this time.

Sparhawk
  • 684
  • 4
  • 18
  • 33
Zoltan Fedor
  • 171
  • 1
  • 3
  • One possible explanation is weather: it sounds like the device may be outdoors. It is winter, maybe the cold is not good for SD cards? – Kryten Jan 13 '15 at 17:05
  • It is indoors, but there is no heating, so temps do go down below freezing. The system actually logs temperatures and I don't see any correlation between the corruptions and the temperature. There was -13C when it was running fine and then it get corrupted at +2C. – Zoltan Fedor Jan 13 '15 at 17:09
  • 1
    Just checked, the operating temperature of the very SD card I'm using is -25C - +85C (which seems to be typical for SD cards) [see http://www.gskill.com/en/product/ff-tsdg32ga-c10], so nothing points to being a temperature-related issue. – Zoltan Fedor Jan 13 '15 at 17:29
  • Are you sure that nothing gets auto-mounted anywhere in your system? – Alessandro Lai Jan 13 '15 at 17:31
  • 1
    I've used them in a pi at -20C before without issue. People have submerged running pis in liquid nitrogen and they work down to -100C. It's not the temperature. It's the brown-out. – goldilocks Jan 13 '15 at 17:35
  • Jean, re: "Are you sure that nothing gets auto-mounted anywhere in your system?" - I don't think so. I was looking for it and didn't find anything, but honestly I don't know a way to be 100% certain that there isn't some temporary auto-mounting / remounting somewhere. What is sure that there are no new partitions, only the ones in fstab mounted with ro. – Zoltan Fedor Jan 13 '15 at 17:49
  • New development on this problem, see here: http://raspberrypi.stackexchange.com/a/31391/5538 – goldilocks May 09 '15 at 18:06
  • @goldilocks Nitpick: liquid nitrogen has the boiling point of −195°C, so it's pretty much impossible to cool something to −100°C by submerging it in LN, and −195°C is too cold for general purpose electronics. Also, getting something to work for 5 minutes doesn't mean it's stable. – Dmitry Grigoryev Nov 14 '16 at 11:21
  • @DmitryGrigoryev http://www.geek.com/chips/raspberry-pi-proven-to-be-stable-when-submerged-in-liquid-nitrogen-1555235/ They do say the SD card was a concern when doing this, but it looks like it worked to me. – goldilocks Nov 14 '16 at 11:22
  • @goldilocks The RPi was inside a dried container / plastic bag, with no direct contact with LN. Not quite the same as submerging. More similar to https://en.wikipedia.org/wiki/Computer_cooling#/media/File:2007TaipeiITMonth_IntelOCLiveTest_Overclocking-6.jpg – Dmitry Grigoryev Nov 14 '16 at 11:24
  • Yes that is a nitpick ;) It does confirm what I claimed though, that at least for a few moments people have had them working at -100 °C. The general point here was that it should be fine in a -13 °C ambient environment. – goldilocks Nov 14 '16 at 11:25

3 Answers3

5

The contacts on the SD connector will bend an SD card, causing it to fail.This is especially true if you often exchange cards, However, some cards are less stiff than others and can bend more easily. We are doing development that requires swapping cards often, and this problem caused a lot of problems. The amount of bending is barely visible but can cause some pins to lose contact. We assumed that our application was causing corruption - not true. The B+ boards use micro SD and do not have this problem.

The SD cards tend to straighten after they are removed and allowed to sit in a warm room. You can test the card by pressing it down during bootup. If it boots when you are pressing it down, the card is bending.

The only reliable workaround that we have found is to use a low-profile microSD card adapter like this: http://www.adafruit.com/product/966 We suspect that many cases of 'corruption' are actually do to this problem.

Patrick Cook
  • 6,245
  • 5
  • 35
  • 61
alan baker
  • 51
  • 1
1

You could try adding this to the end of /etc/rc.local:

/bin/echo "-y" > /forcefsck

Which will run fsck -y (see man fsck) on the root filesystem early in the boot process. It will add 10-15 seconds to the boot time. You won't be able to do this this way on a read-only filesystem, obviously. You could try just permanently putting the file there, but I suspect this won't work because it happens with the fs unmounted, and is then removed (which is why it must be rewritten again later during boot via rc.local).

Of course, that's no help if the data on the card is so corrupt it cannot boot at all. I wonder what could be wrong?

  1. Both SD cards are defunct

    This would be a crazy coincidence, but not completely impossible, of course. Presuming you bought them new for this purpose, based on what you are saying about the purpose, they can't be worn out at this point, regardless of whether you used them ro or rw. Unless we consider the elephant in the room, possibility #2...

  2. Corruption due to low voltage

    Setting the card RO will prevent the chance of minor corruption due to sudden power loss -- this happens because the filesystem is left in an inconsistent state by the OS. You can also prevent it by running sync intermittently, or by using the sync mount option. In any case, this kind of corruption is:

    1. Unlikely to happen at all in the first place, unless the system is extremely busy constantly -- think, enterprise internet server. That's not the case here.

    2. Incredibly unlikely to result in a problem that leaves the system unbootable; I've never actually seen nor heard of such a case (although there are plenty of people who seem to think this happens to them, a meme which is very pernicious online WRT the pi and SD cards). Beyond that, using the /forcefsck mentioned earlier will deal with this possibility.

    Whatever's gone wrong here is not caused simply by the power suddenly dying. What it might well be caused by, though, is the slow drop in voltage that occurs when the power runs out. This presumably could cause problems on a hardware level, so setting the card RO won't make any difference.

    However, I could not find anything conclusive online regarding this possibility; some people claim that SD cards are not prone to this issue because they are built for battery powered devices.

I think you need to implement something that shuts the pi down when the voltage starts to drop. The new pi + versions have a brown-out detector that may help; while it won't cleanly shut down the OS, it will cut the power quickly rather than letting it slowly fade. As already described, sudden power loss is very very unlikely to cause any significant damage that can't be corrected with fsck. Note, however, that it's probably not a good practice in the long run since you may still occasionally loose some data (fsck does the best it can and will leave the filesystem consistent and usable).1 You need to attach a voltmeter with a chip that can message the OS via GPIO; there are various kinds of things like this for the pi available online.


1. "Consistent and usable" here means it can be mounted without error. Since this is the root filesystem, however, there's always the possibility that "data loss" includes something crucial. Again, though, events like this will be few and far between (guesstimate < 0.1% probability).

goldilocks
  • 56,430
  • 17
  • 109
  • 217
  • Thanks for the detailed answer - let me start by clarifying the parts you pointed out not being clear. Yes, I have two cards and I have a backup image of the whole content (dd of the whole card), so when one dies then the maintenance crew just replaces it with the other card which has the original image copied onto it. Unfortunately it is a headless system, so the maintenance crew doesn't know what is wrong - what is certain that it doesn't boot. They took an image of the card when last corrupted and uploading it now, so I will take a look - maybe will provide more info re where it fails. – Zoltan Fedor Jan 13 '15 at 17:46
  • If you find out something new and edit these details into the question, leave a comment here and I'll reopen this. – goldilocks Jan 13 '15 at 17:58
  • Thanks, I have edited these details into the question, could you please reopen it? I'm still waiting to receive the image of the damaged card, after that I might have further details. – Zoltan Fedor Jan 13 '15 at 18:08
  • Could not find much support for my "low voltage" thesis online. But corruption just from yanking the power is definitely too unusual to cause this kind of problem repeatedly, especially if the filesystems are read-only. – goldilocks Jan 13 '15 at 19:05
  • Regarding your suggestion adding /bin/echo "-y" > /forcefsck to /etc/rc.local, this could only work if the file system is not read-only. Obviously I can remount in rc.local to be read-write, do the forcefsck then remount again as read-only, I'm just worried that remounting read-write even for fsck would increase the probability of filesystem damage. Or did I misunderstood something? Did you mean something else with forcefsck? – Zoltan Fedor Jan 14 '15 at 02:12
  • Yeah -- I added a bit about that conundrum when I edited earlier. I'm not sure how this would work out on a RO system. Obviously you can leave a `/forcefsck` file there; the point of the `echo` at boot is to replace it, since post-fsck it will be removed (that's the mechanism). I'm guessing that will happen regardless of whether you intended the partition mounted RO or RW (I've never looked into the mechanism). But based on your description, it's hard to imagine what increasing "the probability of filesystem damage" would amount to... – goldilocks Jan 14 '15 at 02:37
  • ...Aren't you essentially at 100% probability already? I have a pi that's been on pretty much 24/7 for a few years (same SD card) without corruption, and I don't think that is unusual. I try to avoid cutting the power on them, but it does happen from time to time -- same thing, no problem. So if this is an issue for you on a weekly basis, there's some unusual factor at play. If you started down the read-only path *because* of this problem and yet *it hasn't made any difference,* perhaps it isn't a factor at all (i.e., you might as well just use read-write)... – goldilocks Jan 14 '15 at 02:38
  • ...This is not to say forcing fsck will help either (again, it does have to actually boot the kernel for that to mean anything). – goldilocks Jan 14 '15 at 02:38
  • I'm just curious: how did you diagnose for sure your Pi didn't boot since it's a headless system? –  Aug 08 '16 at 21:23
1

Mounting filesystem as read-only only prevents writes as long as the system is stable. You're telling the kernel not to write to a particular device, but in the case of a kernel crash or a brown-out (loss of electrical power) anything can happen - the code to write to the SD card is still there, and if it gets executed the contents of your card will likely be damaged.

If you want to make sure your SD card is read-only, you should write protect it, e.g. using sdtool

sudo sdtool /dev/mmcblk0 lock

Of course, you still need to keep the read-only settings in /etc/fstab, otherwise Linux will keep trying to write to the SD card, fail to do so and report all sorts of filesystem errors. Current Linux drivers seem to only understand the mechanical lock switch present on full-size SD cards, and fail to understand the locked status when no switch is present.

sdtool for the Raspberry Pi can be downloaded here.

Dmitry Grigoryev
  • 26,688
  • 4
  • 44
  • 133