0

I've a strange issue with my raspberry pi 3: The raspberry is working fine, after a while the raspberry stops working for about 5 hours. Afterwards the raspberry recovers itself (without rebooting!) and is working fine again. This happens about every 3 days.

What can block the raspberry? How to prevent this blocking? Any Ideas?

Some more information:

I've logged the CPU Temperature (by vcgencmd measure_temp). Temperature What's strange in the temparature Log: When the raspberry become working again, the temperature is falling down. Possible Ideas for this:

  • Idea A: When raspbery becomes working again, a "lot of stuff" has to be done. This lead to high cpu load and high cpu temperature. Logging start is done only after a "lot of stuff" is done.
  • Idea B: "Something" brings the Raspberry to High CPU Load (and high Temp) and blocks all other stuff. After this "Something" can be resolved, the cpu load goes normal und the temperature can be falling down to normal temp. What can be this "something"? Blocked USB Controller?

Attached to the Raspberry:

  • USB LTE Modem
  • USB Serial Converter

What can block the raspberry? How to prevent this blocking? Any Ideas?

(There is a watchdog, but I want to solve the source of the problem)

powerpete
  • 161
  • 3
  • 1
    Too little information. You need to add diagnostics to try to home in to what is happening. Could the polyfuse be tripping and then recovering? – joan Jun 14 '18 at 09:26
  • @joan: no reboot is done. Crazy – powerpete Jun 14 '18 at 09:31
  • You've labeled the temperature spikes in your graph as "high CPU load". How have you determined that? First guess is some sort of a hardware issue, but it could be anything really. And to echo @joan's comment, you'll either need more information to resolve this, or "get lucky". Here's a [related post that has some things to try.](https://raspberrypi.stackexchange.com/questions/75822/how-can-i-troubleshoot-kernel-panics) – Seamus Jun 14 '18 at 17:24
  • For some test i've create some cpu load. Resulting in "high cpu load" – powerpete Jun 18 '18 at 06:01
  • What operating system you are using? What software do you have installed? Have you tested with a fresh flashed [Raspbian Stretch Lite 2018-04-18](https://www.raspberrypi.org/downloads/raspbian/)? Does freezing also occurs with it? – Ingo Jun 18 '18 at 17:59

2 Answers2

1

There are a few things I would do if I were you:

  1. Get a decent power supply, like 5-10A on 5V well stabilized, not "el-cheapo" wallwart, that gives you 4.5V @ high loads.

  2. Disconnect USB periferals one by one and check if the behaviour still persists. Start from LTE modem, these might be quite power hungry.

  3. Once all USB devices are disconnected and the power supply is good and stable, but bad things still happen, wrap your device in the plastic and put into the ice water. Or use dry ice, or whatever, to avoid those spikes above 70`C.

  4. Make sure your device does not actually reboot. It's hard to watch the screen for 4-5hours straight. So open an SSH connection over the net and see if it gets closed once your RPi is frozen.

From where I sit, it looks like the high load leads to browning out on power, then eventually cooling down and rebooting without you noticing. Would be glad to hear where I'm wrong =)

lenik
  • 11,503
  • 1
  • 29
  • 37
  • Reboot without noticing is not my case. `uptime -s` say no reboot. In addition: I've started the temp logging manual after a boot and it is still alive. – powerpete Jun 26 '18 at 06:31
  • ok, reboot is not on the table anymore, how about the rest of 1..3 things? =) – lenik Jun 26 '18 at 07:33
  • for 1): power supply is 20W DC-DC. 2) Not Checked yet. problem occurs only each 1-3 days ;=( – powerpete Jun 26 '18 at 07:37
  • @powerpete well. if the problem occurs only rarely, use the opposite approach -- load up USB bus with more devices ( portable HDD is my favourite =), increase the ambient temperature and find the conditions that make it fail. then we can think how to deal with that. – lenik Jun 26 '18 at 08:40
0

Perhaps it helps someone else.

In addition to the temperature, I've logged the free system memory. This helped me to find my issue (I hope so ;-) :

Issue is not related to USB. I had a process with a memory leak. Over all the days, the free system memory sunk down to about 35 MB.

  1. At 35Mb The system stayed stable for some hours
  2. afterward it gets "unstable"/"blocked" for some hours.
  3. afterwards the "mem leaked process" killed himself ...
  4. ... and the system runs normal (with a lot of free memory again)

Some additional Info: swap is disabled on my system. looking with top on my system shows thatkswapd has consumed a lot of CPU time.

powerpete
  • 161
  • 3