4

When I use the ethernet interface with Raspbian it interrupts the connection for many seconds so streams will break and a ssh terminal stuck for minutes. For me the ethernet interface is not usable with this behavior. I've looked for this since weeks but wasn't able to fix it. I'm wondering if other people have the same problem. Wifi works flawlessly.

My question is:
What can cause this behavior and how can I fix the ethernet connection so it works as reliable as wifi?

I use Raspbian Stretch Lite Version: November 2017 with sudo apt full-upgrade on Raspberry Pi 3 Model B. For testing this behavior I ping the raspis ethernet interface from a PC with this command:

pc ~$ ping -i 0.3 -qc 1000 192.168.10.111
PING 192.168.10.111 (192.168.10.111) 56(84) bytes of data.

--- 192.168.10.111 ping statistics ---
1000 packets transmitted, 836 received, 16% packet loss, time 300960ms
rtt min/avg/max/mdev = 0.304/0.344/0.809/0.032 ms

To do a permanent test I do that in a loop:

pc ~$ while true; do ping -i 0.3 -qc 1000 192.168.10.89 >> ping.log; done

I interrupt this with <CTRL>Z and kill %1. A result I get with:

pc ~$ grep -P "^1000 packets transmitted," ping.log

A typical output is something like:

1000 packets transmitted, 836 received, 16% packet loss, time 300960ms
1000 packets transmitted, 874 received, 12% packet loss, time 300685ms
1000 packets transmitted, 845 received, 15% packet loss, time 300913ms
1000 packets transmitted, 914 received, 8% packet loss, time 300381ms
1000 packets transmitted, 946 received, 5% packet loss, time 300105ms

The lost pings are not a uniform distribution over a time period. They stop for many seconds and then go on. This is a typical example with lost pings between icmp_req=224 and icmp_req=345:

64 bytes from 192.168.10.111: icmp_req=222 ttl=64 time=0.350 ms
64 bytes from 192.168.10.111: icmp_req=223 ttl=64 time=0.371 ms
64 bytes from 192.168.10.111: icmp_req=224 ttl=64 time=0.346 ms

64 bytes from 192.168.10.111: icmp_req=345 ttl=64 time=0.408 ms
64 bytes from 192.168.10.111: icmp_req=346 ttl=64 time=0.327 ms
64 bytes from 192.168.10.111: icmp_req=347 ttl=64 time=0.352 ms

Testing with wifi connection I have 0% packet loss.

When I ping from raspi to the PC it also works flawlessly without packet loss.


In the next time here I will document step by step the effort I have taken to find a solution but without success now.

Is this a problem only for me?

For this i can't give an answer and need your help. If you like please run the simple ping test for a while from another computer against the ethernet connection of your raspi and tell me if you have packet loss.

pc ~$ while true; do ping -i 0.3 -qc 1000 {raspi_ip _addr} >> ping.log; done
pc ~$ grep -P "^1000 packets transmitted," ping.log

Could it be that most people use wifi and therefore this failure is not detected? I've searched for notes to my problem and have found some questions with similar problems and no one has a solution.

Raspberry Pi Ethernet connection
ethernet internet connection stopped working
RaspBerry PI 2B loses ethernet connection
Why the Raspberry PI loses the ethernet connection? (also in comment @Fran Marzoa)
Ethernet connection drops after several seconds (comment from @Suncatcher May 17 '17 at 9:33)
Internet connection impossible through PC via Ethernet
Share WiFi connection through ethernet
Raspberry Pi 3 with CentOS no SSH without a keyboard

With google query raspberry pi stops responding to ssh I find many other hints to this problem.

Ensure it's not a problem with power supply

I use a dedicated power supply for raspberry pi with 5.0 V / 3.0 A output connected to the micro usb power connector on the raspi. Except the ethernet cable and a USB to TTL converter cable for the serial console there is nothing else connected to the raspi.

Ensure it's not a bad ethernet connection

I use high quality double shielded CAT 7 ethernet patch cable. To test the connection I replaced the raspi with a laptop and do the ping test: 10.000 packets transmitted, 10.000 received.

Ensure it's not a failure of the used hardware

I use the same new class 10 sd card with the test program on two different Raspberry Pi 3 Model B and on a Raspberry Pi 2 Model B. It doesn't make any difference. I always have the same packet loss.

Ensure it's not a bad sd card

I flashed following sd cards with the test program for the ping test:
sd card class 2, 4GB
sd card class 4, 4GB
sd card class 6, 16GB
sd card class 10, 8GB
sd card class 10, 16GB
It all makes no difference. I always get the packet loss.

Reproducible setup

With this combinations of possible parameter it is not easy to get a reproducible setup and it is not said that this is valid for your test environment. For me I found to have:

  • use a Raspberry Pi 3 Model B
  • use a class 6 (or greater) sd card
  • flash it with Raspbian Stretch Lite
  • update with sudo apt update && sudo apt full-upgrade. This will also update some firmware
  • use a power supply with at least output 5.0 V / 2.5 A. I use one with output 5 V / 3.0 A.

This will produce a fairly stable connection with warm up after boot. A typical measure looks like this:

1000 packets transmitted, 504 received, 49% packet loss, time 303659ms
1000 packets transmitted, 729 received, 27% packet loss, time 301849ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299707ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299699ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299699ms
....

Testing different operating systems

Here I found first hints on different behavior.

Linux From Scratch
Linux From Scratch on the Raspberry Pi and a bootstraped raspbian stretch worked without packet loss (5000 packets transmitted, 5000 received).

Debian Buster
Next I will try Debian Buster for Raspberry Pi.

wheezy
@Fran Marzoa commented in question Why the Raspberry PI loses the ethernet connection?:

Raspberry Pi B+ here. Never had this problem before with wheezy, but yesterday I upgraded to jezzie and begin having such problems. I've added the auto eth0 as suggested by thekiwi5000 but I guess that won't prevent from connections to be broken if it's drop while doing something (for example, transferring a file via SMB).

At this time I only have Raspberry Pi 3 for testing but oldoldstable raspbian wheezy does not run on a RPi3 out of the box. We have to use an updated version. Thanks to @Mike Redrobe who made an updated raspbian wheezy for RPi3. Therefore we have to take this test with caution. I downloaded and flashed it, made sudo apt-get update && sudo apt-get dist-upgrade, rebooted and tested. The result was very stable but not 100 %. From 106000 packets transmitted I got 2419 packet loss.

jessie lite
jessie is the successor of wheezy and the changeover to systemd. I installed latest version of 2019-07-05-raspbian-jessie-lite (for me: added enable_uart=1 in config.txt, deleted quiet in cmdline.txt). Ping-test after warm-up without packet loss. Then updated the installation:

pi ~$ sudo -Es
root ~# apt-get clean
root ~# rm /var/lib/apt/lists/*
root ~# apt update
root ~# apt full-upgrade
root ~# exit
pi ~$ sudo systemctl reboot

Now I get typical packet loss.

jessie
Installed latest version of 2017-07-05-raspbian-jessie, prepared like jessie lite. With ping-test I got typical packet loss no matter with or without full-upgrade.

ping-test results
. sd card
operating system class updated ping-test
----------------------------------------------
bootstraped stretch, 4 n.a. 100% response (5000 packets)
linux from scratch, 4 n.a. 100% response (5000 packets)
wheezy for RPi3, 4 yes 106000 packets transmitted, 2419 packet loss
jessie lite, 4 no warm-up
jessie lite, 4 yes typical packet loss
jessie lite, 10 no warm-up
jessie lite, 10 yes typical packet loss
jessie, 4 no typical packet loss
jessie, 4 yes typical packet loss

Problem with avahi-daemon

At next I looked what network services are running but I don't need. This are avahi and ipv6. By disabling these, I found some interesting results. To minimize side effects I switched over to a bootstraped installation. With this I found 100 % reproducible:

no avahi-daemon (default from bootstrap), no gmediarender, ipv6 disabled
Ping-test with result what I call stable

1000 packets transmitted, 1000 received, 0% packet loss, time 299699ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299697ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299699ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299699ms

avahi-daemon installed, no gmediarender, ipv6 disabled
rpi ~$ sudo apt install avahi-daemon
Ping-test with result what I call warm-up

1000 packets transmitted, 593 received, 40% packet loss, time 302923ms
1000 packets transmitted, 578 received, 42% packet loss, time 303065ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299699ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299699ms

avahi-daemon installed, no gmediarender, ipv6 enabled or
avahi-daemon deinstalled, gmediarender running, ipv6 enabled or
avahi-daemon deinstalled, gmediarender running, ipv6 disabled
Ping-test with result what I call unstable

1000 packets transmitted, 814 received, 18% packet loss, time 301156ms
1000 packets transmitted, 556 received, 44% packet loss, time 303229ms
1000 packets transmitted, 381 received, 61% packet loss, time 304641ms
1000 packets transmitted, 732 received, 26% packet loss, time 301819m

avahi-daemon deinstalled, no gmediarender ipv6 enabled
rpi ~$ sudo apt --autoremove purge avahi-daemon
Ping-test with result what I call stable

1000 packets transmitted, 1000 received, 0% packet loss, time 299699ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299697ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299699ms
1000 packets transmitted, 1000 received, 0% packet loss, time 299699ms

Result: Avahi-daemon is a problem. It makes the ethernet connection unstable and together with ipv6 unusable. But it seems it is not the cause. Thought I've found it I deinstalled avahi-daemon and installed gmediarender. Now the ethernet connection is unstable again, no matter if ipv6 is enabled or disabled. gmediarender also uses mdns (Multicast DNS Service Discovery) like avahi. Seems mdns with ethernet device is the problem.

What to do next

There are some major questions:

  • Belongs this all only to me?
    Can't this easily answered.
  • Why does it work with wifi?
    For a test to this I would use a wifi repeater which is connected through ethernet to the access-point (FRITZ!Box 7490) instead of direct connecting wifi to the FRITZ!Box. But for this I have to reconfigure my infrastructure with impact to other user. I would do it at last.
  • Is this specific to raspberry pi or is it a general problem with mdns?
    I have a workstation with kvm/libvirt. To eliminate outer noise I will setup two virtual machines with internal network only and test. That's the next I will do.
  • Analyse network traffic with tcpdump specific to mdns.

Summary

Seems there is a hard to find instability in the ethernet connection which is triggered by software, the more software installed the more packet loss and always after a full-upgrade. Seems also sd card class dosn't matter. The main difference between the working operating systems Linux From Scratch on the Raspberry Pi as well as wheezy and Raspbian Stretch Lite is the lack of systemd. bootstraped raspbian stretch has systemd but in a minimal version. Is there a component in systemd that can cause the connection interrupts of ethernet? What else can do that? But finding the problems with avahi and gmediarender suggests systemd is not the problem. Seems mdns is it.

Ingo
  • 40,606
  • 15
  • 76
  • 189
  • `Is this a problem only for me?` seems so - do you have any other USB devices attached (as the ethernet port on RPi is a USB device) that may be interfering with USB. Also, check your loadavg - perhaps some process is hogging all the CPU :p – Jaromanda X Feb 20 '18 at 03:27
  • @JaromandaX I wrote: _Except the ethernet cable and a USB to TTL converter cable for the serial console there is nothing else connected to the raspi._ `uptime` gives me: `11:52:41 up 14:19, 1 user, load average: 0,00, 0,00, 0,00`. But your comment points me in another direction. I'm just looking at `/proc/interrupts` etc. and at the [kernel parameter for dwc_otg](https://raspberrypi.stackexchange.com/q/1886/79866) – Ingo Feb 20 '18 at 11:57
  • yeah, sorry - bit of a scroller this question :p – Jaromanda X Feb 20 '18 at 12:05
  • Yes, scroller :-) It's also for myself to remember what I've done. Better here than in a private file. – Ingo Feb 20 '18 at 13:53
  • just food for thought: the other end of the ethernet cable, is it connected to a 100 Mbit/s port? Or is your router/switch/similar providing 1 Gbit/s there? In case you use a Fritz!Box: LAN4 should be configurable. You might want to change the LAN port for your RPi from 1G to 100M and monitor again. Munin could be useful. – Fabian Feb 22 '18 at 13:36
  • @Fabian The other end of the cable is connected to a semiprofessional configurable 52-port switch. I've set the port to the raspi to any combination from full-/halfduplex 10MBit/100MBit. It doesn't matter. At the time it is set to autoconfigure. Journalctl on raspi tells me it is connected with 100MBit/fullduplex. Have added "What to do next" to my question. Don't know about `munin` before. Thanks for the tip. Will have a look at it, but is a little bit heavy for this, isn`t it? – Ingo Feb 22 '18 at 23:46
  • I don't see any mention of the possibility of a duplicate IP address on your home network. As a simple test, when the problem is occurring, turn off the RPi. On another computer, ping the IP address the RPi was using. Ignore the ping results (unless the ping receives a reply). After the ping, check the ARP cache for a MAC address associated with that IP address. If you see a MAC address that is not from your RPi, it is using your RPi's IP address. MAC addresses contain vendor codes, so they can be used to help identify the device using the same IP address. – Chad Farmer Mar 02 '18 at 19:57
  • @ChadFarmer Thanks for the tip :-) I've tested that way but cannot find a duplicate IP address. – Ingo Mar 05 '18 at 15:39
  • Good work @Ingo eliminating variables. I'm seeing something somewhat similar. No avahi or gmediarender - but I see almost a complete packet transmit failure by my rpi3B+ (recompiled raspbian 4.14.31 on the pi) until I run tcpdump on the pi (ie. enable promiscuous). Ssh connections to the pi also timeout until promisc is enabled. My workaround: a shell running `# tcpdump -i eth0 port 9999` fixes everything. netstat shows only dnsmasq, ntpd, and sshd are listening. This is on a private net: two machines and only an 8 port Netgear home switch. Hope this little bit of info is useful... – duanev Apr 08 '18 at 06:58

1 Answers1

1
  • It is a problem with gmediarender in conjunction with multicast
  • It is not a specific problem to raspberry pi (so the question should migrated?). With knowing the reason I was able to reproduce it on my laptop.
  • It occurs in my configuration for multicast but can also be a problem for others when they use gmediarender together with igmp snooping.

I haven't told that I use multicast IPTV streaming in my home network with a configurable smart switch that can handle this. Most ethernet ports belong to a multicast VLAN with igmp snooping enabled so clients can easily drop in a port and look TV, e.g. with VLC.

I've looked at the problem with gmediarender because this produced clear instability. Stripping down the problem step by step I come to look with tcpdump during ping test and raspi direct connected to the switch. I saw that there are pauses in ICMP echo request when gmediarender sends an IGMP v3 report (don't know why this is send). Here is a typical trace (192.168.10.121 address of the raspi):

rpi3 ~$ sudo tcpdump -n igmp or icmp
[...]
23:59:26.416336 IP 192.168.10.4 > 192.168.10.121: ICMP echo request, id 7011, seq 79, length 64
23:59:26.416357 IP 192.168.10.121 > 192.168.10.4: ICMP echo reply, id 7011, seq 79, length 64
23:59:26.674095 IP 192.168.10.121 > 224.0.0.22: igmp v3 report, 1 group record(s)

23:59:44.258292 IP 192.168.10.4 > 192.168.10.121: ICMP echo request, id 7011, seq 137, length 64
23:59:44.258357 IP 192.168.10.121 > 192.168.10.4: ICMP echo reply, id 7011, seq 137, length 64
23:59:44.558261 IP 192.168.10.4 > 192.168.10.121: ICMP echo request, id 7011, seq 138, length 64
23:59:44.558284 IP 192.168.10.121 > 192.168.10.4: ICMP echo reply, id 7011, seq 138, length 64
[...]

As you can see there is a pause of 18 seconds in the ethernet connection.

To avoid this connection interrupts I have to eliminate the effect of sending igmp v3 report to multicast. I can do:

  • on the switch remove the port, the raspi is connected to, from the multicast VLAN
  • on the switch disable IGMP snooping on the port, the raspi is connected to.
  • on the raspi block sending igmp v3 report with iptables:
    rpi3 ~$ sudo iptables -I OUTPUT -d 224.0.0.22 -j DROP

Don't know if this is a problem with gmediarender or protocol related or an issue with my switch.

Ingo
  • 40,606
  • 15
  • 76
  • 189
  • As you found out, *mdns* is most likely the culprit. It uses multicast packets, and these need proper hardware support (or promiscous mode). I think it's one of the things not well tested with the Pi and/or the distribution kernel you are using. – Janka Feb 27 '18 at 13:04
  • Is electronic contact spray the solution for you too, like for Philippe Gachoud? Btw: I see outstanding packet drops for all our RPis on Munin if connected to a 1GBit/s router/switch, not if the router/switch provides exactly 100MBit/s. – Fabian Feb 27 '18 at 14:36
  • @Fabian contact spray doesn't help, also not Philippe Gachoud. Read its comment at his answer. No, nothing helps, no spray, no new sd card, no better power supply, no better cable, or what else was suggested. We all should accept that there is a real problem with the raspberry pi and its ethernet port since years. To show this is my goal. – Ingo Feb 27 '18 at 14:50
  • @Janka Aah yes, _promiscous mode_. Thanks for the tip! With this the ethernet port will get all packets on the line, not only that addressed to it. We have a heavy incoming load. And it explains different test results with same test setup but at different time. At TV prime time at 20 o'clock in my home network are running at least two IPTV multicast streams in HD quality and some radio streams. Then we have a real heavy load on the ethernet. At 3 o'clock in the night I will get good tests - all theoretically. I will test against promiscous mode. – Ingo Feb 27 '18 at 15:17
  • I think it's simply the Raspi breaks down under the heavy interrupt load then. The LAN9514 supports 64 multicast hashes and Linux use these, but maybe there's something fishy with it. – Janka Feb 27 '18 at 16:04
  • 2
    You could also test with a dumb switch instead, maybe your intelligent switch uses multicasting for internal purposes as well. – Janka Feb 27 '18 at 16:07
  • @Janka good idea but not instead my central switch. My hole home media will break down. I will put it into the connection to the raspi. Btw.: I test with `gmediarender`. That is what I want to use. I've looked, it doesn't enable _promiscous mode_. – Ingo Feb 27 '18 at 21:08