2

I have:

  • Raspi 3b+
  • 1st internet connection on eth0 through built in adapter
  • 2nd internet connection on eth1 through a USB-dongle

I followed the official manual and the AP is running just fine.

What I'm trying to do is routing the traffic though eth1 when no internet connection is available on eth0. It's working but with a very big latency and packets drops.

Case 1:

  • eth0 has internet

  • eth1 has internet

Result: everything works smoothly.

# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         192.168.1.254   0.0.0.0         UG    202    0        0 eth0
default         192.168.8.1     0.0.0.0         UG    205    0        0 eth1
192.168.1.0     0.0.0.0         255.255.255.0   U     202    0        0 eth0
192.168.8.0     0.0.0.0         255.255.255.0   U     205    0        0 eth1
192.168.253.0   0.0.0.0         255.255.255.0   U     303    0        0 wlan0

Case 2:

  • eth0 has no internet anymore

  • eth1 has internet

Result: big latency, packets drops.

# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         192.168.8.1     0.0.0.0         UG    205    0        0 eth1
192.168.1.0     0.0.0.0         255.255.255.0   U     202    0        0 eth0
192.168.8.0     0.0.0.0         255.255.255.0   U     205    0        0 eth1
192.168.253.0   0.0.0.0         255.255.255.0   U     303    0        0 wlan0

# cat /etc/dnsmasq.conf
resolv-file=/etc/resolv.dnsmasq.conf
interface=wlan0
    server=8.8.8.8
    server=8.8.4.4
dhcp-range=192.168.253.2,192.168.253.254,255.255.255.0,12h
dhcp-authoritative

# iptables-save
# Generated by iptables-save v1.6.0 on Sun May  5 18:44:06 2019
*nat
:PREROUTING ACCEPT [2637:573309]
:INPUT ACCEPT [605:71308]
:OUTPUT ACCEPT [658:46686]
:POSTROUTING ACCEPT [10:1489]
-A POSTROUTING -o eth1 -j MASQUERADE
-A POSTROUTING -o eth0 -j MASQUERADE
COMMIT
# Completed on Sun May  5 18:44:06 2019
# Generated by iptables-save v1.6.0 on Sun May  5 18:44:06 2019
*filter
:INPUT ACCEPT [1667:192581]
:FORWARD ACCEPT [24823:15540031]
:OUTPUT ACCEPT [1590:161791]
-A FORWARD -i eth1 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i eth0 -o eth1 -j ACCEPT
COMMIT
# Completed on Sun May  5 18:44:06 2019

# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.103  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 zzzzzzzzzzzzzzz  prefixlen 64  scopeid 0x20<link>
        ether xx:xx:xx:xx:xx:xx  txqueuelen 1000  (Ethernet)
        RX packets 133500  bytes 132927619 (126.7 MiB)
        RX errors 0  dropped 12  overruns 0  frame 0
        TX packets 97296  bytes 63923420 (60.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.8.100  netmask 255.255.255.0  broadcast 192.168.8.255
        inet6 vvvvvvvvvvvvvvvv  prefixlen 64  scopeid 0x20<link>
        ether yy:yy:yy:yy:yy:yy  txqueuelen 1000  (Ethernet)
        RX packets 23672  bytes 11549930 (11.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 19765  bytes 10665918 (10.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 253  bytes 30503 (29.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 253  bytes 30503 (29.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.253.1  netmask 255.255.255.0  broadcast 192.168.253.255
        inet6 mmmmmmmmmmmmmmmmmmmm  prefixlen 64  scopeid 0x20<link>
        ether aa:aa:aa:aa:aa:aa  txqueuelen 1000  (Ethernet)
        RX packets 155008  bytes 100683215 (96.0 MiB)
        RX errors 0  dropped 8  overruns 0  frame 0
        TX packets 193745  bytes 187522062 (178.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

When the USB dongle is plugged into PC I have a smooth Internet connection, so it's not a problem of the dongle.

Could anyone please help me to figure out what's going on and how to fix that? Thanks in advance.


[UPDATE]

Update infos from comments on answer: Mean while I have another problem after configuring bonding for interfaces. Eth1 is the usb-modem, which becomes the Ethernet interface with usb_modeswitch. When this interface is not bonding - it becomes 'up' with ip address (common situation). When bonding - eth1 is down, however the eth0 is up. I believe, the problem comes from usb dongle router.

Definitely, this issue happens right after I disable dhcp on eth1. echo "denyinterfaces eth0 eth1" >> /etc/dhcpcd.conf

Maxim Ilin
  • 21
  • 2

2 Answers2

1

What you want to achieve is a typical fail-over scenario. You cannot simply use two connections in the hope the second one will be used successfully if the first one fails. It is no problem to have two connections, each with an ip address. The kernel will always only use one interface as its default route to the internet. And it will use that one with the lowest metric. In Case 1 it will use eth0 with metric 202 (lower than 205 for eth1) and with its source ip address 192.168.1.254.

If eth0 fails then the kernel has no problem to dynamically switch to eth1, the next available default route. And it uses the new source ip address 192.168.8.1.

And that is the problem. Any stateful TCP connection established with source ip 192.168.1.254 to whatever destination ip address will break. These are mostly ssh, any authenticated login sessions and maybe database connections, whatever is thinkable for a stateful connection.

This problem is solved by using bonding. This defines an intermediate interface bond0 that doesn't change its ip address. Only the underlaying slave interfaces eth0 and eth1 will switch the physical connection. How it works in principle you can look at Howto migrate from networking to systemd-networkd with dynamic failover. You may be able to implement it with classic networking. Or you decide to also use systemd-networkd and configure the access point with it using Setting up a Raspberry Pi as an access point - the easy way.

Ingo
  • 40,606
  • 15
  • 76
  • 189
  • 1
    That's a very good answer! [hat is off](https://media.giphy.com/media/l396VO7eqMcpD08ve/giphy.gif). :) – Seamus May 06 '19 at 00:16
  • Really thank you for pointing me up! Finally, I know how this technique called - bonding ))) Mean while I have another problem after configuring bonding for interfaces. Eth1 is the usb-modem, which becomes the Ethernet interface with usb_modeswitch. When this interface is not bonding - it becomes 'up' with ip address (common situation). When bonding - eth1 is down, however the eth0 is up. I believe, the problem comes from usb dongle router. How do you think, what shoud I dig to solve this issue? – Maxim Ilin May 06 '19 at 16:20
  • Definitely, this issue happens right after I disable dhcp on eth1. echo "denyinterfaces eth0 eth1" >> /etc/dhcpcd.conf – Maxim Ilin May 06 '19 at 16:30
  • @MaximIlin Sorry, but I can't help with classic networking and with **dhcpcd**. I'm not using it since years. In general is should be possible to bond any interface also that from your modem dongle. I don't know how does it get its ip address. The **bond0** interface should get the ip address instead. – Ingo May 06 '19 at 17:09
0

Finally I managed this 'failover' to be working!

As @Milliways said, the main problem was coming from multiple default gateways.

With suggestion of @Ingo I was trying to implement bonding, however as I was told here, this technique was impossible for my case.

So I just wrote several bash scripts for making my case real.


Task

  • 2 Internet providers: one by built-in ethernet connection (eth0), another one through USB modem dongle

  • eth0 is the first-priority channel, while USB modem is an emergency channel

  • when eth0 is available, we pass the whole traffic through it

  • when eth0 is unavailable, we switch to USB modem and moving on

  • as soon as eth0 is back online, we switch back.


# definitions
INTERFACE_FROM="wlan0"
ACCESSPOINT_IP="192.168.253"
INTERFACE_TO_0="eth0"
INTERFACE_TO_1="eth1"

mkdir -p /root/failover
cp "DIR_WITH_SCRIPTS (see below)" /root/failover
chmod 0755 -R --quiet /root/failover
sed -i "s/##INTERFACES##/$INTERFACE_TO_0 $INTERFACE_TO_1/g" /root/failover/modify_routes
sed -i "s/##INTERFACE_FROM##/$INTERFACE_FROM/g" /root/failover/modify_routes
sed -i "s/##INTERFACES##/$INTERFACE_TO_0 $INTERFACE_TO_1/g" /root/failover/checker
sed -i "s/##INTERFACE_FROM##/$INTERFACE_FROM/g" /root/failover/checker

cat > /etc/network/interfaces.d/10_$INTERFACE_TO_0 << EOF
auto $INTERFACE_TO_0
allow-hotplug $INTERFACE_TO_0
iface $INTERFACE_TO_0 inet dhcp
    dns-nameservers 8.8.8.8 8.8.4.4
    # With logging to custom log file
    #   post-up /root/failover/save_interface_data $INTERFACE_TO_0 >> /root/failover/log 2>&1
    # Logging by means of Linux
    post-up /root/failover/save_interface_data $INTERFACE_TO_0
metric 20
EOF

cat > /etc/network/interfaces.d/20_$INTERFACE_TO_1 << EOF
auto $INTERFACE_TO_1
allow-hotplug $INTERFACE_TO_1
iface $INTERFACE_TO_1 inet dhcp
    dns-nameservers 8.8.8.8 8.8.4.4
    pre-up sleep 20
    # With logging to custom log file
    #   post-up /root/failover/save_interface_data $INTERFACE_TO_1 >> /root/failover/log 2>&1
    # Logging by means of Linux
    post-up /root/failover/save_interface_data $INTERFACE_TO_1
metric 40
EOF

# With logging to custom log file
#   for i in $( seq 0 5 55 ); do (crontab -l ; echo "* * * * * sleep $i; /root/failover/checker >> /root/failover/log 2>&1") | crontab -; done
# Logging to /dev/null
for i in $( seq 0 5 55 ); do (crontab -l ; echo "* * * * * sleep $i; /root/failover/checker > /dev/null 2>&1") | crontab -; done

'save_interface_data' file

#!/bin/bash

set -e

echo -e "Save_interface_data: Start [$(date '+%Y-%m-%d %H:%M:%S')]"


INTERFACE_NAME=$1
INTERFACES_DATA_DIR='/root/failover/interfaces_data'

mkdir -p $INTERFACES_DATA_DIR

echo "Save_interface_data: Truncate $INTERFACES_DATA_DIR/current_active_interface"
> $INTERFACES_DATA_DIR/current_active_interface

INTERFACE_IP=''
INTERFACE_GATEWAY=''
echo "Save_interface_data: Truncate $INTERFACES_DATA_DIR/$INTERFACE_NAME"
> $INTERFACES_DATA_DIR/$INTERFACE_NAME


# Wait for IP
echo "Save_interface_data: [$INTERFACE_NAME] Wait 20 seconds for the IP address obtained"
SUCCESS="false"
end=$((SECONDS+20))
while [ $SECONDS -lt $end ]; do
    printf "."
    INTERFACE_IP=$((ip address show dev $INTERFACE_NAME 2>/dev/null || echo "") | (grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' || echo "") | (grep -Eo '([0-9]*\.){3}[0-9]*' || echo "") | grep -v '127.0.0.0')
    if [[ !  -z  "$INTERFACE_IP"  ]]; then
        SUCCESS="true"
        break
    fi
    sleep 1
done
echo ""
if [ "$SUCCESS" = "true" ]; then
    echo "Save_interface_data: [$INTERFACE_NAME] Ok"
    echo "INTERFACE_IP=$INTERFACE_IP" >> $INTERFACES_DATA_DIR/$INTERFACE_NAME
else
    echo "Save_interface_data: [$INTERFACE_NAME] Can not get the IP address of within 20 seconds.\Aborting."
    exit 0
fi


# Wait for Gateway
echo "Save_interface_data: $INTERFACE_NAME: Wait 20 seconds for the Gateway address obtained"
SUCCESS="false"
end=$((SECONDS+20))
while [ $SECONDS -lt $end ]; do
    printf "."
    INTERFACE_GATEWAY=$(ip route | (grep -Eo "default(.*?)dev(.*?)$INTERFACE_NAME" 2>/dev/null || echo "") | (grep -Eo '([0-9]*\.){3}[0-9]*' || echo "") | head -1)
    if [[ !  -z  "$INTERFACE_GATEWAY"  ]]; then
        SUCCESS="true"
        break
    fi
    sleep 1
done
echo ""
if [ "$SUCCESS" = "true" ]; then
    echo "Save_interface_data: [$INTERFACE_NAME] Ok"
    echo "INTERFACE_GATEWAY=$INTERFACE_GATEWAY" >> $INTERFACES_DATA_DIR/$INTERFACE_NAME
else
    echo "Save_interface_data: [$INTERFACE_NAME] Can not get the Gateway address within 20 seconds.\nAborting."
    exit 0
fi

echo "Save_interface_data: End"

'modify_routes' file

#!/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

set -e

echo -e "Modify_routes: Start [$(date '+%Y-%m-%d %H:%M:%S')]"


THIS_SCRIPT=`realpath $0`

if [ -f "$THIS_SCRIPT-RUNNING" ]; then
    echo "Modify_routes: script already running.\nAborting."
    exit 0
fi



echo "Modify_routes: Mark script as running"
touch "$THIS_SCRIPT-RUNNING"


WATCH_INTERFACES=( ##INTERFACES## )
# WATCH_INTERFACES=( eth0 eth1 )
INTERFACE_FROM="##INTERFACE_FROM##"
# INTERFACE_FROM="wlan0"
INTERFACES_DATA_DIR='/root/failover/interfaces_data'
# DONT_SAVE_current_active_interface=$1


(
    # Reset current_active_interface
    if [ ! -f "$INTERFACES_DATA_DIR/current_active_interface" ]; then
        echo "Modify_routes: Touch $INTERFACES_DATA_DIR/current_active_interface"
        touch $INTERFACES_DATA_DIR/current_active_interface
    else
        echo "Modify_routes: Truncate $INTERFACES_DATA_DIR/current_active_interface"
        truncate -s 0 $INTERFACES_DATA_DIR/current_active_interface
    fi


    echo "Modify_routes: Remove all 'default' gateways"
    while [[ !  -z  $(ip route | (grep -Eo "default" 2>/dev/null || echo "")) ]]; do
        route delete default
    done


    FOUND="false"

    echo "Modify_routes: Find first available interface and set new default gateway"
    for INTERFACE_NAME in "${WATCH_INTERFACES[@]}"
    do
        echo "Modify_routes: Checking $INTERFACE_NAME"

        iptables -t nat -D POSTROUTING -o $INTERFACE_NAME -j MASQUERADE 1>/dev/null 2>&1 || echo '' > /dev/null

        if [ "$FOUND" = "false" ]; then
            echo "Modify_routes: Nothing found so far, so move on!"

            echo "Modify_routes: Source $INTERFACES_DATA_DIR/$INTERFACE_NAME"
            [ -f "$INTERFACES_DATA_DIR/$INTERFACE_NAME" ] && source "$INTERFACES_DATA_DIR/$INTERFACE_NAME"

            # Variables coming from "$INTERFACES_DATA_DIR/$INTERFACE_NAME" file
            #   $INTERFACE_IP
            #   $INTERFACE_GATEWAY

            echo "Modify_routes: Pinging $INTERFACE_GATEWAY through $INTERFACE_NAME"
            if ping -q -c 1 -W 1 -I $INTERFACE_NAME $INTERFACE_GATEWAY >/dev/null 2>&1; then
                echo "Modify_routes: Good!"

                echo "Modify_routes: Add default gateway $INTERFACE_GATEWAY"
                route add default gw "$INTERFACE_GATEWAY"

                echo "Modify_routes: Mark as found"
                FOUND="true"
                iptables -t nat -A POSTROUTING -o $INTERFACE_NAME -j MASQUERADE

                echo "Modify_routes: Set new active interface $INTERFACE_NAME"
                echo "$INTERFACE_NAME" > $INTERFACES_DATA_DIR/current_active_interface
                # fi
            fi
        else 
            echo "Modify_routes: Skipping"
        fi
    done
) || echo "Modify_routes: Error occurred while running $THIS_SCRIPT.\nAborting."


echo "Modify_routes: Remove running flag"
rm -f "$THIS_SCRIPT-RUNNING"

echo "Modify_routes: End"

'checker' file

This script is being executed by cron every 5 seconds

#!/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

set -e

echo -e "Checker: Start [$(date '+%Y-%m-%d %H:%M:%S')]"


WATCH_INTERFACES=( ##INTERFACES## )
# WATCH_INTERFACES=( eth0 eth1 )
INTERFACE_FROM="##INTERFACE_FROM##"
# INTERFACE_FROM="wlan0"
INTERFACES_DATA_DIR='/root/failover/interfaces_data'
ACTIVE_INTERFACE=$(cat $INTERFACES_DATA_DIR/current_active_interface || echo "")


# Remove all defaults except one
remove_all_default_gateways_except_one() {
    local GW=$1

    echo "Checker: Remove all defaults except $GW"

    for tmp_GW in $(ip route | (grep -Eo "^default(.*?)" || echo "") | (grep -Eo '([0-9]*\.){3}[0-9]*' || echo "") | grep -v '127.0.0.0' )
    do
        echo "Checker: Found gateway $tmp_GW"
        if [ "$tmp_GW" != "$GW" ]; then
            echo "Checker: This gateway ($tmp_GW) is not $GW"
            route delete default gw "$tmp_GW" 2>/dev/null || echo "Checker: Remove error (unknown gateway [$tmp_GW])"
            echo "Checker: Gateway ($tmp_GW) removed"
        else
            echo "Checker: Skipping"
        fi
    done
}


# If empty $ACTIVE_INTERFACE
if [[ -z  "$ACTIVE_INTERFACE" ]]; then
    echo "Checker: Active interface is empty"
    /root/failover/modify_routes
    echo "Checker: End"
    exit 0
fi



echo "Checker: Try to get back to 1st-priority interface"
if [ "$ACTIVE_INTERFACE" != "${WATCH_INTERFACES[0]}" ]; then
    echo "Checker: Active interface is not the 1-st priority one (${WATCH_INTERFACES[0]})"

    echo "Checker: Source $INTERFACES_DATA_DIR/${WATCH_INTERFACES[0]}"
    [ -f "$INTERFACES_DATA_DIR/${WATCH_INTERFACES[0]}" ] && source "$INTERFACES_DATA_DIR/${WATCH_INTERFACES[0]}"

    echo "Checker: Pinging $INTERFACE_GATEWAY through ${WATCH_INTERFACES[0]}"
    if ping -q -c 1 -W 1 -I "${WATCH_INTERFACES[0]}" $INTERFACE_GATEWAY >/dev/null 2>&1; then
        echo "Checker: Good!"
        /root/failover/modify_routes
        echo "Checker: End"
        exit 0
    else
        echo "Checker: Bad"
    fi
else
    echo "Checker: Active interface is already the 1-st priority one (${WATCH_INTERFACES[0]})"
fi



echo "Checker: Common case for currently active interface"
echo "Checker: Source $INTERFACES_DATA_DIR/$ACTIVE_INTERFACE"
[ -f "$INTERFACES_DATA_DIR/$ACTIVE_INTERFACE" ] && source "$INTERFACES_DATA_DIR/$ACTIVE_INTERFACE"

echo "Checker: Pinging $INTERFACE_GATEWAY through $ACTIVE_INTERFACE"
if ping -q -c 1 -W 1 -I $ACTIVE_INTERFACE $INTERFACE_GATEWAY >/dev/null 2>&1; then
    echo "Checker: Good!"
    remove_all_default_gateways_except_one $INTERFACE_GATEWAY
    echo "Checker: End"
    exit 0
else 
    echo "Checker: Bad"
    /root/failover/modify_routes
    echo "Checker: End"
    exit 0
fi

Cheers!

Maxim Ilin
  • 21
  • 2
  • What an effort! Something like a [Rube Goldberg machine](https://en.wikipedia.org/wiki/Rube_Goldberg_machine). You was told wrong things that you cannot use bonding for your problem. It is a classical use case for bonding **Mode 1 (active-backup)**. I have shown you an example. – Ingo May 12 '19 at 21:06
  • "It is a classical use case for bonding" - I'm of the same opinion! However I could not make it working. Unfortunately I don't have enough background in linux administration. – Maxim Ilin May 13 '19 at 09:09
  • You could use my example. Then I can help you. – Ingo May 13 '19 at 09:15