TCP hole punching

TCP NAT traversal and TCP hole punching (sometimes NAT punch-through) in computer networking occurs when two hosts behind a network address translation (NAT) are trying to connect to each other with outbound TCP connections. Such a scenario is particularly important in the case of peer-to-peer communications, such as Voice-over-IP (VoIP), file sharing, teleconferencing, chat systems and similar applications.

TCP hole punching is an experimentally used NAT traversal technique for establishing a TCP connection between two peers on the Internet behind NAT devices. NAT traversal is a general term for techniques that establish and maintain TCP/IP network and/or TCP connections traversing NAT gateways.

Terminology

In the following, the terms host, client and peer are used almost interchangeably.

local endpoint, internal endpoint: the local IP:port as seen locally by the host and the internal part of the NAT.
public endpoint, external endpoint: the external IP:port mapped by the NAT, as seen by the network and the external part of the NAT.
remote endpoint: the IP:port of the other peer as seen by the network, or the external parts of both NATs.

Description

NAT traversal, through TCP hole punching, establishes bidirectional TCP connections between Internet hosts in private networks using NAT. It does not work with all types of NATs, as their behavior is not standardized. When two hosts are connecting to each other in TCP, both via outbound connections, they are in the "simultaneous TCP open" case of the TCP state machine diagram.[1]

Network Drawing

Peer A ←→ Gateway A (NAT-a) ← .. Network .. → Gateway B (NAT-b) ←→ Peer B

Types of NAT

The availability of TCP hole punching depends on the type of computer port allocation used by the NAT. For two peers behind a NAT to connect to each other via TCP simultaneous open, they need to know a little bit about each other. One thing that they absolutely need to know is the "location" of the other peer, or the remote endpoint. The remote endpoint is the data of the IP address and a port that the peer will connect to. So when two peers, A and B, initiate TCP connections by binding to local ports Pa and Pb, respectively, they need to know the remote endpoint port as mapped by the NAT to make the connection. When both peers are behind a NAT, how to discover the public remote endpoint of the other peer is a problem called NAT port prediction. All TCP NAT traversal and hole punching techniques have to solve the port prediction problem.

A NAT port allocation can be one of the two:

predictable: the gateway uses a simple algorithm to map the local port to the NAT port. Most of the time a NAT will use port preservation, which means that the local port is mapped to the same port on the NAT.
non predictable: the gateways use an algorithm that is either random or too impractical to predict.

Depending on whether the NATs exhibit a predictable or non-predictable behavior, it will be possible or not to perform the TCP connection via a TCP simultaneous open, as shown below by the connection matrix representing the different cases and their impact on end-to-end communication:

	A predictable	A non-predictable
B predictable	YES	YES
B non-predictable	YES	NO

YES: the connection will work all the time
NO: the connection will almost never work

Techniques

Methods of Port Prediction (with predictable NATs)

Here are some of the methods used by NATs to allow peers to perform port prediction:

The NAT assigns to sequential internal ports sequential external ports: If the remote peer has the information of one mapping, then it can guess the value of subsequent mappings. The TCP connection will happen in two steps, at first the peers make a connection to a third party and learn their mapping. For the second step, both peers can then guess what the NAT port mapping will be for all subsequent connections, which solves port prediction. This method requires making at least two consecutive connections for each peer and requires the use of a third party. This method does not work properly in case of Carrier-grade NAT with a lot of subscribers behind each IP addresses, as only a limited number of ports are available and allocating consecutive ports to the same internal host may be impractical or impossible.

The NAT uses the port preservation allocation scheme: the NAT maps the source port of the internal peer to the same public port. In this case, port prediction is trivial, and the peers simply have to exchange the port to which they are bound through another communication channel (such as UDP, or DHT) before making the outbound connections of the TCP simultaneous open. This method requires only one connection per peer and does not require a third party to perform port prediction.

The NAT uses "endpoint independent mapping": two successive TCP connections coming from the same internal endpoint are mapped to the same public endpoint. With this solution, the peers will first connect to a third party server that will save their port mapping value and give to both peers the port mapping value of the other peer. In a second step, both peers will reuse the same local endpoint to perform a TCP simultaneous open with each other. This unfortunately requires the use of the SO_REUSEADDR on the TCP sockets, and such use violates the TCP standard and can lead to data corruption. It should only be used if the application can protect itself against such data corruption.

Details of a typical TCP connection instantiation with TCP Hole Punching

We assume here that port prediction has already taken place through one of the methods outlined above, and that each peer knows the remote peer endpoint. Both peers make a POSIX connect call to the other peer endpoint. TCP simultaneous open will happen as follows:

- Peer A sends a SYN to Peer B
- Peer B sends a SYN to Peer A
- When NAT-a receives the outgoing SYN from Peer A, it creates a mapping in its state machine.
- When NAT-b receives the outgoing SYN from Peer B, it creates a mapping in its state machine.
Both SYN cross somewhere along the network path, then:
- SYN from Peer A reaches NAT-b, SYN from Peer B reaches NAT-a
- Depending on the timing of these events (where in the network the SYN cross),
- at least one of the NAT will let the incoming SYN through, and map it to the internal destination peer
Upon receipt of the SYN, the peer sends a SYN+ACK back and the connection is established.

Interoperability requirements on the NAT for TCP Hole Punching

Other requirements on the NAT to comply with TCP simultaneous open

For the TCP simultaneous open to work, the NAT should:

not send an RST as a response to an incoming SYN packet that is not part of any mapping
accept an incoming SYN for a public endpoint when the NAT has previously seen an outgoing SYN for the same endpoint

This is enough to guarantee that NATs behave nicely with respect to the TCP simultaneous open.