Identifying OpenVPN sessions within network traffic

A team of researchers from the University of Michigan published results research identification capabilities (VPN Fingerprinting) of connections to OpenVPN-based servers when monitoring transit traffic. As a result, three methods were identified for identifying the OpenVPN protocol among other network packets that can be used in traffic inspection systems to block OpenVPN-based virtual networks.

Testing of the proposed methods on the network of the Internet provider Merit, which has more than a million users, showed the ability to identify 85% of OpenVPN sessions with a low level of false positives. For testing, a toolkit was prepared that first detected OpenVPN traffic on the fly in a passive mode, and then verified the correctness of the result through an active server check. A traffic flow with an intensity of approximately 20 Gbps was mirrored onto the analyzer created by the researchers.

During the experiment, the analyzer was able to successfully identify 1,718 out of 2,000 test OpenVPN connections established by a rogue client, which used 40 different typical OpenVPN configurations (the method worked successfully for 39 out of 40 configurations). In addition, over the eight days of the experiment, 3638 OpenVPN sessions were identified in transit traffic, of which 3245 sessions were confirmed. It is noted that the upper limit of false positives in the proposed method is three orders of magnitude lower than in previously proposed methods based on the use of machine learning.

The performance of OpenVPN traffic tracking protection methods in commercial services was separately assessed – out of 41 tested VPN services using OpenVPN traffic hiding methods, traffic was identified in 34 cases. Services that could not be detected used additional layers in addition to OpenVPN to hide traffic (for example, forwarding OpenVPN traffic through an additional encrypted tunnel). Most successfully defined services used distortion traffic using an XOR operation, additional layers of obfuscation without properly randomly padding the traffic, or the presence of non-obfuscated OpenVPN services on the same server.

The identification methods involved are based on binding to OpenVPN-specific patterns in unencrypted packet headers, ACK packet sizes, and server responses. In the first case, a binding to the “opcode” field in the packet header can be used as an object for identification at the connection negotiation stage, which takes a fixed range of values and changes in a certain way depending on the connection setup stage. Identification boils down to identifying a certain sequence of opcode changes in the first N-packets of the flow.

The second method is based on the fact that ACK packets are used in OpenVPN only at the connection negotiation stage and at the same time have a specific size. Identification is based on the fact that ACK packets of a given size occur only in certain parts of the session (for example, when using OpenVPN, the first ACK packet is usually the third data packet sent in the session).

The third method is an active check and is due to the fact that in response to a connection reset request, the OpenVPN server responds with a specific RESET packet (the check does not work when using the “tls-auth” mode since the OpenVPN server ignores requests from clients not authenticated through TLS).

Thanks for reading: