UNIVERSITY POLITEHNICA OF BUCHAREST FACULTY OF AUTOMATIC CONTROL AND COMPUTERS COMPUTER SCIENCE DEPARTMENT HBFXComputer Science – Logo Computer… [615852]

UNIVERSITY POLITEHNICA OF BUCHAREST
FACULTY OF AUTOMATIC CONTROL AND COMPUTERS
COMPUTER SCIENCE DEPARTMENT
HBFXComputer Science – Logo
Computer Science
Computer Science& Engineering
& EngineeringDepartment
Department
DIPLOMA PROJECT
IPsec : Performance analysis and speed optimizations
in VPN networks
Stan Gabriel
Thesis advisor:
Sl. dr. ing. Mihai Chiroiu
BUCHAREST
2018

CONTENTS
1 Introduction 1
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 State of the Art 3
2.1 Resume Protocol From Cached State . . . . . . . . . . . . . . . . . . . . . 3
2.2 Network Stack Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Separation of Control and Data Plane . . . . . . . . . . . . . . . . . . . . . 6
3 Architecture 9
3.1 IPSec Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 IPSec Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Overall IPSec architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Background 14
4.1 Dedicated Cryptographic Chips . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Optimization of Software Encryption Functions . . . . . . . . . . . . . . . . 15
4.3 IPSec Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Implementation 18
6 Results 22
1

6.1 Disable Fragmentation and change MSS . . . . . . . . . . . . . . . . . . . 24
6.2 Shorter SA Lifetimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3 Linux TCP Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.4 Adjust InterruptThrottleRate . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.5 Resize send / receive bu ers . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.6 Increased receive and send bu er and disable fragmentation . . . . . . . . . 32
7 Weak Points and Future Work 34
8 Conclusions 35

ABSTRACT
Virtual Private Networks (VPN) slowly become a necessity in the public Internet, since security
is changing from an option desired only by big companies, to a fundamental principle required
by every user. Internet Protocol Security (IPsec) is a standard that stands at the basis of
VPN technology, which provides the demanded security in a modularized fashion, in order to
supply a solution best suited for each client. Unfortunately, cryptographic implementations
are considered a performance bottleneck and are constantly subjected to new optimization
techniques. We present a set of optimizations which target the system and VPN con guration,
endorsed by extensive measurements, that achieve a notable increased throughput.

1 INTRODUCTION
1.1 Context
If we were to describe today's technology with only one word, best suited to summarize every-
thing that surrounds us, that word would be "information". It is a very broad abstractization,
but information is undeniably the principle that stands at the basis of today's technology. In
the 1950s a concept was created that would facilitate the exchange of information, which
will later evolve and become known as the Internet. This was a revolutionary advance in
technology which as years passed was integrated in our day by day life and issues such as the
security of the information that we manipulate started becoming more important. Multiple
solutions for those concerns started being created and a very important one that is widely
used today is Internet Protocol Security (IPsec).
1.2 Problem
Unfortunately those solutions come with a huge trade o , they provide security but signi cantly
reduce the speed at which information is being transfered. Cryptographic operations have a
notable overhead and are considered to be very expensive when it comes to the computational
power that they require, so a better management of those resources will have a direct impact
on the data transfer rate. However we must keep in mind that those resources are shared
with other processes, so giving those cryptographic operations a monopoly over them is not
a viable solution.
1.3 Objectives
Our objectives are to analyze the performance of the IPSec protocol suite, determine what
are the most in
uential factors that a ect it and focus on removing the bottleneck once we
1

discover it. We want to obtain signi cant throughput increases that can be applied to a wide
range of users in a modular fashion that does not require aggressive changes to their systems
or networks.
1.4 Solution
The solutions we discovered are covered in Chapter 6 where each end every one of them is
thoroughly explained and suggestion are made as to what is better to be applied and in what
scenario. We covered all possible factors that may in
uence the performance of our target
application and made multiple testing scenarios to decide where and when the solutions we
provide are best to be used.
2

2 STATE OF THE ART
Since security over the Internet is gaining more and more importance as technology evolves
and IPSec o ers an implementation that solves those security issues, studies and papers that
cover this protocol performances and possible optimization have been created which propose
di erent solutions and try to maximize the thoughtput of the network while also maintaining
the security standards. In the following sub-chapters I will present the work that has been
done on this subject and debate the possible shortcomings.
2.1 Resume Protocol From Cached State
Research made by Craig A. Shue and Minaxi Gupta [18] noted the overhead generated by
theInternet Key Exchange (IKE) protocol [12]. IKE is a two phase protocol which uses a
Security Association (SA) to establish the routine in which the two peers communicate. This
requires the end points to agree on the cryptographic algorithm used to encrypt trac, the
mechanism used to authenticate the other end point, and the hash algorithm.
In IKE phase 1 the protocol needs to authenticate the IPSec participants and set up the
secure channel between the peers in order to enable future key exchanges. IKE phase 1 has
the following tasks:
Protect the identities of the IPSec peers and ensures their authentication
Negotiate a matching IKE (SA) policy between the members to protect future exchanges
Establish matching shared secret keys through an authenticated Die-Hellman exchange
Set up the secure tunnel which will be used to negotiate IKE phase 2 parameters
The purpose of IKE phase 2 is to set up the IPSec tunnel by rst negotiating the IPSec SAs
to be used. The following functions must be accomplished by IKE phase 2:
Negotiate the IPSec SA parameters to be used, protected by and existing IKE SA
3

Establish the IPSec security associations
Renegotiate IPSec SA when it's lifetime expires, to ensure forward secrecy
Perform a supplementary Die-Hellman exchange to refresh connection keys
The issue with this protocol, as previously described, is represented by the computationally
intensive exponential operations that the Die-Hellman algorithm requires, so the proposed
solution involves caching the IKE keys and the IPSec keys to be reused in future connec-
tions. They developed a cryptographic secure cache resumption protocol whose overhead is
signi cantly lower than the overhead generated by the Die-Hellman algorithm.
Figure 1: Resume Cache Connection Protocol [18]
Unfortunately the proposed solution is only viable when implemented on a server that expects
multiple visits from the same client over a short period of time, since in this scenario each
phase of the IKE protocol is executed every time a new connection is established without
taking in consideration if a previous one can be reused. Also, the notion of caching keys is
counter intuitive to the re-keying protocol already present in IPSec which is done in the second
phase of the IKE protocol and maintains the Perfect Forward Secrecy property. This solution,
as mentioned in the work, is best suited for multiple, short connections and can be adapted
to only perform re-keying when necessary rather than it being in
uenced by issues regarding
network connectivity.
2.2 Network Stack Optimization
Another notable proposition o ered by Michael G. Iatrou and Artemios G. Voyiatzis [10]
is centered around optimizations that can be achieved by understanding the interactions
4

between IPSec and the Linux kernel networking process. The implementation of IPSec can
be considered a latency generating component from a system's point of view: every packet
received or sent must pass through the IPSec implementation to be processed and a series
of operations will be executed on it, such as encryption, decryption, hash veri cation or hash
generation. The accumulated function calls can be seen as a longer execution path and
it is preferred that we process as many bytes as possible in a single passage. This paper
introduces a series of system and IPSec optimizations that accomplish signi cant throughput
gains without signi cant or intrusive modi cations of the implementation.
The rst focus of their solution was the TCP/IP stack, since is not a static set of protocols,
certain parameters can be con gured to better t the speci c end-to-end link attributes. They
explored speci c Linux kernel extension that allow customizable parameters to be changed in
order to provide a more tting environment for a high performance network. The options
chosen were optimizations regarding timestamps, window scaling, Selective ACK (SACK) and
experimenting with various sizes for the send and receive socket bu ers, which the kernel
would adapt and set the appropriate value based on the available memory.
Another important factor is the overhead created by the interrupts that the network interface
card (NIC) raises when a package is received. Those interrupts must be treated by the proper
interrupt handler, which means the kernel constantly suspends and resumes the other running
processes. We can see how for a constant
ux of packages the overall system performance can
be a ected, and since the IPSec operations are computationally intensive the ideal case would
be to minimize the number of those interrupts. The Linux kernel o ers a solution through
the implementation of NAPI, which is a heuristics-based workaround that better manages the
excessive number of interrupts from the NIC. After a certain rate of interrupts for the NIC is
reached, the kernel will disable them and process further packages using a polling mechanism.
If the rate drops below the set threshold, the kernel will then switch back and enable the
interrupt handler. This approach o ers an answer to the interrupt overload by reducing it
signi cantly and maintains a better system performance, while also handling possible system
overloading caused by network trac.
The last optimization that they decided to focus on is the variable size of the Message
Transmission Unit(MTU). The default value of the MTU is set to 1500 bytes since it is
dependent on the network architecture, meaning that the device that supports the lower
5

MTU value will set that standard for all the devices connected to that network. Since the
tests were made on a direct connection between a client and a server, the MTU value could
be changed being limited only by the NIC. Increasing the MTU allows for bigger datagrams
or frames to be passed between the peers and served to the IPSec implementation to be
processed. However, a balance must be found between the processing power of the station
and the size of the MTU, since the issue of trac congestion can appear if the packages are
sent faster than they are processed.
The solutions presented do not require an intrusive approach on the IPSec implementation and
produce impressive results using only already existing kernel modules and IPSec con gurations
that can be easily adapted to t the requirements of the supporting network. On the other
hand, a clear observation can be made; for those optimizations to be practical, the design of
the network and the hardware used must be known beforehand, meaning that the proposed
optimizations are not applicable when an IPSec tunnel is established between peers over
the Internet, or for a bigger private network which uses auxiliary devices such as routers or
modems, since those do not posses the processing power of a computer.
2.3 Separation of Control and Data Plane
Following the resolution described by Kun Tan and Paul Wang [19], their work is centered
around the utilization of the IPSec protocol inside cloud based environments, in which clients
can request a virtual machine (VM) running a software IPSec gateway inside to secure the
connection to the cloud framework. The main issue with this implementation is the wasteful
use of the limited resources that the cloud can provide, since a VM needs to be allocated
for each di erent tenant and resources that were assigned to each machine can not be redis-
tributed based on their utilization. If a client does not manipulate all the resources given, the
unused fraction will be wasted.
The proposed solution separates the control and data plane of the IPSec protocol and assigns
those responsibilities to separate VM's, allowing multiple clients to be managed by the same
Gateway Management Node (GMN) , a machine that supervises the control plane and redirects
the data that needs to be processes to multiple Gateway Processing Nodes (GPN) , overseers
of the data plane. Two additional nodes are utilized called Gateway Ingress Node (GIN) and
6

Gateway Egress Node (GEN) , whose purpose is to balance the data load of incoming and
outgoing connections, and forward the trac to the destination virtual IP address.
Figure 2: Protego Architecture [19]
Another interesting concept that was introduced in this paper was tunnel migration. In order to
facilitate a viable load balance for the GPNs, the GMN needs to have the ability to redistribute
the trac among multiple GPN's if the data input is too high for the already created GPNs to
process e ectively. The process is managed by the GMN, which sends a request for creating
a new child security association to the GIN, receiving a response containing a DH value and
the nonce of the responder. The GMN then handles over the SA to the new GPN while also
informing the GIN and GEN of a new steering rule for the trac.
Using the described implementation they were able to achieve impressive results as you can
see in Figure 3. Using only one GPN, and depending on the number of CPU cores, the
throughput can reach as high as 17 Gbps, a major increase in performance. They also took
into consideration the overhead generated by tunnel migration and packet processing latency
caused by the maintained communication between the multiple VMs involved and concluded
that those factors do not in
uence the performance in a considerable amount.
While this paper provides an interesting idea centered around separating the IPSec execution
in two separate planes, each managed by a di erent machine, it does not o er a solution
related to performance optimization when a single client is involved in this process. The
implementation o ers better resource management for cloud applications and the results were
7

Figure 3: Protego Results [19]
obtained when multiple clients were connected and redirected to the same GPN. So it is not
a single IPSec connection that was optimized to reach a throughput of 17 Gbps, but multiple
tunnels being processed in parallel on the same machine.
8

3 ARCHITECTURE
Internet Protocol Security (IPSec) [8] is a protocol suite that helps to ensure private and secure
communications over Internet Protocol (IP) networks using cryptographic security services.
Considering IPSec is integrated at the Network layer, it ensures security for almost all protocols
in the TCP/IP suite, and because it is applied in a transparent fashion to other applications, no
other security con gurations are required to ensure other products that use the TCP/IP suite
are compatible with it. The main functionalities that IPSec provides, such as network-level
data integrity, data con dentiality, data origin authentication and replay protection, create a
stable defense against attacks such as :
Network-based attacks from untrusted computers.
Data corruption
Data theft
User-credential theft
Administrative control of servers, other computers, and the network
3.1 IPSec Operation
IPSec supports two modes of operation, designed to cover the needs of di erent clients, each
best suited for a certain scenario. Those two modes are : tunnel mode and transport mode.
In transport mode, the source and destination hosts must directly perform all cryptographic
operations. The encrypted payload is send through a L2TP (Layer 2 Tunneling Protocol) and
the encryption and authentication services are established only for the original IP datagram.
This mode of operation is used for host-to-host communications and it is not compatible with
gateway services.
Tunnel mode on the other hand was created speci cally to o er a valid interaction with
gateway services. Each gateway performs cryptographic processing in addition to the source
9

and destination hosts. Many tunnels are created in succession between gateways, providing
gateway-to-gateway security. The original IP datagram is fully encapsulated o ering security
services for both the IP header and the payload.
3.2 IPSec Protocols
The IPSec standards have de ne three main protocols [14] that are required to establish and
maintain a VPN connection. Those protocols are Authentication Header (AH), Encapsulating
Security Payload (ESP) and Internet Key Exchange (IKE) .
Authentication Header
AH provides data integrity and reply protection for the whole IP datagram by using the
modi ed version of a hash function (i.e : MD5 or SHA1), that takes an authentication key as
input to compute the integrity checksum value of the package. The receiver then recomputes
this checksum and compares it for equality with the one received. An attacker can not make
any changes to the original data without knowing the key used to generate the integrity
checksum. This protocol speci es a set of mutable IP headers that should not be used when
calculating the checksum, since the receiver will not be able to recompute it.
Figure 4: Authentication Header Protocol
10

Encapsulating Security Payload
In addition to the bene ts provided by AH, ESP also gives the user data con dentiality for
the IP datagram. It uses a symmetric key encryption algorithm to create the cypher text to
be sent, and like the previously described protocol it computes a checksum to ensure data
integrity. However, ESP does not authenticate the IP header itself when used in transport
mode.
Figure 5: Encapsulating Security Payload Protocol
Internet Key Exchange
Now that we established the protocols used to ensure the encryption of the data payload and
the protection of the packet headers, we need a protocol that allows us to safely come to
an agreement about the keys and cryptographic algorithms that we are going to use in our
tunnel. The cryptographic parameters that are negotiated are stored in a security association
(SA) , which also contains a lifetime counter associated with it in order to enforce a key and
cryptographic algorithm renewal, to maintain the perfect forward secrecy property. All the
SAs that are created, either manually or automatically through negotiation, are stored in the
Security Association Database (SAD) , which is used by the Security Policy Database (SPD)
to determine what action is to be taken on an incoming IP packet. Based on a selector eld
from the IP packet, the SPD can take one of the following three actions :
11

Drop the packet
Pass the packet to the IPSec module with the corresponding SA
Pass the packet to the IP stack for normal forwarding
The Internet Key Exchange (IKE) protocol de nes the procedure that is used to dynamically
establish SAs between two IPSec peers. The IKE protocol is initialized when two IPSec end
points wish to communicate, and is divided in two phases :
1.Phase I (Authentication Phase) starts by assuming that no secure channel exists.
Therefore, the goals of this phase are to establish the required secure channel, authenti-
cate the participating parties and generate shared keys for the protection of future IKE
protocol messages.
2.Phase II (Key Exchange) is used to determine the IPSec SA and to generate and
renew the encryption key material. For an additional layer or security, a full Die-
Hellman key exchange can be done to provide perfect forward secrecy, but it is more
time expensive. The other option is to derive the keys from the phase I keying material.
3.3 Overall IPSec architecture
Figure 6: IPSec Architecture [14]
In Figure 6 we see the overall architecture and how each component of the IPSec protocol suite
is utilized in the processing of every individual IP packet. The Policy Manager is a module
12

used as an interface that allows the user to manually add security policies in the SPD, the IKE
Daemon module is responsible for the automatic SA negotiations between IPSec peers, and
the Certi cate Manager validates and registers certi cates used for authentication purposes.
When an IP packet is received, the IPSec module extracts the selector from the packet and
if the policy is "IPSec", we look in the SPD for an entry that should point to an SA in the
SAD. The module will then retrieve the corresponding SAD entry and check its validity. If
the SA is expired, the IKE Daemon will be noti ed to start a new SA negotiation. After we
determined the SA, the packet will be processed by the cryptographic module based on the
speci cations of the SA and will then be passed to the transmit queue.
13

4 BACKGROUND
Performance analysis on the IPSec standard has been an interesting topic of discussion over
the years [9], [15]. A common ground can be found on all the research that has been made,
the trade-o between the complexity that resides in the IPSec implementation, to o er a more

exible solution for a large eld of scenarios, and the overall performance of the protocol is not
feasible. However, IPSec may not be perfect but it is considered the best IP security protocol
available when comparing it to PPTP (Point to Point Tunneling Protocol and L2PT (Layer
2 Tunnel Protocol) [6]. In order to make this exchange between complexity and performance
reliable, some notable solutions have been proposed that tackle the issue of the protocol's
performance bottleneck.
4.1 Dedicated Cryptographic Chips
Because cryptographic operations are computationally expensive, moving those operations
to a hardware chip whose sole purpose is to compute them without interference from other
instructions was a provision that gathered around it a lot of research. The works of Ver-
bauwhede and Ingrid[20] as well as Holger Sedlak [17] show that this concept was studied
since the 1980s, the desired solution being the creation of a separate hardware entity that is
responsible for handling the heavy computations of cryptographic algorithms such as RSA and
DES. Later, a more ambitious resolution was proposed by Issam Andoni, Pawel Chodowiec,
and Jacek Radzikowski[5] which focused on the algorithms that were present in the IPSec
implementation of the time. The main goals were to create a hardware implementation of
DES, 3DES and AES as well as the Die-Hellman algorithm for key exchange and the more
frequently used hash functions HMAC-MD5 and HMAC-SHA.
A more recent approach on this idea is the new instruction set created by Intel and integrated
on their chips, AES-NI [21]. A number of six new instructions [4] were added on the Intel CPUs
in order to provide faster computation speeds for encryption, decryption and key generation
14

operations for the AES algorithm.
AESENC. Performs a single round of encryption. Combines four steps of the AES
algorithm – ShiftRows, SubBytes, MixColumns and AddRoundKey into one instruction
AESENCLAST. Performs the last round of encryption. Combines three steps of the
AES algorithm – ShiftRows, SubByste and AddRoundKey into one instruction
AESDEC. Performs a single round of decryption. Combines four steps of the AES
algorithm – InvShiftRows, InvSubBytes, InvMixColumns and AddRoundKey into one
instruction
AESDECLAST. Performs the last round of decryption. Combines three steps of the
AES algorithm – InvShiftRows, InvSubBytes, AddRoundKey into one instruction
AESKEYGENASSIST. Facilitates the generation of the encryption keys used in each
round
AESIMC. Converts the encryption round keys to a form usable in the decryption
This new instruction set achieved great performance enhancements being two times faster
when executing the AES operations for non-parallel modes such as CBC, and ten times faster
for parallelizable modes such as CTR. The parallelization is done using Intel's Hyper-Threading
Technology[11], executing multiple threads on the same core, each thread performing the
required encrypt or decrypt operation on a assigned block of data, and each thread generates
their speci c round key that they need to use.
4.2 Optimization of Software Encryption Functions
Considering that computer systems didn't used to bene t from encryption hardware how they
are today, a software optimization approach was considered to be a more viable resolution,
easier to implement across multiple machines. One of the rst works done on this subject was
written by Ralph C. Merkle [13], who proposed two encryption algorithms : Khufu and Khafre,
and one hashing function : Snefru, that optimize the preferred cryptographic algorithm of the
time, DES.
Khufu is a block cipher that operates on 64-bit blocks that acts similar to DES in the sense
that is a multi-round encryption function that splits the plaintext in two equal parts. The
15

optimization that Khufu brings is related to the number of table lookups; DES uses 8 table
lookups per round of encryption while Khufu bene ts from a larger solution box that only
required one lookup per round. The solution boxes in Khufu are kept secret and precomputed
from a user supplied key.
Khafre is similar in design with Khufu the signi cant di erence being that this algorithm
does not pre-compute the solution boxes but uses instead a prede ned set. Because of this,
the key-mixing mechanism must be implemented since the solution boxes can't act as the
cryptographic secret any more. Khafre rounds are more computationally expensive than the
previously mentioned algorithm and a larger number of iterations is necessary to achieve the
same level of security. The bene ts of Khafre are noticeable on small amounts of data since
pre-computing solution boxes in this case would not be optimal.
Unfortunately the cryptographic weakness of those algorithms for a small number of rounds
leaves them vulnerable to a large eld of possible attacks based on chosen-plaintext, known
plaintext and ciphertext. It is argued that although those optimizations make Khufu and
Khafre faster than DES, they also require more rounds to ensure the same cryptographic
strength, which takes away the optimization that they aimed to achieve.
Another example based on the work of Merkle is the SEAL algorithm [16] developed by Phillip
Rogaway and Don Coppersmith. In contrast, SEAL is a stream cipher which allows it to
push their speed gains further. The same principles de ned by Khufu also apply to SEAL :
preprocessing the key, table-lookups for the algorithm permutations and in addition to those,
a length-increasing pseudorandom function was added to be used in the encryption process
of the stream cipher. The de cit of this implementation is the basis on which it was made,
the fact that it is a stream cipher, meaning that its key should be as long or even longer than
the plaintext, making it unreliable for encrypting a large amount of data.
4.3 IPSec Engines
In order to overcome the challenges of computation-complexity and algorithm scalability that
the IPSec protocol suite poses, dedicated pieces of software and hardware were researched
and developed to o er a resolution.
16

Such an example of dedicated software is PAC (Parallel Algorithm Core) [7] which was created
to exploit the existing parallelism of the IPSec implementation while also trying to maintain
a scalability support. The main goal of this project was to identify the sections of code
that had the potential to be parallelized and then use a dedicated piece of software called
"algorithm pieces ", created for a speci c algorithm type, to exploit this property. Multiple
algorithm pieces formed an algorithm suite as we can see in Figure 7, which was managed by
a scheduler that was responsible of guiding new arriving IPSec packages to the corresponding
piece based on the requested algorithm and ordering the resulted processed packages in the
same succession as they arrived. We have the possibility to create as many algorithm suite as
needed, being limited only by the hardware resources that we have at our disposal, each suite
can be composed of di erent algorithm pieces, each responsible for processing the packages
that are described by their attributed algorithm. The entire architecture is managed by a
system-level scheduler which directs the newly arrived packets into speci c suites that are
capable of serving it, while also achieving load balance among suites.
Figure 7: Pac Suite Structure [7]
17

5 IMPLEMENTATION
For the measurements we used a basic architecture created with two identical Linux machines
running Ubuntu 14.04, one acting as the OpenVPN server and the other one as the OpenVPN
client, as illustrated in Figure 8. The data packets were sent bidirectionally to ensure that all
the cryptographic processes were used and and the same amount of data that was encrypted
and sent was also received and decrypted. The hardware con gurations are as follows :
CPU : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
First NIC : Intel Corporation Ethernet Connection I217-LM
Second NIC : In niBand Mellanox Technologies MT26428
HDD : ATA Disk WDC WD10EZEX-60M
Connectors : Mellanoxs 40Gb QSFP passive copper cable
Figure 8: Network Architecture
18

The IPSec connections were managed by the Mellanox network cards since they possessed a
higher bandwidth of 40Gb/s and we desired to eliminate the limitations that were imposed on
the IPSec performance by the capabilities of the NIC. The Intel Ethernet Connection I217-LM
was still kept in order to simulate normal network data trac that may occur in a standard
IPSec utilization scenario, where the machine is still responsible for the administration of other
IP packets that do not have to pass through the IPSec tunnel itself. We wanted to study
the impact that the kernel interrupts, generated by the NIC, have on the IPSec cryptographic
performance and at the same time have a dedicated network card that does not have to take
the responsibility of raising those interrupts.
In our tests we chose to use the iperf3 [1] tool to generate trac between the two end-
points, increasing the size of the datagram to have a better understand as how the IPSec
implementation is responding to the size of the packet that it receives. We tested for both
UDP and TCP protocols having the OpenVPN client and server running IPSec in tunnel
mode. The algorithm set that we chose to make our measurements on include DES-CBC,
BF-CBC, RC2-64-CBC, AES-128-CBC, AES-192-CBC, AES-256-CBC, and as for the means
of authentication we chose SHA512.
To generate our results we ran two scrips, one on the server and one on the client, that
generated data using the iperf3 tool for all the above mentioned cases. The end results
were obtained after executing multiple measurements from which we generated the average
calculation. A total number of ten measurements were made for each cipher, starting the
OpenVPN application in TCP and UDP mode one at a time.
We noticed that if the Linux machines were running for a long period of time the results
varied signi cantly. We believe this to be caused by the number of increasing swaps the
kernel need to make to bring memory pages in or out of RAM, which brings a substantial
overhead considering the fact that the storage device is an HDD. In order to avoid big variations
between our measurements we power cycled our testing structure before each trial, replicating
as much as possible the same state.
19

Figure 9: Client Side Script
In Figure 9 it is presented the client side of the script that we used to generate our data. We
begin by changing the client OpenVPN con guration le to expect the desired transport layer
protocol and afterwards we cycle through all cryptographic algorithms from the ciphers array,
connect to the OpenVPN server and start our measurement :ten iterations for each block size.
In Figure 10 we see the symmetric implementation of the server side script. We start the
server after changing the transport protocol and the expected cryptographic cipher from the
con guration le and wait for the client to nish its measurements. We assume that the
iperf3 server is already running on this machine. The synchronization is made using timed
sleeps since we can calculate how long the client will need to run for the testing script to
complete and we take into account some error margin that may appear depending on how
fast the OpenVPN process starts and how much time the client needs to establish a stable
connection.
20

Figure 10: Server Side Script
The server also sends back data to the client to trigger the encryption algorithms from the
OpenVPN implementation. For the behavior of our IPSec to be best exploited, and for the
measurements to be as accurate to a real life scenario as possible, we aimed to cover all
possible cryptographic operations that may a ect the performance. As you can observe we
are not interested in the results that are generated by the server script, its sole purpose
being to create trac in the network. The footprint of those packets and the mark of the
cryptographic operations on the CPU will still be re
ected in the measurements done by the
client.
21

6 RESULTS
The cipher suited chosen are obtained from the OpenVPN documentation[3] and at the
moment OpenVPN does not o er support for other block cipher modes. We limit ourself to a
maximum block size of 2048 bytes because we noticed that further increasing it does not o er
any noticeable improvements in throughput, the
uctuation of the measurements becoming
stable. In the following we will show the results and improvements that we managed to obtain
through our solutions and discuss why they work and when they can be applied.
Figure 11: No IPSec
We can see in Figure 11 the throughput that we obtain on the 40 Gb link without running
IPSec. We will use this as reference and to have actual measurements on the maximum
practical throughput that can be obtained, eliminating thus the link capacities as being a
factor that may a ect the IPSec performance.
In Figure 12 and Figure 13 we can see how the throughput is a ected when running our tests
over the IPSec tunnel. We observe a massive decrease in performance, from 8 Gb/s down
to only 1 Gb/s when using the UDP protocol. We also noticed an impressive di erence in
throughput when using the AES cryptographic algorithm instead of the other block ciphers
22

such as DES, BF or RC2. This is due to the fact that our processors have support for AES-NI
which puts this particular cryptographic algorithm in front of the others when it comes to
performance.
Figure 12: IPSec (TCP)
Again, we can observe that the throughput peaks when using block sizes of 1024 bytes. This
is explained when looking at the network interface con gurations, on which the OpenVPN
client and server are running. The MTU for those interfaces is set to the default value of
1500 bytes, meaning that any packet larger than the set value, be it from TCP or UDP, it
must be fragmented before sent. This means that if the client will send 2048 bytes of data,
it will actually send two packets containing 1500 bytes and 548 bytes of data respectively.
Figure 13: IPSec (UDP)
23

6.1 Disable Fragmentation and change MSS
The following two gures, Figure 14 and Figure 15, describe how the performance is enhanced
when the OpenVPN application is executed with the {fragmentation 0 parameter. By running
it with this option, the UDP datagrams will no longer be fragmented before being sent. The
same result can be achieved for TCP frames by setting the {mss x parameter to a large
value. This can be done because the Mellanox network card has a MTU of 65535 bytes, the
maximum size of an IP packet, which allows us to run our tests using jumbograms and very
large TCP frames. An increase of approximately 13% can be seen for the AES algorithm,
when running the tests over the TCP protocol, and a smaller one of 10% for the other
cryptographic algorithms. For the UDP protocol we see a substantial increase in performance
of 40% when using a block size of 2048 bytes. The did not continue our measurements
past the 2048 block size cap because the throughput drops when attempting to send bigger
packets. The retransmission rate for the TCP protocol increases, since bigger data frames are
more susceptible to corruption, and the drop rate of the UDP protocol is kept above 90%,
which explains the loss in eciency.
Figure 14: IPSec disabled fragmentation (TCP)
24

Figure 15: IPSec disabled fragmentation (UDP)
6.2 Shorter SA Lifetimes
Considering the fact that IKE needs to create a new key for the data channel once it expires
or a preset amount of data has been processes with the previous one, and usually those
key generation methods invoke a call to the Die-Hellman key exchange algorithm which is
computationally expensive, we can assume that this process of re-keying has a negative impact
on the performance of our IPSec tunnel. To better understand the in
uence of this process
in IPSec, we try to simulate a scenario where a SA expires every ve seconds and a new key
must be generated. In Figure 16 and Figure 17 we see the in
uence that the Die-Hellman
exponentiations have on the IPSec tunnel, compared to the basic IPSec measurements. While
it only presents a decrease in performance of only 5%, we must note the fact that an OpenVPN
server will need the ability to serve multiple clients at once. This means more IPSec tunnels
which, in the worst case scenario, may need a re-keying at the same time, throttling the
performance. For those kind of issues we can use a pre-shared secret key for the IKE exchanges,
eliminating the requirement of generating a new key whenever necessary, and also eliminating
the Die-Hellman protocol by suggesting the IPSec tunnel keys be derived from the material
of the IKE key.
25

Figure 16: IPSec Rekey at 5 seconds (TCP)
Figure 17: IPSec Rekey at 5 seconds (UDP)
26

6.3 Linux TCP Optimizations
This particular set of changes focuses around the optimizations that can be made for the TCP
protocol. The Linux default parameters under which it operates are made to be suitable for
a wide set of scenarios and do not focus on performance. By changing those default values,
we can facilitate a better communication between peers that use the TCP protocol. When
increasing the maximum TCP read and write bu er sizes, used by the Linux kernel when
managing the network packets that are requested to be written or read by a socket, we allow
more data to be processed at a time.
More improvements can be added by enabling SACK (Selective ACK). This option o ers a
better retransmission rate of lost or altered TCP frames by only sending the segment of bits
that have been corrupted in the transmission process instead of the whole packet. Enabling the
usage of TCP timestamps can also bene t us, implementing through this option reliable data
delivery by retransmitting segments that are not acknowledged within some retransmission
timeout interval. Accurate dynamic determination of an appropriate RTO is essential to TCP
performance, so RTO is determined by using and estimation of the mean and variances of the
measured round-trip time of the connection.
Figure 18: IPSec TCP Optimized
27

6.4 Adjust InterruptThrottleRate
To understand how this can improve our IPSec performance we must rst understand how the
Linux kernel behaves when a new packet of data is presented to it. When a packet is received,
the NIC will raise an interrupt and will ask the kernel to process that packet, before everything
else. The manner in which this is achieved comes with a very low processing time, but the
overhead generated by those interrupts can not be ignored. Every context switch is costly
and will a ect the task that was running on the processor when the interrupt occurred. This
is more noticeable when some normal trac is generated outside the IPSec tunnel. There is
however a way to limit the number of interrupts that a NIC is allowed to generate every second,
the rest of the packets that are not processed when this threshold has been reached being
dropped. An obvious side e ect is the additional latency that the limited network interface
will have, since a lot of the data will have to be retransmitted.
This limitation of interrupts that the NIC can raise is achieved dynamically through NAPI
("New API"); an extension of the network device driver processing framework which achieves
interrupt mitigation by disabling a set of interrupts when high trac is detected. We can also
set our own threshold by interacting with the NIC driver itself, if it does o er support for this.
We are running on both machines the default Linux network driver : e1000e[2], which does
allow the user to set the number of interrupts that it will generate to a preferred value, the
default value being 3000 interrupts per second.
If we want to limit the trac that does not concern the IPSec tunnel even more, we have the
possibility to change the sizes of the RX/TX queues from the network driver. This will again
achieve the same result as limiting the number of interrupts but will also a ect the size of the
packets that can be received by that network device.
Looking at Figure 18 and Figure 19 we only see an improvement of 3% after we set the
interrupt throttle rate to 300 interrupts per second for the NIC that does not manage the
IPSec tunnel. The result is in
uenced by the amount of data trac that is received when
the IPSec cryptographic operations are executed. Testing on a scenario where the OpenVPN
server is
ooded with normal trac, outside the IPSec tunnel, we noticed that the IPSec
performance, with no optimization changes, drops by 19% while the test where we are limiting
the interrupt throttle rate only showed drops of 4%.
28

Figure 19: IPSec adjusted InterruptThrottleRate (TCP)
Figure 20: IPSec adjusted InterruptThrottleRate (UDP)
29

6.5 Resize send / receive bu ers
The send and receive bu ers used by the OpenVPN implementation come with a default value
of 65536 bytes, which is more than enough from a client's point of view, but an OpenVPN
server that manages a very large number of connections, each with its own IPSec tunnel,
could bene t from a larger bu er.
We can observe in Figure 20 and Figure 21 that the throughput increase that comes with this
change is not very noticeable, only a 4%, but it does come with a quality of life improvement,
since this performance growth is obtained due to the fact that the number of TCP frames
retransmitted and UDP datagrams lost has decreased signi cantly.
Figure 21: IPSec increased send / receive bu ers (TCP)
30

Figure 22: IPSec increased send / receive bu ers (UDP)
31

6.6 Increased receive and send bu er and disable fragmentation
From our observations, increasing the tunnel MTU and disabling its fragmentation have given
the IPSec implementation an important throughput gain, but those enhancements were held
back by the large number of TCP frame retransmissions and UDP datagram losses. In order
to reduce this factor, we decided to also increase the size of rcvbu and sndbu in order
to allow a larger number of packets to be sent and received. As can be seen in Figure 23
and Figure 24, this change allows the throughput to increase even more and make the data
transmission for both the TCP and UDP protocol more reliable.
Figure 23: IPSec increased send / receive bu ers (TCP)
Tests that also used the optimization o ered by setting the Interrupt Throttle Rate to a
smaller value did not come with any results that may a ect the connection in a substantial
way, but they were o ering utility when heavy trac was monitored on the machines. We
believe that under certain scenarios limiting the number of interrupts on the NIC and also
reducing the number of keys generated both for the control and data channel may provide
great value to the performance of the IPSec tunnel.
32

Figure 24: IPSec increased RX/TX bu er and no fragmentation (UDP)
33

7 WEAK POINTS AND FUTURE WORK
Considering the fact that the optimizations previously proposed and tested require more in-
formation regarding the network architecture and impose a rule that all machines present in
the network must posses certain processing capabilities, we project our future work as being
concentrated around the idea of studying other, more general applied solutions that come
with at least the same performance enhancements as the ones mentioned above.
One important part that we shall focus on in the future is the parallelization potential that is
concentrated in the IPSec cryptographic operations. While OpenVPN does not seem to have
a major part of its IPSec implementation exploitable for parallelization, since the cipher sets
that are provided are using mainly the Cipher Block Chaining (CBC) block cipher mode which
only shows parallelization capabilities in the decryption operation, we wish to move our work
to another IPSec implementation that provides us with more ground on which we can achieve
those future goals.
We also take into consideration analyzing the possible bene ts of using a dedicated crypto
processors, designed not only for the execution of intensive cryptographic operations, but one
that was created with the sole purpose of facilitating a faster rendition of the whole IPSec
protocol suite. While Intel AES-NI does indeed provide remarkable performance, the detail
that it still runs on a CPU core and not on a separate hardware device can put the dedicated
crypto processor ahead, as a performance optimization mechanism.
A very important milestone that we desire to reach in the future is the implementation of a
scheduler, required by our parallelization of the IPSec cryptographic operations, that will be
capable of sending encrypted data to workers and retrieving the plain text with the sequence
number, reconstructing the message properly. We believe this to be a solution that can
successfully exploit the parallelization potential present in the IPSec implementation and will
signi cantly increase the performance.
34

8 CONCLUSIONS
After studying the IPSec protocol suite we discovered that the performance bottleneck is,
indeed the CPU. The implementation of the IPSec protocol suite may not be perfect and
it may be described by many as overly complicated, but it is at the current time the best
IP security solution. But being overly complicated means there are a lot of factors that can
in
uence its performance, so there are a lot of factors that can be optimized.
We provided simple solutions that do not require an invasive approach on the IPSec imple-
mentation and o er impressive performance enhancements, depending on the necessities and
processing capabilities of the network. We exploited the processor at its maximum, allow-
ing our testing subject as much time as possible on the CPU for a faster nalization of its
execution, stretching the limitations of the serial approach.
We believe that from here on we must focus our work on the parallelization capabilities that
the IPSec protocol suite possesses, and improve the throughput of the VPN networks so that
security will become a need that can be satis ed with a trade-o as small as possible.
35

BIBLIOGRAPHY
[1] Iperf3 documentation. https://iperf.fr/iperf-doc.php . Last accessed : 10 June
2018.
[2] Linux e1000e base driver for intel gigabit ethernet network connections.
https://www.intel.com/content/www/us/en/support/articles/000005480/
network-and-i-o/ethernet-products.html . Last accessed : 10 June 2018.
[3] Openvpn documentation. https://openvpn.net/index.php/open-source/
documentation.html . Last accessed : 25 June 2018.
[4] Kahraman D. Akdemir, Martin Dixon, Wajdi Feghali, Patrick Fay, Vinodh Gopal, J. Guil-
ford, Erdinc Ozturk, Gil Wolrich, and Ronen Zohar. White paper breakthrough aes
performance with intel aes new instructions. 2010.
[5] Issam Andoni, Pawel Chodowiec, and Jacek Radzikowski. Hardware implementation of
ipsec cryptographic transformations. 2001.
[6] Poonam Arora, Prem R Vemuganti, and Praveen Allani. Comparison of vpn protocols{
ipsec, pptp, and l2tp. 2001.
[7] Dong-Nian Cheng, Yu-Xiang Hu, and Cai-Xia Liu. Parallel algorithm core: A novel ipsec
algorithm engine for both exploiting parallelism and improving scalability. Journal of
Computer Science and Technology , 23(5):792{805, 2008.
[8] Microsoft Corp. What is ipsec? https://technet.microsoft.com/pt-pt/library/
cc776369(v=ws.10).aspx . Last accessed: 26 May 2018.
[9] Ferguson, Niels, and Bruce Schneier. A cryptographic evaluation of IPSec. Technical
report, Counterpane Internet Security, 2000.
[10] Michael G. Iatrou, Artemios G. Voyiatzis, and Dimitrios N. Serpanos. Network stack op-
timization for improved ipsec performance on linux. International Conference on Security
and Cryptography (SECRYPT 2009) , 2009.
36

[11] Debbie Marr, Frank Binns, D Hill, Glenn Hinton, D Koufaty, et al. Hyper-threading
technology in the netburst R
microarchitecture. 14th Hot Chips , 2002.
[12] Andrew Mason. Ipsec overview part four: Internet key exchange (ike). http://www.
ciscopress.com/articles/article.asp?p=25474&seqNum=7 . Last accessed: 22
May 2018.
[13] Ralph C Merkle. Fast software encryption functions. In Conference on the Theory and
Application of Cryptography , pages 477{501. Springer, 1990.
[14] Pradosh Kumar Mohapatra and Mohan Dattatreya. Ipsec vpn fundamentals? https:
//www.eetimes.com/document.asp?doc_id=1275828 . Last accessed: 10 May 2018.
[15] Radia Perlman and Charlie Kaufman. Key exchange in ipsec:analysis of ike. IEEE Internet
Computing , 2000.
[16] Phillip Rogaway and Don Coppersmith. A software-optimized encryption algorithm. In
International Workshop on Fast Software Encryption , pages 56{63. Springer, 1993.
[17] Holger Sedlak. The rsa cryptography processor. In Workshop on the Theory and Appli-
cation of of Cryptographic Techniques , pages 95{105. Springer, 1987.
[18] Craig A. Shue, Minaxi Gupta, and Steven A. Myers. Ipsec: Performance analysis and
enhancements. 2007 IEEE International Conference on Communications , 2007.
[19] Jeongseok Son, Yongqiang Xiong, Kun Tan, Paul Wang, Ze Gan, and Sue Moon. Pro-
tego: Cloud-scale multitenant ipsec gateway. 2017 USENIX Annual Technical Conference
(USENIX ATC 17) , 2017.
[20] Ingrid Verbauwhede, Frank Hoornaert, Joos Vandewalle, and Hugo J De Man. Security
and performance optimization of a new des data encryption chip. IEEE Journal of Solid-
State Circuits , 23(3):647{656, 1988.
[21] Leslie Xu. Securing the enterprise with intel aes-ni. September 2010.
37

Similar Posts