Instance-based vEPC implementation right approach-

Jai Shankar

Published Oct 28, 2017

To understand deployment strategies of virtualizing LTE, we find that most of these efforts talk about instance-based vEPC implementation, where vEPC NFs (MME, SGW, PGW etc.) are installed as virtual machine instances. We address two major issues in the context of instance-based vEPC implementation

(1) Signaling storm during peak hours, and

(2) Timely execution of mission critical events

Controlling network signaling storm: LTE devices frequently interact with LTE network to execute their events. These events are Device Attach, Service Request and Release, Handover, Paging, Bearer Activation, Modification and Deactivation, Detach Request, and many others.

Out of these events, some are executed more frequently than others. Handover event is executed at least 50 times more than device Attach incident during busy hours. To execute one such event, different NF components interact with each other and generate a greater number of signaling messages. Some events produce more signaling messages than others. For example, one handover event can generate up to 32 signaling messages compared to paging event that produces only 6 signaling messages. When all events are combined from all devices during busy hours, a signaling storm is generated at EPC NFs. We are motivated to provide a solution that controls the signaling storm at LTE core without restricting devices’ network access.

Defining mission critical events: Operationally mission critical events are those who delay or failure impact over ongoing user services like voice, data, and multimedia services. These events are:

• Handover event that ensures seamless user traffic flow during user mobility

• Paging event that wakes device from idle state when voice/data traffic is pending at LTE network

• Service Request event that provides on-demand network resources to device

Interestingly, these 3 events make up 50% of all LTE network signaling traffic. If focus on virtualized NFV infrastructure which consists of commodity servers running Virtualized Network Functions (VNFs) over cloud platforms. In contrast to legacy LTE NFs implementation, NFV implementation introduces a number of changes.

1) VNFs may be located over multiple hops unlike the traditional NFs which are directly connected. Therefore, during congestion, long queueing delays in switches introduce high latencies.

2) during high data-center utilization, packet loss probability increases that can adversely affect traffic flows, where the loss of an ACK may cause TCP to perform a timed out retransmission.

3) data-center network traffic exploits the inherent multi-path nature of data-center networks resulting in out of order packet delivery.

4) data-center network is designed to meet application deadlines, which provide mechanisms to meet traffic flow deadlines (e.g. mice flows), rather than packet level guarantees.

In short, data-center network is designed to meet Service Level Agreement (SLA) by protecting execution bounds on traffic flows. However, in LTE, service guarantees are made by timely execution of mission critical events.

LTE-NFV framework provides the flexibility and network scalability by decomposing original NFs into multiple VNFs. In order to ramp up the original capacity of a NF, multiple VNF instances are needed. For example, hundreds (if not thousands) of MME-VNF instances are initiated over commodity servers in order to facilitate 10 million subscribers as supported by conventional MME function. These VNF instances are distributed within data center. Ideally, related EPC VNF instances (e.g. MME, SGW, PGW etc.) are placed within the same rack that eliminates network delays between two EPC NFs. However, during mobility, device switches from source eNodeB to target eNodeB – connected to different MME instance. As a result, the device session migrates from its source MME to target MME during handover. Thereafter, new serving MME and rest of old serving EPC NFs (e.g. SGW and PGW etc.) end up residing at different racks. Now network delays play important role on timely execution of network signaling messages. Practically this has been observed that LTE-NFV framework is not able to cope with varying delays among different VNF instances because of following reasons:

Expiry of a timer at any NF may lead to event failure: LTE was designed for fewer EPC NFs which are directly connected over dedicated fiber link. Therefore, in legacy LTE network, the variation in RTT values is negligible.

This motivates LTE network designer to use RTT for two purposes timer between a pair of EPC NFs.

(1) path management, and

(2) calculating message retransmission

Path management: As a matter of fact, all EPC NFs and the connection between them must always be active to serve users. To determine that a peer NF is active, the NFs exchange echo request and echo-response messages. This exchange of the echo-request and echo-response messages between two NFs allows for quick path failure detection.

Retransmission timer: Echo-request and echo-response also help in calculating packet retransmission time at EPC NF. Retransmission timer is calculated based on RTT measurements (i.e. time difference between echo-request and echo-response). Although, such timer value incorporates arbitrary RTT value delays, it does not include larger RTT value variations because network communication delay does not vary for directly connected legacy LTE NFs. However, virtualized EPC implementation needs to address significant RTT variations. Data-center network’s link redundancy provides multiple paths for each distributed pair of NF. This means echo-request and echo-response packets may traverse through two different paths for each RTT calculation. This can potentially cause a significant variation between two subsequent RTT measurement readings. To make things worse, data-center network congestion can cause RTT spikes to tens of milliseconds making EPC retransmission timer calculation even harder. RTT varies from few microseconds to 1000 microseconds under normal network load. This 1000X RTT difference converts into hundreds of different timer values. Unnecessary signaling messages retransmission lead to overall delay in event execution; and expiry of event-timer running at device results in event failure. Expiration of a timer has a domino effect: For one event execution, multiple EPC NFs are chained such that one NF output is an input of second NF, and so on. For example, in handover event execution, signaling messages are exchanged let’s say between 5 different NFs (i.e. source MME, source SGW, target MME, target SGW, and PGW). Each pair of NF is running a different retransmission timer value. When one timer value expires, it produces a domino effect that causes expiration of preceding NF timer. Even in the presence of mild network congestion, the response from SGW is delayed and the timer at target MME expires. Because source MME is waiting for a response from target MME, eventually the timer value at source MME expires. This can potentially create a domino effect to chained NFs for handover event execution. In short, EPC by design is not only sensitive to network delays but also does not tolerate even mild delay variance. It is challenging to provide both constant and smaller network delays in virtualized data-center network, where packets may face network congestion and take multiple packet traversal paths, which were not the case in legacy EPC.

To view or add a comment, sign in

Instance-based vEPC implementation right approach-

Jai Shankar

More articles by Jai Shankar

Others also viewed

New reflections on telecom clouds

The Real CTOs of NFV: Jerome Tollet, Qosmos

Automating ISP Networks with Cisco NSO: SDN Orchestration in a Virtual Lab Environment (second part)

Great SDN Expectations

Note: O-RAN & 3GPP Interfaces

How Billion-Dollar Telecom Platforms Handle Billions of API Calls

ECMP (Equal-Cost MultiPath): Scale Horizontally, Not Just Vertically

About cloudification & virtualisation of telecommunications

The future is cloudy: NFV 2020

Routing Service in an OpenFlow Network

Explore content categories