How to solve network bottleneck on the OpenStack (Using DVR)

How to solve network bottleneck on the OpenStack (Using DVR)

I had a chance to migrate parts of our services to server-hosting from AWS to save a cost. However, we still wanted to keep the flexibility as like AWS. I tried several solutions, and finally, I chose the OpenStack and deployed it on Server-Hosting servers by Kolla-Ansible. After several fail-overs, I finally succeeded to deploy, and everything seemed working fine. Even kubernetes on the OpenStack was working well.

 Tens of virtual instances were migrated onto OpenStack without any problem. As I had much of available resource yet, I decided to check the possibility if the instances on the OpenStack on the Hosting server can handle more traffic.

 When we switched routing to OpenStack to care some of the heavy traffic, Everything was fine for couple minutes, but it started not to respond correctly after minutes even no CPU resource was fully used, nor memory resource was full yet.

After tracing the network flow between the host nodes, I figured out all out-going traffic to public-network went to one of Neutron L3 router on the control node. On the other hand, the OpenStack cluster's bandwidth was restricted by the control node's single interface limitation.

 On a co-location environment generally, it's pretty reasonable I would say since network gateway may have enough bandwidth. However, as we were on the server-hosting, each host has its limit. Indeed, the bandwidth limit is big enough for the single host, but if the hosts work as a cluster, then it's an entirely different story.

After painful hours of surfing, I finally reached here(https://docs.openstack.org/liberty/networking-guide/scenario-dvr-ovs.html). I could distribute traffic to each host, and all of the instances back to normal and everything works like a charm. Lastly, I deployed around 200 instances on OpenStack without any problem.

If you have a similar problem, I strongly recommend you to try Neutron DVR.

Note:

1. For the DVR, each host node needs a public IP. OpenStack(or Neutron) allocates it. Therefore, you should have enough public IPs to allocate.

2. In case the instance on OpenStack does not have a public IP, it uses central public Gateway(on control node) to go out.

To view or add a comment, sign in

More articles by Minchul Jeong

Others also viewed

Explore content categories