Deep Dive: How does the NSX vSwitch Work

Edit: Thank to Ron Flax for helping me correct some errors.

I have been blessed of late to be involved in some VMware NSX deployments and I am really excited about the technology.   I am by no means a master of NSX but I will post about my understand as a method to spread information and assist with my personal learning.   In this post I will be covering only the switch capabilities of NSX.

 

Traditional Switches

The key element of a layer 2 ethernet switch is the MAC address.  This is a unique (perhaps)  identifier on a network card.  Each network adapter should have a unique address.   A traditional physical switch learns the mac addresses connected on each port when the network device first tries to communicate.  For example:

Lunch

When you power on Windows Physical server the physical switch learns that MAC 00:00:00:00:01:01 is connected to port 1.  Any messaged destined for 00:00:00:00:01:01 should be sent to port 1.   This allows the switch to create logical connections between ports and limit the amount of wasted traffic.   This entry in the switches MAC table (sometimes called a cam table) stays present for 5 minutes (user configurable)  and is refreshed whenever the server uses it’s network card.   The Linux server on port two is discovered exactly the same way via physically talking on the port, the table is updated for port 2.   If Windows wants to talk to linux their communication never leaves the switch as long as they are in the same subnet.   If the MAC address is unknown by the switch it will forward the message out it’s default gateway.

Address Resolution Protocol (ARP)

ARP is a protocol used to resolve IP addressed to their MAC addresses.  It is critical to understand that ARP does not return the MAC address of the final destination it only returns the mac address of the next hop.  This is because ethernet is only concerned with next hop via mac not end destination.

Lunch

You can follow the communication with ARP’s between each layer of the diagram the key component is that if the IP is not local then it returns its own MAC and forwards it out the default gateway.

Traditional Virtual Switches

In order to understand NSX vSwitch it is critical that you understand how the traditional virtual switch works.  In a traditional virtual switch (VSS and dVS) the switch learns the mac addresses of virtual machines when they are powered on.  As soon as a virtual machine is assigned a switch port it becomes hard-coded in the MAC table for that virtual switch.   Anything that is local to that switch in the same vlan or segment will be delivered locally.    Otherwise the virtual switch just forwards the message out it’s uplink and allows the physical switches to resolve the connection.

NSX Virtual Switch

The NSX virtual switch includes additional functionality from the traditional virtual switch.  The key feature is the ability to use VXLAN to span layer 2 segments between hosts without the use of multiple streched VLAN’s.   VXLAN also allows strech layer 2 to distant datacenters and up to 16 million segements vs the current limit of 4096 vlans.  There are some common components that need to be understood:

  • VTEP (VXLAN Tunnel End Point)  – this is a ESXi virtual adapter that has its own vlan and ip address including gateway.  This interface must be set for 1600 MTU and all physical switches/routers that handle this traffic must allow at least 1600 MTU.
  • NSX virtual switch (also called logical switch) – This is a software kernel based construct that does the heavy lifting. This is deployed to a dVS switch and works as extensions to the dVS.
  • NSX Manager – This is the management plane for NSX, it acts as a central point for communication, scripting and control.  It is only required when making changes as part of the management plane
  • NSX Control cluster – This is a series of virtual machines that are clustered via software.  Each node (should be a odd number and at least three)  contains all required information and load is distributed between all three.  (Best Practice: Do a DRS rule to keep these on separate hosts, future releases may do this for you)
  • VNI – Virtual network interface – this is an identifier used by VXLAN to separate networks (think vlan tag) they start at 5000 and go to 16,000,000.  It easiest for people to think vlan tags when working with VNI’s.

With all the terminology out-of-the-way it’s time to get down to the path.   The NSX Virtual switch includes one key component the ability to switch packets between nodes or clusters without having the layer 2 streched between the clusters.  For my networking friends this means reduction in spanning tree issues.

So let me lay it out below:

Lunch

We have a three node NSX control cluster that has been deployed.  We have two ESXi hosts running dVS’s with the NSX Virtual switch.  VXLAN has been enabled and a virtual network VNI:5000 has been created.   The VTEP’s have been configured.   We have created two virtual machine as shown in green.  Neither has been connected to the VNI network yet.

 

Time to learn our first MAC:

  • We connect the Windows server to VNI:5000 as shown below
  • The MAC table on our local switch is updated (Learns) then passes it’s learned information to the control cluster
  • The control cluster passes it to all members of the logical switch (there are three methods to pass the information which I will cover in another post unicast, multicast and hybrid)

Lunch

 

This syncing of the MAC table ensures that each member of VNI knows how to handle switching creating a distributed switch (like a switch stack that has multiple switches that act as one).

When we power on the linux server the same method is used:

  • We connect the Windows server to VNI:5000 as shown below
  • The MAC table on our local switch is updated (Learns) then passes it’s learned information to the control cluster
  • The control cluster passes it to all members of the logical switch (there are three methods to pass the information which I will cover in another post unicast, multicast and hybrid)

Lunch

Now we have a ARP table available on each switch that works great.   Let’s follow the flow of communication: Assume the following.   Windows server wants to open a web page on Linux server on port 80:

  • User on Windows server brings up internet explorer and types in 192.168.10.11
  • Windows server sends out a arp entry for 192.168.10.11
  • ESXi1 ‘s virtual switch returns the MAC address 00:00:00:00:02:02
  • Windows server sends out a IP packet with the MAC address of 00:00:00:00:02:02
  • ESXi’s virtual switch forwards the packet out VTEP1 by encapsulating it destined for the IP of VTEP2
  • VTEP2 opens the packet and removes the VTEP encapsulation and forwards the packet to ESXi2 virtual switch on VNI:5000
  • The switch on ESXi2 sends the packet to the virtual port that the linux servers network card is connected on.

 

This is how a NSX virtual switch handles switching.  At first you may say this makes no sense at all… wouldn’t a VLAN just be easier.   There are a number of benefits this brings:

  • Limits your Spanning tree to potentially top of rack switches if architected correctly
  • Allows you to expand past the 4096 VLAN limit
  • Opens the door for other NSX services (which I will post about in the future.)

 

As I mentioned this is just my understanding I do not have inside knowledge if I have made a mistake let me know, I’ll test then correct it.

Central Ohio VMware Lunch and Learn

I have been toying with the idea of starting a community series of lunch and learn sessions to assist people in learning about VMware technology.   I am happy to announce that the first session will be Sep. 25th at Noon at:

 

OARnet – Bale Conference Room

1224 Kinnear Road

Columbus, OH 43212

 

It was very kind of my previous employer to be willing to host us for these sessions.   I am excited to announce that VMware education has also provided some certification discount codes for me to pass out.   The format will be a bit loose.  I will be focusing on VCP content but it will be open to discussion.   I want it to be a forum.   I have also invited others to present in the future and hope to make it a monthly occurrence.   The topic for this month with be vSphere networking.  It will be a great refresher course for anyone looking to study for the VCP-NV.    The event it 100% open to the public and we have seating for about 60 people.  There is standing room for about 40 more.   Bring your lunch and join us.  Feel free to contact me via comments or twitter if you have questions or would like to present a future topic.  The one request I have is this is a technical conversation not a sales pitch.  I want it to be a discussion between technical people.

 

Looking forward to seeing you there.

How do I explain virtualization to my Mother

IMG_20140730_174059As I have progressed in my career it’s been increasing hard to explain my job to people both inside and outside IT.    There used to be a time when people in IT understood what I did… at this point most people really don’t understand what I do or why.   I have given up explaining it to people I just say I work with computers.    Two years ago while at VMworld the crew from VMware TV stopped me on the street and asked how do you explain virtualization to your mother.   They totally stumped me.   I am lucky my mother has some technology in her life.   She recently got a nook and has discovered she can get books without leaving the house.  For a woman in her 70’s she is about as technically savvy as I can expect.    My religious studies have taught me that analogies can be a great way to teach.   So I present my analogy to explain virtualization.

The Apartment building

Imagine with me that I have just bought a 30,000 square foot housing space.   As the owner I could rent out this space to a single four person family.   They would be very happy and have more space than they could ever use.   It does present some critical problems.  The family would have to be very rich in order to pay for my whole building.   There is no way they could possibly use all the space so there would be lots of wasted space.   In my case if the one family moved out I would have a huge expense that I would have to shoulder until I found another rich family who wanted 30,000 square feet.   I have other issues unless I was very handy I would have to hire someone to fix and repair the apartment when things broke.  This is an expense that is wasted when no one is living in the apartment.   The cost for heating, cooling and powering the apartment would be a huge expense that I would pass on to my single family.  At this point the power bill alone might force the family to move out, once again leaving me to shoulder the whole bill.   In reality running a 30,000 square foot apartment building with a single tenant is a huge risk.  In some neighborhoods it’s totally possible to rent out a space like this to a single family and make a huge profit, either because money is no object or they have some requirement that offsets the costs (like a home office).

The subdivided apartment

I prefer investments with less risk.    After some examination I have discovered that in the neighborhood there is a demand for one, two and three bedroom apartments.   Each type of apartment has some common components: bathroom, a living room and a kitchen.   I create three standard configurations and start to subdivide my building into separate living spaces.  Some of my living space is lost to overhead like hallways and doors.  There are some shared area which represent a space saver for example stairs, elevators and laundry rooms.   Making some area’s shared reduces the lost space to overhead.    I may even consider putting in a pool on the roof to increase the price of my apartments individual rent and increase my profit.   Each of the apartments have their own plumbing with sinks, toilets and showers.   Once these shared components leave your individual space they join the building plumbing and water and utilize shared resources.   It’s important that I take into account the total amount of possible shared utilization at the same time to avoid loss of individual services.   After all if everyone flushes their toilet at 5:00 PM I cannot have the pipes get stuck.   I have to be careful that the individual actions of a single tenant cannot create a failure for all other tenants.   This is one of the key reasons why each apartment has their own water heater, we never want the actions of a single bad neighbor to affect everyone else’s experience.

What does the apartment have to do with virtualization

Virtualization is very much like the apartment.   I have a large computer.  Most of the time it’s 30,000 square feet is about 2% utilized.   If I engineered the correct solution I could utilize the other 98% of wasted space.   Much like humans my applications don’t like to live in the same space.   Virtualization creates separate apartments for each service, these virtual apartments have some shared components and some individual components.   For example I may have shared network connections, power, even portions of memory (hallways) and shared storage (laundry room)  while I have my own water heater (reservation/allocation of resources).  I may have a flash cache on my server (pool on the room) to improve the amenities and encourage higher rent.    All of this is done in a fashion to protect the security of individual families and homes (hypervisor security).   Virtualization has to take into account peak usage to avoid having the pipes filled with you know what at 5:00 PM.   Much like my apartment I need to hire systems administrators to provide care and feeding to my virtualization, the more apartments I deploy the better my cost savings in theory (Yes I know there is diminishing returns when I need more workers)

What does virtualization not have to do with an apartment building

Virtualization brings a few key differences to the table over my apartment building.   It is very costly for me to reconfigure my available space into larger to smaller apartments to fulfill demand, virtualization can do this on demand.   If my apartment burns to the ground due to faulty wiring my families cannot be moved within minutes to another apartment nearby with their furniture and home goods intact.  Virtualization can do that.

Key elements

  • Virtualization is like an apartment building created to make efficient use of large wasted space
  • Virtualization has overhead due to shared components but the overhead uses what would be wasted space so it’s a net gain in most situations
  • Virtualization has limits on shared components and should be sized correctly (no full pipes at 5:00PM)
  • Virtualization is better than single homes in almost every way except one: It is still a shared resource and bad neighbors can still make it unlivable

Death of the sysadmin and birth of….

Sysadmins

I started my career as a sysadmin..  I didn’t want to spend all day sitting in a chair writing applications.  I wanted to touch the hardware.   I am a firm believer that every sysadmin is a control freak to some respect.  They love how the machine obeys them.   They enjoy telling users no you cannot and figuring out ways limit access.  The essence of every good sysadmin is the innate need for improvement.   In the early days of my career I was exposed to systems administrators who had hundreds of shell scripts they had everything automated.   As the years past these older sysadmins seem to be replaced with younger admins who had been raised in an easy world.   They were used to clicking next to install applications and things that just work… (hence the appeal of the iPhone).   I am all for simple and easy gadgets and bringing computers to every old persons life.   The relative ease of the solutions have made life a little too easy for us.

Cloud

Then this crazy thing happened…. the cloud.   Amazon brought the easy button to server deployments.   Some embraced the ease of the solution, others liked the agility.   What ever your motivation for using AWS they have changed IT again.  Everywhere I go business units want to know why it takes so long to deploy a server.   They want to know how to create their own AWS cloud.   People every where have been deploying operating systems and getting IT done without systems admins..

The Auto Industry

When the auto industry first started Henry Ford and his engineers would assemble a car from scratch.  Everyone working on the car understand each component and how it worked.  They understood the flow of assembly to make the car.   Each of the assembly guys could build a car from scratch or design a car.   As time past demand increased for the product and Ford had to increase his agility to create cars.   He hired workers to build cars, assigned them specific roles with rote tasks.   These workers would do the same task over and over again.  This provided a few advantages: first they got good at the task and they did not need to know how to build a whole car.   It also introduced some challenges: if they missed their task due to human error failures were introduced.   Eventually humans workers were replaced with robotics.  This reduced the errors and increased the cost.  It also allowed Ford to build a lot more cars.  Not all the jobs went away they just changed.  Workers were replaced with robotics and automation engineers.   The people working on the cars had no idea how to build cars they just kept the robotics working.   Every other car manufacturer followed suite to compete.   Auto manufacturing plants became huge, downtime cost millions of dollars.   Massive amounts of money are spent to ensure the plants keep running.

What lessons can we learn from Auto Industry

  • Having highly skilled humans build the cars worked great
  • Having architects design cars then hand off work instructions to workers introduced a lot of errors
  • Having automation reduces errors and requires workers with a new skill set
  • It is not required that the people keeping the automation running have an understanding of the product, they just need to understand the automation
  • The cost of automation will force a centralization of building cars
  • As manufacturing became centralized downtime became a critical issue

What does this have to do with sysadmins?

Thanks for bearing with me this far.  If your still reading and wondering why I wrote this article let me explain.  I want to suggest that the world is changing for systems administrators and as control freaks they don’t really like it.   AWS has created a golden standard we can deploy the system in minutes why can’t you?  If you have not faced this question you will soon.   Every shop wants to have AWS.

Every shop wants to have AWS but do they need it?

AWS has a very specific business model.  Deploy base templates for customers then step away and collect cash.   It’s a great model.  Functionality of the virtual product beyond being powered on is 100% your problem.   AWS ensures uptime of power and networking.   You still have to do a lot of work to make that server deployed in minutes usable.   AWS has saved you the time of procurement of hardware and working with silo’ed team to get a server in place, but you don’t have a money-making machine until you install your product.  What is it about AWS that you really need?  I suggest it is not agility instead it’s less hassle.  AWS provides you freedom from people who seem to create never-ending road blocks while doing their job.  Yes, I am looking at you security team.  Yes, I am looking at you server deployment team.  Yes,  I am looking at you…

Why is IT so hard

IT is hard because it’s never the same.  In my career I have rarely seen the same request twice.   If your business is netflix and you have three types of servers then automation makes sense.   Most IT shops are not netflix’s every single business unit wants to drive IT choices and so we get a spaghetti mess of IT.   IT is hard because the business unit wants to drive technical choices instead of business requirements.   Many years ago we had a business unit demand that their new workflow be built in Sharepoint, forget the fact that we were a linux shop with no sharepoint.   So the lesson is:

  • Business unit’s stop messing with IT.  Bring your needs well-defined to IT and let us implement it.  Trust us to do our job it’s why we cost so much.

Why do Menu’s exist

Restaurants have menu’s for the following reasons:

  •  To limit customers options – they could not possibility have all ingredients
  • To help customers make choices –  if left to their own customers would become confused by the options and leave
  • Create standard workflows and realize cost savings
  • Give the customers illusion of choice

Why doesn’t IT have a Menu…. here it comes the ITIL service catalog.   Most service catalogs are too technical and don’t represent what the customer really needs.  What the customer needs is a service which is normally a lot more complex than a single server.  They have a project.

Project

Yep that word again project… it’s so important we have a certification and role who manages it.    Business unit’s rarely want one more netflix streaming server.. they expect IT to handle that if needed.  They want to create a whole new business and that requires a project.   Our menu really needs to be a project menu not a server menu.   We need to stop offering the business unit separate components of our offering or they will keep getting into our business.  We need to provide the business unit the correct choices that keep them away from dictating technology.

Death of a sysadmin … birth of a ..process engineer

So now that I have ranted for too long what is the future of systems administration.   I think we need to become process engineers.   Very few people are going to understand the whole product.  More will administrate from a automation console rather than logging into a server.   How do we re-tool for this change? I have a few suggestions:

  • Learn to examine process.  Do something manually first.  Document the process in extreme detail, use a process diagram. Critically look at your process diagram.  Do you see how many manual processes you have?   How can you automate them.
  • Help customers standardize, learn the language stop jumping to technical solutions with your customers.  Focus on their needs and requirements allow the technology to be a black box.
  • Develop standard methods for documenting and ingesting new projects… create a documented process and follow it.
  • Automate everything you can, develop solution with the automation mind set.  How would I do this if I had to deploy 100 servers instead of two.
  • Ask your self does this process, technology or choice scale up?   If I had to increase the amount of these by 1,000 would this process work.

Well thanks for reading my rant.  Let me know where I am wrong.

Design Scenario: Gigabit network and iSCSI ESXi 5.x

Many months ago I posted some design tips on the VMware forums (I am Gortee there if you are wondering).   Today a user updated the thread with a new scenario looking for some advise.  While it would be a bad idea personally and professionally for me to give specific advise without a design engagement I thought I might provide some thoughts about the scenario here.  This will allow me to justify some design choices I might make in the situation.   In no way should this be taken as law.  In reality everyone situation is different and little requirements can really change the design.   The original post is here.

The scenario provided was the following:

3 ESXI hosts (2xDell R620,1xDell R720) each with 3×4 port NICS (12 ports total), 64GB RAM. (Wish I would have put more on them ;-))

1 Dell MD3200i iSCSI disk array with 12 x 450GB SAS 15K Drives (11+1 Spare) w/2 4 port GB Ethernet Ports

2 x Dell 5424 switches dedicated for traffic between the MD3200i and the 3 Hosts

Each host is connected to the iSCSI network though 4 dedicated NIC Ports across two different cards

Each Host has 1 dedicated VMotion Nic Port connected to its own VLAN connected to a stacked N3048 Dell Layer 3 switch

Each Host will have 2 dedicated (active\standby) Nic ports (2 different NIC Cards) for management

Each Hosts will have a dedicated NIC for backup traffic (Has its own Layer 3 dedicated network/switch)

Each host will use the remaining 4 Nic Ports (two different NIC cards) for the production/VM traffic)

 would you be so kind to give me some recommendations based on our environment?

Requirements

  • Support 150 virtual machines
  • Do not interrupt systems during the design changes

Constraints

  • Cannot buy new hardware
  • Not all traffic is vlan segmented
  • Lots of 1GB ports per server

Assumptions

  • Standard Switches only (Assumed by me)
  • Software iSCSI is in use (Assumed again by me)
  • Not using Enterprise plus licenses

 

Storage

Dell MD3200i iSCSI disk array with 12 x 450GB SAS 15K Drives (11+1 Spare) w/2 4 port GB Ethernet Ports

2 x Dell 5424 switches dedicated for traffic between the MD3200i and the 3 Hosts

Each host is connected to the iSCSI network though 4 dedicated NIC Ports across two different cards

I personally have never used this array model, the vendor should be included on the design to make sure none of my suggestions here are not valid with this storage system.  Looking at the VMware HCL we learn the following:

  • Only supported on ESXi 4.1 U1 through 5.5 (no 5.5 U1 yet so don’t update)
  • You should be using the VMW_PSP_RR (Round Robin) for path fail over
  • The array supports the following VAAI natives Block Zero,Full Copy,HW Assisted Locking

The following suggestions should apply to physical cabling:

Storage

Looking at the diagram I made the following design choices:

  • From my limited understanding the array the cabling follows the best practice guide I could find.
  • Connection from the ESXi hosts to switches are done to create as much redundancy as possible including all available cards.  It is critical that the storage be as redundant as possible.
  • Each uplink (physical nic) should be configured to connect to an individual vmkernel port group.  Each port group should be configured with only one uplink.
  • Physical switches and port groups should be configured to use native port assuming these switches don’t so anything other than provide storage traffic between these four devices (three ESXi and one array)  if the array and switch is providing storage to more things you should follow your vendor’s best practices for segmenting traffic.
  • Port binding for iSCSI should be configured as per VMware document and vendor documents

New design considerations from storage:

  • 4 1GB’s will be used to represent max traffic the system will provide
  • The array does not support 5.5 U1 yet so don’t upgrade
  • We have some VAAI natives to help speed up processes and avoid SCSI locks
  • Software iSCSI requires that forged transmissions be allowed on the switch

Advise to speed up iSCSI storage

  • Bind your bottle neck – is it switch speeds, array processors, ESXi software iSCSI and solve it.
  • You might want to consider Storage DRS on your array to automatically balance load and IO metrics (requires enterprise plus license but saves so much time) – Also has an impact on CBT backups making them do a full backup.
  • Hardware iSCSI adapters might also be worth the time… thou they have little real benefit in the 5.x generation of ESXi

 

Networking

We will assume that we now have 8 total 1GB ports available on each host.   We have a current network architecture that looks like this (avoided the question of how many virtual switches):

network

I may have made mistakes from my reading a few items pop out to me:

  • vMotion does not have any redundancy which means if that card fails we will have to power off VM’s to move them to another host.
  • Backup also does not have redundancy which is less of an issue than the vMotion network
  • All traffic does not have redundant switches creating single points of failure

A few assumptions have to be made:

  • No single virtual machine will require more than 1Gb of traffic at any time (otherwise we have to be looking into LACP or etherchannel solutions.
  • Management traffic, vMotion and virtual machine traffic can live on the same switches as long as they are segmented with VLAN’s

 

Recommended design:

Drawing1

  • Combine the management switch and VM traffic switch into dual function switches to provide both types of traffic.
  • This uses vlan tags to include vMotion and management traffic on the same two uplinks providing card redundancy (configured active / passive)  Could also be configured with multi-nic vMotion but I would avoid due to complexity around management network starvation in your situation.
  • Backup continues to have it’s own two adapters to avoid contention

This does require some careful planning and may not be the best possible use of links.   I am not sure you need 6 links for your VM traffic but it cannot hurt.

 

Final Thoughts:

Is any design perfect?  Nope lots of room for error and unknowns.  Look at the design and let me know what I missed.  Tell me how you would have done it differently… share so we can both learn.  Either way I hope it helps.

Deep Dive: Network Health check

vSphere 5.1 introduced one of my favorite new features.  Network health check.  This feature is designed to identify problems with MTU and VLAN settings.   It is easy enough to set up MTU and VLAN’s in ESXi especially with a dVS.  In most environment the vSphere admins don’t control the physical switches making confirmation of upstream configuration hard.    The health check resolves these issues.  It is only available on dVS switches and only via the web client. (I know time to start using that web client.. your magical fat client is going away) If you have an upstream issue with MTU then you will get an alert in vCenter.   You can find the health check by selecting the dVS and clicking on the manage tab.  On the middle pane you will see Health check which you can edit and enable.   You came here because you want to know how it works.

 

MTU

MTU check is easy.   Each system sends out a ping message to the other nodes.  This ping message has a special header that tells the network not to fragment (split) the packet.   In addition it has a payload (empty data) to make the ping the size of the max MTU.   If the host get’s a return message from the ping it knows the MTU is correct.  If it fails then we know MTU is bad.   Each node checks it’s MTU at an interval.   You can manually check your MTU with vmkping but the syntax has changed between 5.0,5.1 and 5.5 so look up the latest syntax.

 

VLAN

Checking the VLAN is a little more complex.    Each VLAN has to be checked.   So one host on the same vDS (not sure which one but I am willing to bet it’s the master node) sends out a broadcast layer 2 packet on the VLAN.  Then it waits for each node to reply to the broadcast via unicast layer 2 packet.   You can determine which hosts have VLAN issues based upon who reports back.   I assume that host marked as bad then try’s to broadcast as a method to identify failed configuration or partitions.   This test is repeated on each VLAN and at regular intervals. It only works when two peers can connect.

Teaming policy

In ESXi 5.5 they added a check for teaming policy to physical switch.  This check identifies mismatches between IP Hash teaming and switches that are not configured in etherchannel/LACP.

 

Negative Effect of Health check

So why should I not use health check?  Well it does produce some traffic.  It does require you to use the web client to enable and determine which vlan’s are bad…  otherwise I cannot figure out a reason to not use it.   A simple and easy way to determine issues.

Design Advice on health check

Health check is a proactive way to determine upstream vlan or MTU issues before you deploy production to that VLAN.  It saves a ton of time when troubleshooting and fighting between networking and server teams.  I really cannot see a reason to not use it.    I have not tested the required bandwidth but it cannot be huge.   My two cents turn it on if you have a vDS… if you don’t have vDS I hope you only have ten or less VLAN’s.

Deep Dive: vSphere Traffic Shaping

Traffic Shaping is all about the bad actor scenario.  We have 100’s of virtual machines that all get along with each other.  The application team deploys a appliance that goes nuts and starts to use it’s link 100%.  Suddenly you get a call about database and website outages.  How do you deal with the application teams bad actor?  This is the most common reason why every apartment has it’s own water heater.   My wife would be very unhappy if she could not take her hot shower in the morning because Bob upstairs took an extra long shower an hour ago.   Sharing resources are great as long as resources are unlimited, not over provisioned or usage patterns stay static.  In a real world none of those things are true.  You are likely limited on resources, over provisioned and your traffic patterns change every single day.   Limits allow us to create constraints upon portions of resources in order control bad actors.

Limits (available on any type of switch)

Limits are as expected limits that a machine cannot cross.  This allows a machine to see a 10GB uplink but only use 1GB at most.  This injected slow down is into the communication stream via normal protocol methods.   The limit settings in VMware can be applied on the port group or on dvPort or dvPort Group.  Notice the difference on dVS switches we can apply limits on ports as well as port groups.  Limits can be applied on standard switches via outbound traffic while a dVS can be inbound and outbound.  There are three options on limits:

  • Average bandwidth = Average number of  bit’s per second to allow across the port
  • Peak bandwidth – Max bits per second to allow across a port when it’s utilizing it’s burst traffic, this limits the bandwidth used by the port when using it’s burst.
  • Burst Size – Max bits per second to allow in a burst.  This is the number of bytes allocated to burst when allocation over the average is required.  This can be viewed as a bank when you don’t use all your average bandwidth it can be stored up to the burst size to be used when needed.

 

Limits of the Limits

Limits produce some well… limits.   Limits are always enforced.  Meaning even if bandwidth is  available it will not be allocated to the port group/ port.  Limits on VSS’s are outbound only meaning you can still flood a switch.  Limits are not reservations.  Machines without limits can consume all available resources on a system.  So effectively limits are only useful to stop a bad actor from everyone else.  It is not a sharing method.  Limits on network do have their place but I would avoid general use if possible.

 

Network IO Control a better choice

Network IO Control (NIOC) is available only on the vDS switch.  It provides a solution to the bad actor symptom while providing flexibility.  NIOC is applied to outbound traffic.  NIOC works very much like resource pools with compute and memory.  You setup a NIOC share (resource pool) with a number between 1 and 100.   vSphere comes with some system defined NIOC shares like vMotion and management.  You can also defined new resource pools and assign them to port groups.  NIOC only comes into play during times of contention on the uplink.  All NIOC Shares are calculated on a uplink by uplink basis.  All the active traffic types on the uplink shares are added together.  For example assume my uplink has the following shares:

  • Management 10
  • vMotion 20
  • iSCSI 40
  • Virtual machines 50

If contention arises and only Management, iSCSI and virtual machines are active we would have 100 total shares.  This number is then used to divide the total available bandwidth on that uplink.  Let’s assume we have a 10GB uplink.  The each active traffic type would get based on shares:

  • Managment 1GB
  • iSCSI 4GB
  • Virtual machines 5GB

This example also assumes they are using 100% of their available links.  If management is only using 100MB the others will get it’s left over amount divided by their share amount (in this case 900mb/90 then 40 assigned to iSCSI and 50 assigned to virtual machine).   If a new traffic type comes into play then the shares are recalculated to meet the demands.   This allows you to create worst case scenarios to ensure traffic types for example:

  • Management will get at least 1GB
  • vMotion will get at least 2GB
  • iSCSI will get at least 4GB
  • Virtual machines will get at least 5GB

There is one wrinkle to this plan with multi-nic vMotion but I will address that in another post.

 

Design Choices

Limits have their uses.  They are hard to manage and really hard to diagnose… Imagine coming into a vSphere environment where limits are in place but you did not know.   It could take a week to figure out that was causing the issues.   My vote use them sparingly.   NIOC on the other hand should be used in almost every environment with Enterprise Plus licenses.   It really has no draw back and provides controls on traffic.