NetApp ONTAP Network Flow Control

To flow control or to not flow control, that is the question.  The answer is, it depends, but what does it depend on?

  1. First, what is flow control at a high level?  Technical report TR-4182 does a great job explaining the mechanism.

Click to access tr-4182.pdf

Page 30, section 6.1 Ethernet Flow Control

“Ethernet flow control is a layer 2 network mechanism that is used to manage the rate of data transmission between two endpoints. It provides a mechanism for one network node to control the transmission speed of another so that the receiving node is not overwhelmed with data.”

2. Next, what are the ONTAP Flow Control defaults?  

The default in ONTAP is disabled for Cluster Interconnect ports and enabled on data ports.  There have been many different flow control recommendations over the years.  The knowledge base article 1002403 provides the best practice to match the setting end-to-end for data ports.

3. What are the flow control best practices for 10g Ethernet?

https://kb.netapp.com/app/answers/answer_view/a_id/1002403

  • Disable flow control on cluster network ports in the Data ONTAP cluster.  This is the default, so do not change this.
  • Flowcontrol on the remaining network ports (the ports that provide data, management, and intercluster connectivity) should be configured to match the settings within the rest of your environment.

4. ONTAP output explained

When flow control is disabled on a physical port, any interface group (ifgrp) or VLAN will show enabled regardless of the underlying port setting.  The flow control setting is handled by the underlying physical ports. Note that a0a, a0a-11 and a0a-12 below show “full” however they are not enabled based on the underlying e0a and e0b ports set to “none”.

5. Follow up and next steps

You will need to talk to the network and server teams to inquire what the settings are on the switch ports and the hosts, then match on the storage ports.  Below are two real customer examples, where the opposite setting increased performance.  The key metric is matching the environment.

Example 1: Enabling Flow Control

I had one customer with Cisco CNAs that did not disable flow control on the host. Throughput was throttled to 1Gb on 10GbE ports. By enabling flow control on both the switches and NetApp, we were able to saturate the 10GbE ports.

Example 2: Disabling Flow Control

At another customer with large sequential IO, we increased throughput 20% by disabling flow control end-to-end.

NetApp ONTAP Tip – Performance Grouping with Infinite QoS Policies

Do you ever need to break out reporting by groups of volumes within an SVM?  You can report granular real-time performance with “qos statistics” commands.  To accomplish this, we track QoS with no limits using the infinite “INF” setting.  You can see real-time IOPs, throughput, latency and more by volume and policy.  This is similar to my prior post setting a NULL qtree quota to report on usage without quota enforcement.  The example below groups volumes together based on sales and engineering divisions within an SVM. Sales is monitored on vol1 and vol2, and engineering is monitored on vol3 and vol4.

There is also a very useful “statistics” command to complement the QoS commands below, and don’t forget about Cloud Insights, OCI, Unified Manager, System Manager, and Grafana for graphical and scale-out analytics views.  Cloud Insights has a free version with 7 days retention for NetApp customers https://cloud.netapp.com/cloud-insights

Call to action – enable Infinite QoS on all of your volumes.  Then you can track performance and also measure the current utilization in case you want to enforce QoS later.

CLI Example Setting up Infinite QoS for Performance Tracking (9.7RC1 VSIM)

Setup QoS

Create two QoS policy groups with no QoS enforcement (infinite)

set adv

qos policy-group create -policy-group sales-unlimited -vserver SVM1 -max-throughput INF

qos policy-group create -policy-group engineering-unlimited -vserver SVM1 -max-throughput INF

Apply the QoS policy groups to four data volumes

volume modify -vserver SVM1 -volume vol1,vol2 -qos-policy-group sales-unlimited 

volume modify -vserver SVM1 -volume vol3,vol4 -qos-policy-group engineering-unlimited 

Note: You do not need to set QoS for the SVM root (vsroot) volume.  If you did, you can set to “none” to remove the setting

volume modify -vserver SVM1 -volume SVM1_root -qos-policy-group none

qos policy-group show

qos workload show

Reporting QoS

The QoS Statistics and Characteristics Commands provide a lot of useful real-time data.  To refresh the screen on each iteration, try the “-refresh-display true” parameter on any of the commands below.

qos statistics characteristics show

qos statistics latency show

qos statistics performance show

qos statistics volume characteristics show

Here are some more QoS statitics commands to try out…

qos statistics volume latency show

qos statistics volume performance show

qos statistics volume resource cpu show

qos statistics volume resource disk show

qos statistics resource cpu show

qos statistics resource disk show 

qos statistics workload characteristics show

qos statistics workload latency show

qos statistics workload performance show

qos statistics workload resource cpu show

qos statistics workload resource disk show