注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

gmd20的个人空间

// 编程和生活

 
 
 

日志

 
 

linux网络包大小和传输带宽的关系,内核网络队列缓存长度调优等(virtualbox虚拟机一定要用半虚拟化virtio网卡)  

2014-04-16 16:27:50|  分类: linux相关 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
相关的命令
iperf
iptraf
tcpdump -ni eth1
ethtool -g eth1
vmstat
watch -d -n 1 cat /proc/interrupts
wireshark -》 menu -》 statistics -》 平均网络包长度 170 bytes 平均网络传输速度 15.115 Mbit/sec

包长度111时可以达到的传输速度  使用no_delay
root@debian02:/opt/sigtran/map_smpp_server# iperf -c 192.168.56.102 -i 5 -l 111 -t 600 -N
------------------------------------------------------------
Client connecting to 192.168.56.102, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.56.103 port 57866 connected with 192.168.56.102 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  10.6 MBytes  17.8 Mbits/sec
[  3]  5.0-10.0 sec  9.58 MBytes  16.1 Mbits/sec
[  3] 10.0-15.0 sec  9.35 MBytes  15.7 Mbits/sec

包长度111时可以达到的传输速度  不使用no_delay
root@debian02:/opt/sigtran/map_smpp_server# iperf -c 192.168.56.102 -i 5 -l 111 -t 600
------------------------------------------------------------
Client connecting to 192.168.56.102, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.56.103 port 57867 connected with 192.168.56.102 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  35.7 MBytes  60.0 Mbits/sec
[  3]  5.0-10.0 sec  35.8 MBytes  60.1 Mbits/sec
[  3] 10.0-15.0 sec  35.8 MBytes  60.0 Mbits/sec

root@debian02:/home/bright# tcpdump -ni eth1
可以看到如果使用No_delay, 每个tcp packet都是111 大小的,如果没使用的话,有时packet的大小是可以达到 1448 的NTU,所以速度也快了几倍。

标准长度1440时候
root@debian02:/opt/sigtran/map_smpp_server# iperf -c 192.168.56.102 -i 5 -l 1440 -t 600
------------------------------------------------------------
Client connecting to 192.168.56.102, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.56.103 port 57868 connected with 192.168.56.102 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec   122 MBytes   205 Mbits/sec
[  3]  5.0-10.0 sec   119 MBytes   200 Mbits/sec
[  3] 10.0-15.0 sec   121 MBytes   202 Mbits/sec

当包长度使用1440的时候, 传输速度大概能达到十几兆。

但不限制包大小时,传输速度最快了。
root@debian02:/opt/sigtran/map_smpp_server# iperf -c 192.168.56.102 -i 5  -t 600
------------------------------------------------------------
Client connecting to 192.168.56.102, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.56.103 port 57869 connected with 192.168.56.102 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  1.13 GBytes  1.95 Gbits/sec
[  3]  5.0-10.0 sec  1.04 GBytes  1.79 Gbits/sec
[  3] 10.0-15.0 sec  1.00 GBytes  1.72 Gbits/sec

用tcpdump可以看到内部网络,每个包的长度达到了 26130的都有。 超过了ifconfig 里面的 MTU:1500。

按照实际的平均网络包长度170,用iperf模拟一下
root@debian02:/opt/sigtran/map_smpp_server# iperf -c 192.168.56.102 -i 5 -l 170 -t 600 -N
------------------------------------------------------------
Client connecting to 192.168.56.102, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.56.103 port 57871 connected with 192.168.56.102 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  14.3 MBytes  24.0 Mbits/sec
[  3]  5.0-10.0 sec  13.6 MBytes  22.8 Mbits/sec
[  3] 10.0-15.0 sec  14.1 MBytes  23.6 Mbits/sec
[  3] 15.0-20.0 sec  13.6 MBytes  22.8 Mbits/sec
[  3] 20.0-25.0 sec  13.6 MBytes  22.9 Mbits/sec

比实际应用测试的15.7 Mbits/sec要快点

可能网卡都有“Packet Forwarding Rate”的限制,如果每次发送的包长度越大,越能接近网卡的理论带宽了。

=====================
vmstat 1 统计的每秒irq中断个数和  iptraf统计每秒packet个数是差不多一致的。最大速度时可以达到2万多个每秒。感觉有点高。

virtualbox上面的perf统计,网络部分代价极高
root@debian02:/home/bright# perf top -G -t 16406
Samples: 64K of event 'cpu-clock', Event count (approx.): 1429151560
+  73.22%  [e1000]             [k] e1000_xmit_frame
+  15.16%  [e1000]             [k] e1000_clean
+   9.22%  [e1000]             [k] e1000_alloc_rx_buffers
+   1.17%  [kernel]            [k] __do_softirq
+   0.21%  [kernel]            [k] finish_task_switch
+   0.06%  [kernel]            [k] native_read_msr_safe

用udp,同样的包大小只能达到  1.05 Mbits/sec,差不多只是tcp的 20分之一,有点奇怪
root@debian01:/opt/sigtran/map_smpp_client# iperf -s -i 5
root@debian02:/opt/sigtran/map_smpp_server# iperf -c 192.168.56.102 -i 5 -l 170 -t 600 -u
------------------------------------------------------------
Client connecting to 192.168.56.102, UDP port 5001
Sending 170 byte datagrams
UDP buffer size:  160 KByte (default)
------------------------------------------------------------
[  3] local 192.168.56.103 port 45440 connected with 192.168.56.102 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec   640 KBytes  1.05 Mbits/sec
[  3]  5.0-10.0 sec   640 KBytes  1.05 Mbits/sec
[  3] 10.0-15.0 sec   640 KBytes  1.05 Mbits/sec
[  3] 15.0-20.0 sec   640 KBytes  1.05 Mbits/sec
[  3] 20.0-25.0 sec   640 KBytes  1.05 Mbits/sec


系统的一些队列长度和缓存长度不知道对性能有没有什么大的影响
======================================
查看网卡设备驱动队列长度
root@debian02:/home/bright# ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 256
RX Mini: 0
RX Jumbo: 0
TX: 256

设置设备驱动队列长度
ethtool -G eth1 rx  512 tx 512

Byte Queue Limits (BQL)算法限制队列长度 解决Bufferbloat 问题导致的包队列延时
root@debian02:/home/bright# cat /sys/devices/pci0000\:00/0000\:00\:08.0/net/eth1/queues/tx-0/byte_queue_limits/limit_max
1879048192

Linux内核的QDisc layer默认队列包长度为1000
root@debian02:/home/bright# ifconfig eth1 | grep txqueuelen
          collisions:0 txqueuelen:1000
可以通过下面这个ip命令来修改
ip link set txqueuelen 500 dev eth0

kernel receiver backlog
root@debian02:/home/bright# cat /proc/sys/net/core/netdev_max_backlog
1000

tcp层的队列长度限制,不需要修改
root@debian02:/home/bright# cat /proc/sys/net/ipv4/tcp_limit_output_bytes
131072

4.1 Socket buffers and bandwidth delay product
socket buffer size = 2* bandwidth * delay
 int sndsize; err = setsockopt (socket_descriptor, SOL_SOCKET, SO_SNDBUF, (char*)&sndsize, (int)sizeof(sndsize));
err = setsockopt (socket_descriptor, SOL_SOCKET, SO_RCVBUF, (char*)&rcvsize, (int)sizeof(rcvsize));

4.2 socket buffer memory queue limits r|w mem (default and max)
tcp层的参数 /proc/sys/net/core    /proc/sys/net/ipv4/ 目录下也有设置
/sbin/sysctl -w net.core.wmem_max= VALUE
/sbin/sysctl -w net.ipv4.tcp_mem= MIN DEFAULT MAX
/sbin/sysctl -w net.ipv4.tcp_rmem= MIN DEFAULT MAX
/sbin/sysctl -w net.ipv4.tcp_wmem= MIN DEFAULT MAX

root@debian02:/home/bright# modinfo e1000
filename:       /lib/modules/3.11-0.bpo.2-686-pae/kernel/drivers/net/ethernet/intel/e1000/e1000.ko
version:        7.3.21-k8-NAPI
license:        GPL
description:    Intel(R) PRO/1000 Network Driver
author:         Intel Corporation, <linux.nics@intel.com>
srcversion:     B31BEB3E096A24BCE972499
alias:          pci:v00008086d00002E6Esv*sd*bc*sc*i*
alias:          pci:v00008086d000010B5sv*sd*bc*sc*i*
alias:          pci:v00008086d00001099sv*sd*bc*sc*i*
alias:          pci:v00008086d0000108Asv*sd*bc*sc*i*
alias:          pci:v00008086d0000107Csv*sd*bc*sc*i*
alias:          pci:v00008086d0000107Bsv*sd*bc*sc*i*
alias:          pci:v00008086d0000107Asv*sd*bc*sc*i*
alias:          pci:v00008086d00001079sv*sd*bc*sc*i*
alias:          pci:v00008086d00001078sv*sd*bc*sc*i*
alias:          pci:v00008086d00001077sv*sd*bc*sc*i*
alias:          pci:v00008086d00001076sv*sd*bc*sc*i*
alias:          pci:v00008086d00001075sv*sd*bc*sc*i*
alias:          pci:v00008086d00001028sv*sd*bc*sc*i*
alias:          pci:v00008086d00001027sv*sd*bc*sc*i*
alias:          pci:v00008086d00001026sv*sd*bc*sc*i*
alias:          pci:v00008086d0000101Esv*sd*bc*sc*i*
alias:          pci:v00008086d0000101Dsv*sd*bc*sc*i*
alias:          pci:v00008086d0000101Asv*sd*bc*sc*i*
alias:          pci:v00008086d00001019sv*sd*bc*sc*i*
alias:          pci:v00008086d00001018sv*sd*bc*sc*i*
alias:          pci:v00008086d00001017sv*sd*bc*sc*i*
alias:          pci:v00008086d00001016sv*sd*bc*sc*i*
alias:          pci:v00008086d00001015sv*sd*bc*sc*i*
alias:          pci:v00008086d00001014sv*sd*bc*sc*i*
alias:          pci:v00008086d00001013sv*sd*bc*sc*i*
alias:          pci:v00008086d00001012sv*sd*bc*sc*i*
alias:          pci:v00008086d00001011sv*sd*bc*sc*i*
alias:          pci:v00008086d00001010sv*sd*bc*sc*i*
alias:          pci:v00008086d0000100Fsv*sd*bc*sc*i*
alias:          pci:v00008086d0000100Esv*sd*bc*sc*i*
alias:          pci:v00008086d0000100Dsv*sd*bc*sc*i*
alias:          pci:v00008086d0000100Csv*sd*bc*sc*i*
alias:          pci:v00008086d00001009sv*sd*bc*sc*i*
alias:          pci:v00008086d00001008sv*sd*bc*sc*i*
alias:          pci:v00008086d00001004sv*sd*bc*sc*i*
alias:          pci:v00008086d00001001sv*sd*bc*sc*i*
alias:          pci:v00008086d00001000sv*sd*bc*sc*i*
depends:
intree:         Y
vermagic:       3.11-0.bpo.2-686-pae SMP mod_unload modversions 686
parm:           TxDescriptors:Number of transmit descriptors (array of int)
parm:           RxDescriptors:Number of receive descriptors (array of int)
parm:           Speed:Speed setting (array of int)
parm:           Duplex:Duplex setting (array of int)
parm:           AutoNeg:Advertised auto-negotiation setting (array of int)
parm:           FlowControl:Flow Control setting (array of int)
parm:           XsumRX:Disable or enable Receive Checksum offload (array of int)
parm:           TxIntDelay:Transmit Interrupt Delay (array of int)
parm:           TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int)
parm:           RxIntDelay:Receive Interrupt Delay (array of int)
parm:           RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int)
parm:           InterruptThrottleRate:Interrupt Throttling Rate (array of int)
parm:           SmartPowerDownEnable:Enable PHY smart power down (array of int)
parm:           copybreak:Maximum size of packet that is copied to a new buffer on receive (uint)
parm:           debug:Debug level (0=none,...,16=all) (int)

 TxIntDelay 和 RxIntDelay参数可以控制网卡的irq中断延时,可以减少irq次数,但会增加网络包的处理的延时。

NAPI 的使用应该也可以减少irq次数才对,现在一个packet一个irq,看上去NAPI不工作似的。

root@debian02:/home/bright# cat /lib/modules/`uname -r`/build/.config |grep NAPI
CONFIG_TULIP_NAPI=y
CONFIG_TULIP_NAPI_HW_MITIGATION=y

看来NAPI已经启用了的,但是网络中断的irq数量还是很多啊。

virtualbox的网络设置优化
====================
The "Paravirtualized network adapter (virtio-net)" is special. If you select this, then VirtualBox does not virtualize common networking hardware (that is supported by common guest operating systems out of the box). Instead, VirtualBox then expects a special software interface for virtualized environments to be provided by the guest, thus avoiding the complexity of emulating networking hardware and improving network performance. Starting with version 3.1, VirtualBox provides support for the industry-standard "virtio" networking drivers, which are part of the open-source KVM project.

The "virtio" networking drivers are available for the following guest operating systems:

Linux kernels version 2.6.25 or later can be configured to provide virtio support; some distributions also back-ported virtio to older kernels.

Performance-wise the virtio network adapter is preferable over Intel PRO/1000 emulated adapters, which are preferred over PCNet family of adapters. Both virtio and Intel PRO/1000 adapters enjoy the benefit of segmentation and checksum offloading. Segmentation offloading is essential for high performance as it allows for less context switches, dramatically increasing the sizes of packets that cross VM/host boundary.

Whenever possible use virtio network adapter, otherwise use one of Intel PRO/1000 adapters;

看上去virtio的半虚拟化的网卡性能更好,我一开始用 “Intel PRO/1000 MT Desktop (82540EM);”,换成半虚拟化的再测试一次看看。

root@debian02:/home/bright# iperf -c 192.168.56.102 -i 5 -l 170 -t 600 -N
------------------------------------------------------------
Client connecting to 192.168.56.102, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.56.103 port 32882 connected with 192.168.56.102 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  54.1 MBytes  90.8 Mbits/sec
[  3]  5.0-10.0 sec  53.9 MBytes  90.4 Mbits/sec
[  3] 10.0-15.0 sec  54.0 MBytes  90.6 Mbits/sec
[  3] 15.0-20.0 sec  54.0 MBytes  90.6 Mbits/sec

看上去virtio确实好很多,同样的参数,测试出来的bandwidth是
“Intel PRO/1000 MT Desktop (82540EM);的4倍左右。


perf没有之前的e1000网卡的占用cpu超多的问题了。
root@debian02:/home/bright# perf top -G
Samples: 15K of event 'cpu-clock', Event count (approx.): 1280502375
+  27.16%  [kernel]            [k] __do_softirq
+   8.60%  [kernel]            [k] iowrite16
+   6.06%  [kernel]            [k] tcp_write_xmit
+   4.81%  [kernel]            [k] sysenter_past_esp
+   3.55%  [vdso]              [.] 0x00000424
+   3.54%  [kernel]            [k] tcp_sendmsg


iptraf的统计显示有3万个packet 每秒
| Total rates:      55838.8 kbits/sec        Broadcast packets:            0                                                 |
|                   30961.4 packets/sec      Broadcast bytes:              0

vmstat的统计irq中断才8000多,e1000网卡的时候这个可是每秒2万多跟每秒的网络包的个数差不多
root@debian02:/home/bright# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 1068856  70712  96040    0    0   211    15  791 2997  2 20 71  8
 1  0      0 1068948  70712  96028    0    0     0     0  809  373  1 26 73  0
 0  0      0 1069012  70712  96028    0    0     0     0  892  379  2 37 61  0
 1  0      0 1069012  70712  96028    0    0     0     0  887  383  1 36 63  0


 全速测试也是 e1000的两倍性能:
 root@debian02:/home/bright# iperf -c 192.168.56.102 -i 5 -t 600
------------------------------------------------------------
Client connecting to 192.168.56.102, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.56.103 port 32884 connected with 192.168.56.102 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  2.21 GBytes  3.80 Gbits/sec
[  3]  5.0-10.0 sec  2.29 GBytes  3.94 Gbits/sec
[  3] 10.0-15.0 sec  2.26 GBytes  3.88 Gbits/sec
[  3] 15.0-20.0 sec  2.31 GBytes  3.97 Gbits/sec
[  3] 20.0-25.0 sec  2.17 GBytes  3.73 Gbits/sec
[  3] 25.0-30.0 sec  2.21 GBytes  3.80 Gbits/sec
[  3] 30.0-35.0 sec  2.22 GBytes  3.82 Gbits/sec
[  3] 35.0-40.0 sec  2.24 GBytes  3.85 Gbits/sec
^C[  3]  0.0-40.4 sec  18.1 GBytes  3.85 Gbits/sec

看来virtualbox上面还是要用半虚拟virio网卡才行, intel e1000这个简直差太远了啊,不知道什么原因使用e1000时,irq数量奇怪的高,导致系统性能很差。




参考:
QUEUEING IN THE LINUX NETWORK STACK
https://www.coverfire.com/articles/queueing-in-the-linux-network-stack/

how to achieve Gigabit speeds with Linux
http://datatag.web.cern.ch/datatag/howto/tcp.html

Bandwidth, Packets Per Second, and Other Network Performance Metrics
http://www.cisco.com/web/about/security/intelligence/network_performance_metrics.html

  评论这张
 
阅读(1092)| 评论(1)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017