注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

gmd20的个人空间

// 编程和生活

 
 
 

日志

 
 

Perf Linux内核集成的性能测试工具  

2011-06-14 11:52:51|  分类: linux相关 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
在ibm developerworks看到一篇文章说到这个,就看了一下。
http://www.ibm.com/developerworks/cn/linux/l-cn-perf1/index.html
Perf -- Linux下的系统性能调优工具


在ubuntu 11.04 要装了这两个包才有的工具的那些命令的,我在软件中心中装的 apt-get install 也一样吧?
linux-tools
linux-tools-common


这个工具不错可以统计很多硬件相关信息 “cpu cache命中" "分支预测" "指令周期"等信息。还可以监控指定的进程的函数调用计数信息等。


idebright@:~/桌面$ perf  list

List of pre-defined events (to be used in -e):

  cpu-cycles OR cycles                       [Hardware event]
  instructions                               [Hardware event]
  cache-references                           [Hardware event]
  cache-misses                               [Hardware event]
  branch-instructions OR branches            [Hardware event]
  branch-misses                              [Hardware event]
  bus-cycles                                 [Hardware event]

  cpu-clock                                  [Software event]
  task-clock                                 [Software event]
  page-faults OR faults                      [Software event]
  minor-faults                               [Software event]
  major-faults                               [Software event]
  context-switches OR cs                     [Software event]
  cpu-migrations OR migrations               [Software event]
  alignment-faults                           [Software event]
  emulation-faults                           [Software event]

  L1-dcache-loads                            [Hardware cache event]
  L1-dcache-load-misses                      [Hardware cache event]
  L1-dcache-stores                           [Hardware cache event]
  L1-dcache-store-misses                     [Hardware cache event]
  L1-dcache-prefetches                       [Hardware cache event]
  L1-dcache-prefetch-misses                  [Hardware cache event]
  L1-icache-loads                            [Hardware cache event]
  L1-icache-load-misses                      [Hardware cache event]
  L1-icache-prefetches                       [Hardware cache event]
  L1-icache-prefetch-misses                  [Hardware cache event]
  LLC-loads                                  [Hardware cache event]
  LLC-load-misses                            [Hardware cache event]
  LLC-stores                                 [Hardware cache event]
  LLC-store-misses                           [Hardware cache event]
  LLC-prefetches                             [Hardware cache event]
  LLC-prefetch-misses                        [Hardware cache event]
  dTLB-loads                                 [Hardware cache event]
  dTLB-load-misses                           [Hardware cache event]
  dTLB-stores                                [Hardware cache event]
  dTLB-store-misses                          [Hardware cache event]
  dTLB-prefetches                            [Hardware cache event]
  dTLB-prefetch-misses                       [Hardware cache event]
  iTLB-loads                                 [Hardware cache event]
  iTLB-load-misses                           [Hardware cache event]
  branch-loads                               [Hardware cache event]
  branch-load-misses                         [Hardware cache event]

  rNNN (see 'perf list --help' on how to encode it) [Raw hardware event descript

  mem:<addr>[:access]                        [Hardware breakpoint]








perf stat ./a.out
^C
 Performance counter stats for './a.out':

             9,044 cache-misses             #      0.003 M/sec  (scaled from 66.87%)
           523,191 cache-references         #      0.172 M/sec  (scaled from 66.96%)
        21,838,315 branch-misses            #      6.678 %      (scaled from 33.13%)
       327,014,993 branches                 #    107.285 M/sec  (scaled from 33.04%)
     2,355,587,681 instructions             #      0.349 IPC    (scaled from 49.41%)
     6,740,540,287 cycles                   #   2211.403 M/sec  (scaled from 67.03%)
               100 page-faults              #      0.000 M/sec
                30 CPU-migrations           #      0.000 M/sec
               482 context-switches         #      0.000 M/sec
       3048.082970 task-clock-msecs         #      0.596 CPUs

        5.118230246  seconds time elapsed





widebright@:~/桌面$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 08:21 ?        00:00:00 /sbin/init
root         2     0  0 08:21 ?        00:00:00 [kthreadd]
root         3     2  0 08:21 ?        00:00:00 [ksoftirqd/0]
root         5     2  0 08:21 ?        00:00:00 [kworker/u:0]
root         6     2  0 08:21 ?        00:00:00 [migration/0]
root         7     2  0 08:21 ?        00:00:00 [migration/1]
root         9     2  0 08:21 ?        00:00:00 [ksoftirqd/1]
root        10     2  0 08:21 ?        00:00:01 [kworker/0:1]


widebright@:~/桌面$ perf top -c 1000 -p 5
  Fatal: Permission error - are you root?
     Consider tweaking /proc/sys/kernel/perf_event_paranoid.

widebright@:~/桌面$ sudo perf top -c 1000 -p 5
[sudo] password for widebright:

-------------------------------------------------------------------------------
   PerfTop:     302 irqs/sec  kernel:100.0%  exact:  0.0% [1000 cycles],  (target_pid: 5)
-------------------------------------------------------------------------------

             samples  pcnt function                         DSO
             _______ _____ ________________________________ ________

              603.00 19.7% i915_gem_retire_requests_ring    [i915]  
              352.00 11.5% kref_put                         [kernel]
              250.00  8.2% __ticket_spin_lock               [kernel]
              184.00  6.0% i915_gem_object_move_to_inactive [i915]  
              143.00  4.7% kfree                            [kernel]
              113.00  3.7% __ticket_spin_unlock             [kernel]
              102.00  3.3% find_busiest_group               [kernel]
               69.00  2.3% i915_gem_retire_work_handler     [i915]  
               65.00  2.1% __slab_free                      [kernel]
               64.00  2.1% mod_timer                        [kernel]
               61.00  2.0% i915_gem_object_move_to_active   [i915]  
               53.00  1.7% update_cfs_load                  [kernel]
               52.00  1.7% process_one_work                 [kernel]

=========================
PERF-STAT(1)                      perf Manual                     PERF-STAT(1)

NAME
       perf-stat - Run a command and gather performance counter statistics

SYNOPSIS
       perf stat [-e <EVENT> | --event=EVENT] [-a] <command>
       perf stat [-e <EVENT> | --event=EVENT] [-a] — <command> [<options>]

DESCRIPTION
       This command runs a command and gathers performance counter statistics
       from it.

OPTIONS
       <command>...
           Any command you can specify in a shell.

       -e, --event=
           Select the PMU event. Selection can be a symbolic event name (use
           perf list to list all events) or a raw PMU event (eventsel+umask)
           in the form of rNNN where NNN is a hexadecimal event descriptor.

       -i, --no-inherit
           child tasks do not inherit counters

       -p, --pid=<pid>
           stat events on existing process id

       -t, --tid=<tid>
           stat events on existing thread id

       -a, --all-cpus
           system-wide collection from all CPUs

       -c, --scale
           scale/normalize counter values

       -r, --repeat=<n>
           repeat command and print average + stddev (max: 100)

       -B, --big-num
           print large numbers with thousands' separators according to locale

       -C, --cpu=
           Count only on the list of CPUs provided. Multiple CPUs can be
           provided as a comma-separated list with no space: 0,1. Ranges of
           CPUs are specified with -: 0-2. In per-thread mode, this option is
           ignored. The -a option is still necessary to activate system-wide
           monitoring. Default is to count on all CPUs.

       -A, --no-aggr
           Do not aggregate counts across all monitored CPUs in system-wide
           mode (-a). This option is only valid in system-wide mode.

       -n, --null
           null run - don’t start any counters

       -v, --verbose
           be more verbose (show counter open errors, etc)

       -x SEP, --field-separator SEP
           print counts using a CSV-style output to make it easy to import
           directly into spreadsheets. Columns are separated by the string
           specified in SEP.

EXAMPLES
       $ perf stat — make -j

           Performance counter stats for 'make -j':

           8117.370256  task clock ticks     #      11.281 CPU utilization factor
                   678  context switches     #       0.000 M/sec
                   133  CPU migrations       #       0.000 M/sec
                235724  pagefaults           #       0.029 M/sec
           24821162526  CPU cycles           #    3057.784 M/sec
           18687303457  instructions         #    2302.138 M/sec
             172158895  cache references     #      21.209 M/sec
              27075259  cache misses         #       3.335 M/sec

           Wall-clock time elapsed:   719.554352 msecs

SEE ALSO
       perf-top(1), perf-list(1)


======================================

PERF-TOP(1)                                                             perf Manual                                                            PERF-TOP(1)

NAME
       perf-top - System profiling tool.

SYNOPSIS
       perf top [-e <EVENT> | --event=EVENT] [<options>]

DESCRIPTION
       This command generates and displays a performance counter profile in real time.

OPTIONS
       -a, --all-cpus
           System-wide collection. (default)

       -c <count>, --count=<count>
           Event period to sample.

       -C <cpu-list>, --cpu=<cpu>
           Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a comma-separated list with no space: 0,1. Ranges of CPUs are
           specified with -: 0-2. Default is to monitor all CPUS.

       -d <seconds>, --delay=<seconds>
           Number of seconds to delay between refreshes.

       -e <event>, --event=<event>
           Select the PMU event. Selection can be a symbolic event name (use perf list to list all events) or a raw PMU event (eventsel+umask) in the form
           of rNNN where NNN is a hexadecimal event descriptor.

       -E <entries>, --entries=<entries>
           Display this many functions.

       -f <count>, --count-filter=<count>
           Only display functions with more events than this.

       -g, --group
           Put the counters into a counter group.

       -F <freq>, --freq=<freq>
           Profile at this frequency.

       -i, --inherit
           Child tasks inherit counters, only makes sens with -p option.

       -k <path>, --vmlinux=<path>
           Path to vmlinux. Required for annotation functionality.

       -m <pages>, --mmap-pages=<pages>
           Number of mmapped data pages.

       -p <pid>, --pid=<pid>
           Profile events on existing Process ID.

       -t <tid>, --tid=<tid>
           Profile events on existing thread ID.

       -r <priority>, --realtime=<priority>
           Collect data with this RT SCHED_FIFO priority.

       -s <symbol>, --sym-annotate=<symbol>
           Annotate this symbol.

       -K, --hide_kernel_symbols
           Hide kernel symbols.

       -U, --hide_user_symbols
           Hide user symbols.

       -D, --dump-symtab
           Dump the symbol table used for profiling.

       -v, --verbose
           Be more verbose (show counter open errors, etc).

       -z, --zero
           Zero history across display updates.

INTERACTIVE PROMPTING KEYS
       [d]
           Display refresh delay.

       [e]
           Number of entries to display.

       [E]
           Event to display when multiple counters are active.

       [f]
           Profile display filter (>= hit count).

       [F]
           Annotation display filter (>= % of total).

       [s]
           Annotate symbol.

       [S]
           Stop annotation, return to full profile display.

       [w]
           Toggle between weighted sum and individual count[E]r profile.

       [z]
           Toggle event count zeroing across display updates.

       [qQ]
           Quit.

       Pressing any unmapped key displays a menu, and prompts for input.

SEE ALSO
       perf-stat(1), perf-list(1)
  评论这张
 
阅读(2008)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017