[Introduction]
The following shell scripts utilize 'perf' with ARM/PL310 PMU for memcpy
profiling.
- perf_memcpy_I2_l2x0.sh
script for I2 with both ARM Cortex-A9 PMU & PL310 PMU profiling.
- perf_memcpy_I3.sh
script for I2 with both ARM Cortex-A9 PMU & LLC PMU profiling.
[Prerequisites]
for I2:
- Linux 3.18 Kernel with the follwoing CONFIGs:
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_EVENTS=y
CONFIG_HW_PERF_EVENTS=y
CONFIG_CACHE_L2X0_PMU=y
- perf executable
- mstar ms_sys driver
- shell script (perf_memcpy_I2_l2x0.sh)
for I3:
- Linux 3.18 Kernel with the follwoing CONFIGs:
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_EVENTS=y
CONFIG_HW_PERF_EVENTS=y
- perf executable
- mstar ms_sys driver
- shell script (perf_memcpy_I3.sh)
[Usage]
for I2:
Usage: ./perf_memcpy_I2_l2x0.sh BUFFER_SIZE L2_PMU_SELECT [memcpy scheme]
[memory type] [cachable]
BUFFER_SIZE: number of KB for each iteration (total bytes transfer: 64KB *
10000)
L2_PMU_SELECT: valid option r|w|e|x
r: drreq and drhit
w: dwreq and dwhit
e: cc and ipfalloc
x: dwtreq and wa
[memcpy scheme]: valid option 0|1|2
0: C runtime memcpy
1: memcpy.S with NEON
2: memcpy.S without NEON
[memory type]: valid option MIU|IMI
[cachable]: valid option 0|1
EXAMPLE: ./perf_memcpy_I2_l2x0.sh 32 r 0
[CRT] memcpy scheme test with [32]KB buffer for 20000 iterations and use
perf PMU for profiling with addtional L2 PMU [drreq/drhit].
for I3:
Usage: ./perf_memcpy_I3.sh BUFFER_SIZE L2_PMU_SELECT [memcpy scheme] [memory
type] [cachable]
BUFFER_SIZE: number of KB for each iteration (total bytes transfer: 64KB *
10000)
L2_PMU_SELECT: not valid for I3
[memcpy scheme]: valid option 0|1|2
0: C runtime memcpy
1: memcpy.S with NEON
2: memcpy.S without NEON
[memory type]: valid option MIU|IMI
[cachable]: valid option 0|1
EXAMPLE: ./perf_memcpy_I3.sh 32 r 0
[CRT] memcpy scheme test with [32]KB buffer for 20000 iterations and use
perf PMU for profiling with LLC PMU.