[Introduction] The following shell scripts utilize 'perf' with ARM/PL310 PMU for memcpy profiling. - perf_memcpy_I2_l2x0.sh script for I2 with both ARM Cortex-A9 PMU & PL310 PMU profiling. - perf_memcpy_I3.sh script for I2 with both ARM Cortex-A9 PMU & LLC PMU profiling. [Prerequisites] for I2: - Linux 3.18 Kernel with the follwoing CONFIGs: CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_HW_PERF_EVENTS=y CONFIG_CACHE_L2X0_PMU=y - perf executable - mstar ms_sys driver - shell script (perf_memcpy_I2_l2x0.sh) for I3: - Linux 3.18 Kernel with the follwoing CONFIGs: CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_HW_PERF_EVENTS=y - perf executable - mstar ms_sys driver - shell script (perf_memcpy_I3.sh) [Usage] for I2: Usage: ./perf_memcpy_I2_l2x0.sh BUFFER_SIZE L2_PMU_SELECT [memcpy scheme] [memory type] [cachable] BUFFER_SIZE: number of KB for each iteration (total bytes transfer: 64KB * 10000) L2_PMU_SELECT: valid option r|w|e|x r: drreq and drhit w: dwreq and dwhit e: cc and ipfalloc x: dwtreq and wa [memcpy scheme]: valid option 0|1|2 0: C runtime memcpy 1: memcpy.S with NEON 2: memcpy.S without NEON [memory type]: valid option MIU|IMI [cachable]: valid option 0|1 EXAMPLE: ./perf_memcpy_I2_l2x0.sh 32 r 0 [CRT] memcpy scheme test with [32]KB buffer for 20000 iterations and use perf PMU for profiling with addtional L2 PMU [drreq/drhit]. for I3: Usage: ./perf_memcpy_I3.sh BUFFER_SIZE L2_PMU_SELECT [memcpy scheme] [memory type] [cachable] BUFFER_SIZE: number of KB for each iteration (total bytes transfer: 64KB * 10000) L2_PMU_SELECT: not valid for I3 [memcpy scheme]: valid option 0|1|2 0: C runtime memcpy 1: memcpy.S with NEON 2: memcpy.S without NEON [memory type]: valid option MIU|IMI [cachable]: valid option 0|1 EXAMPLE: ./perf_memcpy_I3.sh 32 r 0 [CRT] memcpy scheme test with [32]KB buffer for 20000 iterations and use perf PMU for profiling with LLC PMU.