From 25153b3cbcf63f4a099b56f7a185ba43d4e2f503 Mon Sep 17 00:00:00 2001 From: eaw Date: Sat, 14 Mar 2026 10:07:28 +0000 Subject: [PATCH] Update README from eaw.app website content --- README.md | 242 +++++++++++++++++++++++++++++------------------------- 1 file changed, 129 insertions(+), 113 deletions(-) diff --git a/README.md b/README.md index a90174c..fab59ad 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,26 @@ +# ZPU + +**Website:** [engineers@work](https://eaw.app) | **Repository:** [git.eaw.app/eaw/ZPU](https://git.eaw.app/eaw/ZPU) + +--- +
The ZPU is a 32bit Stack based microprocessor and was originally designed by Øyvind Harboe from [Zylin AS](https://opensource.zylin.com/) and original documentation can be found on the [Zylin/OpenCore website or Wikipedia](https://en.wikipedia.org/wiki/ZPU_\(microprocessor\)). It is a microprocessor intended for FPGA embedded applications with minimal logic element and BRAM usage with the sacrifice of speed of execution. Zylin produced two designs which it made open source, namely the Small and Medium ZPU versions. Additional designs were produced by external developers such as the Flex and ZPUino variations, each offering enhancements to the original design such as Wishbone interface, performance etc. -This document describes another design which I like to deem as the ZPU Evo(lution) model whose focus is on *performance*, *connectivity* and *instruction expansion*. This came about as I needed a CPU for an emulator of a vintage computer i am writing which would act as the IO processor to provide Menu, Peripheral and SD services. +This document describes another design which I like to deem as the ZPU Evo(lution) model whose focus is on *performance*, *connectivity* and *instruction expansion*. This came about as I needed a CPU for an emulator of a vintage computer I am writing which would act as the IO processor to provide Menu, Peripheral and SD services. -An example of the *performance* of the ZPU Evo can be seen using CoreMark which returns a value of 22.2 @ 100MHz on Altera fabric using BRAM and for Dhrystone 13.2DMIPS. Comparisons can be made with the original ZPU designs in the gallery below paying attention to the CoreMark score which seems to be the defacto standard now. *Connectivity* can be seen via implementation of both System and Wishbone buses, allowing for connection of many opensource IP devices. *Instruction expansion* can be seen by the inclusion of a close coupled L1 cache where multiple instruction bytes are sourced and made available to the CPU which in turn can be used for optimization (ie. upto 5 IM instructions executed in 1 cycle) or for extended multi-byte instructions (ie. implementation of a LoaD Increment Repeat instruction). There is room for a lot more improvements such as stack cache, SDRAM to L2 burst mode, parallel instruction execution (ie. and + neqbranch) which are on my list. +An example of the *performance* of the ZPU Evo can be seen using CoreMark which returns a value of 22.2 @ 100MHz on Altera fabric using BRAM and for Dhrystone 13.2DMIPS. Comparisons can be made with the original ZPU designs in the gallery below paying attention to the CoreMark score which seems to be the defacto standard now. *Connectivity* can be seen via implementation of both System and Wishbone buses, allowing for connection of many opensource IP devices. *Instruction expansion* can be seen by the inclusion of a close coupled L1 cache where multiple instruction bytes are sourced and made available to the CPU which in turn can be used for optimization (ie. up to 5 IM instructions executed in 1 cycle) or for extended multi-byte instructions (ie. implementation of a LoaD Increment Repeat instruction). There is room for a lot more improvements such as stack cache, SDRAM to L2 burst mode, parallel instruction execution (ie. and + neqbranch) which are on my list. -## The CPU +## The CPU -The ZPU Evo follows on from the ZPU Medium and Flex and areas of the code are similar, for example the instruction decoding. The design differs though due to caching and implementation of a Memory Transaction Processor where all Memory/IO operations (except for direct Instruction reads if dual-port instruction bus is enabled) are routed. The original CPU's all handled their memory requirements in-situ or part of the state machine whereas the Evo submits a request to the MXP whenever a memory operation is required. +The ZPU Evo follows on from the ZPU Medium and Flex and areas of the code are similar, for example the instruction decoding. The design differs though due to caching and implementation of a Memory Transaction Processor where all Memory/IO operations (except for direct Instruction reads if dual-port instruction bus is enabled) are routed. The original CPUs all handled their memory requirements in-situ or part of the state machine whereas the Evo submits a request to the MXP whenever a memory operation is required. The following sections indicate some of the features and changes to original ZPU designs. -### Bus structure +### Bus structure The ZPU has a linear address space with all memory and IO devices directly addressable within this space. Existing ZPU designs either provide a system bus or a wishbone bus whereas the Evo provides both. The ZPU Evo creates up to two distinct regions within the address space depending on configuration, to provide a *system bus* and a *wishbone bus*. @@ -24,17 +30,17 @@ If configured, a wishbone bus can be instantiated and this extends the maximum a A third bus can be configured, which is for instruction reads only. This bus typically shadows the system bus in memory region but is deemed to be connected to fast access memory for reading of instructions without the need for L2 Cache. This would typically be the 2nd port of a dual-port BRAM block with the 1st port connected to the system bus. -### L1 Cache +### L1 Cache In order to gain performance but more especially for instruction optimisations and extended instructions, an L1 cache is implemented using registers. Using registers consumes fabric space so should be very small but it allows random access in a single cycle which is needed for example if compacting a 32bit IM load (which can be 5 instructions) into a single cycle. Also for extended instructions, the first byte indicates an extended instruction and the following 1-5 bytes defines the instruction which is then executed in a single cycle. -### L2 Cache +### L2 Cache -Internal BRAM (on-board Block RAM within the FPGA) doesn't need an L2 Cache as it's access time is 1-2 cycles. As BRAM is a limited resource it is assumed external RAM or SDRAM will be used which is much slower and this needs to be cached to increase throughput. The L2 Cache is used for this purpose, to read ahead a block of external RAM and feed the L1 Cache as needed. On analysis, the C programs generated by GCC are typically loops and calls within a local area (unless using large libraries), so implementing a simple direct mapping cache between external RAM and BRAM (used for the L2 Cache) indexed relative to the Program Counter is sufficient to keep the CPU from stalling most of the time. +Internal BRAM (on-board Block RAM within the FPGA) doesn't need an L2 Cache as its access time is 1-2 cycles. As BRAM is a limited resource it is assumed external RAM or SDRAM will be used which is much slower and this needs to be cached to increase throughput. The L2 Cache is used for this purpose, to read ahead a block of external RAM and feed the L1 Cache as needed. On analysis, the C programs generated by GCC are typically loops and calls within a local area (unless using large libraries), so implementing a simple direct mapping cache between external RAM and BRAM (used for the L2 Cache) indexed relative to the Program Counter is sufficient to keep the CPU from stalling most of the time. -### Instruction Set +### Instruction Set -A feature of the ZPU is it's use of a minimal fixed set of hardware implemented instructions and a soft set of additional instructions which are implemented in pseudo micro-code (ie. the fixed set of instructions). This is achieved by 32byte vectors in the region 0x0000 - 0x0400 and each soft instruction branches to the vector if it is not implemented in hardware. The benefit is reduced FPGA resources but the penalty is performance. +A feature of the ZPU is its use of a minimal fixed set of hardware implemented instructions and a soft set of additional instructions which are implemented in pseudo micro-code (ie. the fixed set of instructions). This is achieved by 32byte vectors in the region 0x0000 - 0x0400 and each soft instruction branches to the vector if it is not implemented in hardware. The benefit is reduced FPGA resources but the penalty is performance. The ZPU Evo implements all instructions in hardware but this can be adjusted in the configuration to use soft instructions if required in order to conserve FPGA resources. This allows for a balance of resources versus performance. Ultimately though, if resources are tight then the use of the Small/Flex ZPU models may be a better choice. @@ -51,7 +57,7 @@ Where ParamSize = Some extended instructions are under development (ie. LDIR) an exact opcode value and extended instruction set has not yet been fully defined. The GNU AS assembler will be updated with these instructions so they can be invoked within a C program and eventually if they have benefit to C will be migrated into the GCC compiler (ie. ADD32/DIV32/MULT32/LDIR/LDDR as from what I have seen, these will have a big impact on CoreMark/Dhrystone tests). -### Implemented Instruction Set +### Implemented Instruction Set | Name | Opcode | Description | | ---------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | @@ -109,17 +115,17 @@ Some extended instructions are under development (ie. LDIR) an exact opcode valu *\** = Emulated instruction if not implemented in hardware.
-### Implemented Instructions Comparison Table +### Implemented Instructions Comparison Table ![alt text](../images/ImplInstructions.png) -### Hardware Variable Byte Write +### Hardware Variable Byte Write In the original ZPU designs there was scope but not the implementation to allow the ZPU to perform byte/half-word/full-word writes. Either the CPU always had to perform 32bit Word aligned operations or it performed the operation in micro-code. In the Evo, hardware was implemented (build time selectable) to allow Byte and Half-Word writes and also hardware Read-Update-Write operations. If the hardware Byte/Half-Word logic is not enabled then it falls back to the 32bit Word Read-Update-Write logic. Both methods have performance benefits, the latter taking 3 cycles longer. -### Hardware Debug Serializer +### Hardware Debug Serializer In order to debug the CPU or just provide low level internal operating information, a cached UART debug module is implemented. Currently this is only for output but has the intention to be tied into the IOCP for in-situ debugging when Simulation/Signal-Tap is not available. @@ -131,16 +137,16 @@ Embedded within the CPU RTL are selectable level triggered statements which issu All critical information such as current instruction being executed (or not if stalled), Signals/Flags, L1/L2 Cache contents and Memory contents can be output. -### Timing Constraints +### Timing Constraints This is a work in progress, I am slowly updating the design and/or adding constraints such that timing is fully met. Currently there is negative slack at 100MHz albeit the design fully works, this will in the future be corrected so timing as analyzed by TimeQuest will be met. -## System On a Chip +## System On a Chip In order to provide a working framework in which the ZPU Evo could be used, a System On a Chip wrapper was created which allows for the instantiation of various devices (ie. UART/SD card). -As part of the development, the ZPU Small/Medium/Flex models were incorporated into the framework allowing the choice of CPU when fabric space is at a premium or comparing CPU's, albeit features such as Wishbone are not available on the original ZPU models. I didn't include the ZPUino as this design already has a very good eco system or the ZY2000. +As part of the development, the ZPU Small/Medium/Flex models were incorporated into the framework allowing the choice of CPU when fabric space is at a premium or comparing CPUs, albeit features such as Wishbone are not available on the original ZPU models. I didn't include the ZPUino as this design already has a very good eco system or the ZY2000. The SoC currently implements (in the build tree): @@ -150,7 +156,8 @@ The SoC currently implements (in the build tree): | Wishbone Bus | Yes | 32 bit Wishbone bus. | | (SB) BRAM | Yes | Implement a configurable block of BRAM as the boot loader and stack. | | Instruction Bus BRAM | Yes | Enable a separate bus (or Dual-Port) to the boot code implemented in BRAM. This is generally a dual-port BRAM shared with the Sysbus BRAM but can be independent. | -| (SB) RAM | Yes | Implement a block of BRAM as RAM, seperate from the BRAM used for the boot loader/stack. | +| (SB) RAM | Yes | Implement a block of BRAM as RAM, separate from the BRAM used for the boot loader/stack. | +| (SB) SDRAM | Yes | Implement an SDRAM controller on the system bus. | | (WB) SDRAM | Yes | Implement an SDRAM controller over the Wishbone bus. | | (WB) RAM | Yes | Implement a block of BRAM as RAM over the Wishbone bus. | | (WB) I2C | Yes | Implements an I2C Controller over the Wishbone bus. | @@ -162,15 +169,16 @@ The SoC currently implements (in the build tree): | (SB) PS2 | Yes | A PS2 Keyboard and Mouse controller. | | (SB) SPI | Yes | A configurable number of Serial Peripheral Interface controllers. | | (SB) SD | Yes | A configurable number of hardware based SPI SD controllers. | +| (SB) IOCTL | Yes | An IOCTL bus controller for MiSTer HPS-to-FPGA communication (download/upload of data between the ARM HPS and the FPGA fabric). | | (SB) SOCCFG | Yes | A set of registers to indicate configuration of the ZPU and SoC to the controlling program. | Within the SoC configuration, items such as starting Stack Address, Reset Vector, IO Start/End (SB) and (WB) can be specified. With the addition of the wishbone bus, it is very easy to add further opencore IP devices, for the system bus some work may be needed as the opencore IP devices use differing signals. -### SDRAM +### SDRAM +The SoC supports SDRAM via two independent paths: a system bus (SB) controller (`SOC_IMPL_SDRAM`) and a Wishbone bus (WB) controller (`SOC_IMPL_WB_SDRAM`). Both are disabled by default and can be enabled independently. The SDRAM timing parameters (tRCD, tRP, tRFC, tREF) and geometry (rows, columns, banks, data width) are all configurable in `zpu_soc_pkg.vhd`. The WB SDRAM controller is a cached variant supporting burst access and is the recommended choice when both buses are active and external RAM bandwidth is critical. - -## Software +## Software The software provided includes: @@ -183,24 +191,24 @@ The software provided includes: 21/04/2020: Software for the ZPU has now been merged with the tranZPUter and is kept and maintained in the [zSoft](/zsoft) repository. -## Configuration +## Configuration -This section shows how to configure the ZPU and the SoC, either to use the ZPU seperately or as part of the included SoC. +This section shows how to configure the ZPU and the SoC, either to use the ZPU separately or as part of the included SoC. -### Configure the CPU +### Configure the CPU
The CPU is configurable using the configuration file 'cpu/zpu_pkg.vhd'. It generally specifies the size of the address bus and what hardware features should be enabled. The following table outlines the configurable options.   | Configuration Variable | Model | Values | Description | | ------------------------ | ----- | ------------ | ---------------------------------------------------------------------------| -| EVO_USE_INSN_BUS | Evo | true/false | Use a seperate instruction bus to connect to the BRAM memory. All other Memory and I/O operations will go over the normal bus. This option is primarily used with Dual Port BRAM, one side connected to the Instruction Bus the other side to the standard bus and will give a significant performance boost when the executed code is in this memory. | +| EVO_USE_INSN_BUS | Evo | true/false | Use a separate instruction bus to connect to the BRAM memory. All other Memory and I/O operations will go over the normal bus. This option is primarily used with Dual Port BRAM, one side connected to the Instruction Bus the other side to the standard bus and will give a significant performance boost when the executed code is in this memory. | | EVO_USE_HW_BYTE_WRITE | Evo | true/false | This option implements hardware writing of bytes, reads are always 32bit and aligned. | | EVO_USE_HW_WORD_WRITE | Evo | true/false | This option implements hardware writing of 16bit words, reads are always 32bit and aligned. | | EVO_USE_WB_BUS | Evo | true/false | Implement the wishbone interface in addition to the system bus. | | DEBUG_CPU | All | true/false | Enable CPU debugging output. This generally consists of core data being serialised and output via the UART1 TX. There are pre-defined blocks of debug data (debug level) for output but it is easy to add in another if your targetting a specific CPU area/instruction. | | DEBUG_LEVEL | All | 0 to 5 | Level of debugging output. 0 = Basic, such as Breakpoint, 1 =+ Executing Instructions, 2 =+ L1 Cache contents, 3 =+ L2 Cache contents, 4 =+ Memory contents, 5=+ Everything else. | -| DEBUG_MAX_TX_FIFO_BITS | All | 2 .. ~16 | Size of UART TX Fifo for debug output. One point to note, if too much data is output and the output Baud rate too low, the CPU will wait so cache size is irrelevant. Cache is only useful if outputting small amounts of data (ie. a targetted instruction) where the cache never becomes full and the CPU doesnt need to wait. | +| DEBUG_MAX_TX_FIFO_BITS | All | 2 .. ~16 | Size of UART TX Fifo for debug output. One point to note, if too much data is output and the output Baud rate too low, the CPU will wait so cache size is irrelevant. Cache is only useful if outputting small amounts of data (ie. a targeted instruction) where the cache never becomes full and the CPU doesn't need to wait. | | DEBUG_MAX_FIFO_BITS | All | 2 .. ~16 | Size of debug output data records fifo. Each request to output data via the serialiser is made via debug records which consume memory, the more records available the less chance of the CPU stalling. | | DEBUG_TX_BAUD_RATE | All | Any Baud integer value | This option sets the output Baud rate of the debug serializer transmitter, ie. 115200 | | maxAddrBit | All | \<16..31n> + WB_ACTIVE | This option sets the width of the address bus. WB_ACTIVE adds 1 to the width of the bus if the WishBone bus is enabled as the wishbone bus operates in the top half of the addressable memory area. | @@ -241,7 +249,7 @@ This section shows how to configure the ZPU and the SoC, either to use the ZPU s
-### Configure the SoC +### Configure the SoC
The System on a Chip is configurable using the configuration file 'zpu_soc_pkg.vhd'. The following table outlines the options which can be configured to adapt the SoC to a specific application. @@ -257,19 +265,19 @@ This section shows how to configure the ZPU and the SoC, either to use the ZPU s | ZPU_EVO_MINIMAL | \<0 or 1\> | Select the Minimalist EVOLUTION CPU, which is the EVO CPU with all configurable options disabled using less fabric. |
- : The following options set the frequencies for the various boards. Normally these dont need changing, add additional constants if using a different board to those defined and add in your _Topleavel.vhd file. NB. This option only changes logic dependent on frequency, it doesnt change the PLL which needs to be done seperately in HDL. + : The following options set the frequencies for the various boards. Normally these don't need changing, add additional constants if using a different board to those defined and add in your _Topleavel.vhd file. NB. This option only changes logic dependent on frequency, it doesn't change the PLL which needs to be done separately in HDL.   | Configuration Variable | Values | Description | | ------------------------ | ------------ | --------------------------------------------------------------------------- | -| SYSCLK_E115_FREQ | \ | Set the frequency for the E115 FPGA Board. | -| SYSCLK_QMV_FREQ | \ | Set the frequency for the QMTECH Cyclone V FPGA Board. | -| SYSCLK_DE0_FREQ | \ | Set the frequency for the DE0-Nano FPGA Board. | -| SYSCLK_DE10_FREQ | \ | Set the frequency for the DE10-Nano FPGA Board. | -| SYSCLK_CYC1000_FREQ | \ | Set the frequency for the Trenz CYC1000 FPGA Board. | -| SYSTEM_FREQUENCY | 100000000 | Default system clock frequency if not overriden by the above values in the top level. | +| SYSCLK_E115_FREQ | 75000000 (default) | Set the frequency for the E115 FPGA Board. | +| SYSCLK_QMV_FREQ | 75000000 (default) | Set the frequency for the QMTECH Cyclone V FPGA Board. | +| SYSCLK_DE0_FREQ | 100000000 (default) | Set the frequency for the DE0-Nano FPGA Board. | +| SYSCLK_DE10_FREQ | 100000000 (default) | Set the frequency for the DE10-Nano FPGA Board. | +| SYSCLK_CYC1000_FREQ | 100000000 (default) | Set the frequency for the Trenz CYC1000 FPGA Board. | +| SYSTEM_FREQUENCY | 75000000 | Default system clock frequency if not overridden by the above values in the top level. |
- : Set the ID's for the various ZPU models. The format is 2 bytes, MSB=\, LSB=\. This is only necessary if your making a different version and you need to detect in your software. + : Set the ID's for the various ZPU models. The format is 2 bytes, MSB=\, LSB=\. This is only necessary if you're making a different version and you need to detect in your software.   | Configuration Variable | Values | Description | | ------------------------ | ------------ | --------------------------------------------------------------------------- | @@ -329,7 +337,7 @@ This section shows how to configure the ZPU and the SoC, either to use the ZPU s   | Configuration Variable | Values | Description | | ------------------------ | ------------ | --------------------------------------------------------------------------- | -| SOC_IMPL_RAM | true/false | Implement RAM using BRAM, typically for Application programs seperate to BIOS. | +| SOC_IMPL_RAM | true/false | Implement RAM using BRAM, typically for Application programs separate to BIOS. | | SOC_MAX_ADDR_RAM_BIT | \, ie.14 | Max address bit of the System RAM. | | SOC_ADDR_RAM_START | \, ie.32768 | Start address of RAM. | @@ -366,7 +374,7 @@ This section shows how to configure the ZPU and the SoC, either to use the ZPU s | SOC_RESET_ADDR_CPU | \ | Initial address to start execution from after reset. This is normally set as the start of BRAM, ie. SOC_ADDR_BRAM_START | | SOC_START_ADDR_MEM | \ | Start location of program memory (BRAM/ROM/RAM). This is normally set as the start of BRAM, ie. SOC_ADDR_BRAM_START | | SOC_STACK_ADDR | \ | Stack start address (BRAM/RAM). This is normally set as the top of the BRAM less 2 words, ie. SOC_ADDR_BRAM_END - 8 | -| SOC_ADDR_IO_START | \ | Start address of the Evo system bus IO region. This is normally via the forumula: '2^(maxAddrBit-WB_ACTIVE)) - (2^maxIOBit)' which sets the address space based on the address bus width and wether the wishbone bus is implemented. || +| SOC_ADDR_IO_START | \ | Start address of the Evo system bus IO region. This is normally via the forumula: '2^(maxAddrBit-WB_ACTIVE)) - (2^maxIOBit)' which sets the address space based on the address bus width and whether the wishbone bus is implemented. || | SOC_ADDR_IO_END | \ | End address of the Evo system bus IO region. This is normally via the formula: (2^(maxAddrBit-WB_ACTIVE)) - 1 | | SOC_WB_IO_START | \, ie. 32505856 | Start address of the Wishbone bus IO range. | | SOC_WB_IO_END | \, ie. 33554431 | End address of the Wishbone bus IO range. | @@ -411,13 +419,13 @@ This section shows how to configure the ZPU and the SoC, either to use the ZPU s
-## Build +## Build -This section shows how to make a basic build and assumes the target development board is the [QMTECH Cyclone V board](https://github.com/ChinaQMTECH/QM_CYCLONE_V). There are many configuration options but these will be covered seperately. +This section shows how to make a basic build and assumes the target development board is the [QMTECH Cyclone V board](https://github.com/ChinaQMTECH/QM_CYCLONE_V). There are many configuration options but these will be covered separately.
-### Software build +### Software build Jenkins can be used to automate the build but for simple get up and go compilation use the build.sh and hierarchical Makefile system following the basic instructions here. @@ -426,9 +434,9 @@ Jenkins can be used to automate the build but for simple get up and go compilati ```shell export PATH=$PATH:/opt/zpu/bin ``` -3. Clone the [ZPU Evo](https://github.com/pdsmart/zpu) repository -4. Edit the \/software/zputa/zputa.h file and select which functions you want building into the zputa core image (by default, all functions are built as applets but these will be ignored if they are built into the zputa core image). You select a function by setting the BUILTIN_ to '1', set to '0' if you dont want it built in. -5. Decide which memory map you want and wether ZPUTA will be an application or bootloader (for your own applications, it is they same kind of choice), see build.sh in the table below for options. Once decided, issue the build command. +3. Clone the [ZPU Evo](https://git.eaw.app/eaw/zpu) repository +4. Edit the \/software/zputa/zputa.h file and select which functions you want building into the zputa core image (by default, all functions are built as applets but these will be ignored if they are built into the zputa core image). You select a function by setting the BUILTIN_ to '1', set to '0' if you don't want it built in. +5. Decide which memory map you want and whether ZPUTA will be an application or bootloader (for your own applications, it is they same kind of choice), see build.sh in the table below for options. Once decided, issue the build command. ```shell cd /software # For this build we have chosen a Tiny IOCP Bootloader, building ZPUTA as an @@ -448,7 +456,7 @@ Jenkins can be used to automate the build but for simple get up and go compilati
-### RTL Bit Stream build +### RTL Bit Stream build To build the FPGA bit stream (conversion of HDL into a configuration map for the FPGA), there are two methods: @@ -467,7 +475,7 @@ To build the FPGA bit stream (conversion of HDL into a configuration map for the
-### ZPU Small Build +### ZPU Small Build The ZPU Small CPU can be built by changing the configuration as follows: @@ -484,13 +492,13 @@ Edit: zpu_soc_pkg.vhd constant ZPU_EVO : integer := 0; -- Use the EVOLUTION CPU. constant ZPU_EVO_MINIMAL : integer := 0; -- Use the Minimalist EVOLUTION CPU. -2. Disable WishBone devices as the ZPU Small doesnt support the wishbone interface: +2. Disable WishBone devices as the ZPU Small doesn't support the wishbone interface: constant SOC_IMPL_WB_I2C : boolean := false; -- Implement I2C over wishbone interface. constant SOC_IMPL_WB_SDRAM : boolean := false; -- Implement SDRAM over wishbone interface. -3. Disable any other devices you dont need, such as PS2 by setting the flag to false. +3. Disable any other devices you don't need, such as PS2 by setting the flag to false. -4. If your using a frequency other than 100MHz as your main clock, enter it in the table +4. If you're using a frequency other than 100MHz as your main clock, enter it in the table below against your board. If you are using a different board, add a constant with suitable name and use this in your TopLevel (ie. as per E115_zpu_Toplevel.vhd). NB. If using your own board it is still imperative that you setup a PLL correctly to @@ -499,8 +507,8 @@ Edit: zpu_soc_pkg.vhd -- Frequencies for the various boards. -- - constant SYSCLK_E115_FREQ : integer := 100000000; -- E115 FPGA Board - constant SYSCLK_QMV_FREQ : integer := 100000000; -- QMTECH Cyclone V FPGA Board + constant SYSCLK_E115_FREQ : integer := 75000000; -- E115 FPGA Board + constant SYSCLK_QMV_FREQ : integer := 75000000; -- QMTECH Cyclone V FPGA Board constant SYSCLK_DE0_FREQ : integer := 100000000; -- DE0-Nano FPGA Board constant SYSCLK_DE10_FREQ : integer := 100000000; -- DE10-Nano FPGA Board constant SYSCLK_CYC1000_FREQ : integer := 100000000; -- Trenz CYC1000 FPGA Board @@ -520,11 +528,16 @@ Edit: cpu/zpu_pkg.vhd constant DEBUG_TX_BAUD_RATE : integer := 115200; --230400; -- Baud rate for the debug transmitter ```` -Using Quartus Prime following the 'RTL Bit Stream build' above, build the RTL in the usual manner with this new configuration. You cannot use the Makefile build as it will entail Makefile changes so just use the Quartus Prime GUI at this time.

The software is the same and unless you have less memory, no changes need to be made to the software build.
+Using Quartus Prime following the 'RTL Bit Stream build' above, build the RTL in the usual manner with this new configuration. Alternatively, use the Makefile build system, for example to build the Small CPU for the QMV board: +```shell + cd /build + make QMV_SMALL +``` +The software is the same and unless you have less memory, no changes need to be made to the software build.

-### ZPU Medium Build +### ZPU Medium Build The ZPU Medium CPU can be built by changing the configuration as follows: @@ -541,13 +554,13 @@ Edit: zpu_soc_pkg.vhd constant ZPU_EVO : integer := 0; -- Use the EVOLUTION CPU. constant ZPU_EVO_MINIMAL : integer := 0; -- Use the Minimalist EVOLUTION CPU. -2. Disable WishBone devices as the ZPU Medium doesnt support the wishbone interface: +2. Disable WishBone devices as the ZPU Medium doesn't support the wishbone interface: constant SOC_IMPL_WB_I2C : boolean := false; -- Implement I2C over wishbone interface. constant SOC_IMPL_WB_SDRAM : boolean := false; -- Implement SDRAM over wishbone interface. -3. Disable any other devices you dont need, such as PS2 by setting the flag to false. +3. Disable any other devices you don't need, such as PS2 by setting the flag to false. -4. If your using a frequency other than 100MHz as your main clock, enter it in the table +4. If you're using a frequency other than 100MHz as your main clock, enter it in the table below against your board. If you are using a different board, add a constant with suitable name and use this in your TopLevel (ie. as per E115_zpu_Toplevel.vhd). NB. If using your own board it is still imperative that you setup a PLL correctly to @@ -556,8 +569,8 @@ Edit: zpu_soc_pkg.vhd -- Frequencies for the various boards. -- - constant SYSCLK_E115_FREQ : integer := 100000000; -- E115 FPGA Board - constant SYSCLK_QMV_FREQ : integer := 100000000; -- QMTECH Cyclone V FPGA Board + constant SYSCLK_E115_FREQ : integer := 75000000; -- E115 FPGA Board + constant SYSCLK_QMV_FREQ : integer := 75000000; -- QMTECH Cyclone V FPGA Board constant SYSCLK_DE0_FREQ : integer := 100000000; -- DE0-Nano FPGA Board constant SYSCLK_DE10_FREQ : integer := 100000000; -- DE10-Nano FPGA Board constant SYSCLK_CYC1000_FREQ : integer := 100000000; -- Trenz CYC1000 FPGA Board @@ -577,11 +590,16 @@ Edit: cpu/zpu_pkg.vhd constant DEBUG_TX_BAUD_RATE : integer := 115200; --230400; -- Baud rate for the debug transmitter ```` -Using Quartus Prime following the 'RTL Bit Stream build' above, build the RTL in the usual manner with this new configuration. You cannot use the Makefile build as it will entail Makefile changes so just use the Quartus Prime GUI at this time.

The software is the same and unless you have less memory, no changes need to be made to the software build.
+Using Quartus Prime following the 'RTL Bit Stream build' above, build the RTL in the usual manner with this new configuration. Alternatively, use the Makefile build system, for example to build the Medium CPU for the QMV board: +```shell + cd /build + make QMV_MEDIUM +``` +The software is the same and unless you have less memory, no changes need to be made to the software build.

-### ZPU Flex Build +### ZPU Flex Build The ZPU Flex CPU can be built by changing the configuration as follows: @@ -598,13 +616,13 @@ Edit: zpu_soc_pkg.vhd constant ZPU_EVO : integer := 0; -- Use the EVOLUTION CPU. constant ZPU_EVO_MINIMAL : integer := 0; -- Use the Minimalist EVOLUTION CPU. -2. Disable WishBone devices as the ZPU Flex doesnt support the wishbone interface: +2. Disable WishBone devices as the ZPU Flex doesn't support the wishbone interface: constant SOC_IMPL_WB_I2C : boolean := false; -- Implement I2C over wishbone interface. constant SOC_IMPL_WB_SDRAM : boolean := false; -- Implement SDRAM over wishbone interface. -3. Disable any other devices you dont need, such as PS2 by setting the flag to false. +3. Disable any other devices you don't need, such as PS2 by setting the flag to false. -4. If your using a frequency other than 100MHz as your main clock, enter it in the table +4. If you're using a frequency other than 100MHz as your main clock, enter it in the table below against your board. If you are using a different board, add a constant with suitable name and use this in your TopLevel (ie. as per E115_zpu_Toplevel.vhd). NB. If using your own board it is still imperative that you setup a PLL correctly to @@ -613,8 +631,8 @@ Edit: zpu_soc_pkg.vhd -- Frequencies for the various boards. -- - constant SYSCLK_E115_FREQ : integer := 100000000; -- E115 FPGA Board - constant SYSCLK_QMV_FREQ : integer := 100000000; -- QMTECH Cyclone V FPGA Board + constant SYSCLK_E115_FREQ : integer := 75000000; -- E115 FPGA Board + constant SYSCLK_QMV_FREQ : integer := 75000000; -- QMTECH Cyclone V FPGA Board constant SYSCLK_DE0_FREQ : integer := 100000000; -- DE0-Nano FPGA Board constant SYSCLK_DE10_FREQ : integer := 100000000; -- DE10-Nano FPGA Board constant SYSCLK_CYC1000_FREQ : integer := 100000000; -- Trenz CYC1000 FPGA Board @@ -634,11 +652,16 @@ Edit: cpu/zpu_pkg.vhd constant DEBUG_TX_BAUD_RATE : integer := 115200; --230400; -- Baud rate for the debug transmitter ```` -Using Quartus Prime following the 'RTL Bit Stream build' above, build the RTL in the usual manner with this new configuration. You cannot use the Makefile build as it will entail Makefile changes so just use the Quartus Prime GUI at this time.

The software is the same and unless you have less memory, no changes need to be made to the software build.
+Using Quartus Prime following the 'RTL Bit Stream build' above, build the RTL in the usual manner with this new configuration. Alternatively, use the Makefile build system, for example to build the Flex CPU for the QMV board: +```shell + cd /build + make QMV_FLEX +``` +The software is the same and unless you have less memory, no changes need to be made to the software build.

-### ZPU Evo Build +### ZPU Evo Build The ZPU Evo has 2 pre-defined versions, the same CPU using different settings. These are the EVO and 'EVO MINIMAL'. The latter implements most of its instructions in micro-code like the ZPU Small. Assuming we are building the EVO without the WishBone interface, change the configuration as follows: @@ -656,13 +679,13 @@ Edit: zpu_soc_pkg.vhd constant ZPU_EVO : integer := 1; -- Use the EVOLUTION CPU. constant ZPU_EVO_MINIMAL : integer := 0; -- Use the Minimalist EVOLUTION CPU. -2. Disable WishBone devices as we arent using the wishbone interface: +2. Disable WishBone devices as we aren't using the wishbone interface: constant SOC_IMPL_WB_I2C : boolean := false; -- Implement I2C over wishbone interface. constant SOC_IMPL_WB_SDRAM : boolean := false; -- Implement SDRAM over wishbone interface. -3. Disable any other devices you dont need, such as PS2 by setting the flag to false. +3. Disable any other devices you don't need, such as PS2 by setting the flag to false. -4. If your using a frequency other than 100MHz as your main clock, enter it in the table +4. If you're using a frequency other than 100MHz as your main clock, enter it in the table below against your board. If you are using a different board, add a constant with suitable name and use this in your TopLevel (ie. as per E115_zpu_Toplevel.vhd). NB. If using your own board it is still imperative that you setup a PLL correctly to @@ -671,8 +694,8 @@ Edit: zpu_soc_pkg.vhd -- Frequencies for the various boards. -- - constant SYSCLK_E115_FREQ : integer := 100000000; -- E115 FPGA Board - constant SYSCLK_QMV_FREQ : integer := 100000000; -- QMTECH Cyclone V FPGA Board + constant SYSCLK_E115_FREQ : integer := 75000000; -- E115 FPGA Board + constant SYSCLK_QMV_FREQ : integer := 75000000; -- QMTECH Cyclone V FPGA Board constant SYSCLK_DE0_FREQ : integer := 100000000; -- DE0-Nano FPGA Board constant SYSCLK_DE10_FREQ : integer := 100000000; -- DE10-Nano FPGA Board constant SYSCLK_CYC1000_FREQ : integer := 100000000; -- Trenz CYC1000 FPGA Board @@ -692,12 +715,17 @@ Edit: cpu/zpu_pkg.vhd constant DEBUG_TX_BAUD_RATE : integer := 115200; --230400; -- Baud rate for the debug transmitter ```` -Using Quartus Prime following the 'RTL Bit Stream build' above, build the RTL in the usual manner with this new configuration. You cannot use the Makefile build as it will entail Makefile changes so just use the Quartus Prime GUI at this time.

The software is the same and unless you have less memory, no changes need to be made to the software build.
+Using Quartus Prime following the 'RTL Bit Stream build' above, build the RTL in the usual manner with this new configuration. Alternatively, use the Makefile build system, for example to build the Evo CPU for the QMV board: +```shell + cd /build + make QMV_EVO +``` +The software is the same and unless you have less memory, no changes need to be made to the software build.

-### Notes on setting up a new development board +### Notes on setting up a new development board If you are using your own FPGA board (ie. not one in the list I've tested and created Quartus configuration files for), please ensure you create these necessary files: ```` @@ -895,9 +923,9 @@ In the build/NEW_zpu_Toplevel.vhd: -### Connecting the Development board +### Connecting the Development board -1. In order to run the ZPU Evo iand it's software in basic form on the QMTECH board you need 2 USB to Serial (ie. [USB to Serial](https://www.amazon.co.uk/Laqiya-FT232RL-Converter-Adapter-Breakout/dp/B07H6XMC2X)) adapters and you wire them up according to the pinout as is defined in the \/build/QMV_zpu.qsf file. Ensure the adapters are set to 3.3V. See Images section for colour coded wiring. +1. In order to run the ZPU Evo and its software in basic form on the QMTECH board you need 2 USB to Serial (ie. [USB to Serial](https://www.amazon.co.uk/Laqiya-FT232RL-Converter-Adapter-Breakout/dp/B07H6XMC2X)) adapters and you wire them up according to the pinout as is defined in the \/build/QMV_zpu.qsf file. Ensure the adapters are set to 3.3V. See Images section for colour coded wiring. ```shell ##============================================================ # UART @@ -925,11 +953,11 @@ set_location_assignment PIN_Y20 -to SDCARD_CS[0]
-## Repository Structure +## Repository Structure The GIT Repository is organised as per the build environment shown in the tables below. -### RTL +### RTL | Folder | RTL File | Description | | ---------------- | -------------------- | ------------------------------------------------------------ | @@ -953,41 +981,28 @@ The GIT Repository is organised as per the build environment shown in the tables | devices/WishBone | I2C | I2C Controller | | | SRAM | Encapsulated Byte Addressable BRAM | | | SDRAM | Byte Addressable 32Bit SDRAM Controller | -| build | CYC1000 | Quartus definition files and Top Level VHDL for the Trenz Electronic CYC1000 Cyclone 10LP development board. | -| | E115 | Quartus definition files and Top Level VHDL for the Cyclone IV EP4CE115 DDR2 64BIT development board. | -| | QMV | Quartus definition files and Top Level VHDL for the QMTech Cyclone V development board. | -| | DE10 | Quartus definition files and Top Level VHDL for the Altera DE10 development board as used in the MiSTer project. | -| | DE0 | Quartus definition files and Top Level VHDL for the Altera DE0 development board. | +| build | CYC1000 | Quartus definition files and Top Level VHDL for the Trenz Electronic CYC1000 (Cyclone 10LP 10CL025YU256C8G) development board. | +| | E115 | Quartus definition files and Top Level VHDL for the Cyclone IV E (EP4CE115F23I7) DDR2 64BIT development board. | +| | QMV | Quartus definition files and Top Level VHDL for the QMTech Cyclone V (5CEFA2F23C8) development board. | +| | DE10 | Quartus definition files and Top Level VHDL for the Terasic DE10-Nano (Cyclone V 5CSEBA6U23I7) development board as used in the MiSTer project. | +| | DE0 | Quartus definition files and Top Level VHDL for the Terasic DE0-Nano (Cyclone V 5CSEMA4U23C6) development board. | +| | ReVerSE-U16 | Top Level VHDL for the ReVerSE-U16 development board. | | | Clock_* | Refactored Altera PLL definitions for various development board source clocks. These need to be made more generic for eventual inclusion of Xilinx fabric. |
-### Software -| Folder | Src File | Description | -| ------- | -------- | ------------------------------------------------------------ | -| apps | | The ZPUTA application can either have a feature embedded or as a separate standalone disk based applet in addition to extended applets. The purpose is to allow control of the ZPUTA application size according to available BRAM and SD card availability.
All applets for ZPUTA are stored in this folder. | -| build | | Build tree output suitable for direct copy to an SD card.
The initial bootloader and/or application as selected are compiled directly into a VHDL file for preloading in BRAM in the devices/sysbus/BRAM folder. | -| common | | Common C modules such as Elm Chan's excellent Fat FileSystem. | -| include | | C Include header files. | -| iocp | | A small bootloader/monitor application for initialization of the ZPU. Depending upon configuration this program can either boot an application from SD card or via the Serial Line and also provide basic tools such as memory examination. | -| startup | | Assembler and Linker files for generating ZPU applications. These files are critical for defining how GCC creates and links binary images as well as providing the micro-code for ZPU instructions not implemented in hardware. | -| utils | | Some small tools for converting binary images into VHDL initialization data. | -| zputa | | The ZPU Test Application. This is an application for testing the ZPU and the SoC components. It can either be built as a single image for pre-loading into a BRAM via VHDL or as a standalone application loaded by the IOCP bootloader from an SD card. The services it provides can either be embedded or available on the SD card as applets depending on memory restrictions. | -| | build.sh | Unix shell script to build IOCP, ZPUTA and Apps for a given design.

NAME
    build.sh -  Shell script to build a ZPU program or OS.

SYNOPSIS
    build.sh [-dIOoMBsxAh]

DESCRIPTION

OPTIONS
    -I  = 0 - Full, 1 - Medium, 2 - Minimum, 3 - Tiny (bootstrap only)
    -O        = zputa, zos
    -o    = 0 - Standalone, 1 - As app with IOCP Bootloader,
                    2 - As app with tiny IOCP Bootloader, 3 - As app in RAM 
    -M      = Max size of the boot ROM/BRAM (needed for setting Stack).
    -B      = Base address of , default -o == 0 : 0x00000 else 0x01000 
    -A      = App address of , default 0x0C000
    -s      = Maximum size of an app, defaults to (BRAM SIZE - App Start Address - Stack Size) 
                    if the App Start is located within BRAM otherwise defaults to 0x10000.
    -d            = Debug mode.
    -x            = Shell trace mode.
    -h            = This help screen.

EXAMPLES
    build.sh -O zputa -B 0x00000 -A 0x50000

EXIT STATUS
     0    The command ran successfully

     >0    An error ocurred. | +## Quartus Prime in Docker -
- -## Quartus Prime in Docker - -Installing Quartus Prime can be tedious and time consuming, especially as the poorly documented linux installation can lead to a wrong mix or missing packages which results in a non-functioning installation. To ease the burden I have pieced together a Docker Image containing Ubuntu, the necessary packages and Quartus Prime 17.1.1. +Installing Quartus Prime can be tedious and time consuming, especially as the poorly documented linux installation can lead to a wrong mix or missing packages which results in a non-functioning installation. +To ease the burden I have pieced together a Docker Image containing Ubuntu, the necessary packages and Quartus Prime 13.0sp1, 13.1 and 17.1.1. 1. Clone the repository: ````bash cd ~ - git clone https://github.com/pdsmart/zpu.git + git clone https://git.eaw.app/eaw/zpu.git cd zpu/docker/QuartusPrime ```` @@ -1010,14 +1025,15 @@ Installing Quartus Prime can be tedious and time consuming, especially as the po Build the docker image: ````bash - docker build -f Dockerfile.17.1.1 -t quartus-ii-17.1.1 . + docker build -f Dockerfile.17.1.1 -t quartus-ii-17.1.1 --build-arg user_uid=`id -u` --build-arg user_gid=`id -g` --build-arg user_name=`whoami` . ```` + For Quartus Prime 13.0.1 and 13.1 replace 17.1.1 with the necessary version. Quartus Prime 13.0.1 supports the older Cyclone devices. 2. Setup your X DISPLAY variable to point to your xserver: ````bash - export DISPLAY=: + export DISPLAY=:> # ie. export DISPLAY=192.168.1.1:0 ```` @@ -1056,9 +1072,9 @@ Installing Quartus Prime can be tedious and time consuming, especially as the po ````
-## Images +## Images -### Images of QMTECH Cyclone V wiring +### Images of QMTECH Cyclone V wiring ![SD Card Wiring](../images/IMG_9837.jpg) ![UART 1 Wiring](../images/IMG_9838.jpg) @@ -1068,9 +1084,9 @@ Installing Quartus Prime can be tedious and time consuming, especially as the po
Above are the wiring connections for the QMTECH Cyclone V board as used in the Build section, colour co-ordinated for reference.
-### Images of ZPUTA on a ZPU EVO CPU +### Images of ZPUTA on a ZPU EVO CPU -#### ZPU Performance +#### ZPU Performance ![ZPUTA Performance Test](../images/ScreenZPU1.png) Dhrystone and CoreMark performance tests of the ZPU Evo CPU. Depending on Fabric there are slight variations, these tests are on a Cyclone V CEFA chip, on a Cyclone IV CE I7 the results are 13.2DMIPS for Dhrystone and 22.2 for CoreMark. @@ -1083,10 +1099,10 @@ Help screen for ZPUTA, help in this instance is an applet on the SD Card. A * be ![ZPUTA SD Directory](../images/ScreenZPU3.png) SD Directory listings of all the compiled applets. -#### SDRAM Performance +#### SDRAM Performance ![ZPUTA SDRAM Performance Sysbus No Cache](../images/ZPUSDRAMPerformance.png) -SDRAM operating over the SYSBUS and with no cache. Not quite true memory performance as the ZPU makes several stack operations for a memory read/write, ie. IM
, IM , STORE for a write which would entail upto 11 instruction reads (3 cycles on the Evo) and two stack writes. +SDRAM operating over the SYSBUS and with no cache. Not quite true memory performance as the ZPU makes several stack operations for a memory read/write, ie. IM
, IM , STORE for a write which would entail up to 11 instruction reads (3 cycles on the Evo) and two stack writes. ![ZPUTA SDRAM Performance Sysbus Cache](../images/ZPUSDRAMPerformanceCached.png) SDRAM operating over the SYSBUS with full page cache per bank for read and write-thru cache for write. @@ -1098,7 +1114,7 @@ SDRAM operating over the WishBone Bus and with no cache. SDRAM operating over the WishBone Bus with full page cache per bank for read and write-thru cache for write.
-## Links +## Links | Recommended Site | | ---------------------------------------------------------------------------------------------- | @@ -1110,17 +1126,17 @@ SDRAM operating over the WishBone Bus with full page cache per bank for read and
-## Credits +## Credits Where I have used or based any component on a 3rd parties design I have included the original authors copyright notice within the headers or given due credit. All 3rd party software, to my knowledge and research, is open source and freely useable, if there is found to be any component with licensing restrictions, it will be removed from this repository and a suitable link/config provided.
-## Licenses +## Licenses The original ZPU uses the Free BSD license and such the Evo is also released under FreeBSD. SoC components and other developments written by me are currently licensed using the GPL. 3rd party components maintain their original copyright notices. -### The FreeBSD license +### The FreeBSD license Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. @@ -1130,7 +1146,7 @@ The original ZPU uses the Free BSD license and such the Evo is also released und The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the this project. -### The Gnu Public License v3 +### The Gnu Public License v3 The source and binary files in this project marked as GPL v3 are free software: you can redistribute it and-or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. The source files are distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.