diff --git a/README.md b/README.md
index fd9efb3..a90174c 100644
--- a/README.md
+++ b/README.md
@@ -1,17 +1,14 @@
-Please consult my [GitHub](https://pdsmart.github.io) website for more upto date information.
-
-
-The ZPU is a 32bit Stack based microprocessor and was designed by Øyvind Harboe from [Zylin AS](https://opensource.zylin.com/) and original documentation can be found on the [Zylin/OpenCore website or Wikipedia](https://en.wikipedia.org/wiki/ZPU_\(microprocessor\)). It is a microprocessor intended for FPGA embedded applications with minimal logic element and BRAM usage with the sacrifice of speed of execution.
+The ZPU is a 32bit Stack based microprocessor and was originally designed by Øyvind Harboe from [Zylin AS](https://opensource.zylin.com/) and original documentation can be found on the [Zylin/OpenCore website or Wikipedia](https://en.wikipedia.org/wiki/ZPU_\(microprocessor\)). It is a microprocessor intended for FPGA embedded applications with minimal logic element and BRAM usage with the sacrifice of speed of execution.
Zylin produced two designs which it made open source, namely the Small and Medium ZPU versions. Additional designs were produced by external developers such as the Flex and ZPUino variations, each offering enhancements to the original design such as Wishbone interface, performance etc.
This document describes another design which I like to deem as the ZPU Evo(lution) model whose focus is on *performance*, *connectivity* and *instruction expansion*. This came about as I needed a CPU for an emulator of a vintage computer i am writing which would act as the IO processor to provide Menu, Peripheral and SD services.
-An example of the *performance* of the ZPU Evo can be seen using CoreMark which returns a value of 19.1 @ 100MHz on Altera fabric using BRAM and for Dhrystone 11.2DMIPS. Comparisons can be made with the original ZPU designs in the gallery below paying attention to the CoreMark score which seems to be the defacto standard now. *Connectivity* can be seen via implementation of both System and Wishbone buses, allowing for connection of many opensource IP devices. *Instruction expansion* can be seen by the inclusion of a close coupled L1 cache where multiple instruction bytes are sourced and made available to the CPU which in turn can be used for optimization (ie. upto 5 IM instructions executed in 1 cycle) or for extended multi-byte instructions (ie. implementation of a LoaD Increment Repeat instruction). There is room for a lot more improvements such as stack cache, SDRAM to L2 burst mode, parallel instruction execution (ie. and + neqbranch) which are on my list.
+An example of the *performance* of the ZPU Evo can be seen using CoreMark which returns a value of 22.2 @ 100MHz on Altera fabric using BRAM and for Dhrystone 13.2DMIPS. Comparisons can be made with the original ZPU designs in the gallery below paying attention to the CoreMark score which seems to be the defacto standard now. *Connectivity* can be seen via implementation of both System and Wishbone buses, allowing for connection of many opensource IP devices. *Instruction expansion* can be seen by the inclusion of a close coupled L1 cache where multiple instruction bytes are sourced and made available to the CPU which in turn can be used for optimization (ie. upto 5 IM instructions executed in 1 cycle) or for extended multi-byte instructions (ie. implementation of a LoaD Increment Repeat instruction). There is room for a lot more improvements such as stack cache, SDRAM to L2 burst mode, parallel instruction execution (ie. and + neqbranch) which are on my list.
-# The CPU
+## The CPU
The ZPU Evo follows on from the ZPU Medium and Flex and areas of the code are similar, for example the instruction decoding. The design differs though due to caching and implementation of a Memory Transaction Processor where all Memory/IO operations (except for direct Instruction reads if dual-port instruction bus is enabled) are routed. The original CPU's all handled their memory requirements in-situ or part of the state machine whereas the Evo submits a request to the MXP whenever a memory operation is required.
@@ -45,67 +42,76 @@ In addition to the original instructions, a mechanism exists to extend the instr
***Extend Instruction,,[byte],[byte],[byte],[byte]***
-Where ParamSize = 00 - No parameter bytes
- 01 - 8 bit parameter
- 10 - 16 bit parameter
- 11 - 32 bit parameter
+Where ParamSize =
+ - 00 - No parameter bytes,
+ - 01 - 8 bit parameter,
+ - 10 - 16 bit parameter,
+ - 11 - 32 bit parameter
Some extended instructions are under development (ie. LDIR) an exact opcode value and extended instruction set has not yet been fully defined. The GNU AS assembler will be updated with these instructions so they can be invoked within a C program and eventually if they have benefit to C will be migrated into the GCC compiler (ie. ADD32/DIV32/MULT32/LDIR/LDDR as from what I have seen, these will have a big impact on CoreMark/Dhrystone tests).
### Implemented Instruction Set
-| Name | Opcode | | Description |
-|------------------|-----------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| BREAKPOINT | 0 | 00000000 | The debugger sets a memory location to this value to set a breakpoint. Once a JTAG-like debugger interface is added, it will be convenient to be able to distinguish between a breakpoint and an illegal(possibly emulated) instruction. |
-| IM | 1xxx xxxx | 1xxx xxxx | Pushes 7 bit sign extended integer and sets the a «instruction decode interrupt mask» flag(IDIM). If the IDIM flag is already set, this instruction shifts the value on the stack left by 7 bits and stores the 7 bit immediate value into the lower 7 bits. Unless an instruction is listed as treating the IDIM flag specially, it should be assumed to clear the IDIM flag. To push a 14 bit integer onto the stack, use two consecutive IM instructions. If multiple immediate integers are to be pushed onto the stack, they must be interleaved with another instruction, typically NOP. |
-| STORESP | 010x xxxx | 010x xxxx | Pop value off stack and store it in the SP+xxxxx*4 memory location, where xxxxx is a positive integer. |
-| LOADSP | 011x xxxx | 011x xxxx | Push value of memory location SP+xxxxx*4, where xxxxx is a positive integer, onto stack. |
-| ADDSP | 0001 xxxx | 0001 xxxx | Add value of memory location SP+xxxx*4 to value on top of stack. |
-| EMULATE | 001x xxxx | 010x xxxx | Push PC to stack and set PC to 0x0+xxxxx*32. This is used to emulate opcodes. See zpupgk.vhd for list of emulate opcode values used. zpu_core.vhd contains reference implementations of these instructions rather than letting the ZPU execute the EMULATE instruction. One way to improve performance of the ZPU is to implement some of the EMULATE instructions.|
-| PUSHPC | emulated | emulated | Pushes program counter onto the stack. |
-| POPPC | 0000 0100 | 0000 0100 | Pops address off stack and sets PC |
-| LOAD | 0000 1000 | 0000 1000 | Pops address stored on stack and loads the value of that address onto stack. Bit 0 and 1 of address are always treated as 0(i.e. ignored) by the HDL implementations and C code is guaranteed by the programming model never to use 32 bit LOAD on non-32 bit aligned addresses(i.e. if a program does this, then it has a bug).|
-| STORE | 0000 1100 | 0000 1100 | Pops address, then value from stack and stores the value into the memory location of the address. Bit 0 and 1 of address are always treated as 0 |
-| PUSHSP | 0000 0010 | 0000 0010 | Pushes stack pointer. |
-| POPSP | 0000 1101 | 0000 1101 | Pops value off top of stack and sets SP to that value. Used to allocate/deallocate space on stack for variables or when changing threads. |
-| ADD | 0000 0101 | 0000 0101 | Pops two values on stack adds them and pushes the result |
-| AND | 0000 0110 | 0000 0110 | Pops two values off the stack and does a bitwise-and & pushes the result onto the stack |
-| OR | 0000 0111 | 0000 0111 | Pops two integers, does a bitwise or and pushes result |
-| NOT | 0000 1001 | 0000 1001 | Bitwise inverse of value on stack |
-| FLIP | 0000 1010 | 0000 1010 | Reverses the bit order of the value on the stack, i.e. abc->cba, 100->001, 110->011, etc. The raison d'etre for this instruction is mainly to emulate other instructions. |
-| NOP | 0000 1011 | 0000 1011 | No operation, clears IDIM flag as side effect, i.e. used between two consecutive IM instructions to push two values onto the stack. |
-| PUSHSPADD | 61 | 00111101 | a=sp; b=popIntStack()*4; pushIntStack(a+b); |
-| POPPCREL | 57 | 00111001 | setPc(popIntStack()+getPc()); |
-| SUB | 49 | 00110001 | int a=popIntStack(); int b=popIntStack(); pushIntStack(b-a); |
-| XOR | 50 | | pushIntStack(popIntStack() ^ popIntStack()); |
-| LOADB | 51 | | 8 bit load instruction. Really only here for compatibility with C programming model. Also it has a big impact on DMIPS test. pushIntStack(cpuReadByte(popIntStack())&0xff); |
-| STOREB | 52 | | 8 bit store instruction. Really only here for compatibility with C programming model. Also it has a big impact on DMIPS test. addr = popIntStack(); val = popIntStack(); cpuWriteByte(addr, val); |
-| LOADH | 34 | | 16 bit load instruction. Really only here for compatibility with C programming model. pushIntStack(cpuReadWord(popIntStack())); |
-| STOREH | 35 | | 16 bit store instruction. Really only here for compatibility with C programming model. addr = popIntStack(); val = popIntStack(); cpuWriteWord(addr, val); |
-| LESSTHAN | 36 | | Signed comparison a = popIntStack(); b = popIntStack(); pushIntStack((a < b) ? 1 : 0); |
-| LESSTHANOREQUAL | 37 | | Signed comparison a = popIntStack(); b = popIntStack(); pushIntStack((a <= b) ? 1 : 0); |
-| ULESSTHAN | 38 | | Unsigned comparison long a; //long is here 64 bit signed integer long b; a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff b = ((long) popIntStack()) & INTMASK; pushIntStack((a < b) ? 1 : 0); |
-| ULESSTHANOREQUAL | 39 | | Unsigned comparison long a; //long is here 64 bit signed integer long b; a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff b = ((long) popIntStack()) & INTMASK; pushIntStack((a <= b) ? 1 : 0); |
-| EQBRANCH | 55 | | int compare; int target; target = popIntStack() + pc; compare = popIntStack(); if (compare == 0) { setPc(target); } else { setPc(pc + 1); } |
-| NEQBRANCH | 56 | | int compare; int target; target = popIntStack() + pc; compare = popIntStack(); if (compare != 0) { setPc(target); } else { setPc(pc + 1); } |
-| MULT | 41 | | Signed 32 bit multiply pushIntStack(popIntStack() * popIntStack()); |
-| DIV | 53 | | Signed 32 bit integer divide. a = popIntStack(); b = popIntStack(); if (b == 0) { // undefined } pushIntStack(a / b); |
-| MOD | 54 | | Signed 32 bit integer modulo. a = popIntStack(); b = popIntStack(); if (b == 0) { // undefined } pushIntStack(a % b); |
-| LSHIFTRIGHT | 42 | | unsigned shift right. long shift; long valX; int t; shift = ((long) popIntStack()) & INTMASK; valX = ((long) popIntStack()) & INTMASK; t = (int) (valX >> (shift & 0x3f)); pushIntStack(t); |
-| ASHIFTLEFT | 43 | | arithmetic(signed) shift left. long shift; long valX; shift = ((long) popIntStack()) & INTMASK; valX = ((long) popIntStack()) & INTMASK; int t = (int) (valX << (shift & 0x3f)); pushIntStack(t); |
-| ASHIFTRIGHT | 43 | | arithmetic(signed) shift left. long shift; int valX; shift = ((long) popIntStack()) & INTMASK; valX = popIntStack(); int t = valX >> (shift & 0x3f); pushIntStack(t); |
-| CALL | 45 | | call procedure. int address = pop(); push(pc + 1); setPc(address); |
-| CALLPCREL | 63 | | call procedure pc relative int address = pop(); push(pc + 1); setPc(address+pc); |
-| EQ | 46 | | pushIntStack((popIntStack() == popIntStack()) ? 1 : 0); |
-| NEQ | 47 | | pushIntStack((popIntStack() != popIntStack()) ? 1 : 0); |
-| NEG | 48 | | pushIntStack(-popIntStack()); |
+| Name | Opcode | Description |
+| ---------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ADD | 00000101 | Pops two values on stack adds them and pushes the result |
+| ADDSP | 0001xxxx | Add value of memory location SP+xxxx*4 to value on top of stack. |
+| AND | 00000110 | Pops two values off the stack and does a bitwise-and & pushes the result onto the stack |
+| ASHIFTLEFT \*| 00101011 | arithmetic(signed) shift left. long shift; long valX; shift = ((long) popIntStack()) & INTMASK; valX = ((long) popIntStack()) & INTMASK; int t = (int) (valX << (shift & 0x3f)); pushIntStack(t); |
+| ASHIFTRIGHT \*| 00101100 | arithmetic(signed) shift left. long shift; int valX; shift = ((long) popIntStack()) & INTMASK; valX = popIntStack(); int t = valX >> (shift & 0x3f); pushIntStack(t); |
+| BREAKPOINT | 00000000 | The debugger sets a memory location to this value to set a breakpoint. Once a JTAG-like debugger interface is added, it will be convenient to be able to distinguish between a breakpoint and an illegal(possibly emulated) instruction. |
+| CALL \*| 00101101 | call procedure. int address = pop(); push(pc + 1); setPc(address); |
+| CALLPCREL \*| 00111111 | call procedure pc relative int address = pop(); push(pc + 1); setPc(address+pc); |
+| DIV \*| 00110101 | Signed 32 bit integer divide. a = popIntStack(); b = popIntStack(); if (b == 0) { // undefined } pushIntStack(a / b); |
+| EMULATE | 001xxxxx | Push PC to stack and set PC to 0x0+xxxxx*32. This is used to emulate opcodes. Emulated Opcodes are marked with a star (\*) in this table. |
+| EQ \*| 00101110 | pushIntStack((popIntStack() == popIntStack()) ? 1 : 0); |
+| EQBRANCH \*| 00110111 | int compare; int target; target = popIntStack() + pc; compare = popIntStack(); if (compare == 0) { setPc(target); } else { setPc(pc + 1); } |
+| ESR | *E*00000000 | Copy Extended Status Register to TOS. Bit 31 : 1 = reserved. Bit 0 = Background Transfer in Progress (1). |
+| EXTEND | 00001111 | Extended instruction set. Byte following this instruction represents the new instruction. |
+| FIADD32 \*| 00111010 | Fixed point (Q15) addition. TOS and NOS are added and the result placed in TOS. |
+| FIDIV32 \*| 00111011 | Fixed point (Q15) division. TOS is the dividend, NOS is the divisor, result is placed in TOS. |
+| FIMULT32 \*| 00111100 | Fixed point (Q15) multiplication. TOS is multiplied by NOS and the result is placed in TOS. |
+| FLIP | 00001010 | Reverses the bit order of the value on the stack, i.e. abc->cba, 100->001, 110->011, etc. The raison d'etre for this instruction is mainly to emulate other instructions. |
+| IM | 1xxxxxxx | Pushes 7 bit sign extended integer and sets the a «instruction decode interrupt mask» flag(IDIM). If the IDIM flag is already set, this instruction shifts the value on the stack left by 7 bits and stores the 7 bit immediate value into the lower 7 bits. Unless an instruction is listed as treating the IDIM flag specially, it should be assumed to clear the IDIM flag. To push a 14 bit integer onto the stack, use two consecutive IM instructions. If multiple immediate integers are to be pushed onto the stack, they must be interleaved with another instruction, typically NOP. |
+| LDIR | *E*00001yxx | LoaD Increment Repeat, copies \ words of memory from source to destination. TOS = Source Address, NOS = Destination Address, *xx* = bytes to transfer where *'01'* = 8 bit parameter, *'10'* = 16 bit parameter, *'11'* = 32 bit parameter. *y* = mode of operation, *'0'* = CPU waits for completion, *'1'* = Transfer operates in background. If a previous transfer is operating in the background, CPU waits for completion prior to executing instruction. Consult ESR for current status of background execution. |
+| LESSTHAN \*| 00100100 | Signed comparison a = popIntStack(); b = popIntStack(); pushIntStack((a < b) ? 1 : 0); |
+| LESSTHANOREQUAL \*| 00100101 | Signed comparison a = popIntStack(); b = popIntStack(); pushIntStack((a <= b) ? 1 : 0); |
+| LOAD | 00001000 | Pops address stored on stack and loads the value of that address onto stack. Bit 0 and 1 of address are always treated as 0(i.e. ignored) by the HDL implementations and C code is guaranteed by the programming model never to use 32 bit LOAD on non-32 bit aligned addresses(i.e. if a program does this, then it has a bug).|
+| LOADB \*| 00110011 | 8 bit load instruction. Really only here for compatibility with C programming model. Also it has a big impact on DMIPS test. pushIntStack(cpuReadByte(popIntStack())&0xff); |
+| LOADH \*| 00100010 | 16 bit load instruction. Really only here for compatibility with C programming model. pushIntStack(cpuReadWord(popIntStack())); |
+| LOADSP | 011xxxxx | Push value of memory location SP+xxxxx*4, where xxxxx is a positive integer, onto stack. |
+| LSHIFTRIGHT \*| 00101010 | unsigned shift right. long shift; long valX; int t; shift = ((long) popIntStack()) & INTMASK; valX = ((long) popIntStack()) & INTMASK; t = (int) (valX >> (shift & 0x3f)); pushIntStack(t); |
+| MOD \*| 00110110 | Signed 32 bit integer modulo. a = popIntStack(); b = popIntStack(); if (b == 0) { // undefined } pushIntStack(a % b); |
+| MULT \*| 00101001 | Signed 32 bit multiply pushIntStack(popIntStack() * popIntStack()); |
+| NEG \*| 00110000 | pushIntStack(-popIntStack()); |
+| NEQ \*| 00101111 | pushIntStack((popIntStack() != popIntStack()) ? 1 : 0); |
+| NEQBRANCH \*| 00111000 | int compare; int target; target = popIntStack() + pc; compare = popIntStack(); if (compare != 0) { setPc(target); } else { setPc(pc + 1); } |
+| NOP | 00001011 | No operation, clears IDIM flag as side effect, i.e. used between two consecutive IM instructions to push two values onto the stack. |
+| NOT | 00001001 | Bitwise inverse of value on stack |
+| OR | 00000111 | Pops two integers, does a bitwise or and pushes result |
+| POPPC | 00000100 | Pops address off stack and sets PC |
+| POPPCREL \*| 00111001 | setPc(popIntStack()+getPc()); |
+| POPSP | 00001101 | Pops value off top of stack and sets SP to that value. Used to allocate/deallocate space on stack for variables or when changing threads. |
+| PUSHPC | emulated | Pushes program counter onto the stack. |
+| PUSHSP | 00000010 | Pushes stack pointer. |
+| PUSHSPADD \*| 00111101 | a=sp; b=popIntStack()*4; pushIntStack(a+b); |
+| STORESP | 010xxxxx | Pop value off stack and store it in the SP+xxxxx*4 memory location, where xxxxx is a positive integer. |
+| STORE | 00001100 | Pops address, then value from stack and stores the value into the memory location of the address. Bit 0 and 1 of address are always treated as 0 |
+| STOREB \*| 00110100 | 8 bit store instruction. Really only here for compatibility with C programming model. Also it has a big impact on DMIPS test. addr = popIntStack(); val = popIntStack(); cpuWriteByte(addr, val); |
+| STOREH \*| 00100011 | 16 bit store instruction. Really only here for compatibility with C programming model. addr = popIntStack(); val = popIntStack(); cpuWriteWord(addr, val); |
+| SUB \*| 00110001 | int a=popIntStack(); int b=popIntStack(); pushIntStack(b-a); |
+| ULESSTHAN \*| 00100110 | Unsigned comparison long a; //long is here 64 bit signed integer long b; a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff b = ((long) popIntStack()) & INTMASK; pushIntStack((a < b) ? 1 : 0); |
+| ULESSTHANOREQUAL\*| 00100111 | Unsigned comparison long a; //long is here 64 bit signed integer long b; a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff b = ((long) popIntStack()) & INTMASK; pushIntStack((a <= b) ? 1 : 0); |
+| XOR \*| 00110010 | pushIntStack(popIntStack() ^ popIntStack()); |
+*E* = Extended instruction, prefixed by EXTEND opcode.
+*\** = Emulated instruction if not implemented in hardware.
### Implemented Instructions Comparison Table
-
+
### Hardware Variable Byte Write
@@ -117,17 +123,20 @@ In the Evo, hardware was implemented (build time selectable) to allow Byte and H
In order to debug the CPU or just provide low level internal operating information, a cached UART debug module is implemented. Currently this is only for output but has the intention to be tied into the IOCP for in-situ debugging when Simulation/Signal-Tap is not available.
-Embedded within the CPU RTL are statements which issue snapshot information to the serialiser, if enabled in the configuration along with the information level. This is then serialized and output to a connected terminal. A snapshot of the output information can be seen below (with manual comments):
+Embedded within the CPU RTL are selectable level triggered statements which issue snapshot information to the serialiser. The statements are expanded and then serialized and output to a connected terminal. A snapshot of the output information can be seen below (with manual comments):
| |
| ------------------------------------------------------------ |
-| 000477 01ffec 00001ae4 00000000 70.17 04770484 046c047c 08f0046c 0b848015 17700500 05000500 05001188 11ef2004
Break Point - Illegal instruction PC Stack TOS NOS Insn Signals Signals Signals Signals L1 Insn Q L1 Insn Q L1 Insn Q L1 Insn Q 000478 01ffe8 00001ae4 00001ae4 00.05 04780484 046c0478 08f0046c 0b888094 05000500 05000500 118811ef 20041188
L1 Cache Dump 000478 (480)-> 11 e2 2a 51 11 a0 11 8f <-(483) (004)->11 ed 20 04 05 00 05 00 05 00 05 00 05 00 05 00 20 (46c)->04 11 b5 11 e4 17 70 <-(46f) (004)-> 11 ed 20 04 05 00 05 00 05 00 05 00 05 00 05 00 20 (46c)->04 11 b5 11 e4 17 70 11 b6 11 c4 2d 27 11 8b <-(473) 05 00 05 00 05 00 05 00 (46c)->20 04 11 b5 11 e4 17 70 11 b6 11 c4 2d 27 11 8b 1c 38 11 80 17 71 17 70 -<(477) (46c)->20 04 11 b5 11 e4 17 70 11 b6 11 c4 2d 27 11 8b 1c 38 11 80 17 71 17 70 -<(477) 05 00 05 00 05 00 05 00 (470)->11 b6 11 c4 2d 27 11 8b 1c 38 11 80 17 71 17 70 <-(477) -> 05 00 05 00 05 00 05 00 (47c)->11 88 11 ef 20 04 11 88 <-(47f) (474)->1c 38 11 80 17 71 17 70 05 00 05 00 05 00 05 00 11 88 11 ef 20 04 11 88 11 e2 2a 51 11 a0 11 8f 05 00 05 00 05 00 05 00 11 88 11 ef 20 04 11 88 11 e2 2a 51 11 a0 11 8f 11 ed 20 04 05 00 05 00 11 88 11 ef 20 04 11 88 11 e2 2a 51 11 a0 11 8f 11 ed 20 04 05 00 05 00 05 00 05 00 05 00 05 00 L2 Cache Dump 000000 88 08 8c 08 ed 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000020 88 08 8c 08 90 08 0b 0b 0b 88 80 08 2d 90 0c 8c 0c 88 0c 04 00 00 00 00 00 00 00 00 00 00 00 00 000040 71 fd 06 08 72 83 06 09 81 05 82 05 83 2b 2a 83 ff ff 06 52 04 00 00 00 00 00 00 00 00 00 00 00 |
All critical information such as current instruction being executed (or not if stalled), Signals/Flags, L1/L2 Cache contents and Memory contents can be output.
+### Timing Constraints
+
+This is a work in progress, I am slowly updating the design and/or adding constraints such that timing is fully met. Currently there is negative slack at 100MHz albeit the design fully works, this will in the future be corrected so timing as analyzed by TimeQuest will be met.
-# System On a Chip
+## System On a Chip
In order to provide a working framework in which the ZPU Evo could be used, a System On a Chip wrapper was created which allows for the instantiation of various devices (ie. UART/SD card).
@@ -135,29 +144,34 @@ As part of the development, the ZPU Small/Medium/Flex models were incorporated i
The SoC currently implements (in the build tree):
-| Component | Selectable (ie not hardwired) |
-| ------------------------- | ------------------------------------------------------------ |
-| CPU | Choice of ZPU Small, Medium, Flex, Evo or Evo Minimal. |
-| Wishbone Bus | Yes, 32 bit bus. |
-| (SB) BRAM | Yes, implement a configurable block of BRAM as the boot loader and stack. |
-| Instruction Bus BRAM | Yes, enable a separate bus (or Dual-Port) to the boot code implemented in BRAM. This is generally a dual-port BRAM shared with the Sysbus BRAM but can be independent. |
-| (SB) RAM | Implement a block of BRAM as RAM, seperate from the BRAM used for the boot loader/stack. |
-| (WB) SDRAM | Yes, implement an SDRAM controller over the Wishbone bus. |
-| (WB) RAM | Implement a block of BRAM as RAM over the Wishbone bus. |
-| (WB) I2C | Yes, implements an I2C Controller over the Wishbone bus. |
-| (SB) Timer 0 | No, implements a hardware 12bit Second, 18bit milliSec and 24bit uSec down counter with interrupt, a 32bit milliSec up counter with interrupt and a YMD HMS Real Time Clock. The down counters are ideal for scheduling. |
-| (SB) Timer 1 | Yes, a selectable number of pre-scaled 32bit down counters. |
-| (SB) UART 0 | No, a cached UART used for monitor output and command input/program load. |
-| (SB) UART 1 | No, a cached UART used for software (C program)/hardware (ZPU debug serializer) output. |
-| (SB) Interrupt Controller | Yes, a prioritized configurable (# of inputs) interrupt controller. |
-| (SB) PS2 | Yes, a PS2 Keyboard and Mouse controller. |
-| (SB) SPI | Yes, a configurable number of Serial Peripheral Interface controllers. |
-| (SB) SD | Yes, a configurable number of hardware based SPI SD controllers. |
-| (SB) SOCCFG | Yes, a set of registers to indicate configuration of the ZPU and SoC to the controlling program. |
+| Component | Option | Comment |
+| ----------------------------- | ------ | ------------------------------------------------------------ |
+| CPU | Yes | ZPU Small, Medium, Flex, Evo or Evo Minimal. |
+| Wishbone Bus | Yes | 32 bit Wishbone bus. |
+| (SB) BRAM | Yes | Implement a configurable block of BRAM as the boot loader and stack. |
+| Instruction Bus BRAM | Yes | Enable a separate bus (or Dual-Port) to the boot code implemented in BRAM. This is generally a dual-port BRAM shared with the Sysbus BRAM but can be independent. |
+| (SB) RAM | Yes | Implement a block of BRAM as RAM, seperate from the BRAM used for the boot loader/stack. |
+| (WB) SDRAM | Yes | Implement an SDRAM controller over the Wishbone bus. |
+| (WB) RAM | Yes | Implement a block of BRAM as RAM over the Wishbone bus. |
+| (WB) I2C | Yes | Implements an I2C Controller over the Wishbone bus. |
+| (SB) Timer 0 | No | Implements a hardware 12bit Second, 18bit milliSec and 24bit uSec down counter with interrupt, a 32bit milliSec up counter with interrupt and a YMD HMS Real Time Clock. The down counters are ideal for scheduling. |
+| (SB) Timer 1 | Yes | A selectable number of pre-scaled 32bit down counters. |
+| (SB) UART 0 | No | A cached UART used for monitor output and command input/program load. |
+| (SB) UART 1 | No | A cached UART used for software (C program)/hardware (ZPU debug serializer) output. |
+| (SB) Interrupt Controller | Yes | A prioritized configurable (# of inputs) interrupt controller. |
+| (SB) PS2 | Yes | A PS2 Keyboard and Mouse controller. |
+| (SB) SPI | Yes | A configurable number of Serial Peripheral Interface controllers. |
+| (SB) SD | Yes | A configurable number of hardware based SPI SD controllers. |
+| (SB) SOCCFG | Yes | A set of registers to indicate configuration of the ZPU and SoC to the controlling program. |
-Within the SoC configuration, items such as starting Stack Address, Reset Vector, IO Start/End (SB) and (WB) can be specified. Given the wishbone bus, it is very easy to add further opencore IP devices, for the system bus some work may be needed as the opencore IP devices use differing signals.
+Within the SoC configuration, items such as starting Stack Address, Reset Vector, IO Start/End (SB) and (WB) can be specified. With the addition of the wishbone bus, it is very easy to add further opencore IP devices, for the system bus some work may be needed as the opencore IP devices use differing signals.
+
+### SDRAM
+
+
+
+## Software
-# Software
The software provided includes:
@@ -166,143 +180,248 @@ The software provided includes:
3. A disk operating system, zOS (ZPU Operating System). A version of ZPUTA but aimed at production code where all functionality resides as disk applications.
4. Library functions in C to aid in building applications, including 3rd party libs ie. FatFS from El. Chan
-### IOCP
-
-The I/O Control Program (IOCP) is basically a bootloader, it can operate standalone or as the first stage in booting an application. At the time of writing the following functionality and memory maps have been defined in the build.sh and within the parameterisation of the IOCP/ZPUTA/RTL but any other is possible by adjusting the parameters.
-
- - Tiny - IOCP is the smallest size possible to boot from SD Card. It is useful for a SoC configuration where there is limited BRAM and the applications loaded from the SD card would potentially run in external RAM.
- - Minimum - As per tiny but adds: print IOCP version, interrupt handler, boot message and SD error messages.
- - Medium - As per small but adds: command line processor to add commands below, timer on auto boot so it can be disabled by pressing a key
-
- | Command | Description |
- | ------- | ------------------------------------------ |
- | 1 | Boot Application in Application area BRAM |
- | 4 | Dump out BRAM (boot) memory |
- | 5 | Dump out Stack memory |
- | 6 | Dump out application RAM |
- | C | Clear Application area of BRAM |
- | c | Clear Application RAM |
- | d | List the SD Cards directory |
- | R | Reset the system and boot as per power on |
- | h | Print out help on enabled commands |
- | i | Prints version information |
-
- - Full - As medium but adds additional commands below.
-
- | Command | Description |
- | ------- | ------------------------------------------ |
- | 2 | Upload to BRAM application area, in binary format, from serial port |
- | 3 | Upload to RAM, in binary format, from serial port |
- | i | Print detailed SoC configuration |
-
-### ZPUTA
-
-ZPUTA started life as a basic test application to verify ZPU Evo and SoC operations. As it evolved and different FPGA's were included in the ZPU Evo scope, it became clear that it had to be more advanced due to limited resources.
-
-ZPUTA has two primary methods of execution, a) as an application booted by IOCP, b) standalone booted as the ZPU Evo startup firmware. The mode is chosen in the configuration and functionality is identical.
-
-In order to cater for limited FPGA BRAM resources, all functionality of ZPUTA can be enabled/disabled within the loaded image. If an SD Card is present then some/all functionality can be shifted from the loaded image into applets (1 applet per function, ie. memory clear) and stored on the SD card - this mode is like DOS where typing a command retrieves the applet from SD card and executes it.
-
-The functionality currently provided by ZPUTA can be summarised as follows.
-
-| Category | Command | Parameters | Description |
-| -------- | ------- | ---------- | ----------------------------------------------- |
-| Disk IO Commands | ddump | \[ \] | Dump a sector |
-| | dinit | \ \[\] | Initialize disk |
-| | dstat | \ | Show disk status |
-| | dioctl | \ | ioctl(CTRL_SYNC) |
-| Disk Buffer Commands | bdump | \ | Dump buffer |
-| | bedit | \ \[\] ... | Edit buffer |
-| | bread | \ \ \[\] | Read into buffer |
-| | bwrite | \ \ \[\] | Write buffer to disk |
-| | bfill | \ | Fill buffer |
-| | blen | \ | Set read/write length for fread/fwrite command |
-| Filesystem Commands | finit | \ \[\] | Force init the volume |
-| | fopen | \ \ | Open a file |
-| | fclose | | Close the open file |
-| | fseek | \ | Move fp in normal seek |
-| | fread | \ | Read part of file into buffer |
-| | finspect | \ | Read part of file and examine |
-| | fwrite | \ \ | Write part of buffer into file |
-| | ftrunc | | Truncate the file at current fp |
-| | falloc | \ \ | Allocate ctg blks to file |
-| | fattr | \ \ \ | Change object attribute |
-| | ftime | | Change object timestamp |
-| | frename | \ \ | Rename an object |
-| | fdel | \ | Delete an object |
-| | fmkdir | \ | Create a directory |
-| | fstat | \[\] | Show volume status |
-| | fdir | \[\] | Show a directory |
-| | fcat | \ | Output file contents |
-| | fcp | \ \ | Copy a file |
-| | fconcat | \ \ \ | Concatenate 2 files |
-| | fxtract | \ \ \ \ | Extract a portion of file |
-| | fload | \ \[\] | Load a file into memory |
-| | fexec | \ \ \ \ | Load and execute file |
-| | fsave | \ \ \ | Save memory range to a file |
-| | fdump | \ \[\] | Dump a file contents as hex |
-| | fcd | \ | Change current directory |
-| | fdrive | \ | Change current drive |
-| | fshowdir | | Show current directory |
-| | flabel | \