Technical data
VAX 6000 Series Vector Processor
2.4.2 Vector Floating-Point Unit Chip
The FPU chip is a multi-stage pipelined floating-point processor. Among
its features are:
• VAX vector floating-point instructions and data types. The FPU
implements instruction and data type support for all VAX vector
floating-point instructions as well as the integer multiply operation.
Floating-point data types F_, D_, and G_floating are supported.
• High-throughput external interface. The FPU receives two 32-bit
operands from the vector register file chip every cycle. It drives back
a 32-bit result to the vector register file chip in the same cycle.
• Based on the floating-point accelerator chip (the F-chip) on the scalar
module.
2.5 LOAD/STORE UNIT
When a load/store instruction is issued, the load/store unit becomes bus
master and controls the internal cache data (CD) bus. Once a load/store
instruction starts execution, no further instructions can be issued on
the CD bus until it completes. The load/store unit handles the memory
reference instructions, the address translation, the cache management,
and the memory bus interface.
If a memory instruction uses register offsets, the offset register is first
read into a buffer and then each element of the offset register is added
to the base. This saves having to turn around the internal bus for each
offset read. If a register offset is not used, addresses are generated by
adding the stride to the base. This virtual address is then translated
to a physical address by using an on-chip 136-entry, fully associative
translation buffer (TB). Two entries are checked at once by an address
predictor looking for "address translation successful" on the last element.
The early prediction permits the scalar processor to be released and
appear to be asynchronous on memory reference instructions. The load
/store unit handles translation buffer misses on its own but returns
the necessary status on invalid or swapped-out pages. Once the scalar
processor corrects the situation, the instruction is retried from the
beginning.
Once a physical address is obtained, the load/store unit looks it up in
the 32K entry tag store. The address is delayed and then passed to the
1-Mbyte cache data store. This delay permits cache lookup to complete
before data is written to the cache on store operations. In parallel, the
2–7