Quadrics QsNetII Interconnect
since the last time that all the boards and chips were selected and cleared. When
using this raw format of error data, you must decide whether the registers are
reporting genuine link errors or simply errors due to node reboots. You look for a
link to show errors repetitively, every day, during normal production mode testing.
Use the following procedure to run this test:
1. Open a connection to the interconnect’s master control card, or launch the
jtest utility remotely as described in Section 11.2.
2. At the jtest utility prompt, select all boards as follows:
# jtest> b -1
board in slot 0 is of type QM501_CU
board in slot 4 is of type QM502_CU
board in slot 8 is of type QM503
board in slot 9 is of type QM503
3. At the jtest utility prompt, select all switch chips as follows:
# jtest> c -1
4. At the jtest utility prompt, enter the error command:
# jtest> error
jtest: no errors on boards 0 4 8 9 chips : 0 1234567
jtest>
If you see the same repetitive error occurring on a link, that error indicates a
potential fault. The error registers do not count the number of errors, just indicate
that at least 1 error has occurred since the register was last cleared.
The jtest error command generates the following information:
• B:C:L The board, chip and link being reported.
• E An error has occurred.
• RtCRC CRC error on route byte (packet and transaction error). This indicates
some bit errors on the route values.
• TrCRC CRC error on transaction (packet and transaction error). This indicates
some bit errors in one of the transactions.
• RcvLk Receiver lock error (low level line error). Problems with the received
or local clock.
• Dskew Deskew error (low level line error). Only likely to be caused by a hard
failing data bit.
• Phase Phase error (low level line error). Probably a missed clock on the
incoming link.
• DataE Data error (low level line error). Not a valid data value or a valid token.
• ChM45 Mod 4/5 change detected on link (low level line error).
• Fifo0 FIFO overrun on virtual channel 0 (protocol error).
• Fifo1 FIFO overrun on virtual channel 1 (protocol error).
• OpenT Packet has been open at the input for too long (protocol error).
• PktRT Packet acknowledge return error (protocol error).
Protocol errors are normally caused by very high rates of errors on another
link. They can only be caused by double or triple bit errors converting one type
of token into another valid token. Note that data errors occur when a node is
reset. The following example demonstrates a protocol error:
B:C:L E RtCRC TrCRC RcvLk Dskew Phase Fifo0 Fifo1 OpenT PktRT ChM45 DataE Value
0:0:0 1 0 0 10 0 1 1 1 0 0 0 1 00f022
12-18 Maintenance and Diagnostic Procedures