Personal Computer User Manual

www.hitachi.com BladeSymphony 1000 Architecture White Paper 15
Figure 6. Hitachi Node Controller connects multiple server blades
By dividing the SMP system across several server blades, the memory bus contention problem is
solved by virtue of the distributed design. A processor’s access to its on-board memory incurs no
penalty. The two processors (four cores) can access up to 64 GB at the full speed of local memory.
When a processor needs data that is not contained in its locally attached memory, its node controller
needs to contact the appropriate other node controller to retrieve the data. The latency for retrieving
that data is therefore higher than retrieving data from local memory. Since remote memory takes longer
to access, this is known as a non-uniform memory architecture (NUMA). The advantage of using non-
uniform memory is the ability to scale to a larger number of processors within a single system image
while still allowing for the speed of local memory access.
While there is a penalty for accessing remote memory, a number of operating systems are enhanced to
improve the performance of NUMA system designs. These operating systems take into account where
data is located when scheduling tasks to run on CPUs, using the closest CPU where possible. Some
operating systems are able to rearrange the location of data in memory to move it closer to the
processors where its needed. For operating systems that are not NUMA aware, the BladeSymphony
1000 offers a number of memory interleaving options that can improve performance.
The Node Controllers can connect to up to three other Node Controllers providing a point-to-point
connection between each Node Controller. The advantage of the point-to-point connections is it
eliminates a bus, which would be prone to contention, and eliminates the cross bar switch, which
reduces contention as a bus, but adds complexity and latency. A remote memory access is streamlined
because it only needs to pass through the two Node Controllers, this provides less latency when
compared to other SMP systems.
MC
Memory
Controller
MC
Memory
Controller
DDR2
Memory
DDR2
Memory
PCI
Bridge
PCI
Slots
PCI-Express (4 Lane)
PCI Bus
2GB/s x3
Processor Bus
6.4 GB/s
(FSB400MHz)
10.6 GB/s
(FSB667MHz)
Memory Bus
4.8 GB/s
(FSB400MHz)
5.3 GB/s
(FSB667MHz)
L3 Cache
Copy Tag
Node Bandwidth
4.8 GB/s
(FSB400MHz)
5.3 GB/s
(FSB667MHz)
CC-Numa
Point to point
Low Latency
NDC
Node
Controller
NDC
Node
Controller
NDC
Node
Controller
NDC
Node
Controller