QuickSpecs

Technical white paper | HP Reference Architecture for Hortonworks Data Platform on HP ProLiant SL4540 Gen8 Server
6
Pre-deployment considerations
There are a number of key factors you should consider prior to designing and deploying a Hadoop Cluster. The following
subsections articulate the design decisions in creating a balanced baseline configuration for the reference architectures. The
rationale provided includes the necessary information for you to take the configurations and modify them to suit a particular
custom scenario.
Table 3. Overview of Functional Components and Configurable Value
Functional Component
Value
Operating System
Improves Availability and Reliability
Computation
Ability to balance Price with Performance
Memory
Ability to balance Price with Capacity and Performance
Storage
Ability to balance Price with Capacity and Performance
Network
Ability to balance Price with Performance and Availability
Operating system
Hortonworks supports 64-bit Red Hat Enterprise Linux (RHEL) 5.x and 6.x and CentOS 5.x and 6.x as choices for the
operating system.
Note
HP recommends using a 64-bit operating system to avoid constraining the amount of memory that can be used on worker
nodes. 64-bit Red Hat Enterprise Linux 6.1 or greater is recommended due to better ecosystem support, more
comprehensive functionality for components such as RAID controllers and compatibility with HP Insight CMU. The Reference
Architectures listed in this document were tested with 64-bit Red Hat Enterprise Linux 6.2.
Computation
The processing or computational capacity of a Hortonworks Data Platform (HDP) cluster is determined by the aggregate
number of MapReduce slots available across all nodes. MapReduce slots are configured on a per server basis. Employing
Hyper-Threading improves process scheduling, allowing you to configure more MapReduce slots. Refer to the Storage
section to see how I/O performance issues arise from sub-optimal disk to core ratios (too many slots and too few disks). For
CPU bound workloads we recommend buying processors with faster clock speeds to remove the bottleneck.
Note
Oracle Java JDK 6 (not JRE) is required to execute MapReduce tasks.
Memory
Use of Error Correcting Memory (ECC) is a practical requirement for Hortonworks Data Platform (HDP) and is standard on all
HP ProLiant servers. Memory requirements differ between the management services and the worker services. For the
worker services, sufficient memory is needed to manage the TaskTracker and DataNode services in addition to the sum of
all the memory assigned to each of the MapReduce slots. If you have a memory bound MapReduce Job we recommend that
you increase the amount of memory on all the nodes running worker services.
Best practice
It is important to saturate all the memory channels available to ensure optimal use of the memory bandwidth. For example,
on a two socket processor with three memory channels that supports two DIMMs each for a total of six (6) DIMMs per
installed processor or a grand total of twelve (12) DIMMs for the server, one would typically fully populate the channels with
8GB DIMMs resulting in a configuration of 96GB of memory per server.