Partnered with OpenSystems Media Best Practices for System Management in the Hyperscale Cloud Michael Stumpf Lead Solution Engineer Dell’s PowerEdge C Server Line MODERATOR Curt Schwaderer, Technology Editor - OpenSystems Media
Partnered with OpenSystems Media Agenda I. A quick tour of the viewer tools II. Introduction III. Presentation IV.
Best Practices for System Management in Hyperscale Clouds Michael Stumpf System Management & Tools, PowerEdge-C Data Center Solutions
Agenda • Hyperscale cloud environments • Techniques • Command line tools • Health monitoring • Best practices 4 http://PowerEdgeC.
Cloud Computing Yes, it’s the latest industry trend… But this one isn’t all ponies and rainbows! So, what does “Cloud Computing” actually mean? • In a nutshell, the goal is to put together a lot of – – – – Efficient Cheap Dense Computing capability & storage • That is “just manageable enough” • Don’t pay for features you don’t need/won’t use 5 http://PowerEdgeC.
Cloud Computing Environments • Failures will occur – – – – Plan for them Minimize impact Minimize total cost Replace it; throw old one on the junk pile • Failure is expected, so single points of failure are ok • Software stack expects, detects, and handles failure – Hadoop • Why pay 10x for 99.999% uptime? – Is 5.26 minutes per year downtime realistic? 6 http://PowerEdgeC.
BMC (Baseboard Management Controller) • BMC is really ideal for Cloud Computing – Cheap, bolt-on, auxiliary maintenance & monitoring – Out-of-band management – Present on every server node – Always-on – Power host on/off – Provides virtual KVM, media, serial port over IP – Monitors server node health – Allows for physical separation of management traffic • Enables totally virtualized server management – Put BMC on the network, – Then server is fully remotely manageable (Even bare metal!) 7 http://PowerEd
PowerEdge C6105 Server 4 separate servers in 2U Power-efficient AMD Opteron 4000 processors 4 separate BMCs 8 http://PowerEdgeC.
First Goals: The “Get My Server Manageable” Plan • Get BMC on network – setup IP, or – collect MAC address and setup DHCP daemon • Access it remotely – – – – – – IPMI over LAN Serial-over-LAN (via BMC) Virtual Console Virtual Media Serial port (physical; cheap!) IP KVM ($$$$$) Then, • Configure it • Remotely install an OS (kickstart file) – Or use a read-only, centralized networkbased PXE image (if practical!) • Set up monitoring 9 http://PowerEdgeC.
PXE Boot (boot from Network) • Uses DHCP to get IP address • MAC address is linked to a boot image • PXE pulls boot image with TFTP • It’s always a good idea to make PXE first boot device. – Can simply bypass with timeout and “localboot” – Helps mechanize discovery, provisioning, FA/crash cart • Driven by symlinks on TFTP/PXE server – – – – Newer approaches exist (iPXE) They’re faster, sexy, and usually work When they don’t, you’re in trouble PXE will always work • Construct states: 1. 2. 3. 4. 5. 6. 7.
PXE Boot: Crowbar • Dell’s Crowbar implements this strategy today – http://github.com/dellcloudedge/ crowbar/wiki • Bare metal to fully functioning cloud in under 2 hours • Open; Apache 2 license • Not restricted to Dell hardware • Embodies Dell’s Cloud experience 11 http://PowerEdgeC.
Configure the server (Provision) • PXE image (runs out of ramdisk) – Mounts NFS share, or uses ftp (ncftp is handy) to pull over your custom scripts – Unbundles & runs scripts – Optionally feed back results upstream • Things to set: – – – – – Custom BIOS settings Custom BMC settings Custom Storage adapter settings Update firmwares (if needed) Storage configuration (array creation) • After provisioning, a kickstart file will automate OS install – Warning: environment is very thin – services like IPMI will
IPMI Tools Many choices. Each has a different focus. • ipmitool – what I use & script. Easy to build. Good all-around. • freeipmi – contains ipmiping; useful for probing • ipmiutil – contains idiscover; useful for scanning a network FAQ: How do I scan the network for BMCs? – Does it ping? + Does it respond to IPMI? This is a BMC – BMC should be able to identify its host – Useful for discovery, provisioning, detecting unexpected changes on network FAQ: How do I encrypt IPMI traffic? – Force IPMI 2.
IPMI Cheat Sheet # Start IPMI service on a RHEL/CentOS-style server (usually installed, but not enabled) service ipmi start # Check power state; power on ipmitool power status ipmitool power on # Issue ACPI shutdown (soft shutdown to OS) ipmitool chassis power soft # Reset the BMC itself (“management channel”) ipmitool mc reset cold # Activate serial-over-LAN (type ~? for help; type ~.
IPMI Cheat Sheet, continued # See BMC LAN configuration ipmitool lan print 1 # Change ipmitool ipmitool ipmitool ipmitool the lan lan lan lan BMC set set set set LAN configuration (DHCP/static IP, netmask, gateway, etc) 1 ipsrc static 1 ipaddr 192.168.0.5 1 netmask 255.255.255.0 1 defgw ipaddr 192.168.0.
IPMI Network Scanning Recipes to determine if an IP is a BMC: • ipmitool – slowest; ~20 seconds per IP • ipmiping – fast, ~2 seconds per IP • idiscover – very fast; uses broadcast or GetChannelAuthCap 16 Data Center Solutions
IPMI Network Scanning Recipes to determine if an IP is a BMC: • ipmitool – slowest; ~20 seconds per IP • ipmiping – fast, ~2 seconds per IP • idiscover – very fast; uses broadcast or GetChannelAuthCap 17 Data Center Solutions
Burn-in Explicit Stress/Validation Tools • CPU: cpuburn: http://freecode.com/projects/cpuburn • Memory: memtester: http://pyropus.ca/software/memtester/ • Memory: memtest86+: http://www.memtest.org/ • Memory/IO: stressapptest: http://code.google.com/p/stressapptest/ # Allocate 256MB of memory and run 8 "warm copy" threads, and 8 cpu load threads. Exit after 20 seconds.
Command Line Tools Some people prefer GUIs… • Give me Unix-style, single purpose tools success/failure that return – Scriptable & scale well – Easy to reconfigure & change – These are the instrumentation steps to plug into a GUI anyway • Encapsulate common operations as small tools: – – – – – Server quick overview (1 line) Server full state Server inventory Server healthy? (1 line: yes/no) Server full health info, including sensor readings • Once built, splice small script together to view projects or
PowerEdge C tools: Inventory & State Dashboard & Full View Dashboard Script 20 http://PowerEdgeC.
Command Line Tools: pdsh pdsh (parallel distributed shell): http://sourceforge.net/projects/pdsh/ Run a task (including fanout) in parallel across many hosts. Here is a crash course: • Build pdsh ./configure --without-rsh --with-ssh make make install # Put this in your .bash-profile: export PDSH_RCMD_TYPE=ssh • Set up SSH keys # Create SSH key pair & copy to remote host. ssh-keygen -t dsa ssh-copy-id username@remotehost.com “No passphrase” is most convenient.
Health Monitoring Monitoring need not be complicated. Two options: • Out-of-band – – – – – – Very easy to setup Completely agentless; OS agnostic BMC watches sensors for issues Monitor by polling (once/minute is enough) Management traffic is physically separable Impartial observer Out-of-band (BMC-based) • In-band – – – – Agent runs on the production OS A great example is collectd (http://collectd.
BMC Health monitoring Strategy: - Issue this command - Look for non “ok” or “ns” status - Once per minute should be enough 23 http://PowerEdgeC.
Health Monitoring, ii (Bonus points) • Nagios provides a graphical console front-end • Bonus points if you – feed sensor data into a db, and – Apply visualization tools like Cacti (uses RRDtool) – May discover non-obvious trends (hot spots in the Data Center) • A service-level monitor is also nice – OS can appear alive, but App stack is dead – Monitoring may be very app-specific – Framework like Munin works well (http://munin-monitoring.org/) 24 http://PowerEdgeC.
Best Practice: Tuning & Incremental Rollout • Build proof of concept rather than speculate – Theoretical knowledge only goes so far with such complex systems – Workload patterns drive the architecture – Sometimes, impossible to know until you actually start • If possible, scale up slowly (staged rollout) – 5%, 20%, 50%, 100% – Shake out bugs & limit their impact – Gives you opportunity to limit scope of physical changes, if required – Will become apparent if ratios are right ( CPU-cores :: amount of RAM ::
Best Practices, continued • badblocks, 3 passes @ 100% clean saves a lot of trouble with drives (detect early mortality) • If it’s not broken, don’t touch it – – – – 26 Old firmware is ok Unless it isn’t You’ll know if it isn’t Upgrades always carry risk http://PowerEdgeC.
Updating Mass Numbers of Machines Two Major Strategies: BMC or PXE • BMC (manage by IPMI) – – – – Managed out-of-band, from a central point Simple in concept Limited as to what it can do (settings, firmware) May be able to carry out updates without host reboots • PXE image to carry out actions – – – – – – More complicated Higher upfront effort, lower effort each use Tailored to each vendor’s hardware No limits on scope, or what can be done Requires reboots & some downtime Test on 1, then 5%, before rolli
Partnered with OpenSystems Media Questions and Answers System Management in the Hyperscale Cloud Ask Dell’s Michael Stumpf
Partnered with OpenSystems Media Thanks for joining us. More information on AMD’s cloud offerings can be found at www.AMD.com/cloud You’ll find an archive of this event at: http://ecast.opensystemsmedia.com/ Send us your comments on the presentation: clong@opensystemsmedia.