Steve Womer Rick Troth Mike MacIsaac Kyle S. Black Chapter 1. Sharing and maintaining SLES 10 SP2 Linux under z/VM This paper is divided into the following sections: 1.1, “History” on page 2 - a history of the two versions of the paper 1.2, “Overview” on page 2 - an overview of the entire paper 1.3, “Background of read-only root Linux” on page 3 - describes the shared root file structure and the maintenance system 1.
1.1 History This paper was originally published as the IBM Redpaper Sharing and maintaining Linux under z/VM, largely based on input from architects and system administrators from Nationwide Insurance, published in February of 2008, available on the Web at: http://www.redbooks.ibm.com/abstracts/redp4322.html In 2009, it was updated, with most input coming from system administrators at Penn State University. This paper is available on the Web page: http://linuxvm.
Staff productivity: fewer people are needed to manage a large-scale virtual server environment running on z/VM Operational flexibility: companies can leverage and utilize their IT infrastructure to enhance their business results A word of caution and a disclaimer are necessary. The techniques described in this paper are not simple to implement. Both z/VM and Linux on System z skills are needed. It is not guaranteed that such a system would be supported.
During the boot process, read-write copies of the directories /etc/, /srv/, and /root/ are bind-mounted from /local/ over the read-only copies, while /var/ is its own minidisk. There is a background discussion on bind-mounts in section 1.3.5, “Overview of bind mounts” on page 14. Figure 1-1 shows a block diagram of the entire system. The boxes above the dashed line represent a maintenance system for conventional Linux servers with most file systems being read-write.
A more detailed summary of these systems is in section 1.4, “Summary of virtual machines” on page 19. 1.3.3 High level approach of read-only root system In the section that follows, the read-only root solution is described at a high level. In addition to the read-only root environment, a maintenance process is also established. Details on implementing the solution begin in section 1.5, “Building a read-write maintenance system” on page 20.
old and next, you would simply create more sets of minidisks on the gold user ID, with an agreed upon device numbering convention. S10RWMNT S10RWTST In stall m edia ts GOLD n t2 m mn te xe st t2t c ex ec mnt2gold exec CMSCLONE Figure 1-4 Freezing a golden copy of a Linux system When a new Linux server needs to be cloned or provisioned with conventional read-write disks, it is copied from the gold minidisks.
S10RWMNT LNXCLONE h .s or k r ri p t m sc S10ROGLD Figure 1-6 Creating a read-only root system with the mnt2rogld.sh script Then the cleanup script is run on the first read-only root Linux system and the system is shut down. Now a REXX EXEC (RO2GOLD) is executed to copy from S10ROGLD to the 21Bx minidisks on S10GOLD. These are now the gold read-only root disks.
1.3.4 Directory structure of the read-only root system The directory structure for the Linux shared read-only root is displayed in Figure 1-8.
/lib /media /mnt /opt /sbin /srv /tmp /usr /var Essential shared libraries and kernel modules Mount point for removable media Mount point for mounting a file system temporarily Add-on application software packages Essential system binaries Data for services provided by this system Temporary files Secondary hierarchy Variable data The following directories, or symbolic links to directories, must be in /, if the corresponding subsystem is installed: /home /lib /root User home directories (optional) A
/dev This directory highlights one important characteristic of the Linux file system - almost everything is a file or a directory. If you look at this directory and you should see dasda1, dasda2 etc, which represent the various partitions on the disk drive of the system. The entries beginning with sda* are SCSI devices. Each logical minidisk would be represented as dasda for the first, dasdb for the second, etc...
“The /lib directory contains kernel modules and those shared library images (the C programming code library) needed to boot the system and run the commands in the root file system, i.e. by binaries in /bin and /sbin. Libraries are readily identifiable through their filename extension of *.so. Windows® equivalent to a shared library would be a DLL (dynamically linked library) file. They are essential for basic system functionality.
“Although most distributions neglect to create the directories /opt/bin, /opt/doc, /opt/include, /opt/info, /opt/lib, and /opt/man they are reserved for local system administrator use. Packages may provide “front-end” files intended to be placed in (by linking or copying) these reserved directories by the system administrator, but must function normally in the absence of these reserved directories. Programs to be invoked by users are located in the directory /opt/'package'/bin.
the program. The reasoning behind this is for compliance with IEEE standard P1003.2 (POSIX, part 2).” In the read-only root system /tmp is actually an in-memory only virtual file system as well, for more information see tmpfs (also known as SHMFS). /usr The LSB states that: “/usr usually contains by far the largest share of data on a system. Hence, this is one of the most important directories in the system as it contains all the user binaries, their documentation, libraries, header files, etc....
The methodology used to name subdirectories of /srv is unspecified as there is currently no consensus on how this should be done. One method for structuring data under /srv is by protocol, e.g. ftp, rsync, www, and cvs. On large systems it can be useful to structure /srv by administrative context, such as /srv/physics/www, /srv/compsci/cvs, etc. This setup will differ from host to host.
/mnt Before mount here / guest vol there /mnt here foo.bar foo.bar After mount / guest vol there foo.bar mount –bind /guestvol/there /mnt/here Figure 1-9 Overview of bind mounts 1.3.6 Summary of read-only root file systems Table 1-1 summarize the file systems in the read-only root system. Note that file systems which will be read-only use a type ext2, because a journal cannot be written to a read-only file system. Read-write file systems use a type of ext3 which is more conventional.
Table 1-1 Summary of file systems and swap spaces Directory FS type Attribu tes Device Vaddr Notes / ext2 R/O /dev/disk/by-path/ccw -0.0.
1.3.7 The modified boot process During the normal Linux boot process, the root file system is initially mounted read-only, and then later mounted read-write. In the read-only root system, it is not remounted read-write. Figure 1-10 on page 17 shows the System z Linux boot process. Start Boot loader zipl Load Kernel Initial Ramdisk / sbin/init 2 Inittab 1 Init.d/ boot Run level rc.d boot.d boot boot boot boot . boot boot boot . boot boot .
Checks and mounts /local/ (1B5 disk) Bind mounts /etc/, /srv/ and /root/ This script leaves the root file system in read-only mode after performing a file check on /local/ directory. The section 1.10.5, “Modified boot.rootfsck file” on page 81 contains a copy of the modified script. The boot.findself script will run on the first boot to update the IP address for the new virtual Linux server. This allows the servers to be cloned with identical IP addresses then updated on first boot. 1.3.
CLONERW EXEC Clones a read-write machine from S10GOLD MNT2GOLD EXEC Copies the S10RWMNT minidisks to S10GOLD 11Bx RO2GOLD EXEC Copies disks from S10ROGLD to S10GOLD 21Bx RO2GOLD2 EXEC Copies disks from S10ROGLD to S10GOLD2 21Bx MNT2TST EXEC Copies disks from S10RWMNT to S10RWTST TST2MNT EXEC Copies disks from S10RWTST to S10RWMNT LNXCLONE A Linux system that contains of the tools used to create the S10ROGLD machine i.e. all shared root images.
11Bx Read-write machines are cloned from this series of minidisks. These minidisks were populated and updated from S10RWMNT. 21Bx Read-only machines are linking these minidisks as RR. Consider this the production binaries for shared-root. 1.5 Building a read-write maintenance system Before building a read-only root system, a system for maintaining and cloning a conventional read-write Linux system is described. The read-write system is created with a maintenance plan in mind.
DM63C9 3390-9 S10RWMNT S10RWTST DM63CA 3390-9 DM63CB 3390-9 S10GOLD xxxGOLD LNX226 S10ROGLD 11Bx, 21Bx DM63CD 3390-9 LNX227 DM6364 3390-3 CMSCLONE LNXCLONE Figure 1-11 Disk planning with four 3390-9s and one 3390-3 Defining virtual machines A user ID CMSCLONE is defined with the following directory definition. It is given two minidisks: 191 A small 30 cylinder minidisk for storing the cloning EXECs. 192 A 100 cylinder common disk that will become the Linux user IDs’ read-only 191 disk.
The LINK statement to the CMSCLONE 192 disk enables the other user IDs to share a common read-only 191 disk: PROFILE RORDFLT IPL CMS MACHINE ESA 4 CPU 00 BASE CPU 01 NICDEF 0600 TYPE QDIO LAN SYSTEM VSW1 SPOOL 000C 2540 READER * SPOOL 000D 2540 PUNCH A SPOOL 000E 1403 A CONSOLE 009 3215 T LINK MAINT 0190 0190 RR LINK MAINT 019D 019D RR LINK MAINT 019E 019E RR LINK CMSCLONE 192 191 RR LINK TCPMAINT 592 592 RR A user ID LNXCLONE is created with the following directory definition.
MDISK 01B7 3390 7238 2290 DM63C9 MR PASSWD PASSWD PASSWD MDISK 01B8 3390 9528 0489 DM63C9 MR PASSWD PASSWD PASSWD The user ID S10GOLD, is created with the following directory definition. This ID should never logged on to, so the password is set to NOLOG. The minidisks 11Bx are the read-write gold disks and 21Bx are the read-only gold disks. It requires the space of a complete 3390-9 because it stores two systems.
sbin/ sbin/mnt2rogld.sh sbin/cloneprep.sh sbin/boot.findself sbin/offliner.sh sbin/fstab.ror sbin/boot.rootfsck.diffs vm/ vm/CLONERO.EXEC vm/CLONERW.EXEC vm/MNT2GOLD.EXEC vm/MNT2TST.EXEC vm/PROFILE.EXEC vm/RO2GOLD.EXEC vm/TST2MNT.EXEC vm/COPYMDSK.EXEC vm/SAMPLE.PARM-S10 vm/SLES10S2.EXEC vm/LINKED.EXEC vm/PROFILE.XEDIT vm/RO2GOLD2.EXEC vm/SWAPGEN.EXEC You should now have access to the files associated with this paper. 1.5.3 Populating CMS disks on CMSCLONE The new CMSCLONE user ID is logged onto.
SLES10S2 KERNEL The SLES 10 SP2 kernel. This is available from the /boot/s390x/ directory of the SLES 10 SP2 install media, where it is named vmrdr.ikr. SLES10S2 INITRD The SLES 10 SP2 initial RAMdisk. This is also available from the /boot/s390x/ directory of the SLES 10 SP2 install media, where it is named initrd. SWAPGEN EXEC The EXEC to create VDISK swap spaces. It is included in the tar file for convenience.
... ftp> put PROFILE.EXEC ... ftp> put PROFILE.XEDIT ... ftp> quit 1.5.4 Installing SLES 10 SP2 Linux Linux is installed twice: 1. Onto the golden image (S10RWMNT) which will be the system that is cloned 2. Onto a worker system (LNXCLONE) which will be used for running Linux scripts Installing SLES 10 SP2 onto S10RWMNT This section does not supply every detail on installing Linux.
DMSACP723I A (191) R/O DMSACP723I C (592) R/O DIAG swap disk defined at virtual address 1B2 (16245 4K pages of swap space) DIAG swap disk defined at virtual address 1B3 (16245 4K pages of swap space) Do you want to IPL Linux from DASD 1B0? y/n n Now a minimal SLES 10 SP2 system is installed onto S10RWMNT. The install is started with the SLES10S2 EXEC.
The minidisks are formatted, by first selecting and activating devices 1B0-1B8. Then 1B2 and 1B3 are deselected, so the VDISK swap spaces created by SWAPGEN EXEC are not trashed. The remaining seven disks are formatted as shown in Figure 1-12. Figure 1-12 Formatting seven minidisks For partitioning the DASD, select the devices, mount points and file system types as shown in the Directory, FS Type and Device columns of Table 1-1 on page 16.
When creating a partitions on minidisks, it is extremely important to click the Fstab Options button as shown in the upper left of Figure 1-13 and selecting Device Path in the Mount in /etc/fstab by radio button group as shown in the lower right. The default value in SLES 10 SP1 and SP2 is Device ID which makes cloning nearly impossible because with this setting, the volume ID is stored in the /etc/fstab file. This must be set for each of the six minidisks in this example.
Figure 1-14 shows a summary of the file systems created via the Expert Partitioner: Figure 1-14 Partitioning 1B0 (/dev/dasda) - 1B8 (/dev/dasdi) For Software Selection, all package groups are deselected except for Server Base System.
Figure 1-15 Software selection and Installation settings The first half of the install completes and the new system is IPLed from 1B0. The second half of the install is completed with the following notes. On the Host and Domain Name panel, the box Change Hostname via DHCP is deselected. On the Network Configuration panel, the Firewall is disabled. On the Installation Completed panel, the box Clone This System for Autoyast is deselected.
/dev/disk/by-path/ccw-0.0.01b6-part1 /var /dev/dasdc1 swap /dev/dasdd1 swap /dev/disk/by-path/ccw-0.0.01b4-part1 swap proc /proc sysfs /sys debugfs /sys/kernel/debug devpts /dev/pts ext3 swap swap swap proc sysfs debugfs devpts acl,user_xattr defaults defaults defaults defaults noauto noauto mode=0620,gid=5 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 Important: Verify all the minidisks are accessed by path. The entries in /etc/fstab should look similar to the above.
vm/CLONERO.EXEC vm/CLONERW.EXEC vm/MNT2GOLD.EXEC vm/MNT2TST.EXEC vm/PROFILE.EXEC vm/RO2GOLD.EXEC vm/TST2MNT.EXEC vm/COPYMDSK.EXEC vm/SAMPLE.PARM-S10 vm/SLES10S2.EXEC vm/LINKED.EXEC vm/PROFILE.XEDIT vm/RO2GOLD2.EXEC vm/SWAPGEN.EXEC You should see a README file and directories vm/ whose files should have already been copied to z/VM, and sbin/ for Linux files: # ls sbin boot.findself boot.rootfsck.diffs cloneprep.sh fstab.ror mnt2rogld.sh fstab.dcss mnt2rogld-dcss.sh offliner.
on S10RWTST. In this fashion, there is a backup copy of Linux. If tests are not successful on the test system, a fresh copy of Linux can quickly be rolled back from the maintenance system. There are a number of ways to copy the disks. It is important that before copying, both the source and target systems are shutdown and their virtual machines are logged off z/VM. The MNT2TST EXEC is run from the CMSCLONE user ID. It tries to use FLASHCOPY to copy the minidisks quickly.
00: built on 00: There is 00: FILES: 00: LOGON AT z/VM V5.3.0 IBM Virtualization Technology no logmsg data NO RDR, NO PRT, NO PUN 10:17:18 EDT FRIDAY 08/31/07 2007-06-19 08:41 DMSACP723I A (191) R/O DMSACP723I C (592) R/O DIAG swap disk defined at virtual address 1B2 (16245 4K pages of swap space) DIAG swap disk defined at virtual address 1B3 (16245 4K pages of swap space) Do you want to IPL Linux from DASD 1B0? y/n y Linux should then boot: 00: zIPL v1.5.3 interactive boot menu 00: 00: 0.
at 1B9 for /sbin/ and one at 1BA for /bin/). The additional slots from 2B0-2BF can be used for adding devices for logical volumes, and the additional slots from 320-32F can be considered reserved for future growth. The vmpoff=LOGOFF parameter is added so that VM user IDs are logged off after Linux is shut down. Back up the original zipl.conf then make the following changes: # cd /etc # cp zipl.conf zipl.conf.orig # vi zipl.conf # Modified by YaST2.
Parsing RPM database... Summary: [S1:1][package]cmsfs-1.1.8-3.2.s390x Continue? [y/n]: y Downloading: [S1:1][package]cmsfs-1.1.8-3.2.s390x, 39.7 K(136.9 K unpacked) Installing: [S1:1][package]cmsfs-1.1.8-3.2.s390x Setting the cmm and vmcp module to be loaded When the cmm module is loaded, in conjunction with configuration changes on z/VM, significant performance gains are possible. Collaborative Memory Management and VMRM are discussed in more detail in 1.9.
The boot.findself script is set to run at boot time with the chkconfig command and verified with chkconfig --list: # chkconfig boot.findself on # chkconfig --list boot.findself boot.findself 0:off 1:off 2:off 3:off 4:off 5:off 6:off B:on See “The boot.findself script” on page 67 for a listing of the code and a brief description of the logic. Creating empty mount points under /opt/ Mounting middleware binaries read-only is beyond the scope of this paper.
Do you want to copy R/W disks from S10RWTST to S10RWMNT? y/n y Copying minidisk 01B0 to 11B0 ... 00: Command complete: FLASHCOPY 01B0 0 END TO 11B0 0 END Return value = 0 Copying minidisk 01B1 to 11B1 ... 00: Command complete: FLASHCOPY 01B1 0 END TO 11B1 0 END Return value = 0 Copying minidisk 01B4 to 11B4 ... 00: Command complete: FLASHCOPY 01B4 0 END TO 11B4 0 END Return value = 0 Copying minidisk 01B5 to 11B5 ...
HCPDDR696I VOLID READ IS 0X01B1 COPYING 0X01B1 COPYING DATA 10/12/07 AT 16.47.31 GMT FROM 0X01B1 TO 0X01B1 INPUT CYLINDER EXTENTS OUTPUT CYLINDER EXTENTS START STOP START STOP 0 399 0 399 END OF COPY END OF JOB Return value = 0 Copying minidisk 01B4 to 11B4 ... ... Note that FLASHCOPY did not succeed in every case, and the EXEC falls back to DDR. This is not unexpected as the copying of the data from the previous EXEC had probably not completed in the background of the disk subsystem.
3. Grant the new user ID access to the VSWITCH. The following statement is put in AUTOLOG1’s PROFILE EXEC: 'cp set vswitch vsw1 grant lnx226' This command is also run interactively from the command line for the current z/VM session. By using the SLES 10 SP2 parameter file to maintain the IP address and host name, there is a side effect.
You can now log on to LNX226 and IPL from 1B0. At the initial logon, be sure there are no errors related to minidisks nor VSWITCH access: LOGON LNX226 00: NIC 0600 is created; devices 0600-0602 defined 00: z/VM Version 5 Release 4.0, Service Level 0801 (64-bit), 00: built on IBM Virtualization Technology 00: There is no logmsg data 00: FILES: NO RDR, NO PRT, NO PUN 00: LOGON AT 10:33:03 EDT TUESDAY 05/05/09 z/VM V5.4.
... These messages show that the boot.findself script ran and modified the IP address and host name. You should now be able to start an SSH session with the cloned system at the updated IP address. You can view the file systems with the df -h command: # df -h Filesystem /dev/dasdb1 udev /dev/dasda1 /dev/dasdf1 /dev/dasdi1 /dev/dasdh1 /dev/dasdg1 Size 341M 121M 41M 69M 333M 1.6G 706M Used 79M 128K 13M 4.
1.6.2 Creating a prototype read-only root Linux The script mnt2rogld.sh and some modified files were composed to help facilitate creating a read-only root system from a conventional read-write Linux system. Understand that these are neither supported, nor heavily tested. Again, check with your Linux distributor and/or support company to be sure that such a system will be supported. If you implement it, test everything well. The global variables and functions calls are at the bottom of the script.
TGT="/mnt/target" echo "" 1 echo "Backing up and modifying /etc/init.d/boot script ..." cp $TGT/etc/init.d/boot $TGT/etc/init.d/boot.orig if [ "$?" != 0 ]; then exit 47; fi cat $TGT/etc/init.d/boot.orig | \ sed -e 's:bootrc=/etc/init.d/boot.d:bootrc=/sbin/etc/init.d/boot.d:g' > \ $TGT/etc/init.d/boot if [ "$?" != 0 ]; then exit 48; fi echo "" 2 echo "Backing up and patching /etc/init.d/boot.rootfsck script ..." cp $TGT/etc/init.d/boot.rootfsck $TGT/etc/init.d/boot.rootfsck.
The above lines make copies of /etc/, /root/ and /srv/ to /local/. These are three of the four directories that will be read-write in the read-only root system (the fourth directory, /var/, is it’s own minidisk). These three directories will later be bind-mounted from /local/ to their original slot by the modified boot.rootfsck script. echo "" echo "Manipulating /etc/init.d and /sbin on $TGT ..." chroot $TGT mv /etc/init.d /sbin chroot $TGT ln -s /sbin/init.d /etc } The above lines move /etc/init.
Backing up and modifying /etc/init.d/boot script ... Backing up and patching /etc/init.d/boot.rootfsck script ... patching file /mnt/target/etc/init.d/boot.rootfsck Backing up and copying modified /etc/fstab file ... Copying /etc/fstab file Backing up and modifying /etc/zipl.conf file ... Running zipl in target environment ... ... Copying source /etc/, /root/ and /srv/ to target /local/ ... Making mountpoint for R/O RPM Database... Manipulating /etc/init.d and /sbin on /mnt/target ... Cleaning up ...
00: built on IBM Virtualization Technology 00: There is no logmsg data 00: FILES: NO RDR, NO PRT, NO PUN 00: LOGON AT 13:01:24 EDT TUESDAY 05/05/09 z/VM V5.4.0 2008-12-05 12:25 DMSACP723I A (191) R/O DMSACP723I C (592) R/O DIAG swap disk defined at virtual address 1B2 (16245 4K pages of swap space) DIAG swap disk defined at virtual address 1B3 (16245 4K pages of swap space) Do you want to IPL Linux from DASD 1B0? y/n y 00: zIPL v1.6.3 interactive boot menu 00: 00: 0. default (ipl) 00: 00: 1. ipl 00: 2.
The first five commands should succeed because those are read-write directories. The last three commands should fail because those directories are accessed read-only. You may choose to leave the empty foo files and verify that they remain in place across a reboot. After that is verified you may want to delete the empty files so they don’t get cloned. # rm /etc/foo /var/foo /root/foo /srv/foo /tmp/foo Shutdown the read-only root system.
Return value = 0 01B0-01B1 DETACHED 01B4-01B8 DETACHED 21B0-21B1 DETACHED 21B4-21B8 DETACHED The modified read-only version of the gold read-write system should now be “alongside” the read-write version on the S10GOLD disks. 1.6.4 Cloning a read-only Linux You should now be able to clone a read-only Linux system. A user ID, LNX227, is created to clone the system to. Only the 1B4 (swap), 1B5 (/local/) and 1B6 (/var/) minidisks are read-write.
===> copy s10rwmnt parm-s10 d lnx227 = = ===> x lnx227 parm-s10 d ramdisk_size=65536 root=/dev/ram1 ro init=/linuxrc TERM=dumb HostIP=129.40.179.227 Hostname=ntc227.pbm.ihost.com Gateway=129.40.179.254 Netmask=255.255.255.0 Broadcast=129.40.179.255 Layer2=0 ReadChannel=0.0.0600 WriteChannel=0.0.0601 DataChannel=0.0.0602 Nameserver=129.40.106.1 Portname=dontcare Install=nfs://129.40.179.200/nfs/sles10/SLES-10-CD-s390x-GMC-CD1.
... /etc/init.d/boot.findself: changing (escaped) gpok222\.endicott\.ibm\.com to gpo k227.endicott.ibm.com in /etc/HOSTNAME /etc/init.d/boot.findself: changing gpok222 to gpok227 and IP address in /etc/ho sts /etc/init.d/boot.findself: changing (escaped) 9\.60\.18\.222 to 9.60.18.227 in / etc/sysconfig/network/ifcfg-qeth-bus-ccw-0.0.0600 You should be able to start an SSH session to the new IP address.
6. Test the read-only root system. 7. When both systems are fully tested copy them to the gold read-write disks via MNT2GOLD and RO2GOLD. Now the change should be reflected in any new Linux system that is cloned, be it read-write or read-only. Of course this does not affect any of the Linux systems already in existence. Updating the cloned servers Performing maintenance on existing Linux servers is a little more difficult. The following approach is just one option.
RO2GOLD S10ROGLD 1Bx disks S10GOLD [current] LNX227 RO clone S10DFLT RO2GOLD2 S10GOLD2 [new] LNX242 RO clone S10DFLT LNX242 RO clone S10DFLT2 Figure 1-16 Adding a second user ID, ROGOLD2, to store the next Once the Linux machines are running and linked to the read-only gold disks, the gold disks cannot be updated until those machines are either shutdown or migrated. Figure 1-16 depicts how systems can be safely migrated.
DM63C9 3390-9 S10RWMNT S10RWTST DM63CA 3390-9 DM63CB 3390-9 S10GOLD xxxGOLD S10ROGLD LNX226 11Bx, 21Bx DM63CD 3390-9 LNX227 S10GOLD2 LNX242 DM6364 3390-3 CMSCLONE LNXCLONE Figure 1-17 Disk planning with four 3390-9s Preparing for the new maintenance system To prepare for this new methodology, perform the following steps: 1. Create a S10GOLD2 user ID with a set of 21Bx minidisks, but not 11Bx. The 11Bx disks are not needed because a second set of read/write disks is not required.
3. Bring the directory changes online for the new profile There is now a second gold virtual machine, S10GOLD2, and a second user directory PROFILE, S10DFLT2. Creating a new read-only Linux system Currently in the example system described by this paper, there are only two clones - a read-write system running on LNX226 and a read-only system running on LNX227. A new read-only system is created to demonstrate the maintenance system. 1. Logon to MAINT and create a new read-only user ID.
DIAG swap disk defined at virtual address 1B2 (16245 4K pages of swap space) DIAG swap disk defined at virtual address 1B3 (16245 4K pages of swap space) Do you want to IPL Linux from DASD 1B0? y/n y ... The new read-only Linux system has been created, cloned and started. Start a CMS session on CMSCLONE.
S10GOLD2 so as to not affect the other read-only systems linked to the read-only disks on S10GOLD (LNX227 and LNX242 in this example). 1. From CMSCLONE, the read-write system is copied from S10RWTST to S10RWMNT: ==> tst2mnt ... 2. From CMSCLONE, the read-write system is copied from S10RWMNT to S10GOLD. ==> mnt2gold ... 3. From the Linux system running on LNXCLONE, the read-write system on S10RWMNT is converted into a read-only system on S10ROGLD.
5. Run the LINKED EXEC again to see the change: ==> linked S10ROGLD LINKS: TEST S10GOLD LINKS: GOLD 1 LNX227 LNX242 S10GOLD2 LINKS: [[GOLD 2]] The two read-only Linux systems are still linked to S10GOLD, but note that the double brackets are now around GOLD 2. This shows that S10GOLD2 contains the latest golden image. Now the golden image has been updated and its read-only contents copied to S10GOLD2. The current system still exists and is running on S10GOLD.
# updatedb You have new mail in /var/mail/root # locate foo | head -1 /opt/gnome/share/icons/hicolor/16x16/stock/navigation/stock_navigator-foonote-body-toggl e.png A second test is run on the read-write gold disk. The read-write system running on LNX226 is shut down and the user ID is logged off. A new read-write Linux system is cloned to LNX226: ==> clonerw lnx226 ... An SSH session is started to the newly cloned LNX226 system.
Think of all the things in /usr/, /boot/, / etc. to begin with. All of this runs read-only, and any read-write work is done in the home directory or /etc/ or /var/ etc... These software packages work great, and no explanation is needed. Just install them on S10RWTST/MNT and away you go. 2. Read-only, but picky and needs to be tricked...
echo "" echo "Making logical volumes ..." fdasd -a $dev22b0 pvcreate "$dev22b0"1 vgcreate rwVG "$dev22b0"1 lvcreate -L 200M -n varLV rwVG lvcreate -L 1.2G -n srvLV rwVG mke2fs /dev/rwVG/varLV > /dev/null mke2fs /dev/rwVG/srvLV > /dev/null echo "" echo "Mounting logical volumes ..." vgscan vgchange -ay mount /dev/rw_vg/srv_lv /mnt/target/srv mount /dev/rw_vg/var_lv /mnt/target/var } 1.8.
1.8.3 Enabling Collaborative Memory Management (CMM) VMRM is a z/VM tool that provides functions to dynamically tune the z/VM system. Groups of virtual machines can be defined to be part of a workload. The workloads can then be managed by VMRM to goals that are also defined Cooperative Memory Management (CMM1) is a facility which provides z/VM with a way to indirectly control the amount of memory that Linux uses. This technique is known as ballooning, and is implemented by a special driver in Linux.
ADMIN MSGUSER VMRMADMN NOTIFY MEMORY LNX* S10* RH5* This example will apply to z/VM user IDs whose names begin with LNX, S10 or RH5. 3. Edit the PROFILE EXEC on AUTOLOG1 so that VMRMSVM is auto-logged at IPL time. Logon to AUTOLOG1. 4. Before pressing Enter at the VM READ prompt, type acc (noprof so that the PROFILE EXEC is not run. Then add LOGON AUTOLOG1 z/VM Version 5 Release 4.
1.9 Contents of tar file This file ro-root-S10.tgz is a compressed tar file that contains the following files and directories: README.TXT The README file sbin/ The Linux code - see Appendix 1.10, “Linux code” on page 65. The subdirectory is named sbin/ so you can untar it from /usr/local/ and have the scripts in your default PATH. vm/ The z/VM code - see Appendix 1.11, “z/VM code” on page 84 Following is a list of all the files in the tar file: README.txt sbin/ sbin/mnt2rogld.sh sbin/cloneprep.
It then enables the ID’s 191 disk which should be the CMSCLONE 192 disk and contain the source parameter file: S10RWMNT PARM-S10 and the target parameter file where the file name is the same as the target user ID name. #!/bin/bash # # /etc/init.d/boot.findself # ### BEGIN INIT INFO # Provides: boot.findself # Required-Start: boot.
# Enable my 191 (A) disk #+--------------------------------------------------------------------------+ { chccwdev -e 191 > /dev/null 2>&1 rc=$? if [ $rc != 0 ]; then # unable to enable 191 disk echo "$0: Unable to enable 191, rc from chccwdev = $rc" exit 1 fi sleep 1# wait a sec to be sure disk is ready Adisk=/dev/$(egrep '^0.0.
echo "$0: changing (escaped) $sourceName to $targetName in /etc/HOSTNAME" sed --in-place -e "s/$sourceName/$targetName/g" /etc/HOSTNAME echo "$0: changing $sourceHost to $targetHost and IP address in /etc/hosts" sed --in-place -e "s/$sourceHost/$targetHost/g" \ -e "s/$sourceIP/$targetIP/g" /etc/hosts echo "$0: changing (escaped) $sourceIP to $targetIP in $eth0file" sed --in-place -e "s/$sourceIP/$targetIP/g" $eth0file hostname $targetHost } #+-----------------------------------------------------------------
. /etc/rc.status @@ -74,6 +75,9 @@ case "$1" in start) + # ROR: add 1 + echo "Read-only root: In modified $0" + # # fsck may need a huge amount of memory, so make sure, it is there. # @@ -107,6 +111,7 @@ # if not booted via initrd, /dev is empty.
+# +# + + + + + +# +# + +# +# + +# +# + + + + @@ +# +# +# + + + + @@ +# + + + 70 test $FSCK_RETURN -gt 0 && touch /fsck_corrected_errors ROR: del 1, add 4 mount -n -o remount,rw / mount -n /local mount -n --bind /local/etc /etc mount -n --bind /local/root /root mount -n --bind /local/srv /srv test $FSCK_RETURN -gt 0 && touch /fsck_corrected_errors else echo echo '*** ERROR! Cannot fsck because root is not read-only!' ROR: chg 1 echo '*** ERROR! Cannot fsck because root is not read-only!' echo '*** E
1.10.3 The cloneprep script This script prepares the Linux system on S10RWMNT to be cloned. #!/bin/bash # # IBM DOES NOT WARRANT OR REPRESENT THAT THE CODE PROVIDED IS COMPLETE # OR UP-TO-DATE. IBM DOES NOT WARRANT, REPRESENT OR IMPLY RELIABILITY, # SERVICEABILITY OR FUNCTION OF THE CODE. IBM IS UNDER NO OBLIGATION TO # UPDATE CONTENT NOR PROVIDE FURTHER SUPPORT. # ALL CODE IS PROVIDED "AS IS," WITH NO WARRANTIES OR GUARANTEES WHATSOEVER.
#!/bin/sh # mnt2rogld.sh - script to create a read-only root system on target user ID # Hard-coded virtual device addresses: # 1b0 - /boot # 1b1 - / # 1b2 - swap (VDISK) # 1b3 - swap (VDISK) # 1b4 - swap # 1b5 - /local # 1b6 - /var # 1b7 - /usr # 1b8 - /opt # # Source disks are linked as 11Bx # Target disks are linked as 21Bx # # IBM DOES NOT WARRANT OR REPRESENT THAT THE CODE PROVIDED IS COMPLETE # OR UP-TO-DATE.
#+--------------------------------------------------------------------------+ { userID=$1 CPcmd QUERY $userID rc=$? case $rc in 0) # user ID is logged on or disconnected echo "Error: $userID user ID must be logged off" exit 2 ;; 3) # user ID does not exist echo "Error: $ID user ID does not exist" exit 3 ;; 45) # user ID is logged off - this is correct - fall through ;; *) # unexpected echo "Unexpected rc from QUERY: $rc" echo "$targetID user ID must exist and be logged off" exit 4 esac } #+-----------------
if [ $? != CPcmd link if [ $? != CPcmd link if [ $? != CPcmd link if [ $? != 0 ]; then $targetID 0 ]; then $targetID 0 ]; then $targetID 0 ]; then exit 15; 1b6 21b6 exit 16; 1b7 21b7 exit 17; 1b8 21b8 exit 18; fi mr fi mr fi mr fi } #+--------------------------------------------------------------------------+ function enableSourceDisks() # Enable the source and target disks (except swap disks x1b4) #+--------------------------------------------------------------------------+ { echo "" echo "Enabling sou
#+--------------------------------------------------------------------------+ { source=$1 target=$2 echo "" echo "copying $source to $target ..." sDev=/dev/$(egrep ^0.0.$source /proc/dasd/devices | awk '{ print $7 }') if [ "$?" != 0 ]; then exit 33; fi tDev=/dev/$(egrep ^0.0.$target /proc/dasd/devices | awk '{ print $7 }') if [ "$?" != 0 ]; then exit 34; fi echo "" echo "dasdfmt-ing $tDev ..." dasdfmt -y -b 4096 -f $tDev if [ "$?" != 0 ]; then exit 35; fi echo "" echo "dd-ing $sDev to $tDev ...
#+--------------------------------------------------------------------------+ function mountSourceRoot() # Mount disk at 11b1 over /mnt/source, then make mount points. # Then mount disks at 11be over usr, 11bf over opt and 11b0 over boot #+--------------------------------------------------------------------------+ { echo "" echo "Making source mount point ...
function modifySystem() # Arg 1: "dcss" - if converting to DCSS R/O file systems also, else null # 1) Copy modified /etc/init.d/boot, /etc/init.d/boot.rootfsck and /etc/fstab # 2) Copy source /etc/ and /root/ directories to target /local/ # 3) Move /etc/init.d under /sbin/ and create symlink to point back #+--------------------------------------------------------------------------+ { TGT="/mnt/target" echo "" echo "Backing up and modifying /etc/init.d/boot script ..." cp $TGT/etc/init.d/boot $TGT/etc/init.
sed -e 's/dasd=1b0-1bf/ro dasd=1b0(ro),1b1(ro),1b2-1b7,1b8(ro),1b9-1bf/g' > \ $TGT/etc/zipl.conf fi echo "" echo "Running zipl in target environment ..." chroot $TGT zipl if [ "$?" != 0 ]; then exit 59; fi echo "" echo "Copying source /etc/, /root/ and /srv/ to target /local/ ..." cp -a $TGT/etc $TGT/local echo "mount -n -o ro --bind /local/.var/lib/rpm /var/lib/rpm" >> $TGT/local/etc/init.d/boot.local echo "blogger "RETURN CODE FROM boot.rorpm = $?"" >> $TGT/local/etc/init.d/boot.
# clean up source disks umount /mnt/source chccwdev -d 11b0 chccwdev -d 11b1 chccwdev -d 11b4 chccwdev -d 11b5 chccwdev -d 11b6 chccwdev -d 11b7 chccwdev -d 11b8 vmcp det 11b0 vmcp det 11b1 vmcp det 11b4 vmcp det 11b5 vmcp det 11b6 vmcp det 11b7 vmcp det 11b8 } # main() # global variables sourceID="S10RWMNT" targetID="S10ROGLD" rorDiffs="/usr/local/sbin/boot.rootfsck.diffs" fstabFile="/usr/local/sbin/fstab.ror" fstabDCSSfile="/usr/local/sbin/fstab.
# # /etc/init.d/boot.readonlyroot # derived from /etc/init.d/boot.rootfsck (SuSE) # ### BEGIN INIT INFO # Provides: boot.readonlyroot # Required-Start: # Required-Stop: # Default-Start: B # Default-Stop: # Description: check and mount /local filesystem ### END INIT INFO . /etc/rc.status # to get max number of parallel fsck processes . /etc/sysconfig/boot if [ -f /etc/sysconfig/dump ]; then . /etc/sysconfig/dump fi export FSCK_MAX_INST rc_reset # # LKCD is active when # - $DUMP_ACTIVE = 1 # - boot.
# [[ $2 =~ $sl ]] || NEED_REMOUNT=1 case "$sl" in ro) WRITABLE=ro ;; rw) WRITABLE=rw ;; esac done test "$WRITABLE" = "unknown" && WRITABLE=rw [[ ${2} =~ $WRITABLE ]] || NEED_REMOUNT=1 } case "$1" in start) # ROR: add 1 echo "Read-only root: In modified $0" # # fsck may need a huge amount of memory, so make sure, it is there. # # However, in the case of an active LKCD configuration, we need to # recover the crash dump from the swap partition first, so we cannot # yet activate them.
if test -n "$ROOTFS_BLKDEV" -a "$ROOTFS_BLKDEV" != "/" -a -b "$ROOTFS_BLKDEV" ; then MAY_FSCK=1 fi fi # # # 82 FSCK_FORCE="" if test -f /forcefsck ; then FSCK_FORCE="-f" ROOTFS_FSCK="" fi if test "$ROOTFS_FSCK" = "0" ; then # already checked and ok, skip the rest MAY_FSCK=0 fi if test ! -f /fastboot -a -z "$fastboot" -a $MAY_FSCK -eq 1 ; then # on an umsdos root fs this mount will fail, # so direct error messages to /dev/null. # this seems to be ugly, but should not really be a problem.
echo " bash# mount -n -o remount,rw /local" echo echo "Attention: Only CONTROL-D will reboot the system in this" echo "maintanance mode. shutdown or reboot will not work." echo PS1="(repair filesystem) # " export PS1 /sbin/sulogin /dev/console # if the user has mounted something rw, this should be umounted echo "Unmounting file systems (ignore error messages)" umount -avn # on umsdos fs this would lead to an error message.
fi # start with a clean mtab and enter root fs entry rm -f /etc/mtab* mount -f / ;; stop) ;; # ROR: add 3 halt|reboot) mount -n -o remount,ro /local 2> /dev/null ;; restart) rc_failed 3 rc_status -v ;; status) rc_failed 4 rc_status -v ;; *) echo "Usage: $0 {start|stop|status|restart}" exit 1 ;; esac rc_exit 1.10.6 Configuration file /etc/fstab Following is the contents of the sbin/fstab.ror file. It is copied to the read-only gold system as /etc/fstab: /dev/disk/by-path/ccw-0.0.
“RO2GOLD2 EXEC” on page 94 “TST2MNT EXEC” on page 95 “SAMPLE PARM-S10 file” on page 96 “SLES10S2 EXEC” on page 96 As these files are not supported, all of them start with the following disclaimer: /*-----------------------------------------------------------------THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A
Say 'Target user ID is a required parameter' Say Say 'Syntax: CLONERO targetID' Exit 1 End /* Kyle Black March 26, 2009 */ /* Grab gold disk variable from file */ SAY 'Setting current Gold Disk Variable ...
/*+------------------------------------------------------------------+*/ CleanUp: Procedure expose ro_addrs rw_addrs /*| Detach all disks before exiting |*/ /*| parm 1: Exit code |*/ /*+------------------------------------------------------------------+*/ Parse Arg retVal Say Say 'Cleaning up ...
rw_addrs = '01B0 01B1 01B4 01B5 01B6 01B7 01B8' ro_addrs = '11B0 11B1 11B4 11B5 11B6 11B7 11B8' /* link target disks R/W */ Do a = 1 to Words(rw_addrs) addr = Word(rw_addrs,a) 'CP LINK' target addr addr 'MR' If (rc \= 0) Then Call CleanUp 100+a End /* link source disks R/O */ Do a = 1 to Words(ro_addrs) addr = Word(ro_addrs,a) 'CP LINK' source addr addr 'RR' If (rc \= 0) Then Call CleanUp 200+a End /* copy disks */ Do a = 1 to Words(ro_addrs) source_addr = Word(ro_addrs,a) target_addr = Word(rw_addrs,a) 'EX
End End 1.11.4 MNT2GOLD EXEC The MNT2GOLD EXEC copies the S10RWMNT 1Bx read-write disks to the S10GOLD read-write 21Bx disks: /* Copy Linux from maintenance ID to gold ID's read-write disks */ Address 'COMMAND' source = 'S10RWMNT' target = 'S10GOLD' ro_addrs = '01B0 01B1 01B4 01B5 01B6 01B7 01B8' rw_addrs = '11B0 11B1 11B4 11B5 11B6 11B7 11B8' Call CheckLoggedOff source Call CheckLoggedOff target Say 'Do you want to copy R/W disks from S10RWMNT to S10GOLD? y/n' Parse Upper Pull answer .
When (rc = 3) Then Do /* user ID does not exist */ Say "Error:" virt_machine "user ID does not exist" Exit 3 End When (rc = 45) Then /* user ID is logged off - this is correct */ Return 0 Otherwise Do /* unexpected */ Say "Error:" virt_machine "user ID must exist and be logged off" Exit 4 End End /*+------------------------------------------------------------------+*/ CleanUp: Procedure Expose ro_addrs rw_addrs /*| Detach all disks before exiting |*/ /*| parm 1: Exit code |*/ /*+----------------------------
source_addr = Word(ro_addrs,a) target_addr = Word(rw_addrs,a) 'EXEC COPYMDSK' source_addr target_addr If (rc \= 0) Then Call CleanUp 300+a End /* cleanup */ Call CleanUp 0 /*+------------------------------------------------------------------+*/ CheckLoggedOff: Procedure /*| Verify that a user ID is logged off |*/ /*| parm 1: User ID to check |*/ /*+------------------------------------------------------------------+*/ Parse arg virt_machine .
'CP IPL' iplDisk Else Do /* user is interactive -> prompt */ Say 'Do you want to IPL Linux from DASD' iplDisk'? y/n' Parse Upper Pull answer . If (answer = 'Y') Then 'CP IPL' iplDisk End Exit 1.11.
Do a = 1 to Words(ro_addrs) addr = Word(ro_addrs,a) 'CP LINK' source addr addr 'RR' If (rc \= 0) Then Call CleanUp 200+a End /* copy disks */ Do a = 1 to Words(ro_addrs) source_addr = Word(ro_addrs,a) target_addr = Word(rw_addrs,a) 'EXEC COPYMDSK' source_addr target_addr If (rc \= 0) Then Call CleanUp 300+a End /* cleanup */ Call CleanUp 0 /*+------------------------------------------------------------------+*/ CheckLoggedOff: Procedure /*| Verify that a user ID is logged off |*/ /*| parm 1: User ID to chec
1.11.8 RO2GOLD2 EXEC The RO2GOLD2 EXEC is almost identical except that it copies to the 1xxx disks of S10ROGOLD2: /* Copy Linux from read-only ID to Gold ID */ 'pipe cp link s10gold2 21b7 21b7 rr' 'pipe cp 10000 q links 21b7 |', 'split /,/|', 'strip|', 'strip /,/|', 'strip|', 'count lines |', 'var howmany |', 'console|', 'stem names.' 'cp det 21b7' If (howmany > 1 ) Then Do Say 'WARNING WARNING WARNING WARNING!' Say 'You have Linux guests currently reading the S10GOLD' Say 'disks.
/*+------------------------------------------------------------------+*/ CleanUp: Procedure Expose ro_addrs rw_addrs /*| Detach all disks before exiting |*/ /*| parm 1: Exit code |*/ /*+------------------------------------------------------------------+*/ Parse Arg retVal Say Say 'Cleaning up ...' 'CP DETACH' ro_addrs rw_addrs /* Write the line S10GOLD2 to the file CURGOLD FILE A */ SAY 'Setting current Gold Disk Variable ...
/* CleanUp */ Call CleanUp 0 /*+------------------------------------------------------------------+*/ CheckLoggedOff: Procedure /*| Verify that a user ID is logged off |*/ /*| parm 1: User ID to check |*/ /*+------------------------------------------------------------------+*/ Parse arg virt_machine .
'CP SPOOL PUN *' 'CP CLOSE RDR' 'CP PURGE RDR ALL' 'PUNCH SLES10S2 KERNEL * (NOHEADER' 'PUNCH' Userid() 'PARM-S10 * (NOHEADER' 'PUNCH SLES10S2 INITRD * (NOHEADER' 'CP CHANGE RDR ALL KEEP' 'CP IPL 00C CLEAR' 1.12 The team that wrote this paper This paper was originally written at Nationwide and converted into a Redpaper at IBM in Poughkeepsie in 2007 by the following authors: Steve Womer is a Sr. Consulting IT Architect at Nationwide Insurance in Columbus Ohio.
1.12.