Boot mechanism for discless HP-UX - technical
Perry E. ScottBoot Mechanism for Discless HP-UX
THE IMPLEMENTATION OF A DISCLESS WORKSTATION requires three distinct services: a remote file system, a remote swapping capability, and the ability to load and initialize the operating system from a remote source. All of these services are implemented for the HP-UX 6.0 system with the goal of maintaining a single-system view. For the boot mechanism this means that although the operating system and its loader are on a remote system (i.e., the root server), a user can power up any workstation in a cluster and get the same boot sequence that is experienced with a stand-alone system. A stand-alone system is a workstation that uses a local disc for booting and file system operations. This article describes how the standard HP-UX boot mechanism works, and the modifications made for the HP-UX 6.0 discless implementation.
Overview
The major modules and interfaces involved in the HP-UX system boot mechanism are shown in Fig. 1. Fig. 1a shows the boot components for a conventional stand-alone HP-UX system and Fig. 1b shows the components for a discless configuration. The following sequence outlines what happens when a discless workstation is powered on and booted. A more detailed description of these steps and the components shown in Fig. 1 is given later.
* After power-up, the boot ROM searches for and assigns an input device (keyboard) and an output device (display) to use as a console.
* The boot ROM checks for and tests interface cards, RAM, and other internal peripherals. It then displays the information shown in the left side of Fig. 2. This is called self-test.
* The boot ROM loader polls all supported mass storage devices and LANs connected to the computer for an operating system, and the message SEARCHING FOR ASYSTEM (RETURN To Pause) appears on the display (see Fig. 2).
* If the user strikes the keyboard during self-test the boot ROM assumes the user wants to control the selection of the operating system to boot. This is called the attended mode. When this is done a list of available operating systems appears on the right side of the display (see Fig. 2). The user selects a system by entering one of the two character codes (e.g., 1H). If a key is not struck the boot ROM loader automatically selects the first bootable system it finds. This is called the unattended mode.
* Once the operating system is chosen (assume 1H) the boot ROM retrieves the secondary loader from the server and loads it into RAM on the discless cnode. Control is then transferred to the secondary loader.
* The secondary loader retrieves the operating system (e.g., /hp-ux) from the server, loads it and transfers control to the operating system.
* The operating system initializes the discless kernel.
The first five steps in this sequence are called the boot ROM phase, and the last two steps are called the secondary loader phase and the HP-UX initialization phase, respectively.
Except for searching the LAN connection and loading the secondary loader from the server, these same actions also take place when a stand-alone HP-UX system is booted. The difference is that the stand-alone system accesses files directly from its local disc instead of going over the LAN. From the user's perspective, the boot process looks the same.
There may be more than one cluster of workstations connected to a LAN cable, and therefore more than one server may exist on the LAN. One of the main features of the discless boot mechanism is that when a booting cnode is polling the LAN connection for an operating system it is able to select the correct server. The mechanism for doing this is explained later.
Discless Workstation Boot Modules
Boot ROM Loader. The HP 9000 Series 300 boot ROM loader is one of the boot ROM modules located in EPROM on the CPU board. After self-test the boot ROM loader initiates communication with the server to retrieve the bootable system files. During the boot sequence, when the boot ROM loader finds a LAN interface it broadcasts a server identify request packet. Typically a cnode belongs to one server; however, there is the possibility for a cnode to be configured with more than one server. Each server has a process called /etc/rbootd listening to the LAN. Based on the information in the server's configuration file (/etc/clusterconf), etc/rbootd decides whether to respond with the server's host name. The host name is then displayed on the cnode's system console. The process /etc/rbootd, which is discussed later, is a server daemon that handles communication with discless cnodes during boot.
For each server responding, the boot ROM loader sends a file list request packet containing a file number. The file number is incremented for each file list request sent to a particular server. As the file names are sent to the requesting cnode they are displayed on its system console (see Fig. 2). This is done until the file number exceeds the number of boot file names the server has available to send. At this point the server responds with a reply packet that indicats there are no more file names to send. When a bootable file is selected (e.g., 1H) the boot ROM sends a request to open the file. This file (e.g., SYSHPUX) is the secondary loader and resides on the server as/usr/boot/SYSHPUX.
In addition to opening the boot file, the boot ROM records several global variables in RAM that are used by the secondary loader and the HP-UX kernel. These values include:
* MSUS (mass storage unit specifier). Information about the boot device, such as the directory format, device type, and select code.
* SYSNAME. The name of the selected operating system (e.g., SYSHPUX).
* SYSFLAG2. The name of the processor type on the cnode (e.g., 68020).
* LOWRAM, HIGHRAM. The low and high limits of system memory.
* F_AREA. A driver scratch area where the LAN link level address of the server is stored. The link level address is retrieved from the IEEE 802.3 packet containing the server's host name.
After the boot file is opened, the boot ROM loader issues a read request packet to the server to read the secondary loader into the discless cnode's memory. When the secondary loader has been loaded, a boot complete packet is sent to close the boot file and terminate the session. The boot ROM then passes control to the secondary loader.
Boot ROM User Interface. The displays produced during boot and the handling of user input are the responsibilities of the boot ROM user interface modules. When a key is struck during self-test (attended mode) the interface module is responsible for assigning the two-character codes (e.g., 1H, 2B) to each bootable operating system that is found. All prompts and error messages go through the user interface routines.
Boot ROM Read Interface. The read interface provides file open, read, and close facilities to the boot ROM loader and the secondary loader, and it functions as an interface to the driver modules. The boot ROM loader uses the read interface to load the secondary loader, and the secondary loader uses it to load the HP-UX system.
The read interface operates in either an absolute mode or a file mode. In file mode, file relative addressing is used to access files on the server. The booting cnode relies on the server to resolve the logical address into physical disc blocks. In absolute mode, device relative addressing is used and the calling routine is responsible for performing the logical-to-physical disc block mappings.
For the discless implementation one of the design goals was to make the read interface to the LAN driver look like other devices so that existing secondary loaders would not have to change. The original HP-UX loader was built on the assumption that it was always booting from a local disc; therefore, it uses the absolute mode. The absolute mode proved impractical for the LAN driver. The HP-UX secondary loader was modified to recognize nondisc devices and use the file mode. We already had secondary loaders for our BASIC and Pascal workstations which use the file mode for boot over the Shared Resource Manager (SRM). The SRM has characteristics similar to the LAN.
Root Server Boot Modules
/etc/rbootd (remote boot daemon). /etc/rbootd is a process that runs on the root server and handles all of the boot protocol requests between the server and the discless workstations. Rbootd uses two files to determine how it should respond to requests from the discless cnodes: a configuration file /etc/clusterconf and a boot table /etc/boottab. The configuration file contains the names and link level addresses of the cnodes associated with the server. /etc/boottab contains a list of boot files available to each cnode in the cluster. Rbootd detects when changes are made to either of these files and reconfigures itself using the new information.
To allow context dependent boot files (files tailored to the capabilities of the workstation), rbootd emulates the pathname lookup code used by the HP-UX 6.0 kernel to handle context dependent files. The emulation is not perfect since rbootd cannot determine some of the hardware-specific context (e.g., whether the discless cnode has an MC68881 floating-point coprocessor installed). Therefore, hardware-specific context elements are not supported for boot files. Context dependent files (CDF) are ddiscussed in detail in the article "A Discless HP-UX File System," on page 10.
Rbootd supports four levels of error and information logging, ranging from logging only fatal errors to recording the beginning and end of every boot session. The logging level is set with a command line option.
The communication protocol used by rbootd is based on a simple request/reply model. When a packet arrives, rbootd wakes up and processes the packet, usually by sending a reply, and then goes back to sleep. Requests are queued by the link level access driver in the kernel. Because queue space is limited, rbootd uses HP's real-time priority feature to ensure that boot (especially unattended boot) does not fail because of dropped packets.
Several boot protocols were investigated for our discless implementation. The Trivial File Transfer Protocol (TFTP) was considered, but could not be used. First, the boot ROM read interface is random-access and TFTP is sequential-only access. Second, TFTP is built on top of IP, which would require more code in the boot ROM. Finally, the boot ROM must obtain a list of file names, which is not provided by TFTP. We could have worked around many of these limitations; however, we decided to use a version of the Remote Maintenance Protocol (RMP) boot capability. This protocol was already in use within HP and the only capability missing was the ability to obtain a list of files from the server. Investigation showed that special interpretation of certain fields in the boot request packet would allow this feature to be implementated.
Rbootd services five types of requests: server identify, boot file list, boot request, read request, and boot complete. The boot request, read request, and boot complete packet types are standard RMP requests. The server identify and boot file list packet types are extensions to the RMP boot request packet.
* Server Identify Request. In the boot ROM phase the discless cnode uses the server identify request to get a server's hostname. At the same time the server's link level network address is obtained from the IEEE 802.o packet header sent by the server's LAN driver.
* Boot File List Request. The boot file list request is sent by the boot ROM to obtain the names of the files listed in /etc/boottab. The request packet contains an index number that is used by rbootd to respond with the name of the file. If the number is greater than the number of files available, rbootd responds with a packet indicating that there are no more boot files.
* Boot Request. A boot request opens the requested boot file and allocates a session number. This session number is used by the discless cnode for the read request and boot complete request. Session numbers are used to support concurrent boot requests.
* Read Request. A read request is used to read a boot file. The request packet contains an offset and the number of bytes to be read from the file. This enables the discless cnode to access data randomly from the boot file. Rbootd responds with a packet containing the number of bytes actually read.
* Boot Complete Request. Boot complete causes rbootd to close the boot file and deallocate the session number.
Secondary Loader. In a stand-alone system the secondary loader resides in Logical Interchange Format (LIF) in the first 8K bytes of the boot disc. It is transferred to memory by the boot ROM interface routines at the end of the boot ROM phase. The purpose of the secondary loader is to load the /hp-ux a out file (i.e., the HP-UX operating system) into low memory and execute it. Fig. 3 shows the secondary loader's flow of control and the processes involved for discless and stand-alone loading situations. The open(), read(), and close() routines emulate the behavior of the HP-UX system routines by the same name, and provide the secondary loader with an interface to the boot ROM read interface open, read, and close routines. The file system parser is a routine that understands the HP-UX file system structure and is responsible for resolving HP-UX pathnames during a boot file open in the absolute mode. Bookkeeping functions include the activities performed to keep track of data being transferred from disc (for instance, keeping a count of the number of blocks and current file offset and size, or processing partial or multiblock data transfers).
The secondary loader starts the loading process by examining the LOWRAM variable to determine the load point for the HP-UX kernel, and then uses the variable MSUS to determine the boot device. The name of the boot file is retreived from the variable SYSNAME and the boot file name is translated to an HP-UX pathname and the open() routine is called. For instance, the boot file SYSHPUX is translated to /hp-ux.
The open() routine selects either the absolute or the file mode of the open operation depending on the type of boot device. For local boot the file system parser resolves the HP-Ux pathname by using the boot ROM read interface read routine to perform pathname lookup. For a remote boot, as in the discless situation, the LAN driver is invoked through the boot ROM read interface open routine and a boot request is sent to the server where it is processed by rbootd.
The read() routine makes the same selection as the open() regarding absolute or file mode and uses the boot ROM read interface read routine to access the drivers. For absolute mode the loader uses the bookkeeping function to keep track of character counts, number of blocks read, and block addressing. For the discless situation a read request is sent to the server to be processed by rbootd. The read() operation results in transferring the selected operating system (/hp-ux) to the discless cnode's memory. The loading sequence for the operating system proceeds as follows: first the /hp-ux a.out header, which contains the sizes of the text, data, and uninitialized data areas, is read into a temporary area, and then the file /hp-ux is read into memory in two read calls, one for text and one for data.
When the operating system is loaded the close() routine is called. For the discless situation this results in a boot complete request being sent to rbootd. For the stand-alone situation the loader does some internal bookkeeping without calling the boot ROM. When the close operation is complete the secondary loader transfers control to the HP-UX kernel.
Kernel Debugger Considerations
The above process changes slightly if SYSDEBUG is chosen instead of SYSHPUX. The kernel debugger is loaded just like the HP-UX kernel. When the debugger is started, it opens the a.out file /SYSDEBUG to find its relocation information, then moves itself into high RAM, adjusting all of its jump points. It then adjusts the HIGHRAM boot ROM variable, effectively protcting itself from being overwritten.
The debugger uses the secondary loader open(), read(), and close() routines, which are left in high RAM. After the user selects the kernel to boot, the debugger loads the HP-UX kernel like the secondary loader loads the HP-UX kernel.
HP-UX Discless Kernel Initialization
The HP-UX discless kernel finds its server's LAN card address in the boot ROM F_AREA. This value is used to initialize several discless kernel pointers, which effectively turns on the discless message interface. The discless message interface provides the protocol for communication between a discless workstation and the server. The discless message interface is described in detail in the article "The Design of Network Functions for Discless Clusters" on page 20. Once the discless message layer is operational the discless cnode sends a cluster request message to the server. The cluster message contains the discless cnode's LAN address, which is used for security purposes, and its kernel release number, which is used to prevent server or client kernel mismatch.
The server validates the discless cnode's request by comparing the cnode's LAN address against the list kept in /etc/clusterconf. If it is not there the request is rejected. Likewise, the request is rejected if the kernel release numbers do not match. Otherwise, the server broadcasts a message to the rest of the cluster and the discless cnode is admitted. The server then sends a message to the cnode that contains the current system time, a description of the rest of the discless cnodes in the cluster, and the ID of the cnode's root and swap servers. At this point, the discless cnode can use the root server's file system, and control is passed to the /etc/init program. The discless file system is used to execute programs started by /etc/init, and kernel initialization is complete.
Acknowledgments
The authors would like to thank the following individuals who contributed to the discless boot mechanism: Anny Randel for her work on the original /etc/rbootd design and prototype, David O. Gutierrez for his patient explanation of the HP-UX LAN driver, discless messages, and kernel initialization, and Joe Cowan for project management in bringing together the resources to complete the discless boot mechanism.
COPYRIGHT 1988 Hewlett Packard Company
COPYRIGHT 2004 Gale Group