SUSE Linux Enterprise Server 10 SP1 EAL4 High-Level Design Version 1.2.
Version Author Date Comments 1.0 EJR 3/15/07 First draft based on RHEL5 HLD 1.1 EJR 4/19/07 Updates based on comments from Stephan Mueller and Klaus Weidner 1.2 GCW 4/26/07 Incorporated Stephan's comment to remove racoon 1.2.1 GCW 10/27/08 Added legal matter missing from final draft. Novell, the Novell logo, the N logo, and SUSE are registered trademarks of Novell, Inc. in the United States and other countries.
Table of Contents 1 Introduction....................................................................................................................................................1 1.1 Purpose of this document.......................................................................................................................1 1.2 Document overview ..............................................................................................................................1 1.
.1.2.1 DAC....................................................................................................................................25 4.1.2.2 AppArmor............................................................................................................................26 4.1.2.3 Programs with software privilege.........................................................................................26 4.2 TOE Security Functions software structure..............................................
.1.5 Discretionary Access Control (DAC)..........................................................................................55 5.1.5.1 Permission bits.....................................................................................................................56 5.1.5.2 Access Control Lists ............................................................................................................57 5.1.6 Asynchronous I/O .........................................................................
5.3.3.2 Common functions...............................................................................................................76 5.3.3.3 Message queues....................................................................................................................77 5.3.3.4 Semaphores..........................................................................................................................78 5.3.3.5 Shared memory regions.............................................................
5.5.3 Kernel memory management....................................................................................................142 5.5.3.1 Support for NUMA servers................................................................................................142 5.5.3.2 Reverse map Virtual Memory............................................................................................143 5.5.3.3 Huge Translation Lookaside Buffers..........................................................................
5.8.3 securityfs....................................................................................................................................174 5.9 Device drivers....................................................................................................................................174 5.9.1 I/O virtualization on System z....................................................................................................175 5.9.1.1 Interpretive-execution facility...........................
5.11.3.1 agetty................................................................................................................................203 5.11.3.2 gpasswd............................................................................................................................203 5.11.3.3 login.................................................................................................................................203 5.11.3.4 mingetty.........................................................
5.13.3.2 groupmod.........................................................................................................................232 5.13.3.3 groupdel...........................................................................................................................232 5.13.4 System Time management.......................................................................................................234 5.13.4.1 date.............................................................................
6.1 Identification and authentication.......................................................................................................251 6.1.1 User identification and authentication data management (IA.1).................................................251 6.1.2 Common authentication mechanism (IA.2)................................................................................251 6.1.3 Interactive login and related mechanisms (IA.3)...................................................................
6.8 Security enforcing interfaces between subsystems.............................................................................255 6.8.1 Summary of kernel subsystem interfaces ..................................................................................256 6.8.1.1 Kernel subsystem file and I/O............................................................................................257 6.8.1.2 Kernel subsystem process control and management...........................................................
1 Introduction This document describes the High Level Design (HLD) for the SUSE® Linux® Enterprise Server 10 Service Pack 1 operating system. For ease of reading, this document uses the phrase SUSE Linux Enterprise Server and the abbreviation SLES as a synonym for SUSE Linux Enterprise Server 10 SP1. This document summarizes the design and Target of Evaluation Security Functions (TSF) of the SUSE Linux Enterprise Server (SLES) operating system.
2 System Overview The Target of Evaluation (TOE) is SUSE Linux Enterprise Server (SLES) running on an IBM eServer host computer. The SLES product is available on a wide range of hardware platforms. This evaluation covers the SLES product on the IBM eServer System x™, System p™, and System z™, and eServer 326 (Opteron). (Throughout this document, SLES refers only to the specific evaluation platforms). Multiple TOE systems can be connected via a physically-protected Local Area Network (LAN).
The TOE system provides user Identification and Authentication (I&A) mechanism by requiring each user to log in with proper password at the local workstation, and also at any remote computer where the user can enter commands to a shell program (for example, remote ssh sessions). Each computer enforces a coherent Discretionary Access Control (DAC) policy, based on UNIX®-style mode bits and an optional Access Control List (ACL) for the named objects under its control.
The Common Criteria for Information Technology Security Evaluation [CC] and the Common Methodology for Information Technology Security Evaluation [CEM] demand breaking the TOE into logical subsystems that can be either (a) products, or (b) logical functions performed by the system.
The SLES kernel includes the base kernel and separately-loadable kernel modules and device drivers. (Note that a device driver can also be a kernel module.) The kernel consists of the bootable kernel image and its loadable modules. The kernel implements the system call interface, which provides system calls for file management, memory management, process management, networking, and other TSF (logical subsystems) functions addressed in the Functional Descriptions chapter of this document.
2.2.2 eServer system structure The system is an eServer computer, which permits one user at a time to log in to the computer console. Several virtual consoles can be mapped to a single physical console. Different users can login through different virtual consoles simultaneously. The system can be connected to other computers via physically and logically protected LANs.
Figure 2-3: Local and network services provided by SLES Network services, such as ssh or ftp, involve client-server architecture and a network service-layer protocol. The client-server model splits the software that provides a service into a client portion that makes the request, and a server portion that carries out the request, usually on a different computer. The service protocol is the interface between the client and server.
Objects are passive repositories of data. The TOE defines three types of objects: named objects, storage objects, and public objects. Named objects are resources, such as files and IPC objects, which can be manipulated by multiple users using a naming convention defined at the TSF interface. A storage object is an object that supports both read and write access by multiple non-trusted subjects.
The local TSF interfaces provided by an individual host computer include: • Files that are part of the TSF database that define the configuration parameters used by the security functions. • System calls made by trusted and untrusted programs to the privileged kernel-mode software. As described separately in this document, system calls are exported by the base SLES kernel and by kernel modules.
The SLES operating system is distributed as a collection of packages. A package can include programs, configuration data, and documentation for the package. Analysis is performed at the file level, except where a particular package can be treated collectively. A file is included in the TSF for one or more of the following reasons: • It contains code, such as the kernel, kernel module, and device drivers, that runs in a privileged hardware state. • It enforces the security policy of the system.
11
3 Hardware architecture The TOE includes the IBM System x, System p, System z, and eServer 326. This section describes the hardware architecture of these eServer systems. For more detailed information about Linux support and resources for the entire eServer line, refer to http://www.ibm.com/systems/browse/linux. 3.1 System x IBM System x systems are Intel processor-based servers with X-architecture technology enhancements for reliability, performance, and manageability.
In this mode, applications may access: • 64-bit flat linear addressing • 8 new general-purpose registers (GPRs) • 8 new registers for streaming Single Instruction/Multiple Data (SIMD) extensions (SSE, SSE2 and SSE3) • 64-bit-wide GPRs and instruction pointers • uniform byte-register addressing • fast interrupt-prioritization mechanism • a new instruction-pointer relative-addressing mode.
USB (except keyboard and mouse), PCMCIA, and IEEE 1394 (Firewire) devices are not supported in the evaluated configuration. 3.3 System z The IBM System z is designed and optimized for high-performance data and transaction serving requirements. On a System z system, Linux can run on native hardware, in a logical partition, or as a guest of the z/VM® operating system. SLES runs on System z as a guest of the z/VM Operating System.
Figure 3-1: z/VM as hypervisor For more details about z/Architecture, refer to the z/Architecture document z/Architecture Principles of Operation at http://publibz.boulder.ibm.com/epubs/pdf/dz9zr002.pdf. USB (except keyboard and mouse), PCMCIA, and IEEE 1394 (Firewire) devices are not supported in the evaluated configuration. 3.4 eServer 326 The IBM eServer 326 systems are AMD Opteron processor-based systems that provide high performance computing in both 32-bit and 64-bit environments.
processor extensions are activated, allowing the processor to operate in one of two sub-modes of LMA. These are the 64-bit mode and the compatibility mode. • 64-bit mode: In 64-bit mode, the processor supports 64-bit virtual addresses, a 64-bit instruction pointer, 64-bit general-purpose registers, and eight additional general-purpose registers, for a total of 16 general-purpose registers.
17
4 Software architecture This chapter summarizes the software structure and design of the SLES system and provides references to detailed design documentation. The following subsections describe the TOE Security Functions (TSF) software and the TSF databases for the SLES system. The descriptions are organized according to the structure of the system and describe the SLES kernel that controls access to shared resources from trusted (administrator) and untrusted (user) processes.
Figure 4-1: Levels of Privilege System x: The System x servers are powered by Intel processors. Intel processors provide four execution modes, identified with processor privilege levels 0 through 3. The highest privilege level execution mode corresponds to processor privilege level 0; the lowest privilege level execution mode corresponds to processor privilege level 3. The SLES kernel, as with most other UNIX-variant kernels, utilizes only two of these execution modes.
When the processor is in kernel mode, the program has hardware privilege because it can execute certain privileged instructions that are not available in user mode. • Thus, any code that runs in kernel mode executes with hardware privileges. Software that runs with hardware privileges includes: • The base SLES kernel. This constitutes a large portion of software that performs memory management file I/O and process management. • Separately loaded kernel modules, such as ext3 device driver modules.
4.1.2.1 DAC The DAC model allows the owner of the object to decide who can access that object, and in what manner. Like any other access control model, DAC implementation can be explained by which subjects and objects are under the control of the model, security attributes used by the model, access control and attribute transition rules, and the override (software privilege) mechanism to bypass those rules. 4.1.2.1.1 Subjects and objects Subjects in SLES are regular processes and kernel threads.
4.1.2.3 Programs with software privilege Examples of programs running with software privilege are: • Programs that are run by the system, such as the cron and init daemons. • Programs that are run by trusted administrators to perform system administration. • Programs that run with privileged identity by executing setuid programs. All software that runs with hardware privileges or software privileges, and that implements security enforcing functions, is part of the TOE Security Functions (TSF).
The concept of breaking the TOE product into logical subsystems is described in the Common Criteria. These logical subsystems are the building blocks of the TOE, and are described in the Functional Descriptions chapter of this paper. They include logical subsystems and trusted processes that implement security functions. A logical subsystem can implement or support one or more functional components. For example, the File and I/O subsystem is partly implemented by functions of the Virtual Memory Manager. 4.
4.2.1.1 Logical components The kernel consists of logical subsystems that provide different functionalities. Even though the kernel is a single executable program, the various services it provides can be broken into logical components. These components interact to provide specific functions. Figure 4-3 schematically describes logical kernel subsystems, their interactions with each other, and with the system call interface available from user space.
• Audit subsystem: This subsystem implements functions related to recording of security-critical events on the system. Implemented functions include those that trap each system call to record security critical events and those that implement the collection and recording of audit data. 4.2.1.2 Execution components The execution components of the kernel can be divided into three components: base kernel, kernel threads, and kernel modules depending on their execution perspective.
4.2.1.2.3 Kernel modules and device drivers Kernel modules are pieces of code that can be loaded and unloaded into and out of the kernel upon demand. They extend the functionality of the kernel without the need to reboot the system. Once loaded, the kernel module object code can access other kernel code and data in the same manner as statically-linked kernel object code. A device driver is a special type of kernel module that allows the kernel to access the hardware connected to the system.
• • The crontab program is the program used to install, deinstall, or list the tables used to drive the cron daemon. Users can have their own crontab files that set up the time and frequency of execution, as well as the command or script to execute. • The gpasswd command administers the /etc/group file and /etc/gshadow file if compiled with SHADOWGRP defined. The gpasswd command allows system administrators to designate group administrators for a particular group.
4.3 • The chfn command allows users to change their finger information. The finger command displays that information, which is stored in the /etc/passwd file. • The date command is used to print or set the system date and time. Only an administrative user is allowed to set the system date and time. • The groupadd, groupmod, and groupdel commands allow an administrator to add, modify, or delete a group, respectively. Refer to their respective man pages for more detailed information.
This section briefly describes the functional subsystems that implement the required security functionalities and the logical subsystems that are part of each of the functional subsystems. The subsystems are structured into those implemented within the SLES kernel, and those implemented as trusted processes. 4.4.1 Hardware The hardware consists of the physical resources such as CPU, main memory, registers, caches, and devices that effectively make up the computer system.
4.4.5 • gpasswd • chage • useradd, usermod, userdel • groupadd, groupmode, groupdel • chsh • chfn • openssl User-level audit subsystem This subsystem contains the portion of the audit system that lies outside the kernel. This subsystem contains the auditd trusted process, which reads audit records from the kernel buffer, and transfers them to on-disk audit logs, the ausearch trusted search utility, the autrace trace utility, the audit configuration file, and audit libraries.
31
5 Functional descriptions The kernel structure, its trusted software, and its Target of Evaluation (TOE) Security Functions (TSF) databases provide the foundation for the descriptions in this chapter. 5.1 File and I/O management The file and I/O subsystem is a management system for defining objects on secondary storage devices.
In order to shield user programs from the underlying details of different types of disk devices and disk-based file systems, the SLES kernel provides a software layer that handles all system calls related to a standard UNIX file system. This common interface layer, called the Virtual File System, interacts with disk-based file systems whose physical I/O devices are managed through device special files.
Figure 5-3: ext3 and CD-ROM file systems after mounting The root directory is contained in the root file system, which is ext3 in this TOE. All other file systems can be mounted on subdirectories of the root file system. The VFS allows programs to perform operations on files without having to know the implementation of the underlying disk-based file system. The VFS layer redirects file operation requests to the appropriate file system-specific file operation. An example is in Figure 5-4.
inode: Stores general information about a specific file, such as file type and access rights, file owner, group owner, length in bytes, operations vector, time of last file access, time of last file write, and time of last inode change. An inode is associated to each file and is described in the kernel by a struct inode data structure.
Figure 5-5: VFS pathname translation and access control checks Figure 5-5 VFS pathname translation and access control checks 36
5.1.1.2 open() The following describes the call sequence of an open() call to create a file: 1. Call the open() system call with a relative pathname and flags to create a file for read and write. 2. open() calls open_namei(), which ultimately derives the dentry for the directory in which the file is being created. If the pathname contains multiple directories, search permission for all directories in the path is required to get access to the file.
5.1.1.3 write() Another example of a file system operation is a write() system call to write to a file that was opened for writing. The write() system call in VFS is very straightforward, because access checks have already been performed by open(). The following list shows the call sequence of a write() call: 1. Call the write() system call with the file descriptor that was returned by open(). 2. Call fget() to get the file pointer corresponding to the file descriptor. 3.
• Unbindable Mount: This mount does not forward or receive propagation. This mount type can not be bind-mounted, and it is not valid to move it under a shared mount. • Slave Mount: A slave mount remains tied to its parent mount and receives new mount or unmount events from there. The mount or unmount events in a slave mount do not propagate elsewhere. • Shared Mount: When this mount is used, all events generated are automatically propagated to the shared mount subtree.
5.1.2.1.1.1 Access Control Lists ACLs provide a way of extending directory and file access restrictions beyond the traditional owner, group, and world permission settings. For more details about the ACL format, refer to Discretionary Access Control, Section 5.1.5, of this document, and section 6.2.4.3 of the SLES Security Target document. EAs are stored on disk blocks allocated outside of an inode.
• ext3_group_desc: Disk blocks are partitioned into groups. Each group has its own group descriptor. ext3_group_desc stores information such as the block number of the inode bitmap, and the block number of the block bitmap. • ext3_inode: The on-disk counterpart of the inode structure of VFS, ext3_inode stores information such as file owner, file type and access rights, file length in bytes, time of last file access, number of data blocks, pointer to data blocks, and file access control list.
Figure 5-8: New data blocks are allocated and initialized for an ext3 field 42
Figure 5-9 shows how for a file on the ext3 file system, inode_operations map to ext3_file_inode_operations. Figure 5-9: Access control on ext3 file system Similarly, for directory, symlink, and special-file types of objects, inode operations map to ext3_dir_inode_operations, ext3_symlink_inode_operations, and ext3_special_inode_operations, respectively. ext3_truncate() is the entry point for truncating a file.
from the superblock’s s_root field of the superblock, and then invokes isofs_find_entry() to retrieve the object from the CD-ROM. On a CD-ROM file system, inode_operations map to isofs_dir_inode_operations. Figure 5-10: File lookup on CD-ROM file system 5.1.3 5.1.3.1 Pseudo file systems procfs The proc file system is a special file system that allows system programs and administrators to manipulate the data structures of the kernel.
Since VM is volatile in nature, tmpfs data is not preserved between reboots. Hence this file system is used to store short-lived temporary files. An administrator is allowed to specify the memory placement policies (the policy itself and the preferred nodes to be allocated) for this file system. 5.1.3.3 sysfs sysfs is an in-memory file system, which acts as repository for system and device status information, providing a hierarchical view of the system device tree.
5.1.3.6 binfmt_misc binfmt_misc provides the ability to register additional binary formats to the kernel without compiling an additional module or kernel. Therefore, binfmt_misc needs to know magic numbers at the beginning, or the filename extension of the binary. binfmt_misc works by maintaining a linked list of structs that contain a description of a binary format, including a magic number with size, or the filename extension, offset and mask, and the interpreter name.
chown() system call. The owner and the root user are allowed to define and change access rights for an object. This following subsection looks at the kernel functions implementing the access checks. The function used depends on the file system; for example, vfs_permission() invokes permission() which then calls specific *_permission() routines based on the inode’s inode operation vector i_op. proc_permission() is called for files in procfs. ext3_permission() is called for the ext3 diskbased file system.
• If the process is neither the owner nor a member of an appropriate group, and the permission bits for world allow the type of access requested, then the subject is permitted access. • If none of the conditions above are satisfied, and the effective UID of the process is not zero, then the access attempt is denied. 5.1.5.2 Access Control Lists The ext3 file system supports Access Control Lists (ACLs) that offer more flexibility than the traditional permission bits.
5.1.5.2.3 ACL permissions An ACL entry can define separate permissions for read, write, and execute or search. 5.1.5.2.4 Relationship to file permission bits An ACL contains exactly one entry for each of the ACL_USER_OBJ, ACL_GROUP_OBJ, and ACL_OTHER types of tags, called the required ACL entries. An ACL can have between zero and a defined maximum number of entries of the ACL_GROUP and ACL_USER types. An ACL that has only the three required ACL entries is called a minimum ACL.
5.1.5.2.8 ACL enforcement The ext3_permission() function uses ACLs to enforce DAC. The algorithm goes through the following steps: 1. Performs checks such as “no write access if read-only file system” and “no write access if the file is immutable.” 2. For ext3 file systems, the kernel calls the ext3_get_acl() to get the ACL corresponding to the object. ext3_get_acl() calls ext3_xattr_get(), which in turn calls ext3_acl_from_disk() to retrieve the extended attribute from the disk.
file by adding ACLs with the setfacl command. For example, the following command allows a user named john read access to this file, even if john does not belong to the root group. #setfacl –m user:john:4,mask::4 /aclfile The ACL on file will look like: # owner: root # group: root user:: rwuser:john:r— group::r-mask::r-other::--- The mask field reflects the maximum permission that a user can get.
application, the I/O scheduler is considered an important kernel component in the I/O path. SLES includes four I/O scheduler options to optimize system performance. 5.1.7.1 Deadline I/O scheduler The deadline I/O scheduler available in the Linux 2.6 kernel incorporates a per-request expiration-based approach, and operates on five I/O queues.
requests. This capability makes it behaves similarly to the Anticipatory I/O scheduler. I/O priorities are also considered for the processes, which are derived from their CPU priority. 5.1.7.4 Noop I/O scheduler The noop I/O scheduler can be considered as a rather minimal I/O scheduler that performs, as well as provides, basic merging and sorting functionalities.
5.1.8.4 Tasklets Tasklets are dynamically linked and built on top of softirq mechanisms. Tasklets differ from softirqs in that a tasklet is always serialized with respect to itself. In other words, a tasklet cannot be executed by two CPUs at the same time. However, different tasklets can be executed concurrently on several CPUs. 5.1.8.5 Work queue The work queue mechanism was introduced in the 2.6 Linux kernel.
5.2 Process control and management A process is an instance of a program in execution. Process management consists of creating, manipulating, and terminating a process. Process management is handled by the process management subsystems of the kernel. The kernel interacts with the memory subsystem, the network subsystem, the file and I/O subsystem, and the inter-process communication (IPC) subsystem.
The SLES kernel maintains information about each process in a task_struct process type of descriptor. Each process descriptor contains information such as run-state of process, address space, list of open files, process priority, which files the process is allowed to access, and security relevant credentials fields including the following: • uid and gid, which describe the user ID and group ID of a process. • euid and egid, which describe the effective user ID and effective group ID of a process.
Figure 5-12: The task structure The kernel maintains a circular doubly-linked list of all existing process descriptors. The head of the list is the init_task descriptor referenced by the first element of the task array. The init_task descriptor belongs to process 0 or the swapper, the ancestor of all processes. 5.2.2 Process creation and destruction The SLES kernel provides these system calls for creating a new process: clone(), fork(), and vfork().
5.2.2.2.4 setresuid()and setresgid() These set the real user and group ID, the effective user and group ID, and the saved set-user and group ID of the current process. Normal user processes (that is, processes with real, effective, and saved user IDs that are nonzero) may change the real, effective, and saved user and group IDs to either the current uid and gid, the current effective uid and gid, or the current saved uid and gid.
5.2.5 Scheduling Scheduling is one of the features that is highly improved in the SLES 2.6 kernel over the 2.4 kernel. It uses a new scheduler algorithm, called the O (1) algorithm, that provides greatly increased scheduling scalability. The O (1) algorithm achieves this by taking care that the time taken to choose a process for placing into execution is constant, regardless of the number of processes.
Figure 5-14: Hyperthreaded scheduling For more information about hyperthreading, refer to http://www.intel.com/technology/hyperthread/. 5.2.6 Kernel preemption The kernel preemption feature has been implemented in the Linux 2.6 kernel. This should significantly lower latency times for user-interactive applications, multimedia applications, and the like. This feature is especially good for real-time systems and embedded devices.
The following code snippet demonstrates the per-CPU data structure problem, in an SMP system: int arr[NR_CPUS]; arr[smp_processor_id()] = i; /* kernel preemption could happen here */ j = arr[smp_processor_id()]; /* i and j are not equal as smp_processor_id() may not be the same */ In this situation, if kernel preemption had happened at the specified point, the task would have been assigned to some other processor upon re-schedule, in which case smp_processor_id() would have returned a different value.
5.3.1 Pipes Pipes allow the transfer of data in a FIFO manner. The pipe() system call creates unnamed pipes. Unnamed pipes are only accessible to the creating process and its descendants through file descriptors. Once a pipe is created, a process may use the read() and write() VFS system calls to access it. In order to allow access from the VFS layer, the kernel creates an inode object and two file objects for each pipe. One file object is used for reading (reader) and the other for writing (writer).
pipe_inode_info: Contains generic state information about the pipe with fields such as base (which points to the kernel buffer), len (which represents the number of bytes written into the buffer and yet to be read), wait (which represents the wait queue), and start (which points to the read position in the kernel buffer). do_pipe(): Invoked through the pipe() system call, do_pipe() creates a pipe that performs the following actions: 1. Allocates and initializes an inode. 2.
The inode allocation routine of the disk-based file system does the allocation and initialization of the inode object; thus, object reuse is handled by the disk-based file system. 5.3.2.2 FIFO open A call to the open() VFS system call performs the same operation as it does for device special files. Regular DACs when the FIFO inode is read are identical to access checks performed for other file system objects, such as files and directories.
• ipc_id: The ipc_id data structure describes the security credentials of an IPC resource with the p field, which is a pointer to the credential structure of the resource. • kern_ipc_perm: The kern_ipc_perm data structure is a credential structure for an IPC resource with fields such as key, uid, gid, cuid, cgid, mode, seq, and security. uid and cuid represent the owner and creator user ID. gid and cgid represent the owner and creator group ID.
5.3.3.3.3 msgget() This function is invoked to create a new message queue, or to get a descriptor of an existing queue based on a key. The newly created credentials of the message queue are initialized from the credentials of the creating process. 5.3.3.3.4 msgsnd() This function is invoked to send a message to a message queue. DAC is performed by invoking the ipcperms() function. A message is copied from the user buffer into the newly allocated msg_msg structure.
5.3.3.4.4 semctl() A function that is invoked to set attributes, query status, or delete a semaphore. A semaphore is not deleted until the process waiting for a semaphore has received it. DAC is performed by invoking the ipcperms() function. 5.3.3.5 Shared memory regions Shared memory regions allow two or more processes to access common data by placing the processes in an IPC shared memory region.
5.3.4 Signals Signals offer a means of delivering asynchronous events to processes. Processes can send signals to each other with the kill() system call, or the kernel can internally deliver the signals. Events that cause a signal to be generated include keyboard interrupts via the interrupt, stop, or quit keys, exceptions from invalid instructions, or termination of a process.
specifying the target address of the server. For an Internet domain socket, the address of the server is its IP address and its port number. Sockets are created using the socket() system call. Depending on the type of socket, either UNIX domain or internet domain, the socket family operations vector invokes either unix_create() or inet_create(). unix_create() and inet_create() invoke sk_alloc() to allocate the sock structure.
• The protocol-independent interface module provides an interface that is independent of hardware devices and network protocol. This is the interface module that is used by other kernel subsystems to access the network without having a dependency on particular protocols or hardware. Finally, the system call interface module restricts the exported routines that user process can access.
The transport layer consists of the TCP, UDP and similar protocols. The application layer consists of all the various application clients and servers, such as the Samba file and print server, the Apache web server, and others. Some of the application-level protocols include Telnet, for remote login; FTP, for file transfer; and, SMTP, for mail transfer. Network devices form the bottom layer of the protocol stack.
5.4.2 Transport layer protocols The transport layer protocols supported by the SLES kernel are TCP and UDP. 5.4.2.1 TCP TCP is a connection-oriented, end-to-end, reliable protocol designed to fit into a layered hierarchy of protocols that support multi-network applications. TCP provides for reliable IPC between pairs of processes in host computers attached to distinct but interconnected computer communication networks.
The following section introduces Internet Protocol Version 6 (IPv6). For additional information about referenced socket options and advanced IPv6 applications, see RFC 3542. Internet Protocol Version 6 (IPv6) was designed to improve upon and succeed Internet Protocol Version 4 (IPv4). IPv4 addresses consist of 32 bits. This accounts for about 4 billion available addresses. The growth of the Internet and the delegation of blocks of these addresses has consumed a large amount of the available address space.
5.4.3.2.3 Flow Labels The IPv6 header has a field to in which to enter a flow label. This provides the ability to identify packets for a connection or a traffic stream for special processing. 5.4.3.2.4 Security The IPv6 specifications mandate IP security. IP security must be included as part of an IPv6 implementation. IP security provides authentication, data integrity, and data confidentiality to the network through the use of the Authentication and Encapsulating Security Payload extension headers.
The phrase data integrity implies that the data received is as it was when sent. It has not been tampered, altered, or impaired in any way. Data authentication ensures that the sender of the data is really who you believe it to be. Without data authentication and integrity, someone can intercept a datagram and alter the contents to reflect something other than what was sent, as well as who sent it.
In tunnel mode, the entire IP datagram is encapsulated, protecting the entire IP datagram. An IP Packet with tunnel mode AH 5.4.3.4.1.2 Encapsulating Security Payload Protocol (ESP) The Encapsulating Security Payload (ESP) header is defined in RFC 2406. Besides data confidentiality, ESP also provides authentication and integrity as an option. The encrypted datagram is contained in the Data section of the ESP header.
An IP Packet with tunnel mode ESP 5.4.3.4.1.3 Security Associations RFC2401 defines a Security Association (SA) as a simplex or one-way connection that affords security services to the traffic it carries. Separate SAs must exist for each direction. IPSec stores the SAs in the Security Association Database (SAD), which resides in the Linux kernel. 5.4.3.4.1.4 Security Policy A Security Policy is a general plan that guides the actions taken on an IP datagram.
5.4.3.4.1.8 Cryptographic subsystem IPSec uses the cryptographic subsystem described in this section. The cryptographic subsystem performs several cryptographic-related assignments, including Digital Signature Algorithm (DSA) signature verification, in-kernel key management, arbitrary-precision integer arithmetic, and verification of kernel modules signatures.
5.4.4.1.1 Address Resolution Protocol (ARP) Address Resolution Protocol (ARP) is a protocol for mapping an IP address to a physical machine address that is recognized in the local network. For example, in IP Version 4, the most common level of IP in use today, an address is 32 bits long. In an Ethernet local area network, however, addresses for attached devices are 48 bits long. (The physical machine address is also known as a Media Access Control [MAC] address.
The following subsections describe access control and object reuse handling associated with establishing a communications channel. 5.4.5.1 socket() socket() creates an endpoint of communication using the desired protocol type. Object reuse handling during socket creation is described in Section 5.3.5. socket() may perform additional access control checks by calling the security_socket_create() and security_socket_post_create() LSM hooks, but the SLES kernel does not use these LSM hooks. 5.4.5.
Figure 5-21: bind() function for UNIX domain TCP socket Similarly, for UNIX domain sockets, bind() invokes unix_bind(). unix_bind() creates an entry in the regular ext3 file system space. This process of creating an entry for a socket in the regular file system space has to undergo all file system access control restrictions. The socket exists in the regular ext3 file system space, and honors DAC policies of the ext3 file system.
5.4.5.6 Generic calls read(), write() and close(): read(), write() and close() are generic I/O system calls that operate on a file descriptor. Depending on the type of object, whether regular file, directory, or socket, appropriate object-specific functions are invoked. 5.4.5.7 Access control DAC mediation is performed at bind() time.
• A system call interface is provided to provide restricted access to user processes. This interface allows user processes to allocate and free storage, and also to perform memory-mapped file I/O.
5.5.1 Four-Level Page Tables Before the current implementation of four-level page tables, the kernel implemented a three-level page table structure for all architectures. The three-level page table structure that previously existed was constituted, from top to bottom, for the page global directory (PGD), page middle directory (PMD), and PTE.
Figure 5-25: New page-table implementation: the four-level page-table architecture The creation and insertion of a new level, the PUD level, immediately below the top-level PGD directory aims to maintain portability and transparency once all architectures have an active PGD at the top of hierarchy and an active PTE at the bottom. The PMD and PUD levels are only used in architectures that need them. These levels are optimized on systems that do not use them.
The larger kernel virtual address space allows the system to manage more physical memory. Up to 64 GB of main memory is supported by SLES on x86-compatible systems. The larger user virtual address space allows applications to use approximately 30% more memory (3.7—3.8 GB), improving performance for applications that take advantage of the feature. This means that x86compatible systems can be expected to have a longer life-span and better performance.
5.5.2.1.1 Segmentation The segmentation unit translates a logical address into a linear address. A logical address consists of two parts: a 16 bit segment identifier called the segment selector, and a 32-bit offset. For quick retrieval of the segment selector, the processor provides six segmentation registers whose purpose is to hold segment selectors. Three of these segmentation registers have specific purpose.
5.5.2.1.2 Paging The paging unit translates linear addresses into physical addresses. It checks the requested access type against the access rights of the linear address. Linear addresses are grouped in fixed-length intervals called pages. To allow the kernel to specify the physical address and access rights of a page instead of addresses and access rights of all the linear addresses in the page, continuous linear addresses within a page are mapped to continuous physical addresses.
Figure 5-30: Regular paging In extended paging, 32 bits of linear address are divided into two fields: • Directory: The most significant 10 bits represents directory. • Offset: The remaining 22 bits represents offset. Figure 5-31: Extended paging Each entry of the page directory and of the page table is represented by the same data structure. This data structure includes fields that describe the page table or page entry, such as accessed flag, dirty flag, and page size flag.
User-Supervisor flag: This flag contains the privilege level that is required for accessing the page or page table. The User-Supervisor flag is either 0, which indicates that the page can be accessed only in kernel mode, or 1, which indicates that it can always be accessed. Figure 5-32: Access control through paging 5.5.2.1.2.1 Paging in the SLES kernel The SLES kernel is based on Linux version 2.6.16, and implements three-level paging to support 64-bit architectures.
For more information about call gates, refer to the http://www.csee.umbc.edu/~plusquel/310/slides/micro_arch4.html Web site. 5.5.2.1.2.3 Translation lookaside buffers The System x processor includes other caches, in addition to the hardware caches. These caches are called Translation Lookaside Buffers (TLBs), and they speed up the linear-to-physical address translation. The TLB is built up as the kernel performs linear-to physical translations.
Figure 5-33: Paging data structures The PS flag in the page directory entry (PDE.PS) selects between 4 KB and 2 MB page sizes. 5.5.2.2 System p Linux on POWER5 System p systems runs only in Logical Partitioning (LPAR) mode. The System p offers the ability to partition one system into several independent systems through LPAR. LPAR divides the processors, memory, and storage into multiple sets of resources, so each set of resources can be operated independently with its own OS instance and applications.
Figure 5-34: Logical partitions On System p systems without logical partitions, the processor has two operating modes, user and supervisor. The user and supervisor modes are implemented using the PR bit of the Machine State Register (MSR). Logical partitions on System p systems necessitate a third mode of operation for the processor. This third mode, called the hypervisor mode, provides all the partition control and partition mediation in the system.
• 0 The processor is not in hypervisor state. • 1 If MSRPR= 0 the processor is in hypervisor state; otherwise, the processor is not in hypervisor state. The hypervisor takes the value of 1 for hypervisor mode and 0 for user and supervisor mode.
Figure 5-36: Determination of processor mode in LPAR Just as certain memory areas are protected from access in user mode, some memory areas, such as hardware page tables, are accessible only in hypervisor mode. The PowerPC and POWER architecture provides only one system call instruction. This system call instruction, sc, is used to perform system calls from the user space intended for the SLES kernel, as well as hypervisor calls from the kernel space intended for the hypervisor.
hardware address of the memory. This translation is done by the hypervisor, which keeps a logical partition unaware of the existence of other logical partitions. 5.5.2.2.1 Address Translation on LPARs On System p systems running with logical partitions, the effective address, the virtual address, and the physical address format and meaning are identical to those of System p systems running in native mode.
5.5.2.2.4 Virtual mode addressing Operating systems use another type of addressing, virtual addressing, to give user applications an effective address space that exceeds the amount of physical memory installed in the system. The operating system does this by paging infrequently used programs and data from memory out to disk, and bringing them back into memory on demand.
Figure 5-38: DMA addressing 5.5.2.2.7 Run-Time Abstraction Services System p hardware platforms provide a set of firmware Run-Time Abstraction Services (RTAS) calls. In LPAR, these calls perform additional validation checking and resource virtualization for the partitioned environment.
For further information about PowerPC 64 bit processor, see PowerPC 64-bit Kernel Internals by David Engebretson, Mike Corrigan & Peter Bergner at http://lwn.net/2001/features/OLS/pdf/pdf/ppc64.pdf. You can find further in formation about System p hardware at http://www-1.ibm.com/servers/eserver/pseries/linux/. The following describes the four address types used in System p systems.
Figure 5-41: Block address • To access a particular memory location, the CPU transforms an effective address into a physical address using one of the following address translation mechanisms. • Real mode address translation, where address translation is disabled. The physical address is the same as the effective address. • Block address translation, which translates the effective address corresponding to a block of 128 KB to 256 MB size.
• DR: Data Address Translation. The value of 0 disables translation, and the value of 1 enables translation. 5.5.2.3.2 Page descriptor Pages are described by Page Table Entries (PTEs). The operating system generates and places PTEs in a page table in memory. A PTE on SLES is 128 bits in length. Bits relevant to access control are Page protection bits (PP), which are used with MSR and segment descriptor fields to implement access control. Figure 5-43: Page table entry 5.5.2.3.
Figure 5-45: Block Address Translation entry • Vs: Supervisor mode valid bit. Used with MSR[PR] to restrict translation for some block addresses. • Vp: User mode valid bit. Used with MSR[PR] to restrict translation for some block addresses. • PP: Protection bits for block. 5.5.2.3.5 Address translation mechanisms The following simplified flowchart describes the process of selecting an address translation mechanism based on the MSR settings for instruction (IR) or data (DR) access.
Real Mode Address Translation: Real Mode Address Translation is not technically the translation of any addresses. Real Mode Address Translation signifies no translation. That is, the physical address is the same as the effective address. The operating system uses this mode during initialization and some interrupt processing. Because there is no translation, there is no access control implemented for this mode.
Page address translation begins with a check to see if the effective segment ID, corresponding to the effective address, exists in the Segment Lookaside Buffer (SLB). The SLB provides a mapping between Effective Segment Ids (ESIDs) and Virtual Segment Ids (VSIDs). If the SLB search fails, a segment fault occurs. This is an Instruction Segment exception or a data segment exception, depending on whether the effective address is for an instruction fetch or for a data access.
Figure 5-48: Page Address Translation and access control 105
5.5.2.4 System z SLES on System z systems can run either in native mode or in LPAR. Additionally, it can run as z/VM guests, which is specific to this series. This section briefly describes these three modes and how they address and protect memory. For more detailed information about System z architecture, refer to Z/Architecture Principle of Operation, at http://publibz.boulder.ibm.com/epubs/pdf/dz9zr002.pdf, or System z hardware documents at http://www-1.ibm.com/servers/eserver/zseries/. 5.5.2.4.
Absolute address: An absolute address is the address assigned to a main memory location. An absolute address is used for a memory access without any transformations performed on it. Effective address: An effective address is the address that exists before any transformation takes place by dynamic address translation or prefixing. An effective address is the result of the address arithmetic of adding the base register, the index register, and the displacement.
Figure 5-49: System z address types and their translation 5.5.2.4.7.1 Dynamic address translation Bit 5 of the current PSW indicates whether a virtual address is to be translated using paging tables. If it is, bits 16 and 17 control which address space translation mode (primary, secondary, access-register, or home) is used for the translation. Figure 5-50: Program Status Word The following diagram illustrates the logic used to determine the translation mode.
Figure 5-51: Address translation modes Each address-space translation mode translates virtual addresses corresponding to that address space. For example, primary address-space mode translates virtual addresses from the primary address space, and home address space mode translates virtual addresses belonging to the home address space. Each address space has an associated Address Space Control Element (ASCE).
Figure 5-52: 64-bit or 31-bit Dynamic Address Translation 5.5.2.4.7.2 Prefixing Prefixing provides the ability to assign a range of real addresses to a different block in absolute memory for each CPU, thus permitting more than one CPU sharing main memory to operate concurrently with a minimum of interference. Prefixing is performed with the help of a prefix register. No access control is performed while translating a real address to an absolute address.
For a detailed description of prefixing as well as implementation details, see z/Architecture Principles of Operation at http://publibz.boulder.ibm.com/epubs/pdf/dz9zr002.pdf. 5.5.2.4.8 Memory protection mechanisms In addition to separating the address space of user and supervisor states, the z/Architecture provides mechanisms to protect memory from unauthorized access.
5.5.2.4.8.2 Page table protection The page table protection mechanism is applied to virtual addresses during their translation to real addresses. The page table protection mechanism controls access to virtual storage by using the page protection bit in each page-table entry and segment-table entry. Protection can be applied to a single page or an entire segment (a collection of contiguous pages).
Figure 5-54: 31-bit Dynamic Address Translation with page table protection 113
Figure 5-55: 64-bit Dynamic Address Translation with page table protection 114
5.5.2.4.8.3 Key-controlled protection When an access attempt is made to an absolute address, which refers to a memory location, key-controlled protection is applied. Each 4K page, real memory location, has a 7-bit storage key associated with it. These storage keys for pages can only be set when the processor is in the supervisor state. The Program Status Word contains an access key corresponding to the current running program.
Figure 5-57: Fetch protection override for key-controlled 5.5.2.5 eServer 326 eServer 326 systems use AMD Opteron processors. The Opteron processors can either operate in legacy mode to support 32-bit operating systems, or in long mode to support 64-bit operating systems. Long mode has two possible sub modes, the 64-bit mode, which runs only 64-bit applications, and compatibility mode, which can run on both 32-bit and 64-bit applications simultaneously.
The segment selector specifies an entry in either the global or local descriptor table. The specified descriptortable entry describes the segment location in virtual-address space, its size, and other characteristics. The effective address is used as an offset into the segment specified by the selector. 5.5.2.5.2 Effective address The offset into a memory segment is referred to as an effective address.
• Requestor Privilege Level (RPL):RPL represents the privilege level of the program that created the segment selector. The RPL is stored in the segment selector used to reference the segment descriptor. • Descriptor Privilege Level (DPL):DPL is the privilege level that is associated with an individual segment. The system software assigns this DPL, and it is stored in the segment descriptor. CPL, RPL, and DPL are used to implement access control on data accesses and control transfers as follows. 5.5.2.5.
calls. If the code segment is non-conforming (with conforming bit C set to zero in the segment descriptor), then the processor first checks to ensure that CPL is equal to DPL. If CPL is equal to DPL, then the processor performs the next check to see if the RPL value is less than or equal to the CPL. A general protection exception occurs if either of the two checks fail.
Figure 5-60: Contiguous linear addresses map to contiguous physical addresses The eServer 326 supports a four-level page table. The uppermost level is kept private to the architecturespecific code of SLES. The page-table setup supports up to 48 bits of address space. The x86-64 architecture supports page sizes of 4 KB and 2 MB. Figure 5-61 illustrates how paging is used to translate a 64-bit virtual address into a physical address for the 4 KB page size.
Figure 5-61: 4 KB page translation, virtual to physical address translation When the page size is 2 MB, bits 0 to 20 represent the byte offset into the physical page. That is, page table offset and byte offset of the 4 KB page translation are combined to provide a byte offset into the 2 MB physical page. Figure 5-62 illustrates how paging is used to translate a 64-bit linear address into a physical address for the 2 MB page size.
Figure 5-62: 2 MB page translation, virtual to physical address translation Each entry of the page map level-4 table, the page-directory pointer table, the page-directory table, and the page table is represented by the same data structure. This data structure includes fields that interact in implementing access control during paging. These fields are the Read/Write (R/W) flag, the User/Supervisor (U/S) flag, and the No Execute (NX) flag.
• Read/Write flag: This flag contains access rights of the physical pages mapped by the table entry. The R/W flag is either read/write or read. If set to 0, the corresponding page can only be read; otherwise, the corresponding page can be written to or read. The R/W flag affects all physical pages mapped by the table entry. That is, the R/W flag of the page map level-4 entry affects access to all the 128 MB (512 x 512 x 512) physical pages it maps through the lower-level translation tables.
5.5.3.1 Support for NUMA servers NUMA is an architecture wherein the memory access time for different regions of memory from a given processor varies according to the nodal distance of the memory region from the processor. Each region of memory, to which access times are the same from any CPU, is called a node. The NUMA architecture surpasses the scalability limits of SMP (Symmetric Multi-Processing) architecture. With SMP, all memory accesses are posted to the same shared memory bus.
systems, this operation is unacceptably slow. With Rmap VM, additional memory management structures have been created that enable a physical address to be back-translated to its associated virtual address quickly and easily. Figure 5-65: Rmap VM For more information about Rmap VM, see http://lwn.net/Articles/23732/ and http://www-106.ibm.com/developerworks/linux/library/l-mem26/. 5.5.3.
Huge TLB File system (hugetlbfs) is a pseudo file system, implemented in fs/hugetlbfs/inode.c. The basic idea behind the implementation is that large pages are being used to back up any file that exists in the file system. During initialization, init_hugetlbfs_fs() registers the file system and mounts it as an internal file system with kern_mount(). There are two ways that a process can access huge pages.
5.5.3.4 Remap_file_pages Remap_file_pages is another memory management feature that is suitable for large memory and database applications. It is primarily useful for x86 systems that use the shared memory file system (shmemfs). A shmemfs memory segment requires kernel structures for control and mapping functions, and these structures can grow unacceptably large given a large enough segment and multiple sharers.
5.5.3.6 Memory area management Memory areas are sequences of memory cells having contiguous physical addresses with an arbitrary length. The SLES kernel uses the buddy algorithm for dealing with relatively large memory requests, but in order to satisfy kernel needs of small memory areas, a different scheme, called slab allocator, is used. The slab allocator views memory areas as objects with data and methods.
address returned by arch_get_unmapped_area() to contain a linear address that is part of another process’s address space. In addition to this process compartmentalization, the do_mmap() routine also makes sure that when a new memory region is inserted it does not cause the size of the process address space to exceed the threshold set by the system parameter rlimit. The do_mmap() function only allocates a new valid linear address to the address space of a process.
5.5.5 Symmetric multiprocessing and synchronization The SLES kernel allows multiple processes to execute in the kernel simultaneously (the kernel is reentrant). It also supports symmetric multiprocessing (SMP), in which two or more processors share the same memory and have equal access to I/O devices. Because of re-entrancy and SMP synchronization, issues arises. This section describes various synchronization techniques used by the SLES kernel.
5.5.5.3 Spin locks Spin locks provide an additional synchronization primitive for applications running on SMP systems. A spin lock is just a simple flag. When a kernel control path tries to claim a spin lock, it first checks whether or not the flag is already set. If not, then the flag is set, and the operation succeeds immediately. If it is not possible to claim a spin lock, then the current thread spins in a tight loop, repeatedly checking the flag until it is clear.
Figure 5-69: Audit framework components 5.6.1.1 Audit kernel components Linux Audit of the SLES kernel includes three kernel-side components relating to the audit functionality. The first component is a generic mechanism for creating audit records and communicating with user space. The communication is achieved via netlink socket interface. Netlink enables the transfer of information between kernel modules and user-space processes. It provides kernel-user space bidirectional communication links.
The kernel checks the effective capabilities of the sender process. If the sender does not possess the right capability, the netlink message is discarded. 5.6.1.1.2 Syscall auditing The second component is a mechanism that addresses system call auditing. It uses the generic logging mechanism for creating audit records and communicating with user space. 5.6.1.1.3 Filesystem watches The third component adds file system auditing support, based on file locations and names, to the audit subsystem.
Figure 5-71: Task Structure 5.6.1.1.5 Audit context fields • Login ID: Login ID is the user ID of the logged-in user. It remains unchanged through the setuid() or seteuid() system calls. Login ID is required by the Controlled Access Protection Profile to irrefutably associate a user with that user’s actions, even across su() calls or use of setuid binaries.
• serial: A unique number that helps identify a particular audit record. Along with ctime, it can determine which pieces belong to the same audit record. The (timestamp, serial) tuple is unique for each syscall and it lives from syscall entry to syscall exit. • ctime: Time at system call entry. • major: System call number. • argv array: The first 4 arguments of the system call. • name_count: Number of names. The maximum defined is 20.
When a filesystem object the audit subsystem is watching changes, the inotify subsystem calls the audit_handle_event() function. audit_handle_event() in turn updates the audit subsystem's watch data for the watched entity. This process is detailed in Section 5.6.3.1.3. 5.6.1.3 User space audit components The main user level audit components consist of a daemon (auditd), a control program (auditctl), a library (libaudit), a configuration file (auditd.conf), and an initial setup file (auditd.rules).
Figure 5-72: Audit User Space Components 5.6.2 Audit operation and configuration options 5.6.2.1 Configuration There are many ways to control the operation of the audit subsystem. The controls are available at compilation time, boot time, daemon startup time, and while the daemon is running. At compilation time, SLES kernel provides three kernel configuration options that control the level of audit support compiled into the kernel.
Option Description Possible values log_file name of the log file log_format How to flush the data from auditd to the log. priority_boost The nice value for auditd. Used to run auditd at a certain priority. flush Method of writing data to disk. none, interval, data, sync freq Used when flush is incremental, states how many records written before a forced flush to disk. num_logs Number of log files to use max_log_file Maximum log size in megabytes.
Option description Possible values -b Sets max number of outstanding Default is 64 buffer allowed. If all buffers are exhausted, the failure flag is checked. -e Sets enabled flag 0|1 -f Sets failure flag silent, printk, panic -r Sets the rate of messages/second. If 0 no rate is set. If > 0 and rate exceeded, the failure flag is checked. Table 5-3: audictl control arguments 5.6.2.
7. If audit is enabled, the kernel intercepts the system calls, and generates audit records according to the filter rules. Or, the kernel generates audit records for watches set on particular file system files or directories. 8. Trusted programs can also write audit records for security relevant operation via the audit netlink, not directly to the audit log. 9. The auditctl 5.6.3 –m option is another way to write messages to the log.
Figure 5-73: Audit Record Generation 5.6.3.1.2 Syscall audit record generation Once attached, every security-relevant system call performed by the process is evaluated in the kernel. The process’s descendants maintain their attachment to the audit subsystem. 1. All security-relevant system calls made by the process are intercepted at the beginning or at the exit of the system call code.
generates the audit record, and sends the record to netlink socket. Both audit_syscall_entry() and audit_syscall_exit() call audit_filter_syscall() to apply filter logic, to check whether to audit or not to audit the call. Figure 5-74: Extension to system calls interface Filtering logic allows an administrative user to filter out events based on the rules set, as described in the auditctl man page. 5.6.3.1.3 File system audit record generation The file system code is hooked with the inotify subsystem.
5.6.3.1.4 Socket call and IPC audit record generation Some system calls pass an argument to the kernel specifying which function the system call is requesting from the kernel. These system calls request multiple services from the kernel through a single entry point. For example, the first argument to the ipc() call specifies whether the request is for semaphore operation, shared memory operation, and so forth.
timestamp of the record and the serial number are used by the user-space daemon to determine which pieces belong to the same audit record. The tuple is unique for each syscall and lasts from syscall entry to syscall exit. The tuple is composed of the timestamp and the serial number. Each audit record for system calls contain the system call return code, which indicates if the call was successful or not. The following table lists security relevant events for which an audit record is generated on the TOE.
Event Description LAF audit events Startup and shutdown of audit functions DAEMON_START, DAEMON_END are generated by auditd Modification of audit configuration files DAEMON_CONFIG, DAEMON_RECONFIG are generated by auditd. Syscalls open, link, unlink, rename, truncate, write on configuration files Successful and unsuccessful file read/write Syscall open Audit storage space exceeds a threshold space_left_action, admin_space_left_action configuration parameters for auditd.
Event Description LAF audit events Execution of the test of the underlying machine Audit message from amtu utility: audit record and the result of the test type: USER. Changes to system time Syscall settimeofday, adjtimex Setting up a trusted channel Sycall exec (of stunnel program) Table 5-4: Audit Subsystem event codes 5.6.4 Audit tools In addition to the main components, the user level provides a search utility, ausearch, and a trace utility, autrace.
Lower-layer functions, such as scheduling and interrupt management, cannot be modularized. Kernel modules can be used to add or replace system calls. The SLES kernel supports dynamically-loadable kernel modules that are loaded automatically on demand. Loading and unloading occurs as follows: 1. The kernel notices that a requested feature is not resident in the kernel. 2. The kernel executes the modprobe program to load a module that fits this symbolic description. 3.
STRUCTURE OBJECT task_struct Task(Process) linux_binprm Program super_block File system inode Pipe, File, or Socket file Open File sk_buff Network Buffer(Packet) net_device Network Device kern_ipc_perm Semaphore, Shared Memory Segment, or Message Queue msg_msg Individual Message Table 5-5: Kernel data structures modified by the LSM kernel patch and the corresponding abstract objects The security_operations structure is the main security structure that provides security hooks for various
Figure 5-76: LSM hook architecture LSM adds a general security system call that simply invokes the sys_security hook. This system call and hook permits security modules to implement new system calls for security-aware applications. 5.7.2 LSM capabilities module The LSM kernel patch moves most of the existing POSIX.1e capabilities logic into an optional security module stored in the file security/capability.c.
● Administrative utilities provide a mechanism for administrators to configure, query, and control AppArmor. For background information on AppArmor which was originally named SubDomain, SubDomain: Parsimonious Server Security by Crispin Cowan, Steve Beattie, Greg KroahHartman, Calton Pu, Perry Wagle, and Virgil Gligor at https://forgesvn1.novell.com/viewsvn/apparmor/trunk/docs/papers/subdomain lisa00.pdf?revision=3 [CRISP] and http://www.novell.
● px discrete profile execute ● Px discrete profile execute after scrubbing the environment ● ix inherit execute ● m allow PROT_EXEC with mmap(2) calls ● l – link For more information about complete AppArmor profile syntax, please see the apparmor.d man page. AppArmor profiles are loaded into the kernel by the apparmor_parser tool. apparmor_parser can load new profiles, replace profiles, and remove profiles.
5.9 Device drivers A device driver is a software layer that makes a hardware device respond to a well-defined programming interface. The kernel interacts with the device only through these well-defined interfaces. For detailed information about device drivers, see Linux Device Drivers, 2nd Edition, by Alessandro Rubini and Jonathan Corbet. The TOE supports many different I/O devices, such as disk drives, tape drives, and network adapters.
guest program or interpreted machine. The interpreted and host machines execute guest and host programs, respectively. The interpretive-execution facility is invoked by executing the Start Interpretive Execution (SIE) processor instruction, which causes the CPU to enter the interpretive-execution mode and to begin execution of the guest program under control of the operant of the instruction, called the state description.
• Conditional interceptions refer to functions that are executed for the guest unless a specified condition is encountered that causes control to be returned to the host by the process that called the interception.
This extra level of indirection is needed for character devices, but not for block devices, because of the large variety of character devices and the operations they support. The following diagram illustrates how the kernel maps the file operations vector of the device file object to the correct set of operations routines for that device. Figure 5-77: Setup of f_op for character device specific file operations 5.9.
Figure 5-78: Setup of f_op for block device specific file operations 5.10 System initialization When a computer with SLES is turned on, the operating system is loaded into memory by a special program called a boot loader. A boot loader usually exists on the system's primary hard drive, or other media device, and has the sole responsibility of loading the Linux kernel with its required files or, in some cases, other operating systems, into memory.
the system runlevel by controlling PID 1. For more information on the /etc/inittab file, please see the inittab(5) man page. For more information on the init program, please see the init(8) manpage. The init program generally follows these startup steps: 1. Gets its own name. 2. Sets its umask. 3. Checks for root identity. 4. Checks to see if it is PID 1 (init—the daemon) or not PID 1 (telinit—the control program). 5. If it is telinit, runs telinit processing and exits; continues if it is init. 6.
5.10.2.1 Boot methods SLES supports booting from a hard disk, a CD-ROM, or a floppy disk. CD-ROM and floppy disk boots are used for installation, and to perform diagnostics and maintenance. A typical boot is from a boot image on the local hard disk. 5.10.2.2 Boot loader A boot loader is a program that resides in the starting sectors of a disk, that is, the Master Boot Record (MBR) of the hard disk.
14. The boot loader sets the IDT with null interrupt handlers. It puts the system parameters obtained from the BIOS and the parameters passed to the operating system into the first page frame. 15. The boot loader identifies the model of the processor. It loads the gdtr and idtr registers with the addresses of the Global Descriptor Table and Interrupt Descriptor Table, and jumps to the start_kernel() function. 16.
Figure 5-79: System x SLES boot sequence 160
5.10.3 System p This section briefly describes the system initialization process for System p servers. 5.10.3.1 Boot methods SLES supports booting from a hard disk or from a CD-ROM. CD-ROM boots are used for installation and to perform diagnostics and maintenance. A typical boot is from a boot image on the local hard disk. 5.10.3.2 Boot loader A boot loader is the first program that is run after the system completes the hardware diagnostics setup in the firmware.
1. Yaboot allows an administrator to perform interactive debugging of the startup process by executing the /etc/sysconfig/init script. 2. Mounts the /proc special file system. 3. Mounts the /dev/pts special file system. 4. Executes /etc/rc.d/rc.local, which was set by an administrator to perform site-specific setup functions. Performs run-level specific initialization by executing startup scripts defined in /etc/inittab. The scripts are named /etc/rc.d/rcX.d, where X is the default run level.
Figure 5-80: System p SLES boot sequence 5.10.4 System p in LPAR SLES runs in a logical partition on an System p system. The hypervisor program creates logical partitions, which interacts with actual hardware and provides virtual versions of hardware to operating systems running in different logical partitions. As part of an Initial Program Load, the hypervisor performs certain initializations, listed below, before handing control over to the operating system.
5.10.4.1 Boot process For an individual computer, the boot process consists of the following steps when the CPU is powered on or reset: 1. The hypervisor assigns memory to the partition as a 64 MB contiguous load area and the balance in 256 KB chunks. 2. The boot loader loads the SLES kernel into the load area. 3. Provides system configuration data to the SLES kernel via several data areas provided within the kernel. 4.
• Starts the agetty program. For more details about services started at run level 3, see the scripts in /etc/rc.d/rc3.d on a SLES system. Figure 5-81 schematically describes the boot process of System p LPARs.
5.10.5 System z This section briefly describes the system initialization process for System z servers. 5.10.5.1 Boot methods Linux on System z supports three installation methods: native, LPAR, and z/VM guest installations. SLES only supports z/VM guest installation. The process described below corresponds to the z/VM guest mode.
4. Executes /etc/rc.d/rc.local, which was set by an administrator to perform site-specific setup functions. 5. Performs run-level specific initialization by executing startup scripts defined in /etc/inittab. The scripts are named /etc/rc.d/rcX.d, where X is the default run level. The default run level for a SLES system in the evaluated configuration is 3. The following lists some of the initializations performed at run level 3.
Figure 5-82: System z SLES boot sequence 5.10.6 eServer 326 This section briefly describes the system initialization process for eServer 326 servers. For detailed information on system initialization, see AMD64 Architecture, Programmer’s Manual Volume 2: System Programming, at http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf. 5.10.6.1 Boot methods SLES supports booting from a hard disk, a CD-ROM, or a floppy disk.
5.10.6.2 Boot loader After the system completes the hardware diagnostics setup in the firmware, the first program that runs is the boot loader. The boot loader is responsible for copying the boot image from hard disk and then transferring control to it. SLES supports GRUB, which lets you set pointers in the boot sector to the kernel image and to the RAM file system image. 5.10.6.3 Boot process The boot process consists of the following steps when the CPU is powered on or reset: 1.
17. x86_64_start_kernel() completes the kernel initialization by initializing Page Tables, Memory Handling Data Structures IDT tables, slab allocator (described in Section 5.5.3.6), system date, and system time. 18. Uncompress the initrd initial RAM file system, mounts it, and then executes /linuxrc. 19. Unmounts initrd, mounts the root file system, and executes /sbin/init. Resets the pid table to assign process ID one to the init process. 20.
Figure 5-83: eServer 326 SLES boot sequence 5.11 Identification and authentication Identification is when a user possesses an identity to a system in the form of a login ID. Identification establishes user accountability and access restrictions for actions on the system. Authentication is verification that the user’s claimed identity is valid, and is implemented through a user password at login time.
provides a way to develop programs that are independent of the authentication scheme. These programs need authentication modules to be attached to them at run-time in order to work. Which authentication module is to be attached is dependent upon the local system setup and is at the discretion of the local system administrator.
6. Each authentication module performs its action and relays the result back to the application. 7. The PAM library is modified to create a USER_AUTH type of audit record to note the success or failure from the authentication module. 8. The application takes appropriate action based on the aggregate results from all authentication modules. 5.11.1.2 Configuration terminology PAM configuration files are stored in /etc/pam.d. Each application is configured with a file of its own in the /etc/pam.d directory.
• pam_passwdqc.so: Performs additional password strength checks. For example, it rejects passwords such as “1qaz2wsx” that follow a pattern on the keyboard. In addition to checking regular passwords it offers support for passphrases and can provide randomly generated passwords. • pam_env.so: Loads a configurable list of environment variables, and it is configured with the file /etc/security/pam_env.conf. • pam_shells.so: Authentication is granted if the user’s shell is listed in /etc/shells.
5.11.2 Protected databases The following databases are consulted by the identification and authentication subsystem during user session initiation: • /etc/passwd: For all system users, it stores the login name, user ID, primary group ID, real name, home directory, and shell. Each user’s entry occupies one line, and fields are separated by a colon (:). The file is owned by the root user and root group, and its mode is 644.
• /etc/ftpusers: The ftpusers text file contains a list of users who cannot log in using the File Transfer Protocol (FTP) server daemon. The file is owned by the root user and root group, and its mode is 644. • /etc/apparmor/* and /etc/apparmor.d/*: The directories /etc/apparmor and /etc/apparmor.d contain several configuration files that are used by the AppArmor LSM modules. Both directories are owned by the root user and root group, and their mode is 755. 5.11.2.1 5.11.2.1.
6. Execs the login program. The steps that are relevant to the identification and authorization subsystem are step 5, which prompts for the user’s login name, and step 6, which executes the login program. The administrator can also use a command-line option to terminate the program if a user name is not entered within a specific amount of time. 5.11.3.2 gpasswd The gpasswd program administers the /etc/group and /etc/gshadow files.
17. Sets effective, real, and saved user ID. 18. Changes directory to the user’s home directory. 19. Executes shell. 5.11.3.4 mingetty mingetty, the minimal Linux getty, is invoked from /sbin/init when the system transitions from single-user mode to multi-user mode. mingetty opens a pseudo tty port, prompts for a login name, and invokes /bin/login to authenticate. Refer to the mingetty man page for more detailed information. mingetty follows these steps: 1. Sets language. 2.
16. Sets up signals. 17. Forks a child. 18. Parent waits on child's return; child continues: 19. Adds the new GID to the group list. 20. Sets the GID. 21. Logs an audit record. 22. Starts a shell if the -c flag was specified. 23. Looks for the SHELL environment variable or, if SHELL is not set defaults to /bin/sh. 24. Gets the basename of the shell for argv[0]. 25. Closes the password and group files. 26. Changes to home directory if doing a login. 27. Logs an audit record. 28. Execs a shell with a command.
4. Processes command-line arguments. 5. Sets up the environment variable array. 6. Invokes pam_start() to initialize the PAM library, and to identify the application with a particular service name. 7. Invokes pam_set_item() to record the tty and user name. 8. Validates the user that the application invoker is trying to become. 9. Invokes pam_authenticate() to authenticate the application user. Terminal echo is turned off while the user is typing his or her password.
Cryptography can be used to neutralize some of these attacks and to ensure confidentiality and integrity of network traffic. Cryptography can also be used to implement authentication schemes using digital signatures. The TOE supports a technology based on cryptography called OpenSSL. OpenSSL is a cryptography toolkit implementing the Secure Sockets Layer (SSL) versions 2 and 3, and Transport Layer Security (TLS) version 1 network protocols and related cryptography standards required by them.
5.12.1.1 Concepts SSL is used to authenticate endpoints and to secure the contents of the application-level communication. An SSL-secured connection begins by establishing the identities of the peers, and establishing an encryption method and key in a secure way. Application-level communication can then begin. All incoming traffic is decrypted by the intermediate SSL layer and then forwarded on to the application; outgoing traffic is encrypted by the SSL layer before transmission.
Figure 5-87: Encryption Algorithm and Key Data confidentiality can be maintained by keeping the algorithm, the key, or both, secret from unauthorized people. In most cases, including OpenSSL, the algorithm used is well-known, but the key is protected from unauthorized people. 5.12.1.1.1.1 Encryption with symmetric keys A symmetric key, also known as a secret key, is a single key that is used for both encryption and decryption. For example, key = 2 used in the above illustration is a symmetric key.
Figure 5-88: Asymmetric keys If encryption is done with a public key, only the corresponding private key can be used for decryption. This allows a user to communicate confidentially with another user by encrypting messages with the intended receiver’s public key. Even if messages are intercepted by a third party, the third party cannot decrypt them. Only the intended receiver can decrypt messages with his or her private key.
5.12.1.1.2 Message digest A message digest is text in the form of a single string of digits created with a one-way hash function. Oneway hash functions are algorithms that transform a message of arbitrary length into a fixed length tag called a message digest. A good hash function can detect even a small change in the original message to generate a different message digest. The hash function is one-way; it is not possible to deduce the original message from its message digest.
Figure 5-90: SSL Protocol The SSL architecture differentiates between an SSL session and an SSL connection. A connection is a transient transport device between peers. A session is an association between a client and a server. Sessions define a set of cryptographic security parameters, which can be shared among multiple connections. Sessions are used to avoid the expensive negotiation of security parameters for each new connection.
Figure 5-91: Handshake protocol (optional or content-dependent handshake messages are in italic type) 1. Client hello message: The CipherSuite list, passed from the client to the server in the client hello message, contains the combinations of cryptographic algorithms supported by the client in order of the client's preference (first choice first). Each CipherSuite defines both a key exchange algorithm and a CipherSpec.
For the list of Cipher suites supported, see FCS_COP.1(2) in the Security Target. 5. SSL Change cipher spec protocol: The SSL change cipher spec protocol signals transitions in the security parameters. The protocol consists of a single message, which is encrypted with the current security parameters. Using the change cipher spec message, security parameters can be changed by either the client or the server.
• Blowfish: Blowfish is a block cipher that operates on 64-bit blocks of data. It supports variable key sizes, but generally uses 128-bit keys. • Data Encryption Standard (DES): DES is a symmetric key cryptosystem derived from the Lucifer algorithm developed at IBM. DES describes the Data Encryption Algorithm (DEA). DEA operates on a 64-bit block size and uses a 56-bit key. • TDES (3DES): TDES, or Triple DES, encrypts a message three times using DES. This encryption can be accomplished in several ways.
MD2, MD4, and MD5 are cryptographic message-digest algorithms that take a message of arbitrary length and generate a 128-bit message digest. In MD5, the message is processed in 512-bit blocks in four distinct rounds. MDC2 is a method to construct hash functions with 128-bit output from block ciphers. These functions are an implementation of MDC2 with DES. RIPEMD is a cryptographic hash function with 160-bit output. The Secure Hash Algorithm (SHA) is a cryptographic hash function with 160-bit output.
mac = MAC (key, sequence_number || unencrypted_packet) where unencrypted_packet is the entire packet without MAC (the length fields, payload and padding), and sequence_number is an implicit packet sequence number represented as uint32. The sequence number is initialized to zero for the first packet, and is incremented after every packet, regardless of whether encryption or MAC is in use. It is never reset, even if keys or algorithms are renegotiated later. It wraps around to zero after every 2^32 packets.
5.12.3 Very Secure File Transfer Protocol daemon Very Secure File Transfer Protocol daemon (VSFTPD) provides a secure, fast, and stable file transfer service to and from a remote host. The behavior of VSFTPD can be controlled by its configuration file /etc/vsftpd/vsftpd.conf. The remainder of this section describes some of the security-relevant features of VSFTPD. For additional information, on SLES systems see the documents in /usr/share/doc/packages/vsftpd/SECURITY/*, and also http://vsftpd.beasts.org.
For background on CUPS labeled printing, please see: http://free.linux.hp.com/~mra/docs/. CUPS uses the Internet Printing Protocol (IPP) that was designed to replace the Line Printer Daemon (LPD) protocol, as a basis for managing print jobs. CUPS also supports LPD, Server Message Block (SMB), and AppSocket protocols with reduced functionality. CUPS controls access to printers via its configuration file. For an overview of CUPS, refer to http://www.cups.org/documentation.php/overview.html or http://en.
24. Check for input or output requests with select(). 25. If select() fails, logs error messages, notifies clients, and exits the main loop for shutdown processing. 26. Gets the current time. 27. Checks print status of print jobs. 28. Updates CGI data. 29. Updates notifier messages. 30. Expires subscriptions and removes completed jobs. 31. Updates the browse list. 32. Checks for new incoming connections on listening sockets and accepts them. 33. Checks for new data on client sockets. 34.
cryptography standards that they require. The openssl command can be used by an administrative user for the following: • Creation of RSA, DH, and DSA parameters. • Generation of 1024-bit RSA keys. • Creation of X.509 certificates, CSRs, and CRLs. • Calculation of message digests. • Encryption and Decryption with ciphers. • SSL and TLS client and server tests. • Handling of S/MIME signed or encrypted mail. For detailed information about the openssl command and its usage, see: http://www.
# Service-level configuration # --------------------------[ssmtp] accept = 465 connect = 25 The above configuration secures localhost-SMTP when someone connects to it via port 465. The configuration tells stunnel to listen to the SSH port 465, and to send all info to the plain port 25 on localhost. For additional information about stunnel, refer to its man page as well as http://stunnel.mirt.net and http://www.stunnel.org. 5.12.4.
14. Invokes pam_chauthok() to rejuvenate user’s authentication tokens. 15. Exits. 5.13.1.2 chfn The chfn program allows users to change their finger information. The finger command displays the information, stored in the /etc/passwd file. Refer to the chfn man page for detailed information. chfn generally follows these steps: 1. Sets language. 2. Gets invoking user’s ID. 3. Parses command-line arguments. 4. Performs a check that a non-root user is not trying to change finger information of another user.
11. Invokes setpwnam() to update appropriate database files with the new shell. 12. Exits. 5.13.2 User management 5.13.2.1 useradd The useradd program allows an authorized user to create new user accounts on the system. Refer to the useradd man page for more information. useradd generally follows these steps: 1. Sets language. 2. Invokes getpwuid (getuid()) to obtain the application user’s passwd structure. 3.
6. Processes command-line arguments. 7. Ensures that the user account being modified exists. 8. Invokes open_files() to lock and open authentication database files. 9. Invokes usr_update() to update authentication database files with updated account information. 10. Generates audit record to log actions of the usermod command.
5.13.3 Group management 5.13.3.1 groupadd The groupadd program allows an administrator to create new groups on the system. Refer to the groupadd man page for more detailed information on usage of the command. groupadd generally follows these steps: 1. Sets language. 2. Invokes getpwuid (getuid()) to obtain an application user’s passwd structure. 3. Invokes pam_start() to initialize the PAM library, and to identify the application with a particular service name. 4.
5.13.3.2 groupmod The groupmod program allows an administrator to modify existing groups on the system. Refer to the groupmod man page for more information. groupmod generally follows these steps: 1. Sets language. 2. Invokes getpwuid (getuid()) to obtain application user’s passwd structure. 3. Invokes pam_start() to initialize the PAM library, and to identify the application with a particular service name. 4. Invokes pam_authenticate() to authenticate the application user.
202
5.13.4 System Time management 5.13.4.1 date The date program, for a normal user, displays current date and time. For an administrative user, date can also set the system date and time. Refer to the date man page for more information. date generally follows these steps: 1. Sets language. 2. Parses command-line arguments. 3. Validates command-line arguments. 4. If command line options indicate a system time set operation, invokes the stime() system call to set the system time.
This tool works from a premise that it is working on an abstract machine that is providing functionality to the TSF. The test tool runs on all hardware architectures that are targets of evaluation and reports problems with any underlying functionalities. For more detailed information on the Abstract Machine Test, refer to Emily Ratliff, “Abstract Machine Testing: Requirements and Design.” The AMTU test tool performs detailed in the subsections that follow. 5.13.5.1.
5.13.5.1.5.1 System p The instruction set for the PowerPC processor is given in the book at the following URL: http://www.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600682CC7/$file/booke _rm.pdf For each instruction, the description in the book lists whether it is available only in supervisor mode or not.
To test CPU control registers, use MOVL %cs, 28(%esp). This overwrites the value of the register that contains the code segment. The register that contains the address of the next instruction (eip) is not directly addressable. Note that in the Intel documentation of MOV it is explicitly stated that MOV cannot be used to set the CS register. Attempting to do so will cause an exception (SIGILL rather than SIGSEGV).
2. Gets its euid and uid. 3. Transforms old-style command line argument syntax into new-style syntax. 4. Processes the command line arguments. 5. Sets up signal handling. 6. Initializes the fifo. 7. Initializes any remote connection. 8. Sets back the real UID. 9. Opens the archive. 10. Initializes data structures. 11. Checks for multi-volume format; logs an error and exits if so because it is unsupported. 12. Initializes device macro handling. 13. Initializes extended headers buffer. 14.
5.13.6 I&A support 5.13.6.1 pam_tally The pam_tally utility allows administrative users to reset the failed login counter kept in the /var/log/faillog. Please see the /usr/share/doc/packages/pam/modules/README.pam_tally file on a SLES system for more information. 5.13.6.2 unix_chkpwd The unix_chkpwd helper program works with the pam_unix PAM module (Section 5.11.1.3). It is intended only to be executed by the pam_unix PAM module and logs an error if executed otherwise.
The crontab program is used to install, deinstall, or list the tables used to drive the cron daemon in Vixie Cron. The crontab program allows an administrator to perform specific tasks on a regularly-scheduled basis without logging in. Users can have their own crontabs that allow them to create jobs that will run at given times. A user can create a crontab file with the help of this command. The crontab command generally goes through these steps: 1.
commands that are to be executed. Information stored in this job file, along with its attributes, is used by the atd daemon to recreate the invocation of the user’s identity while performing tasks at the scheduled time. 5.14.2 Batch processing daemons 5.14.2.1 cron The cron daemon executes commands scheduled through crontab or listed in /etc/crontab for standard system cron jobs. The cron trusted process daemon processes users’ crontab files.
5.15 User-level audit subsystem The main user-level audit components consist of the auditd daemon, the auditctl control program, the libaudit library, the auditd.conf configuration file, and the auditd.rules initial setup file. There is also the /etc/init.d/auditd init script that is used to start and stop auditd.
2. Processes the command line arguments. 3. Attempts to raise its resource limits. 4. Sets its umask. 5. Resets its internal counters. 6. Emits a title. 7. Processes audit records from an audit log file or stdin, incrementing counters depending on audit record contents. 8. Prints a message and exits if there are no useful events. 9. Prints a summary report. 10. Destroys its data structures and frees memory. 11. Exits. 5.15.2.2 ausearch Only root has the ability to run this tool.
5.16 Supporting functions Trusted programs and trusted processes in an SLES system use libraries. Libraries do not form a subsystem in the notation of the Common Criteria, but they provide supporting functions to trusted commands and processes. A library is an archive of link-edited objects and their export files. A shared library is an archive of objects that has been bound as a module with imports and exports, and is marked as a shared object.
Library Description /lib/libc.so.6 C Run time library functions. /lib/libcrypt.so.1 Library that performs one-way encryption of user and group passwords. /lib/libcrypt.so.o.9.8b Replacement library for libcrypt.so.1 Supports bigcrypt and blowfish password encryption. /lib/security/pam_unix.so Modules that perform basic password-based authentication, configured with the MD5 hashing algorithm. /lib/security/pam_passwdqc Modules that enforce additional stricter password rules. For .
5.16.2 Library linking mechanism On SLES, a binary executable automatically causes the program loader /lib/ld-linux.so.2 to be loaded and run. This loader takes care of analyzing the library names in the executable file, locating the library in the system directory tree, and making requested code available to the executing process. The loader does not copy the library object code, but instead performs a memory mapping of the appropriate object code into the address space of the executing process.
system initialization, and sets the IDT entry corresponding to vector 128 (Ox80) to invoke the system call exception handler. When compiling and linking a program that makes a system call, the libc library wrapper routine for that system call stores the appropriate system call number in the eax register, and executes the int 0x80 assembly language instruction to generate the hardware exception. The exception handler in the kernel for this vector is the system_call() system call handler.
passed as system-call parameters. For the sake of efficiency, and satisfying the access control requirement, the SLES kernel performs validation in a two-step process, as follows: 1. Verifies that the linear address (virtual address for System p and System z) passed as a parameter does not fall within the range of interval addresses reserved for the kernel. That is, that the linear address is lower than PAGE_OFFSET. 2.
6 Mapping the TOE summary specification to the High-Level Design This chapter provides a mapping of the security functions of the TOE summary specification to the functions described in this High-Level Design document. 6.1 Identification and authentication Section 5.11 provides details of the SLES Identification and Authentication subsystem. 6.1.1 User identification and authentication data management (IA.1) Section 5.11.
6.2.3 Audit record format (AU.3) Section 5.6.3.2 describes information stored in each audit record. 6.2.4 Audit post-processing (AU.4) Section 5.15.2 describes audit subsystem utilities provided for post-processing of audit data. 6.3 Discretionary Access Control Sections 5.1 and 5.2 provide details on Discretionary Access Control (DAC) on the SLES system. 6.3.1 General DAC policy (DA.1) Sections 5.1 and 5.2.2 provides details on the functions that implement general Discretionary Access policy.
6.5.1 Roles (SM.1) Section 5.13 provides details on various commands that support the notion of an administrator and a normal user. 6.5.2 Access control configuration and management (SM.2) Sections 5.1.1 and 5.1.2.1 provide details on the system calls of the file system that are used to set attributes on objects to configure access control. 6.5.3 Management of user, group and authentication data (SM.3) Sections 5.11.2 and 5.
6.7.4 Trusted processes (TP.4) Section 4.2.2 provides details on the non-kernel trusted process on the SLES system. 6.7.5 TSF Databases (TP.5) Section 4.3 provides details on the TSF databases on the SUSE Linux Enterprise Server system. 6.7.6 Internal TOE protection mechanisms (TP.6) Section 4.1.1 describes hardware privilege implementation for the System x, System p, System z and Opteron eServer 326. Section 5.5 describes memory management and protection. Section 5.
• • Kernel Modules • Device Drivers Trusted process subsystems: • System Initialization • Identification and Authentication • Network Applications • System Management • Batch Processing • User-level audit subsystem 6.8.1 Summary of kernel subsystem interfaces This section identifies the kernel subsystem interfaces and structures them per kernel subsystem into: External Interfaces: System calls associated with the various kernel subsystems form the external interfaces.
6.8.1.1.2 Internal Interfaces 6.8.1.1.3 Internal function Interfaces defined in permission This document, Section 5.1.1.1 vfs_permission This document, Sections 5.1.1.1 and 5.1.5.1 get_empty_filp This document, Section 5.1.1.1 fget This document, Section 5.1.1.1 do_mount This document, Section 5.1.2.1 Specific ext3 methods Interfaces defined in ext3_create This document, Section 5.1.2.1 ext3_lookup This document, Section 5.1.2.1 ext3_get_block This document, Section 5.1.2.
read_inode write_super read_inode2 write_super_lockfs dirty_inode unlockfs write_inode statfs put_inode remount_fs delete_inode clear_inode Dentry operations: Note that they are not used by other subsystems, so there is no subsystem interface: • d_revalidate • d_hash • d_compare • d_delete • d_release • d_iput 6.8.1.1.4 Data Structures super_block include/linux/fs.h ext3_super_block include/linux/ext3_fs.h isofs_sb_info include/linux/iso_fs_sb.h inode include/linux/fs.
System calls are listed in the Functional Specification mapping table. 6.8.1.2.2 Internal Interfaces Internal function Interfaces defined in current Understanding the LINUX KERNEL, Chapter 3, 2nd Edition, Daniel P.
6.8.1.3.1 External interfaces (system calls) • TSFI system calls • Non-TSFI system calls System calls are listed in the Functional Specification mapping table. 6.8.1.3.2 Internal Interfaces Internal function Interfaces defined in do_pipe Understanding the LINUX KERNEL, Chapter 19, 2nd Edition, Daniel P. Bovet, Marco Cesati, ISBN# 0-596-00213-0/ and this document, Section 5.3.1.1 pipe_read Understanding the LINUX KERNEL, Chapter 19, 2nd Edition, Daniel P.
6.8.1.4 Kernel subsystem networking This section lists external interfaces, internal interfaces and data structures of the networking subsystem. 6.8.1.4.1 External interfaces (system calls) • TSFI system calls • Non-TSFI system calls System calls are listed in the Functional Specification mapping table. 6.8.1.4.2 Internal interfaces Sockets are implemented within the inode structure as specific types of inodes. inode.u, in the case of an inode for a socket, points to a structure of type socket.
System calls are listed in the Functional Specification mapping table 6.8.1.5.2 Internal interfaces Internal interfaces Interfaces defined in get_zeroed_page Linux Device Drivers, O’Reilly, Chapter 7, 2nd Edition June 2001, Alessandro Rubini /this document, chapter 5.5.2.
6.8.1.6.3 • audit_sockaddr • audit_ipc_perms Data structures • audit_sock: The netlink socket through which all user space communication is done. • audit_buffer: The audit buffer is used when formatting an audit record to send to user space. The audit subsystem pre-allocates audit buffers to enhance performance. • audit_context: The audit subsystem extends the task structure to potentially include an audit_context.
driver methods for character device drivers and block device drivers, see [RUBN]. Chapter 3 describes the methods for character devices and chapter 6 describes the methods for block devices. 6.8.1.7.2.
6.8.1.7.3 Data structures device_struct fs/devices.c file_operations include/linux/fs.h block_device_operati include/linux/fs.h ons 6.8.1.8 Kernel subsystems kernel modules This section lists external interfaces, internal interfaces, and data structures of the kernel modules subsystem. 6.8.1.8.1 External interfaces (system calls) • TSFI system calls • Non_TSFI system calls System calls are listed in the Functional Specification mapping table. 6.8.1.8.
7 References [CC] Common Criteria for Information Technology Security Evaluation, CCIMB-99-031, Version 2.1, August 1999 [CEM] Common Methodology for Information Technology Security Evaluation, CEM-99/045, Part 2 – Evaluation Methodology, Version 1.0, 1999 [BOVT] Understanding the LINUX KERNEL, 2nd Edition, Daniel P.
[RSA] "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems," Communications of the ACM, v. 21, n. 2, Feb 1978, pp. 120-126, R. Rivest, A. Shamir, and L. M. Adleman, [DH1] "New Directions in Cryptography," IEEE Transactions on Information Theory, V.IT-22, n. 6, Jun 1977, pp. 74-84, W. Diffie and M. E. Hellman. [DSS] NIST FIPS PUB 186, "Digital Signature Standard," National Institute of Standards and Technology, U.S.Department of Commerce, 18 May 1994.
The following are trademarks or registered trademarks of the International Business Machines Corporation in the United States and/or other countries. For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml: BladeCenter, eServer, POWER, Power Architecture, PowerPC, PR/SM, S/390, System p, System x, System z, z/VM, z/Architecture. SUSE is a registered trademark of Novell, Inc. Linux is a trademark of Linus Torvalds in the United States, other countries, or both.