Server Virtualization

Server virtualization is crucial to ensuring the success of server consolidation projects that maintain the isolation of separate systems. This topic addresses the principles of server virtualization and applicable business rules (BR). It identifies the design and implementation requirements for consistency with the CMS Technical Reference Architecture (CMS TRA) and describes implementation options.

Although virtualization may be generally applicable within the CMS Processing Environments, there are a few situations for which virtualization may not be appropriate:

Need for specialized hardware
Need for extreme performance
Need for higher security

These exceptions are becoming less common over time.

CMS Cloud Server Virtualization

PREFERRED

The CMS-preferred solution for virtual servers is to use AWS Elastic Compute Cloud (EC2) instances or Microsoft Azure VMs, running Windows or Linux. For single-purpose services, it is often more cost-effective and flexible to use AWS or MAG integrated services.

One CMS strategic solution is the CMS Cloud Gold Image. Gold Image AMIs are standard, baseline operating system (OS) images for most common Linux and Windows versions. Updated monthly, these contain:

Current approved patches to resolve vulnerabilities
Latest DISA STIG or CIS configurations to ensure configuration compliance
Pre-installed Shared Service applications and agents that are utilized in the CMS AWS environment

Terminology

Despite the large numbers of products available for virtualization, there is significant commonality in the underlying architectural and operational requirements for the deployment of server virtualization. This commonality stems from the common goals of and architectural elements employed by server virtualization implementations. The terminology in the table below applies to all server virtualization implementations.

Table - Server Virtualization-Specific Terms
Term	Definition
Production	Refers to a CMS Authorization to Operate (ATO)(ed) environment, or to components in a CMS ATO(ed) environment.
Virtualization Implementation	A method for providing virtualized computing facilities to an application. For the purposes of this chapter, it means UNIX, Windows, or mainframe virtualization.
Virtualization Host Server	A single physical hardware instance that supports multiple virtual machines running independently under a hypervisor.
Container	A process running under the control of a container engine. Containers are not virtualized, but rather, operate under strict control of the operating system (OS) hypervisor (stricter than normal processes). Containers run on a common kernel and operating system, unlike virtual machines.
Container Engine	A control process that manages operating system resources to create, destroy, and supervise containers. The container engine performs analogously to a hypervisor, but without the use of Central Processing Unit (CPU) virtualization technology.
Resource	Architectural elements required by a virtualization host server. Resources can be divided into the following four basic types—Memory, Interfaces, Processors, and Storage: Memory – Volatile data storage that can be read, written, and erased by applications. Memory usage quotas can be enacted to limit memory usage on a per-application basis. Interfaces Network Interface – An interface to an Internet Protocol (IP)-based data network. This is as opposed to an interface to a Storage Area Network (SAN) or directly connected peripheral. Network interfaces can be divided into virtual and physical interfaces. Virtual network interfaces share not only network interface cards (NIC) but the IP stack, which communicates with an IP data network external to the virtualization server. This single stack supports network address translation (NAT) of virtual applications and the shared NIC and IP stack. The virtual interface may be associated with a NIC or a Virtual Local Area Network (VLAN) on a NIC. Physical network interfaces share only the NICs. Each application has its own IP stack and data link layer instance. The data link layer instance may be associated with a NIC or a VLAN on a NIC. Processors – This resource type includes shared CPUs and special use processors, Storage – Permanent data storage that can be read, written, or erased via an OS either directly through an OS interface or by other applications using an application programming interface (API) to the OS. Storage is divided into file systems and further subdivided into directory structures and their individual files. Access to these file systems, directories, or files can be controlled based on user or group. In addition, these file systems can have per-user or group-usage quotas enacted to limit file system usage. Storage can be backed up and restored either via networked or storage or backup servers, e.g., tape backup devices. In some instances, OS resources can also be a constraint, such as user IDs, devices, IP semaphores, message queues, and other limited resources. These typically do not limit virtualization but may impact containerization.
Instance	An individual physical or virtual instantiation of a resource, e.g., a single CPU on a virtualization server or a single NIC.
Pool	A set of resource instances of similar type on a single virtualization server, e.g., a set of CPUs that can be used by an application.
OS Variant	A specific make and version of an operating system, e.g., Solaris 10.
Oversubscribed Pooled Resource	A pooled resource is oversubscribed when the sum of all minimum subscriptions is greater than 100 percent of the total available resources in the pool. For example, if five (5) virtual machines (VM) share a processor pool and their minimum processor utilization is set to 25 percent, then the processor pool is 25 percent oversubscribed.
Hypervisor (a.k.a. Virtualization Controller)	An OS variant that can be used to configure virtual instances of other OS variants and their global resource quotas, as well as to enforce these quotas. Global resource quotas pertain to all instances of all resources on the virtualization server. As a result, only one instance of the virtualization controller may be running on a single virtualization server.
Virtual Machine (a.k.a. Virtual Machine instance, Supervised OS, Guest OS)	An instance of an OS variant running under the control of a hypervisor.
Management Access	Access to a VM or hypervisor using an account with management privileges.
User Access	Access to a VM using an account with user privileges.
Management Application	An application that can only be run with management access. This includes management programs and OS.
User Application	An application that can be run with either user or management access. These applications include business and office automation applications.
Project	Work done to develop a single application. This may be done by multiple contractors; however, the project’s scope should be covered by a single statement of work.

Despite this commonality, there are significant differences between Solaris, Linux, Windows, and IBM mainframe virtualization implementations. Specifically, there are substantial differences in the hardware, software, and security architectures employed by these implementations. These architectural differences require identification of the hypervisor and network security boundaries with implementation-specific architectural elements. The table Virtualization Implementation-Specific Security Boundaries defines the terms for implementation-specific security boundaries.

Table - Virtualization Implementation-Specific Security Boundaries
Example Virtualization Implementation	Hypervisor Security Boundary	Network Security Boundary
AWS Elastic Compute Cloud (EC2), Microsoft Azure VM	Virtual Instance	User Data – Each virtual NIC is isolated by security group(s). Management Segment – Virtual NIC for a Management Zone is isolated by cloud-specific rules. Security Segment – Virtual NIC on Security Zone is isolated by cloud-specific rules.
VMWare vSphere, HP Virtualization Infrastructure, IBM z/OS, IBM z/VM	Blade Server or Mainframe	User Data – Virtual NIC on User VLAN mapped to physical cBlade NIC for inter-zone communications through firewalls. Management Segment – Virtual NIC on internal Management VLAN mapped to physical cBlade NIC in external Management VLAN. Security Segment – Virtual NIC on Security VLAN mapped to physical cBlade NIC in external Security VLAN.
Citrix XEN, VMware VM Workstation, Linux KVM, Oracle VirtualBox, Microsoft Hyper-V	Physical server	User Data – Virtual NIC on User VLAN mapped to physical NIC for inter-zone communications through firewalls. Management Segment – Virtual NIC on internal Management VLAN mapped to physical NIC in external Management VLAN. Security Segment – Virtual NIC on Security VLAN mapped to physical NIC in external Security VLAN.

Hardening

All CMS infrastructure must be hardened to CMS standards. The CMS policy on hardening is specified in CMS ARS Security Control CM-6.

Business Rules

Although server virtualization (SV) technologies are evolving at a rapid pace, CMS has established the following business rules and recommended practices (RP) to help the implementation of these technologies meet CMS’s needs. The following server virtualization BRs support the consistent implementations of these technologies.

BR-SV-1: Apply Separation of Duties to Virtualization Administration

CMS will maintain separation of administrative duties between the hypervisor administration and VM administration.

Related CMS ARS Security Controls include: AC-5 - Separation of Duties.

Rationale:

The responsibilities of the hypervisor administrator are different from the responsibilities of VM administration. Because the hypervisor administrator can create and destroy VM instances (and other virtual resources), their role is different and separate from the day-to-day administration of virtual machine instances.

In a Cloud Service Provider (CSP) environment, the CSP typically controls the hypervisor. In this case, the CSP application programming interface (API) allows authorized users to create and destroy virtual resources, effectively acquiring hypervisor administrator privileges.

BR-SV-2: Provide Hypervisor Root Access Only to Specific Administrative Accounts

CMS will not grant unrestricted root (or system administrator) access to production hypervisor servers to the developers or end-users. Instead, root access should be restricted to a specified list of administrative accounts.

Related CMS ARS Security Controls include: CM-5 - Least Privilege and AC-5 - Separation of Duties.

Rationale:

Giving out unrestricted full root access to hypervisors allows users to alter system configurations and system audit logging controls, which CMS ARS Security Control CM-6 specifically prohibits. Limiting access helps limit the risk due to the highly sensitive nature of these accounts. (Please refer to National Institute for Standards and Technology [NIST] Special Publication [SP] 800-125.)

BR-SV-3: Different Administration Account on Blade Controllers and Hypervisors

Administration accounts on the blade controllers and hypervisors must be different from those used to administer VM instances.

Rationale:

Using different accounts to administer blade controllers and hypervisors helps to enforce separation of duties.

Related CMS ARS Security Controls include: CM-5 - Least Privilege and AC-5 - Separation of Duties.

BR-SV-4: Configure UserIDs and GroupIDs to Be Unique across the Processing Environment

User groups and usernames must be unique across all file systems to prevent unintended inter-VM, hypervisors, and (when possible) Host access.

Rationale:

Network-based file servers use the UNIX userid and group--id when accessing files. It is possible for two different users to have the same userid number when on two different machines. Files created by the first user on a network share would be accessible by the second user. Ensuring different user-ids avoids accidental disclosure through this mechanism. In addition, the use of consistent UNIX ids allows for more consistent audit log correlation.

When a userid is no longer needed (such as user leaving, etc.), the userids should be retired and not reused.

RP-SV-5: Maintenance Window Planning

Applications that have conflicting maintenance windows and uptime requirements should not be hosted on the same VM or hypervisor.

Rationale:

During planned maintenance, it may be necessary to reboot virtual machines. If applications have different maintenance windows, it may be very difficult to schedule maintenance. Similarly, if the hypervisor or VM software needs upgrades or patches (for example), it is important to determine the impact to applications before undertaking this maintenance. Testing and Training systems can have similar constraints, which makes sharing difficult.

RP-SV-6: Consider High-Availability Configuration

All virtualization servers and VMs should consider configuring high-availability and disaster recovery services to mitigate failure and meet business owner-defined availability limits.

Rationale:

High-availability configurations, such as the use of clustering technology or redundant servers, should be used to meet business owner defined availability limits. If no such requirements exist or the limits are sufficiently low, a non-redundant configuration may be used.

BR-SV-7: No Co-Hosting on Production and Non-Production Hypervisors

In a physical data center, production and Non-Production environment virtualization servers, and their associated VMs, must reside on separate physical server hardware or logical partitions (LPAR).

Rationale:

CMS prohibits commingled workloads, thereby jeopardizing production workloads. Instead, host non-production workloads on different hypervisors.

BR-SV-8: Do Not Oversubscribe ATO(ed) Environments and Management Zones

All VMs in ATO(ed) environments or in Management Zones must have adequate resource constraints (upper and lower bounds) to ensure that they do not adversely impact the performance of other VMs on the same virtualization server. Over-subscription of pooled or shared resources is not permitted.

Please refer to BR-SV-20 and BR-SV-21 for additional related guidance.

Rationale:

Oversubscription can jeopardize meeting Service Level Agreements (SLA). In Production applications, this can result in sub-par performance and potentially impact application security. This is critical because any system processing CMS data is a Production system.

Please refer to CMS TRA Foundation, CMS Processing Environments for a formal definition.

BR-SV-9: Use Storage Quotas for Virtual Machines

Storage quotas must be enacted on a per-user and group basis.

Rationale:

Using quotas helps prevent filling file systems and object storage, which can lead to service failure and raise vulnerability to a denial-of-service attack.

BR-SV-10: ATO(ed) Environment VMs and Resource Pools Must Not Be Shared between Zones

Rationale:

A virtual machine can only exist in one zone—otherwise compromise of one virtual machine can lead to compromise of multiple zones, nullifying the advantage of defense-in-depth architecture.

RP-SV-11: Collect Virtualization Performance Metrics

CMS must gather virtualization server and VM performance metrics on a consistent basis to ensure maintenance of proper resource allocation.

Rationale:

Along with traditional server performance monitoring, virtualization technology offers additional performance metrics that may be useful to monitor.

BR-SV-12: Perform Asset Management of Virtual Instances

All virtual instances resources must be tracked in an asset management database, provided either by CMS or the hosting provider, and must provide a cross-reference to the host server hardware and host operating system.

Rationale:

Virtual instances must be identified in a durable form (such as Object Identifier [OID]) to track the use of assets and perform crosswalks between security and audit logs and virtual instances.

Related CMS ARS Security Controls include: CM-8 - Information System Component Inventory.

BR-SV-13: Keep Forensic Evidence per CMS Security Rules

Before deleting or disposing of virtual machine instances, determine if image copies should be retained for forensic purposes.

Rationale:

A copy of the instance can be used for forensic analysis by CMS. CMS may require adherence to specific rules for evidence gathering. Consult with CMS security to determine proper chain of custody rules.

RP-SV-14: Use VM Configuration Templates

Rather than generating VM instances as one-off configurations, consider using virtualization templating technology to define and create instances from repeatable configurations.

Rationale:

Repeatability is desirable because it facilitates change and configuration management and allows for replication of configurations in other processing environments. Unlike Graphical User Interface (GUI) forms, templates can be stored in source control systems, allowing for better auditing and change control.

BR-SV-15: Production Management Zone VMs May Not Use IP Multipathing

IP Multipathing (IPMP) on UNIX/Linux/Solaris or Multipath Input/Output (MPIO) on Windows Host OS are not permitted in the CMS Production environment.

Rationale:

Multipathing is discouraged because Intrusion Detection Systems (IDS)/Intrusion Protection Systems (IPS) are sometimes not available to re-assemble a full multipath Transmission Control Protocol (TCP) session and, therefore, are unable to properly inspect traffic. As a result, multipathing can be used to obfuscate attacks (see The Dangers and Promise of Multipath TCP).

BR-SV-16: Originate Administrator Access to Blade Controllers and Hypervisors from the Management Zone

All administration access to the Blade controllers and hypervisors must originate from the Management Zone only and must use a CMS-approved secure access method.

Rationale:

Ensuring that network access to blade controllers and hypervisors originates from the Management Zone helps thwart administrator access from within the Application Zone.

Related CMS ARS Security Controls include: CM-5 - Least Privilege and AC-5 Separation of Duties.

BR-SV-17: All Management Traffic Must Originate or Terminate in the Management Zone and Use Only Isolated and Protected Interfaces

All management network traffic must originate or terminate in the Management Zone .

Management traffic must only use management interfaces and VLANs.

Management traffic must be protected from inspection and tampering by other users.

Production environment virtualization server NICs must be Media Access Control (MAC) address locked to prevent spoofing.

Rationale:

This rule ensures that management access is exclusively performed from the Management Zone. It also reduces the likelihood that unauthorized access could be performed on the business application network. Management protocols may be used between a Zone and the Management Zone only. For example, one cannot run a local syslog server in the Application Zone because syslog is, presumably, a management protocol.

BR-SV-18: Hypervisor Access Is Permitted Only Via the Management Interface

Access to the hypervisor must be allowed only via the management interface on the hypervisor host.

Rationale:

Management of the hypervisor is the sole responsibility of the operations team, which uses the Management Zone to initiate access to hypervisors. Limiting access from the Management Zone on the management interface prevents malicious actors from accessing the hypervisor from within the Application Zone.

BR-SV-19: Separate Security Segment from All Other Management Zone Segments

Management interfaces serving the Security segment must be isolated from those serving all other Management segments, e.g., Backup segments.

When Application and Presentation Zone VMs use virtual network interfaces:

They must route to each other through the external switch, i.e., no loopback routing; and

They must have compatible ports, protocols, and services policies that can be enforced by an external switch and firewall.

Related CMS ARS Security Controls include: AC-06 - Least Privilege, SC-03 - Security Function Isolation (High), SC-03(02) - Supplemental: Access/Flow Control Functions, and SC-07 - Boundary Protection.

Rationale:

Network traffic between Zones must be available for inspection by security services not running on the hypervisor, such as Network-based Intrusion Detection System (NIDS), and must be filtered by a firewall or equivalent.

BR-SV-20: Oversubscription of Non-Production Instances Is Permitted

Oversubscription of non-production instances is permitted, with business owner approval.

Please refer to BR-SV-8 for additional related guidance.

Rationale:

Oversubscription of non-production instances is helpful in reducing the operational cost of lower environments. It is understood that concurrent use could result in sub-par performance.

BR-SV-21: No Business Applications May Run on the Hypervisor’s Host OS

The hypervisor must only be used for the operation and management of virtual machines.

Rationale:

The hypervisor is a high-value, high-impact information technology (IT) asset. Operating business applications on the hypervisor exposes the hypervisor to a larger attack surface and additional sources of instability. By focusing the hypervisor on a single task (virtualization management), it becomes more robust and secure.

BR-SV-22: Operate Applications under Application-Specific System Accounts

Each application running on an operating system instance must operate under an application-unique system account userid. In particular, applications may not operate under an end user ID or RACF ID. Likewise, end users must not log in using an application system account userid.

System accounts must have only the minimum necessary permissions and should not have permission to log in.

Related CMS ARS Security Controls include: AC-02(09) - Supplemental: Restrictions on Use of Shared Groups/Accounts, AU-10 - Non-Repudiation (High), AC-6 - Least Privilege (High and Moderate), and CM-7 - Least Functionality (High, Moderate, Low).

Rationale:

By employing application-specific system account userids, it is possible to use operating system controls to impose access control and have better specificity in audit logs. If users can log in using system IDs, this limits non-repudiation for actions taken while operating as that userid.