Artificial Intelligence Guidance

Introduction

Artificial Intelligence (AI) tools and technology are rapidly evolving with the promise of enabling increased efficiency, insight, and analysis across a broad range of use cases. CMS is embracing AI to improve health care administration and delivery. At the same time, policies, processes, and mechanisms are being established to ensure that AI-enabled systems at CMS are developed, deployed, and used responsibly and ethically, mitigating risks and protecting sensitive information while also maximizing benefits for CMS and beneficiaries.

Aligned to federal guidelines (see ai.cms.gov), this section provides current AI-related policies and recommended practices at CMS. Given the dynamic nature of the AI domain, frequent updates and changes are expected. It is not intended as a primer on AI technologies or approaches. Additional information on those topics can be found within the CMS AI Playbook and Technical Application Resources.

More policy guidance can be found in the HHS Intranet: “Reminder of Existing HHS IT User Policies Relevant for Third-Party Generative AI Tools,” “HHS Policy for Securing Artificial Intelligence (AI) Technology,” and “HHS Policy for Rules of Behavior for Use of Information and IT Resources.”

Additional Considerations

Accountability and Oversight

All individuals and teams employing AI or Machine Learning (ML) at CMS are responsible for the system's output, regardless of the model or tools used. The “owner(s)” of this output are also required to use the appropriate mandated security controls for “sensitive” and especially “privacy data” assets. Continuous human oversight is required to ensure CMS and federal guidelines are met.

Risk Management and Security

AI Model Supply Chain and Provenance

In the modern information security landscape, AI-driven supply chain attacks are a concern. These can introduce malicious code through compromised AI models, data, or code repositories. Teams deploying AI models should ensure the provenance and integrity of AI components used in production.

Example: The “Model Namespace Reuse” attack, demonstrated against Google and Microsoft products, highlights the risk of trusting a model based on its name alone. This form of attack can enable a threat actor to gain code execution permissions and gain access to underlying infrastructure.

Recommendation: Implement System Composition Analysis to verify the origin and integrity of AI dependencies.

Embracing a Zero-Trust Architecture for AI

Using AI technology that can perform outbound internet requests can introduce risk to CMS environments, including sensitive code and data that could be transmitted outside the network.

Recommendation: Teams should perform threat-modeling and risk assessments to ensure proper network segmentation and limit an AI tool’s access to only the necessary resources to contain the blast radius of any potential breach. Implement Data Minimization principles to ensure AI tools access only the information strictly necessary for their function. See The TRA section Zero Trust Architecture.

Monitoring, Versioning and Observability for AI and Production Operations

System maintainers must implement observability to understand how their AI systems are behaving over time. This requirement supports debugging, security, and continuous improvement, and ensures alignment with OMB M-25-21 and M-25-22.

Recommendation: Track traces, evaluations (EVALs), prompt management or versioning, and key metrics for production systems using AI.

Recommendation: Prompts — AI system prompts used for production at CMS systems should be versioned (See BR-SCM-1). Store prompts (or reusable “recipes”) and review them and measure their performance over time.

Data Handling, Data Retention and Privacy

Privacy-Preserving Techniques

When developing and testing AI models that handle sensitive data, teams should adopt privacy-preserving techniques like federated learning and homomorphic encryption. These methods allow for model training and inference on encrypted or decentralized data without direct sensitive data exposure.

Use of Synthetic Data for AI Initiatives

The use of synthetic data is highly recommended to avoid the use of sensitive Personally Identifiable Information (PII) or Protected Health Information (PHI) in development and lower environments entirely. This mitigates the risk of exposure and simplifies compliance.

Examples: AI techniques like Generative AI, GANs and VAEs, and initiatives like SyntheticMass and Synthea™ [can also be used to avoid the use of sensitive PII or PHI from development and lower environments entirely at CMS.

Review Data Rights, Retention Policies prior to the use of External AI Tools and Services

The use of AI tools, services or infrastructure external to CMS may introduce risks, including companies that train or fine-tune AI models using CMS non-public code, data or metadata.

Recommendation: Application teams, CMS system owner or business owners introducing new AI technology into their systems should ensure compliance with CMS security and privacy requirements, and that appropriate data use agreements are in place before incorporating external AI systems that can access non-public CMS data . Teams should be especially careful with Terms of Use and Agreements that enable the training, share or selling of CMS data. For inquiries, consult with the CMS Privacy Office.

AI Services and Data Retention: Content generated by your AI system may be subject to Data Retention policies and FOIA. For inquiries, consult with Records_Retention@cms.hhs.gov. .

Overall AI Business Rules

BR-AI-1: AI Tools and Services Must Meet Federal AI, Cybersecurity and Privacy Standards in the Handling of Sensitive Data

Sensitive data, including Protected Health Information (PHI), Sensitive Personally Identifiable Information (SPII), classified, export controlled, trade secret and other confidential information may only be used with AI tools and services that meet HHS and CMS Cybersecurity Standards.

To ensure compliance and protect privacy, consult with the CMS Privacy Office in relation to AI tools and Services that could handle sensitive data.

Rationale:

All HHS and CMS policies regarding AI use, Personally Identifiable Information (PII) protection, and data security (including storage, transmission, and sharing) must be adhered to in order to ensure proper implementation of AI risk management practices per the HHS AI Strategy and HHS Compliance Plan for OMB Memorandum M-25-21.

Also see in the TRA: BR-F-5: Any System That Processes CMS Data Must Be Covered by a CMS ATO; BR-SQ-6: De-Identification of Production Data Is Required in Non-Production Environments; NIST SP 800-122, “Guide to Protecting the Confidentiality of Personally Identifiable Information (PII);”NIST SP 800-18, ”De-Identifying Government Datasets: Techniques and Governance.”

BR-AI-2: High-Impact AI Use Cases Must Meet Minimum Risk Management Practices

A high-impact use case, as defined in the Office of Management and Budget (OMB) Memorandum M-25-21, Accelerating Federal Use of AI through Innovation, Governance, and Public Trust, is one where the AI’s output serves as a principal basis for decisions or actions that have a legal, material, binding, or significant effect on specific critical areas.

A CMS AI Governance risk assessment will determine if an AI use case is high-impact. All CMS high-impact use cases must apply minimum risk management practices in accordance with M-25-21.

Contact CMS IT_Governance for an AI Governance Risk Assessment.

Rationale:

Non-approved AI tools may retain or use submitted information in ways that could expose CMS sensitive data. Attackers may be able to steal sensitive data by manipulating prompts or exploiting vulnerabilities in AI systems. Non-approved AI tools may not be compliant with evolving federal regulations regarding data protection, records retention, consumer rights, and algorithmic transparency. Additionally, these tools may lack the security controls and oversight mechanisms required for handling federal government data within established security boundaries.

BR-AI-3: Foreign Entity AI Tools May Only Be Used if Deployed on CMS Infrastructure

With the rapid introduction of new AI capabilities into the market, it can be difficult to determine which are aligned to federal government requirements. AI technology from foreign entities can be used if it’s deployed on CMS infrastructure and does not send data to the internet. This aligns with the principle of CMS data staying within the United States (see BR-SAAS-8: CMS Data Must Always Reside in the U.S.). Such deployments must utilize normal CMS and AI governance processes, including the CMS Risk Management Framework and the NIST AI Risk Management Framework.

Rationale:

The use of unapproved or foreign AI tools and models could compromise CMS information. Foreign AI models, especially those from countries with broad government access to data, could be used for surveillance and the collection of sensitive personal information. Third-party AI models may lack sufficient security controls, potentially exposing the systems to data breaches.

BR-AI-4: Human Review Must Follow Use of AI Tools to Write CMS Policies

Individual CMS employees are accountable for official CMS policies. AI tools are only to be used in support of and not as a substitute for human decisions and oversight of CMS strategic or compliance-related activities. Humans must be in the loop.

Rationale:

AI tools lack the capacity for human judgment and critical thinking, which is crucial for interpreting complex situations, considering ethical implications, and crafting policies that fully align with CMS’ values and objectives. AI tools may generate false or misleading information as factual and may reflect biases present in their training data.

BR-AI-5: Do Not Rely on AI for Final Decisions for “High Impact” Cases

AI should provide advice or recommendations; final decisions must be made by qualified staff with documented oversight. AI use cases for these purposes are considered potentially “High-Impact AI,” and actions they support may risk compliance with privacy and civil rights requirements.

Rationale:

OMB memorandum M-25-21, Accelerating Federal Use of AI through Innovation, Governance, and Public Trust defines “High-Impact AI” as “AI with an output that serves as a principal basis for decisions or actions with legal, material, binding, or significant effect on:

“an individual or entity’s civil rights, civil liberties, or privacy; or
“an individual or entity’s access to education, housing, insurance, credit, employment, and other programs;
“an individual or entity’s access to critical government resources or services;
“human health and safety;
“critical infrastructure or public safety; or
“strategic assets or resources, including high-value property and information marked as sensitive or classified by the Federal Government.”

AI tools lack the capacity for human judgment and the nuanced perspective necessary to make complicated decisions. High-Impact AI risks need to be managed.

BR-AI-6: AI-Supported Official Actions Are Subject to Records Retention Requirements

When AI provides work products in support of official CMS actions that would be subject to records retention, those work products become part of the record and must be retained according to the applicable records retention schedule. Those AI-created work products must include the AI model version and the prompt(s) used.

Rationale:

“Adequate and proper documentation” (per 44 U.S.C. Chapter 31 §3101) supporting official CMS actions are part of the action. This supports transparency of AI use. For more information, consult with the OSORA Issuances, Records & Information Systems Group.

AI-Assisted Coding Guidance

Recommended practices regarding AI-assisted coding are available within the TRA Application Development section on AI-Assisted Coding.

Threat Modeling for AI Systems

See TRA Recommended Practice AD-SS-8: Perform Threat Modeling During the Design Phase to Identify Potential System Threats. Additional recommended practices regarding performing Threat Modeling for AI systems can be found in A Practical Understanding of Threat Modeling for AI Systems

AI Recommended Practices

This section provides Recommended Practices information for AI overall and for specific AI concepts.

RP-AI-1: Release and Maintain AI Code as Shareable Open Source Software

CMS projects must prioritize sharing AI code, models, and data government-wide, consistent with the Open, Public, Electronic and Necessary (OPEN) Government Data Act. Custom-developed AI code is also covered by the SHARE-IT Act and OMB M-25-21. More information is found in the TRA Application Development section on Open Source Software.

AI Prompt Crafting

Prompt Crafting is the practice of writing clear and helpful directions that a large language model (LLM) can use to generate more accurate, relevant, and useful outputs for a given task or question. This is like providing a person the right level of contextual detail and specific instructions to complete a task in the manner it is to be completed. For the best results using LLMs, it is important to build instructions and relevant details in a similar manner- by providing the task to accomplish, the role that the AI is taking on to complete it, any relevant reference materials as context, and the desired response format.

Prompt Crafting is the bridge between human intent and LLM capability. It is the heart of what determines the quality and usefulness of AI-generated responses from LLMs. In the context of CMS Chat, effective prompting enables users to interact with documents, synthesize information across sources, and generate new content effectively.

For CMS, effective Prompt Crafting matters because it:

Improves the accuracy and reliability of AI-generated outputs
Increases consistency in AI interactions across the organization
Reduces the likelihood of AI hallucinations or incorrect responses
Enhances the user experience of staff using CMS Chat
Increases the efficiency of the AI and reduces time and compute costs

The following best practices can be applied to effectively use chatbots like CMS Chat to enhance the user experience and ensure more accurate and useful AI interactions

RP-AI-2: Clearly define the context of the prompt

Be specific about which aspects the AI is to focus on and what type of analysis is needed. Provide relevant background information, specify limitations, include domain-specific terminology, and define the scope of the response

Example: “Within the context of the 2024 Medicare Physician Fee Schedule final rule, focusing specifically on telehealth provisions, analyze the requirements for audio-only services.”

RP-AI-3: Clearly define the role the AI should adopt

This approach provides clear context for the AI’s responses, helps maintain consistent tone and expertise level, and enables more targeted and relevant outputs.

Example: Instead of “Tell me about Medicare Part B coverage,” use “Within the context of the 2024 Medicare Physician Fee Schedule final rule, focusing specifically on telehealth provisions, analyze the requirements for audio-only services.”

It is also possible to provide roles that the AI can use to tailor the generated output.

Example: “Act as an experienced Medicare benefits counselor with 15 years of experience explaining coverage to beneficiaries. Explain Medicare Part B coverage in simple terms.”

RP-AI-4: Break down complex requests into clear, sequential steps

This approach helps AI systems provide more organized and concise responses. When analyzing documents or synthesizing information, structure prompts to guide the AI through the process.

Example: “First, analyze the key points of this policy document. Then, identify any changes from the previous version. Finally, summarize the potential operational impacts for the policy team.”

Sometimes, it’s best to break these steps into separate prompts for each step. AI have a maximum amount of response length they can provide, so allowing them to focus on one step at a time, with a full response at each stage, may provide better end results. This is also true when you have large documents. The AI can be asked to review the first chunk of text, then the next, and so on until the needed details are derived from the entire document.

Example:

Prompt 1: “Analyze the key points of this policy document.”
Prompt 2: “Review the key points provided and identify any changes from the previous version.”
Prompt 3: “Based on the output in the previous response, summarize the potential operational impacts for the policy team.”

RP-AI-5: Provide Guidelines

Outline any scoping, rules, writing styles, or specify any specific format requirements for the response, such as using headers, bullet points, or specific citation formats. Clearly communicate how the information should be presented.

Example: “Present your findings in a structured format using headers to break up the text, followed by a few sentences that explain the header, and then bullet points to breakdown the main points.”

Sometimes, more complex guidelines and response formats are necessary. It’s possible to instruct many AIs to respond in structured formats while also demanding it utilize a specific approach, style, or scope.

Example: “Present your findings in a structured table format. I’d like columns A, B, C, and in A, please create categories for the text, followed by a brief description of each in B, then bullet points to breakdown the main points in C. Write in the style of a CMS Policy Expert but be concise and clear. Avoid jargon or complex topics. Instead, focus on clarity and simplicity.”

RP-AI-6: Create New Chats or Sessions When Switching Topics

It is best to create a new chat or discussion session when switching topics. The context of conversations in generative AI prompting sessions is typically stored and available throughout a single chat. Unless the context of the entire chat is needed for the next query, it is recommended to create a new conversation, so the AI does not hallucinate or conflate topics. For example, if the AI was asked in one chat to clean up a project presentation, then instead of asking how to write a policy document in the same chat, start a new one with the right level of context provided to ensure the AI has no distractions from the desired intent.

RP-AI-7: Use Structured Prompts

The best prompts are structured with all the details covered in the previous best practices in mind. A recommended structure template to get the best results is:

[Task] + [Role] + [Guidelines] + [Reference Context (Reference Materials)]

Each are further defined below:

[Task] – The action(s) you wish the AI to take based on the [Role], [Guidelines], [Format Requirements], and [Reference Context (Reference Materials)]
[Role] – The role that the AI is to take on to complete the [Task]
[Guidelines] – Rules or guidance for the AI to ensure it accomplishes the task as expected. This may include the expected output structure/format or instructions on how to write (perspective, style, etc.), and other guidance for the AI
[Reference Context (Reference Materials)] – Relevant attachments, details, documentation, resource materials, etc. that can act as the source of truth for the AI to accomplish its [Task]

Example:

Task: “Please create a categorized list of Medicare Part B rules relevant to a retiree from Tennessee.”

Role: Act as an experienced Medicare benefits counselor with 15 years of experience explaining coverage to beneficiaries. Explain Medicare Part B coverage in simple terms. You complete TASKS based on the GUIDELINES and REFERENCE CONTEXT and respond as outlined by the FORMAT REQUIREMENTS

Guidelines: Please respond in a bulleted or tabular format. Responses should be concise, but fully explained in a style most suitable to the intended audience. Note if any relevant details may be missed and what additional actions should be taken to ensure the [Task] can be completed.

Reference Context: *Attached Document* “OR [Copy/Paste text from the appropriate documentation]”

RP-AI-8: Use Iterative Refinement

Prompt Crafting is not a one-time process but rather an iterative approach that involves starting with a basic prompt, evaluating the response, refining the prompt based on the output, and then testing and validating any improvements. It is also a creative process — one that benefits from continuous experimentation and adjustment to ensure the AI’s responses align with evolving requirements and user needs. By reviewing each new output, identifying gaps or inaccuracies, and incorporating feedback into the next version of the prompt, steady improvements will be achieved in both the clarity and effectiveness of AI interactions.

RP-AI-9: Recommended AI-Assisted Development Methodologies

Recommended Workflow Steps:

Specify: Create a formal, human-readable specification that defines the goals in accordance with best-practices.
Plan: Use AI to generate a technical plan based on the spec and enterprise standards.
Implement: Use AI to break down the plan into small, testable tasks and generate code.
Review and Verify: A developer reviews each code change for accuracy, security, and quality. Automated testing and security checks run in CI/CD.
Iterate: If issues are found, refine the specification or prompts and re-run the cycle.

TRA 2026 Release 1 • General Distribution / Unclassified Information