Lab 7 · Agentic AI · Healthcare · Claude API

Build an Agentic AI Workflow:
From Single Answer to Autonomous Action

In this lab you will move from asking AI a question to building an AI that takes a sequence of actions — experiencing firsthand the difference between a chatbot and an agent, and designing the human-in-the-loop safeguards that make agentic AI safe enough to deploy.

Google Colab Claude API No prior coding required ~90 minutes

What you will build

This lab has three parts that each deepen your understanding of what "agentic AI" actually means in practice — not just as a concept from the chapter, but as something you can observe, test, and evaluate yourself.

Part 1
~20 min · No code
Map Epic's real agents (Art, Penny, the pre-visit assistant) to the agentic AI framework
Part 2
~40 min · Python + Claude API
Build a Penny-style insurance appeal agent that takes a denial and writes an appeal — step by step
Part 3
~30 min · Analysis
Design the human-in-the-loop governance framework for your agent — when should it act, and when should it stop?
Before you start
Read Chapter 7 before doing this lab — especially Sections 8, 9, 11, and 12. The concepts of agentic AI, Penny, Art, prompt engineering, and human-in-the-loop are all used directly in Parts 2 and 3. The lab will make much more sense if the chapter is fresh.
1
Mapping Epic's Agents No code · ~20 min
Understanding what makes an AI "agentic" before you build one

What separates an agent from a chatbot?

Before building anything, you need a clear mental model of the difference. A chatbot answers a question — one input, one output, done. An agent pursues a goal — it takes a series of steps, uses tools, checks its own work, and adapts when something doesn't go as planned. The difference is not a matter of how smart the AI is. It is a matter of how it is designed.

The three defining features of an agent
1. Goal-directed: the agent is given an objective, not just a prompt. It figures out the steps needed to reach the goal.

2. Multi-step: the agent takes a sequence of actions, with each step potentially depending on the results of the previous one.

3. Tool use: the agent can call external tools — search a database, read a document, write to a system — rather than just generating text.
1
Complete the agent analysis table

For each of Epic's three agents from Chapter 7, fill in the table. Use your own words — the goal is to make these concepts concrete, not to reproduce the chapter.

Agent What goal is it pursuing? What steps does it take? What tools does it use? Where is the human in the loop?
Art
Clinical intelligence
Your answerYour answerYour answerYour answer
Penny
Insurance appeals
Your answerYour answerYour answerYour answer
Pre-visit assistant
Patient preparation
Your answerYour answerYour answerYour answer
2
The chatbot vs. agent comparison

Penny replaces a process where an administrator asked a chatbot "help me write an appeal for this denial" and got back a generic draft. How is the agentic version of Penny different from that? Write 3–4 sentences identifying the specific differences in what the system does — not just in how good the output is.

What you just did
You performed a task decomposition — the same analysis an engineer does before building an agent. Before you can build a system that pursues a goal autonomously, you have to know exactly what steps that goal requires, what information each step needs, and where a human needs to be in the loop. The table you just filled in is the specification for an agent.
2
Build a Penny-Style Insurance Appeal Agent Python · Claude API
~40 minutes · Claude writes the code, you run and interpret

What you are building

You are going to build a simplified version of Penny — the agent that reads an insurance denial, pulls together the relevant clinical context, and drafts an appeal letter. The real Penny connects to Epic's EHR and retrieves actual patient records. Your version will work with realistic fictional patient data you provide directly. The underlying logic — read the denial, reason about it, draft a response — is the same.

More importantly: you are going to build it in stages, deliberately. First as a simple single-prompt chatbot. Then as a multi-step agent. The difference between those two versions is the entire point of Chapter 7.

Setup
Open Google Colab and create a new notebook. You will also need a Claude API key — your instructor will provide one for this lab, or you can get a free-tier key at console.anthropic.com. Keep your API key private — do not share it or commit it to GitHub.

Step 1 — Install and set up ~3 min

Cell 1 — Install and configure
!pip install anthropic --quiet

import anthropic
import json

# Paste your API key here — keep this notebook private
API_KEY = "your-api-key-here"
client = anthropic.Anthropic(api_key=API_KEY)

print("✓ Anthropic client ready")

Step 2 — Version A: The chatbot approach ~8 min

First, build the naive version — one prompt, one response. This is what existed before agents. Notice what it can and can't do.

Cell 2 — Version A: single-prompt chatbot
# The insurance denial we need to appeal
denial = """
INSURANCE DENIAL NOTICE
Patient: Maria Chen, DOB 03/14/1958
Claim #: HC-2024-88421
Service: Continuous Glucose Monitor (CGM) device + supplies
Denial reason: Not medically necessary. Patient does not meet
  criteria for CGM coverage under policy section 4.2.1.
  Criteria requires: Type 1 diabetes OR insulin-dependent
  Type 2 diabetes with documented hypoglycemic episodes.
"""

# Patient clinical context (in a real system, Penny retrieves this from the EHR)
patient_context = """
Patient: Maria Chen
Diagnosis: Type 2 Diabetes Mellitus (E11.9), diagnosed 2019
Current medications: Metformin 1000mg BID, Glipizide 10mg daily
  (Glipizide is a sulfonylurea — a class known to cause hypoglycemia)
Recent labs: HbA1c 8.4% (elevated), last 3 months
Recent notes: Patient reported two episodes of dizziness and
  confusion in past month; blood glucose 58 mg/dL on one occasion.
  Physician ordered CGM to monitor for hypoglycemic patterns.
"""

# VERSION A: Single prompt — just ask the AI to write the appeal
response_a = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1000,
    messages=[{
        "role": "user",
        "content": f"""Write an insurance appeal letter for this denial.

Denial: {denial}

Patient context: {patient_context}

Write a professional appeal letter."""
    }]
)

print("=== VERSION A: CHATBOT APPROACH ===")
print(response_a.content[0].text)

Step 3 — Version B: The agentic approach ~15 min

Now build the same task as an agent — three distinct steps, each building on the last. This is how Penny actually works.

Cell 3 — Version B: multi-step agent
def call_claude(prompt, system=None):
    """Helper: send a prompt, return the text response."""
    messages = [{"role": "user", "content": prompt}]
    kwargs = {"model": "claude-sonnet-4-20250514", "max_tokens": 1000, "messages": messages}
    if system:
        kwargs["system"] = system
    response = client.messages.create(**kwargs)
    return response.content[0].text

print("=== VERSION B: AGENTIC APPROACH ===")
print("Running 3-step agent pipeline...\n")

# STEP 1: Analyze the denial — identify the specific criteria gap
print("Step 1: Analyzing the denial...")
denial_analysis = call_claude(
    f"""You are an expert in insurance policy and medical billing.

Analyze this insurance denial carefully:
{denial}

Identify:
1. The exact policy criteria that triggered the denial
2. The specific gap between what the insurer requires and what they know about the patient
3. What clinical evidence, if present, would override this denial

Be specific and cite the policy section mentioned.""",
    system="You are a medical billing specialist with deep expertise in insurance policy language."
)
print("✓ Denial analysis complete\n")
print(denial_analysis)
print("\n" + "="*50 + "\n")

# STEP 2: Match clinical evidence to policy requirements
print("Step 2: Matching clinical evidence to policy...")
evidence_match = call_claude(
    f"""You are a medical billing specialist.

The denial analysis identified these requirements:
{denial_analysis}

The patient's clinical record contains:
{patient_context}

Identify:
1. Which specific clinical facts in the record satisfy the insurer's criteria
2. Which policy language the clinical facts speak to directly
3. Any clinical details that strengthen the medical necessity argument

Be precise — quote the relevant clinical facts and map them to specific criteria.""",
    system="You are a medical billing specialist who builds airtight insurance appeals."
)
print("✓ Evidence mapping complete\n")
print(evidence_match)
print("\n" + "="*50 + "\n")

# STEP 3: Draft the appeal using the structured analysis
print("Step 3: Drafting the appeal letter...")
appeal_letter = call_claude(
    f"""You are drafting a formal insurance appeal letter.

Use this analysis of the denial:
{denial_analysis}

And this mapping of clinical evidence to policy criteria:
{evidence_match}

Write a formal, professional appeal letter that:
- Cites the specific policy section and explains why the denial is incorrect
- References the clinical evidence that satisfies the insurer's criteria
- Is factual, specific, and concise — no generic language
- Ends with a clear request for reconsideration

Format it as a real letter ready for submission.""",
    system="You are a medical billing specialist drafting a formal insurance appeal. Be precise, factual, and professional."
)
print("✓ Appeal letter drafted\n")
print("=== FINAL APPEAL LETTER ===")
print(appeal_letter)

Step 4 — Compare the two versions ~10 min

Question Version A (chatbot) Version B (agent)
Does it identify the specific policy criteria?Your observationYour observation
Does it cite specific clinical facts from the record?Your observationYour observation
Would a billing specialist need to rewrite it before submitting?Your observationYour observation
Could it handle a more complex denial with multiple criteria?Your observationYour observation
What the comparison shows
The chatbot approach generates text that looks like an appeal. The agentic approach generates an appeal grounded in the specific policy language and the specific clinical evidence. The difference isn't just quality — it's reliability. An administrator reviewing Version B has something they can verify. An administrator reviewing Version A has to essentially redo the analysis themselves to know if the draft is accurate. That's the difference between augmentation and a time sink.

Step 5 — Test your own denial ~5 min

Modify the denial and patient context and run the agent again. Try a different type of denial — a medication, a procedure, or a specialist referral. Observe how the agent adapts its analysis to the new scenario.

💬 Ask Claude for a test case
Tell Claude: "I'm building an insurance appeal agent for a class lab. Give me a realistic fictional insurance denial for a different type of claim — not a CGM device — and corresponding fictional patient clinical context that would support a strong appeal. Keep it medically realistic."
3
Design the Human-in-the-Loop Governance Framework Analysis · ~30 min
When should the agent act — and when must it stop?

The governance problem

You have just built an agent that can analyze a denial, match evidence to policy, and draft an appeal — all without human involvement. The agent works. Now the harder question: should it work without human involvement? At every step? Or only some steps?

This is the human-in-the-loop question from Chapter 7, Section 12 — and it does not have one right answer. It depends on accuracy, stakes, accountability, and trust built over time. Your job in this part is to think through it seriously for the agent you just built.

1
Map the risk at each step

For each of the three steps in your Version B agent, assess the risk of the AI getting it wrong — and the consequence if it does.

Agent stepWhat could go wrong?Consequence if wrongHuman review: required or optional?
Step 1: Analyze the denialYour answerYour answerYour answer
Step 2: Match clinical evidenceYour answerYour answerYour answer
Step 3: Draft the appeal letterYour answerYour answerYour answer
Final submission to insurerYour answerYour answerYour answer
2
Add a self-check step to your agent

Real agents don't just act — they verify their own work before surfacing it to humans. Add a fourth step to your agent that reviews the appeal letter before it is presented for approval.

Cell 4 — Add a self-check step
# STEP 4: Self-check — the agent reviews its own output
print("Step 4: Agent self-check...")
self_check = call_claude(
    f"""You are a senior medical billing reviewer.

An AI agent has drafted this insurance appeal letter:
{appeal_letter}

The original denial was:
{denial}

The patient's clinical record contains:
{patient_context}

Review the appeal critically:
1. Does every factual claim in the letter match the clinical record exactly?
2. Does the letter address ALL the specific criteria in the denial, or does it miss any?
3. Is there any language that could weaken the appeal or give the insurer grounds to re-deny?
4. Rate the appeal: READY TO SUBMIT / NEEDS MINOR REVISION / NEEDS MAJOR REVISION

Be specific about any issues found.""",
    system="You are a senior medical billing reviewer. Be critical and precise."
)

print("=== AGENT SELF-CHECK ===")
print(self_check)

# In a real system, the agent would route the letter differently
# based on the self-check result before presenting to a human
if "READY TO SUBMIT" in self_check.upper():
    print("\n→ Agent confidence: HIGH — presenting to administrator for final review")
elif "MINOR" in self_check.upper():
    print("\n→ Agent confidence: MODERATE — flagging issues for administrator before review")
else:
    print("\n→ Agent confidence: LOW — escalating to senior billing specialist")
Written reflection — submit with your lab
  1. The agentic vs. chatbot difference. After running both versions, describe in your own words what specifically makes Version B agentic. Don't just say "it has more steps" — explain what each step does that a single prompt can't.
  2. Your governance framework. Based on your risk analysis in Step 1, describe where you would place human review in the Penny workflow. Which steps should be automatic? Which should require human approval before proceeding? Would your answer change if the agent had processed 10,000 appeals with 98% accuracy?
  3. Human-in-the-loop vs. human-on-the-loop. Chapter 7 distinguishes between a human who approves each output (in-the-loop) and a human who monitors a dashboard of autonomous actions (on-the-loop). For the Penny use case specifically, at what demonstrated accuracy rate and under what conditions would you be comfortable moving from in-the-loop to on-the-loop? What would you need to see before making that change?
  4. The liability question. If your agent drafts an appeal letter that contains a factual error — misreading the patient's lab value or misquoting the policy — and an administrator submits it without noticing, who is responsible: the hospital, the software vendor, the administrator, or the AI? How should responsibility be allocated, and does your answer change the governance framework you designed?
  5. Beyond billing. The chapter describes Art (clinical intelligence) and a pre-visit patient assistant alongside Penny. Pick one of those agents and apply the same governance analysis you did for Penny. Where is the risk higher? Where would you require stricter human review — and why?

Lab deliverables

The thread connecting all three parts
Part 1 asked you to define what an agent is by mapping Epic's real ones. Part 2 had you experience the difference by building both versions yourself. Part 3 asked you to confront the question the chapter ends with: the capability of the AI is not the binding constraint. The binding constraint is trust, governance, and accountability — and those are human decisions, not technical ones. That is the core lesson of every Epic AI deployment, and the core lesson of this lab.
← Chapter 7: Epic & Agentic AI Back to all chapters →