17 C
Casper
Wednesday, June 17, 2026

“Disheveled,” “Not Coherent” — The Bias Is in the Notes, Not the AI

Must read

Khushbu Raval
Khushbu Raval
Khushbu Raval is a Senior Correspondent and Content Strategist at Vibe Media Group, specializing in AI, Cybersecurity, Data, and Martech. A keen researcher in the tech domain, she transforms complex innovations into compelling narratives and optimizes content for maximum impact across platforms. She's always on the hunt for stories that spark curiosity and inspire.

Clinical notes are not the event itself — they are someone’s interpretation of it. That gap is where healthcare AI’s earliest bias problem begins.

Long before anyone worried about AI models becoming too powerful, Engy Ziedan was worried about something else: the data those models learn from in the first place.

An applied microeconomist by training and now Co-Founder and Chief Scientific Officer of Protege, Ziedan spends her time at the intersection of health economics, behavioral science, and causal inference — work that has, at various points, shaped CDC school-reopening guidance and forced her to confront how much bias gets quietly encoded into clinical records before a single algorithm ever sees them.

Her path to Protege began with a research problem that should have been simple. linking electronic medical records to mortality data during COVID, and took months longer than it should have. That frustration, shared with co-founder Bobby Samuels, became the seed of a company built to make healthcare data not just accessible, but trustworthy enough to build on.

In this conversation, Ziedan discusses the uncomfortable truths hidden in clinical notes, why dermatology data behaves very differently from radiology data, and what gets lost when the populations who need answers most are the ones least represented in the data.

Full interview; 

What’s the most uncomfortable truth your data has ever revealed — something you found that you almost wished you hadn’t?

An uncomfortable reality is that healthcare’s earliest AI alignment challenge may have less to do with increasingly powerful models and more to do with the biased human narratives reflected in the data that trains them.

That was something I suspected, but it is different when you actually sit with the records. In healthcare, we often use data as a proxy for what is happening inside a person’s body, or what they are feeling, or what a physician observed – but the data is not the event itself. It is the residual of the event. Sometimes it feels like reading the last 30 seconds of the last minute of a movie and pretending you understood the whole plot.

Clinical text can carry all kinds of hidden aspects. The way a patient is described can vary by gender, age, race, or social context. A note may say someone is “disheveled” or “not coherent,” but those words are not neutral. They reflect a human interpretation of a judgment made about someone (biased or unbiased). If we train models on those narratives without understanding them, we risk encoding the bias as if it were clinical truth.

That is the part I wish were not so visible. Once you see enough records, you cannot unsee that the data is both incredibly valuable and deeply imperfect.

Also Read: Karthik Ranganathan on Why AI’s Future Starts With Data Infrastructure

Take us back to the moment you decided to co-found Protege? What was happening in the research that made you realize it was needed?

The simplest answer is that it was just really hard to connect the different pieces of a patient journey. In 2020, I met my co-founder, Bobby Samuels, now Protege’s CEO, as a research customer at Datavant. I was trying to write what I thought would be a simple paper. I wanted to link electronic medical records to mortality data and study what happened when people had appointments canceled during COVID. The idea was that some patients had already scheduled care before COVID, and then, by random chance, their visits were displaced. I wanted to follow those patients over time and understand the value of the care they lost.

Bobby was incredibly helpful. He gave me compute, space to work with data, and access where he could, but it still took months. And this was for one research question, linking just a few things. At the same time, it was clear that AI would move much faster than the traditional data access world. We needed to link modalities across domains: clinical notes, imaging, claims, mortality, pathology, and more. The old way of doing that felt impossible.

Then, in February 2024, Bobby emailed me and said he had left his job. He shared his idea for an AI data marketplace. We had already been talking about academic access and a health consortium, and from that day forward, I basically never stopped working on what is now Protege.

The thing that made it feel necessary was not a grand theory. It was the daily reality of research. If the right data cannot be accessed, governed, and evaluated responsibly, then the research cannot happen. 

You work with large-scale, real-world healthcare data. Have you ever found a pattern in a dataset that changed how you think about a certain group of people?

One example I think about a lot is dermatology. Or at least have been studying a bit. 

People are increasingly comfortable uploading images of their bodies to AI systems and asking questions they might feel embarrassed to ask a physician, a patient portal, or even a family member. That changed how I think about patients’ relationship to AI. For some people, a model may feel like a lower-friction place to ask private or embarrassing health questions before bringing them to another person. The first thing that struck me was how sensitive that data is. You have to be extremely careful with privacy and de-identification, because these are not abstract records. They are people’s bodies, people’s anxieties, and sometimes things they have not told anyone else.

The second thing I noticed was how dermatology data behaves differently from other kinds of medical data. It changed how I think about healthcare providers as data producers – their notes are not less complete, they are optimized for a different purpose depending on their specialty. Radiologists describe. A radiology report is usually written for another physician, so it may include very specific details, such as the size and location of a mass. Dermatologists often do not describe in the same way. They prescribe. The note may go straight to what should be done about the condition, because the image is sitting right there in front of them.

That matters for AI. If the text does not describe the image, or if it refers to events outside the visit, a model can learn the wrong relationship between the image and the language. It can hallucinate.  So the pattern changed how I thought about both sides of the encounter. Patients may use AI to ask questions they are not ready to ask elsewhere, and clinicians document care in ways that reflect their specialty, workflow, and intended reader.

Also Read: Is AI Becoming America’s Next Security Crisis?

Your mission is to make healthcare data accessible. Can you share a time when someone, like a researcher, patient, or policymaker, could not get an answer they urgently needed because the data was not available?

Many models are not trained on very rare pediatric datasets, and there are good reasons for that. Children do not have full agency. Privacy standards are stricter. The same is true for other populations where the ethical bar is rightly higher. We should champion those privacy protections, not treat them as an inconvenience.

But the consequence is real. If someone is asking a model a question about a child with a rare condition, what is the quality of that judgment or inference if the model has seen very little relevant data? The cases with the highest potential benefit are often the least represented. In pediatrics, especially for rare diseases, the marginal value of better information can be enormous because the possible benefit is measured over a lifetime.

I do not think the answer is to weaken protections. The harder answer is that we need much more research on how to responsibly make these kinds of datasets useful while preserving privacy, agency, and scientific ethics. That is exactly where healthcare AI has to be careful. The urgency is real, but so are the constraints.

You teach at Tulane, run a data lab, and co-founded a company in New Orleans. The city has a unique relationship with healthcare inequality. Was your choice to work there intentional?

My academic work began with a question economists often ask: What is valuable medical care, and what is wasteful medical care?

Economists sometimes call it “flat-of-the-curve” medicine: more spending, but not much more health. When I moved to the U.S., it was shocking to see how much we spend on healthcare relative to other developed countries and how poorly we perform on many outcomes. That made me interested in whether we over-provision wasteful care and under-provision valuable care.

When AI entered healthcare, the promise was very compelling. If AI could reduce administrative burden, lower the marginal cost of delivering care, and expand access to valuable services, that could be meaningful. But many benchmarks and evaluations do not tell us that. A model improves by one percentage point on a benchmark, but what does that mean for added life years? For quality of life? For savings to a patient? For a clinician’s ability to make a better decision?

That is the health economist in me. I do not want to know only whether a model performed better on a test. I want to know what that improvement means in the real world, and for whom. 

Also Read: 100 Things Google Announced at I/O 2026

Your work has been cited by the CDC. Can you walk us through what happened after that? Did anyone reach out to you? Did it lead to any policy changes?

That work emerged from the pandemic, when schools were closed, and everyone was talking about learning loss. I remember going for a walk with Doug Harris, who was the chair of my department at Tulane, and we were talking about what could actually be done. The question became: can we answer whether opening schools increases COVID transmission in the most methodologically rigorous way possible?

We designed it as a quasi-experiment. The goal was to bring causal inference to a question that was emotionally charged, urgent, and affecting every family. We published the paper, and the CDC cited it and used our threshold in school reopening guidelines.

After that, the New York Times cited the work several times, and then school districts began reaching out. Over time, schools began opening. To this day, I sometimes mention the paper to parents, and they remember it. They remember the guidelines because the decision directly affected their children. 

The way I think about it now is that the situation was risky, but it had to be guided by science at some point. Otherwise, it would never have ended. That experience shaped how I think about data and policy. The work does not have to be perfect to be useful, but it must be methodologically honest, transparent about uncertainty, and grounded in the best available evidence. This is a lot easier said than done. 

More articles

Latest posts