Chapter 1: Knowledge Graphs & AI Prediction — EveryCure

Contents

1. Company Background 2. The Challenge: Too Many Possibilities 3. The EveryCure Approach 4. The Knowledge Graph 5. The AI Factory: Data to Value 6. Why This Problem Requires AI 7. Risks and Limitations 8. Summary Table & Discussion Questions

1 Company Background

7,000+

Rare diseases with no treatment

20,000+

Approved drugs worldwide

$2.6B

Average cost to develop a new drug

10–15 yrs

Average drug development timeline

Nonprofit

Organization type

EveryCure is a nonprofit organization founded on a deceptively simple premise: many patients suffer from diseases that could already be treated with existing drugs — but nobody has discovered the connection yet. The problem is not that medicine lacks the tools. The problem is that the space of possible drug-disease combinations is so large that humans cannot search it systematically.

There are more than 20,000 approved drugs in the world and thousands of known diseases. The number of possible combinations between them runs into the tens of millions. Traditional drug discovery addresses this by starting with one disease, identifying its biological mechanisms, and searching for a compound that might interfere with those mechanisms. This process typically takes 10 to 15 years and costs billions of dollars — and it focuses on new molecular entities, not existing drugs.

EveryCure takes a different approach entirely. Rather than developing new drugs from scratch, the organization searches for new uses for drugs that already exist. This is called drug repurposing — and it has a powerful economic logic: repurposed drugs already have known safety profiles, manufacturing processes, and regulatory histories. If a match can be found, the path to patient treatment is dramatically shorter and cheaper than developing a new compound.

Why this matters

EveryCure's founders believe that many patients suffer not because treatments do not exist, but because no one has systematically searched for them. AI makes that systematic search possible for the first time.

2 The Challenge: Too Many Possibilities

The core problem EveryCure faces is one of scale. There are thousands of approved drugs and thousands of diseases. The number of possible drug-disease pairs is enormous. Evaluating each possibility one at a time — even with a large team of researchers — would take centuries. And that assumes researchers know which combinations are worth investigating in the first place.

Traditional research is not designed for this kind of search. A scientist studying Parkinson's disease becomes deeply expert in the biology of Parkinson's. They are unlikely to notice that a drug approved for a gastrointestinal condition also affects a protein implicated in neurodegeneration — unless they happen to read the right paper at the right time. The knowledge exists. The connection is just invisible without a system designed to find it.

Key concept

The "needle in a haystack" problem in AI

Many of the most valuable applications of AI involve searching through a very large space of possibilities to find the few that matter. This is called a combinatorial search problem. The number of possible drug-disease combinations in EveryCure's system is far too large for humans to evaluate manually — but machine learning models can score all of them simultaneously, identifying the most promising candidates for human researchers to investigate. Recognizing when a problem has this structure — large possibility space, complex relationships, need for repeated evaluation — is one of the core skills in AI strategy.

The data EveryCure needs to solve this problem is not missing. It exists scattered across scientific research articles, clinical trial databases, drug registries, genetic databases, protein interaction data, and electronic health records. The challenge is connecting it — and then reasoning across those connections at scale.

3 The EveryCure Approach

EveryCure built an AI platform designed to search for possible connections between drugs and diseases. Rather than guessing randomly or relying on a researcher's intuition, the system analyzes large datasets and produces a score for each possible drug-disease pair — an estimate of how likely that drug is to help treat that condition.

Researchers can then focus their experimental work on the most promising possibilities: the pairs with the highest scores, the most supporting biological evidence, and the strongest structural logic. The AI does not replace the scientist. It tells the scientist where to look.

Key concept

Drug repurposing

Drug repurposing (also called drug repositioning) is the process of identifying new therapeutic uses for drugs that are already approved for other conditions. Because repurposed drugs have already passed safety trials, the path from discovery to patient treatment is much shorter than developing a new drug from scratch. Famous examples include aspirin (originally for pain, now used for cardiovascular prevention) and thalidomide (notorious for birth defects, now used to treat certain cancers under strict controls). AI makes repurposing vastly more systematic by allowing researchers to search the entire space of drug-disease combinations simultaneously.

Key concept

Human-in-the-loop AI

In many AI systems, particularly in high-stakes domains like healthcare, the model generates predictions but humans make the final decisions. This is called a human-in-the-loop design. EveryCure's system suggests which drug-disease combinations are worth investigating, but researchers decide what to actually test. This design reflects a fundamental principle: AI systems are most trustworthy when their outputs are treated as inputs to human judgment rather than final answers. The higher the stakes of a wrong decision, the more important it is to keep humans in the loop.

4 The Knowledge Graph

To organize the vast amount of biomedical information it uses, EveryCure represents data as a network of connected concepts called a knowledge graph. In this graph, different types of information are linked together: drugs, diseases, genes, proteins, symptoms, and biological pathways. Rather than storing these as separate tables of data, the knowledge graph stores the relationships between them.

Key concept

Knowledge graph

A knowledge graph is a data structure that stores information as a network of nodes (entities) and edges (relationships between them). Nodes might represent drugs, diseases, genes, or proteins. Edges represent relationships: "Drug A affects Protein B," "Protein B is associated with Disease C," "Disease C shares a biological pathway with Disease D." Knowledge graphs are powerful for AI because they allow a model to reason across many types of information simultaneously — finding indirect connections that would be invisible if each dataset were stored separately. Google's search engine, Meta's social network, and medical AI systems like EveryCure all rely on knowledge graph structures.

For example, a drug may affect a protein, that protein may be associated with a disease, and two diseases may share similar biological mechanisms. By connecting these pieces of information, the system can identify relationships that are not obvious when looking at each dataset separately. A drug approved to treat diabetes might, through a chain of biological connections, be relevant to a rare neurological condition — a connection no human researcher would be likely to discover through traditional literature review.

Key concept

Graph-based machine learning

Traditional machine learning models work on tabular data — rows and columns. Graph-based ML models work on network data, learning patterns from the structure of connections between nodes rather than from individual data points in isolation. In EveryCure's system, a graph neural network can learn that "drugs connected to proteins that are connected to diseases via pathway X tend to be therapeutic candidates." These patterns are invisible to a tabular model but naturally captured by a graph model. Graph ML is increasingly important in drug discovery, fraud detection, social network analysis, and supply chain optimization.

5 The AI Factory: Data to Value

EveryCure's approach follows the same underlying pattern that appears in nearly every powerful AI application. Understanding this pattern — the AI Factory — is the central skill this course is designed to build.

Data

→

Model

→

Prediction

→

Decision

→

Value

Key concept

The AI Factory model

The AI Factory is a framework for understanding how organizations convert raw data into business value using machine learning. It describes a five-step loop: Data (collect and structure relevant information), Model (train a system to learn patterns from that data), Prediction (use the trained model to generate outputs for new cases), Decision (translate those predictions into actions), and Value (measure the business or social outcome). The loop is continuous — every decision generates new data that improves the next model. Organizations that build this loop well develop compounding advantages over time.

In EveryCure's system, the loop works as follows:

Data: EveryCure collects biomedical data from many sources — research articles, clinical trials, drug databases, genetic data, protein interaction networks — and organizes it in a knowledge graph.
Model: Machine learning models analyze the graph, looking for patterns that link drugs, diseases, and biological mechanisms in ways that suggest therapeutic potential.
Prediction: The system produces a score for each drug-disease pair — an estimate of how likely that drug is to have therapeutic value for that condition.
Decision: Researchers review the highest-scoring predictions and decide which combinations to pursue in laboratory and clinical testing.
Value: If a new treatment is validated, patients benefit directly. Research costs decrease. And the results of testing feed back into the model, improving future predictions.

Key concept

Supervised learning in prediction systems

EveryCure's scoring model is an example of supervised learning: the system is trained on known drug-disease relationships (cases where a drug is already known to treat a disease) and learns to predict which unknown pairs are likely to have a similar relationship. The model learns from labeled examples — "this combination works, this one doesn't" — and generalizes that learning to combinations that have never been tested. The quality of the predictions depends directly on the quality and completeness of the labeled training data.

6 Why This Problem Requires AI

Not every business problem needs machine learning. But EveryCure's problem has several characteristics that make it exceptionally well-suited for AI:

Very large amounts of data — millions of research papers, clinical records, and biological databases
Many possible decisions — tens of millions of drug-disease combinations to evaluate
Complex relationships between variables — indirect biological connections spanning multiple data types
Patterns that are difficult for humans to see — connections visible only at the level of the whole network
Decisions that must be repeated many times — every new drug or disease discovery creates new combinations to evaluate

Business lesson

AI systems are most valuable in situations where the number of possibilities is too large for people to evaluate manually, where patterns exist across multiple data sources, and where the same type of decision needs to be made repeatedly at scale. EveryCure checks every one of these boxes. Learning to recognize this pattern in other industries is one of the most transferable skills in this course.

7 Risks and Limitations

Although AI can help identify possible treatments, it does not guarantee correct answers. Predictions depend on the quality of the data and the assumptions built into the model. A knowledge graph built from published research inherits all the biases of published research — diseases that have received more funding and attention will have more data, making the model more accurate for those conditions and potentially less accurate for rare or understudied diseases.

Important limitation

In healthcare, mistakes can have serious consequences. A false positive — a drug incorrectly predicted to treat a disease — could waste years of research funding and, in the worst case, harm patients if it advances to clinical testing. This is why EveryCure's human-in-the-loop design is not just a technical choice — it is an ethical requirement. The model suggests. Humans decide. The higher the stakes, the more important that distinction becomes.

Key concept

Data bias in AI systems

An AI model is only as good as the data it was trained on. If the training data over-represents certain diseases, populations, or research traditions, the model will perform better in those areas and worse in others. In biomedical AI, this means conditions that affect wealthy populations or have received significant research funding tend to have richer data and produce more accurate models. Rare diseases, conditions affecting developing-world populations, and historically underfunded research areas may produce weaker predictions — even though these are often the areas where AI assistance is most needed. Recognizing data bias is a core responsibility of anyone deploying AI in a real-world setting.

8 Summary Table & Discussion Questions

AI Factory model: EveryCure mapped

Step	EveryCure example	Business purpose	Key ML concept
Data	Research articles, clinical trials, drug databases, protein networks organized in a knowledge graph	Create a unified view of biomedical knowledge that no single dataset provides	Knowledge graphs; data integration
Model	Graph neural network trained on known drug-disease relationships	Learn patterns linking drugs, targets, and diseases	Graph-based ML; supervised learning
Prediction	Score for each drug-disease pair indicating therapeutic likelihood	Prioritize which combinations are worth investigating	Scoring models; combinatorial search
Decision	Researchers select top-scored pairs for laboratory and clinical testing	Focus expensive human research on highest-probability candidates	Human-in-the-loop AI
Value	New treatments discovered faster and cheaper than traditional drug development	Patient benefit; reduced research cost; scientific progress	AI governance; feedback loops

ML vocabulary introduced in this chapter

Drug repurposing

Finding new uses for already-approved drugs

Knowledge graph

A network of nodes (entities) and edges (relationships)

Graph-based ML

Learning patterns from network structure, not just tabular data

Supervised learning

Training on labeled examples to predict outcomes

Human-in-the-loop AI

AI suggests, humans make the final decision

Combinatorial search

Searching a very large space of possible combinations

Data bias

Model performance skewed by unrepresentative training data

The AI Factory

Data → Model → Prediction → Decision → Value loop

Discussion questions

What problem is EveryCure trying to solve? Why is this problem difficult for humans to solve without AI? What specific characteristics make it well-suited for machine learning?
What is a knowledge graph? In your own words, explain why organizing biomedical data as a graph is more powerful than storing it as separate spreadsheets or databases.
Map the AI Factory: Walk through each step — Data, Model, Prediction, Decision, Value — using EveryCure as your example. Which step do you think creates the most business value, and why?
Human-in-the-loop design: EveryCure's AI suggests treatments but humans decide what to test. Is this the right design? At what point, if ever, should AI be allowed to make the final call in a healthcare context?
Data bias in healthcare AI: EveryCure's model is trained on published research. What diseases or populations might be underrepresented in that data? What are the consequences of that bias for the model's predictions?
How is this similar to AI in business? EveryCure uses the same AI Factory pattern as Netflix, Spotify, and Amazon. Pick one of those companies and explain how their system mirrors EveryCure's — and where they differ.
The AI Factory in your industry: Choose an industry you are interested in and describe what a knowledge graph might look like in that space. What would the nodes be? What would the edges represent? What could the model predict?

← Back to All Chapters Lab 1: Build a Knowledge Graph in Python →

Knowledge Graphs & AI Prediction:How EveryCure Finds Hidden Treatments

1 Company Background

2 The Challenge: Too Many Possibilities

3 The EveryCure Approach

4 The Knowledge Graph

5 The AI Factory: Data to Value

6 Why This Problem Requires AI

7 Risks and Limitations

8 Summary Table & Discussion Questions

AI Factory model: EveryCure mapped

ML vocabulary introduced in this chapter

Discussion questions

Knowledge Graphs & AI Prediction:
How EveryCure Finds Hidden Treatments