Chapter 1 · Healthcare AI · Knowledge Graphs

Knowledge Graphs & AI Prediction:
How EveryCure Finds Hidden Treatments

There may already be a drug that cures your disease. Nobody has found it yet. EveryCure is using AI to change that — and their approach reveals one of the most important patterns in all of machine learning.

Company: EveryCure Industry: Healthcare / Biotech Core concept: Knowledge graphs & AI prediction
Also in this chapter: Lab 1: Build a Knowledge Graph in Python →
Contents
1. Company Background 2. The Challenge: Too Many Possibilities 3. The EveryCure Approach 4. The Knowledge Graph 5. The AI Factory: Data to Value 6. Why This Problem Requires AI 7. Risks and Limitations 8. Summary Table & Discussion Questions

1 Company Background

7,000+
Rare diseases with no treatment
20,000+
Approved drugs worldwide
$2.6B
Average cost to develop a new drug
10–15 yrs
Average drug development timeline
Nonprofit
Organization type

EveryCure is a nonprofit organization founded on a deceptively simple premise: many patients suffer from diseases that could already be treated with existing drugs — but nobody has discovered the connection yet. The problem is not that medicine lacks the tools. The problem is that the space of possible drug-disease combinations is so large that humans cannot search it systematically.

There are more than 20,000 approved drugs in the world and thousands of known diseases. The number of possible combinations between them runs into the tens of millions. Traditional drug discovery addresses this by starting with one disease, identifying its biological mechanisms, and searching for a compound that might interfere with those mechanisms. This process typically takes 10 to 15 years and costs billions of dollars — and it focuses on new molecular entities, not existing drugs.

EveryCure takes a different approach entirely. Rather than developing new drugs from scratch, the organization searches for new uses for drugs that already exist. This is called drug repurposing — and it has a powerful economic logic: repurposed drugs already have known safety profiles, manufacturing processes, and regulatory histories. If a match can be found, the path to patient treatment is dramatically shorter and cheaper than developing a new compound.

Why this matters
EveryCure's founders believe that many patients suffer not because treatments do not exist, but because no one has systematically searched for them. AI makes that systematic search possible for the first time.

2 The Challenge: Too Many Possibilities

The core problem EveryCure faces is one of scale. There are thousands of approved drugs and thousands of diseases. The number of possible drug-disease pairs is enormous. Evaluating each possibility one at a time — even with a large team of researchers — would take centuries. And that assumes researchers know which combinations are worth investigating in the first place.

Traditional research is not designed for this kind of search. A scientist studying Parkinson's disease becomes deeply expert in the biology of Parkinson's. They are unlikely to notice that a drug approved for a gastrointestinal condition also affects a protein implicated in neurodegeneration — unless they happen to read the right paper at the right time. The knowledge exists. The connection is just invisible without a system designed to find it.

Key concept
The "needle in a haystack" problem in AI
Many of the most valuable applications of AI involve searching through a very large space of possibilities to find the few that matter. This is called a combinatorial search problem. The number of possible drug-disease combinations in EveryCure's system is far too large for humans to evaluate manually — but machine learning models can score all of them simultaneously, identifying the most promising candidates for human researchers to investigate. Recognizing when a problem has this structure — large possibility space, complex relationships, need for repeated evaluation — is one of the core skills in AI strategy.

The data EveryCure needs to solve this problem is not missing. It exists scattered across scientific research articles, clinical trial databases, drug registries, genetic databases, protein interaction data, and electronic health records. The challenge is connecting it — and then reasoning across those connections at scale.

3 The EveryCure Approach

EveryCure built an AI platform designed to search for possible connections between drugs and diseases. Rather than guessing randomly or relying on a researcher's intuition, the system analyzes large datasets and produces a score for each possible drug-disease pair — an estimate of how likely that drug is to help treat that condition.

Researchers can then focus their experimental work on the most promising possibilities: the pairs with the highest scores, the most supporting biological evidence, and the strongest structural logic. The AI does not replace the scientist. It tells the scientist where to look.

Key concept
Drug repurposing
Drug repurposing (also called drug repositioning) is the process of identifying new therapeutic uses for drugs that are already approved for other conditions. Because repurposed drugs have already passed safety trials, the path from discovery to patient treatment is much shorter than developing a new drug from scratch. Famous examples include aspirin (originally for pain, now used for cardiovascular prevention) and thalidomide (notorious for birth defects, now used to treat certain cancers under strict controls). AI makes repurposing vastly more systematic by allowing researchers to search the entire space of drug-disease combinations simultaneously.
Key concept
Human-in-the-loop AI
In many AI systems, particularly in high-stakes domains like healthcare, the model generates predictions but humans make the final decisions. This is called a human-in-the-loop design. EveryCure's system suggests which drug-disease combinations are worth investigating, but researchers decide what to actually test. This design reflects a fundamental principle: AI systems are most trustworthy when their outputs are treated as inputs to human judgment rather than final answers. The higher the stakes of a wrong decision, the more important it is to keep humans in the loop.

4 The Knowledge Graph

To organize the vast amount of biomedical information it uses, EveryCure represents data as a network of connected concepts called a knowledge graph. In this graph, different types of information are linked together: drugs, diseases, genes, proteins, symptoms, and biological pathways. Rather than storing these as separate tables of data, the knowledge graph stores the relationships between them.

Key concept
Knowledge graph
A knowledge graph is a data structure that stores information as a network of nodes (entities) and edges (relationships between them). Nodes might represent drugs, diseases, genes, or proteins. Edges represent relationships: "Drug A affects Protein B," "Protein B is associated with Disease C," "Disease C shares a biological pathway with Disease D." Knowledge graphs are powerful for AI because they allow a model to reason across many types of information simultaneously — finding indirect connections that would be invisible if each dataset were stored separately. Google's search engine, Meta's social network, and medical AI systems like EveryCure all rely on knowledge graph structures.

For example, a drug may affect a protein, that protein may be associated with a disease, and two diseases may share similar biological mechanisms. By connecting these pieces of information, the system can identify relationships that are not obvious when looking at each dataset separately. A drug approved to treat diabetes might, through a chain of biological connections, be relevant to a rare neurological condition — a connection no human researcher would be likely to discover through traditional literature review.

Key concept
Graph-based machine learning
Traditional machine learning models work on tabular data — rows and columns. Graph-based ML models work on network data, learning patterns from the structure of connections between nodes rather than from individual data points in isolation. In EveryCure's system, a graph neural network can learn that "drugs connected to proteins that are connected to diseases via pathway X tend to be therapeutic candidates." These patterns are invisible to a tabular model but naturally captured by a graph model. Graph ML is increasingly important in drug discovery, fraud detection, social network analysis, and supply chain optimization.

5 The AI Factory: Data to Value

EveryCure's approach follows the same underlying pattern that appears in nearly every powerful AI application. Understanding this pattern — the AI Factory — is the central skill this course is designed to build.

Data
Model
Prediction
Decision
Value
Key concept
The AI Factory model
The AI Factory is a framework for understanding how organizations convert raw data into business value using machine learning. It describes a five-step loop: Data (collect and structure relevant information), Model (train a system to learn patterns from that data), Prediction (use the trained model to generate outputs for new cases), Decision (translate those predictions into actions), and Value (measure the business or social outcome). The loop is continuous — every decision generates new data that improves the next model. Organizations that build this loop well develop compounding advantages over time.

In EveryCure's system, the loop works as follows:

Key concept
Supervised learning in prediction systems
EveryCure's scoring model is an example of supervised learning: the system is trained on known drug-disease relationships (cases where a drug is already known to treat a disease) and learns to predict which unknown pairs are likely to have a similar relationship. The model learns from labeled examples — "this combination works, this one doesn't" — and generalizes that learning to combinations that have never been tested. The quality of the predictions depends directly on the quality and completeness of the labeled training data.

6 Why This Problem Requires AI

Not every business problem needs machine learning. But EveryCure's problem has several characteristics that make it exceptionally well-suited for AI:

Business lesson
AI systems are most valuable in situations where the number of possibilities is too large for people to evaluate manually, where patterns exist across multiple data sources, and where the same type of decision needs to be made repeatedly at scale. EveryCure checks every one of these boxes. Learning to recognize this pattern in other industries is one of the most transferable skills in this course.

7 Risks and Limitations

Although AI can help identify possible treatments, it does not guarantee correct answers. Predictions depend on the quality of the data and the assumptions built into the model. A knowledge graph built from published research inherits all the biases of published research — diseases that have received more funding and attention will have more data, making the model more accurate for those conditions and potentially less accurate for rare or understudied diseases.

Important limitation
In healthcare, mistakes can have serious consequences. A false positive — a drug incorrectly predicted to treat a disease — could waste years of research funding and, in the worst case, harm patients if it advances to clinical testing. This is why EveryCure's human-in-the-loop design is not just a technical choice — it is an ethical requirement. The model suggests. Humans decide. The higher the stakes, the more important that distinction becomes.
Key concept
Data bias in AI systems
An AI model is only as good as the data it was trained on. If the training data over-represents certain diseases, populations, or research traditions, the model will perform better in those areas and worse in others. In biomedical AI, this means conditions that affect wealthy populations or have received significant research funding tend to have richer data and produce more accurate models. Rare diseases, conditions affecting developing-world populations, and historically underfunded research areas may produce weaker predictions — even though these are often the areas where AI assistance is most needed. Recognizing data bias is a core responsibility of anyone deploying AI in a real-world setting.

8 Summary Table & Discussion Questions

AI Factory model: EveryCure mapped

StepEveryCure exampleBusiness purposeKey ML concept
DataResearch articles, clinical trials, drug databases, protein networks organized in a knowledge graphCreate a unified view of biomedical knowledge that no single dataset providesKnowledge graphs; data integration
ModelGraph neural network trained on known drug-disease relationshipsLearn patterns linking drugs, targets, and diseasesGraph-based ML; supervised learning
PredictionScore for each drug-disease pair indicating therapeutic likelihoodPrioritize which combinations are worth investigatingScoring models; combinatorial search
DecisionResearchers select top-scored pairs for laboratory and clinical testingFocus expensive human research on highest-probability candidatesHuman-in-the-loop AI
ValueNew treatments discovered faster and cheaper than traditional drug developmentPatient benefit; reduced research cost; scientific progressAI governance; feedback loops

ML vocabulary introduced in this chapter

Drug repurposing
Finding new uses for already-approved drugs
Knowledge graph
A network of nodes (entities) and edges (relationships)
Graph-based ML
Learning patterns from network structure, not just tabular data
Supervised learning
Training on labeled examples to predict outcomes
Human-in-the-loop AI
AI suggests, humans make the final decision
Combinatorial search
Searching a very large space of possible combinations
Data bias
Model performance skewed by unrepresentative training data
The AI Factory
Data → Model → Prediction → Decision → Value loop

Discussion questions

  1. What problem is EveryCure trying to solve? Why is this problem difficult for humans to solve without AI? What specific characteristics make it well-suited for machine learning?
  2. What is a knowledge graph? In your own words, explain why organizing biomedical data as a graph is more powerful than storing it as separate spreadsheets or databases.
  3. Map the AI Factory: Walk through each step — Data, Model, Prediction, Decision, Value — using EveryCure as your example. Which step do you think creates the most business value, and why?
  4. Human-in-the-loop design: EveryCure's AI suggests treatments but humans decide what to test. Is this the right design? At what point, if ever, should AI be allowed to make the final call in a healthcare context?
  5. Data bias in healthcare AI: EveryCure's model is trained on published research. What diseases or populations might be underrepresented in that data? What are the consequences of that bias for the model's predictions?
  6. How is this similar to AI in business? EveryCure uses the same AI Factory pattern as Netflix, Spotify, and Amazon. Pick one of those companies and explain how their system mirrors EveryCure's — and where they differ.
  7. The AI Factory in your industry: Choose an industry you are interested in and describe what a knowledge graph might look like in that space. What would the nodes be? What would the edges represent? What could the model predict?
← Back to All Chapters Lab 1: Build a Knowledge Graph in Python →