Observe 1: This submit is an element 1 of a three-part collection on healthcare, data graphs, and classes for different industries
Observe 2: All photos by creator
Abstract
within the first half of the Nineteenth century, and you’re feeling an virtually paralyzing ache in your stomach. You now have a selection. You study to stay with that ache for the remainder of your life (which can solely be weeks or months away relying on what’s inflicting that ache) otherwise you enterprise to the physician, a nightmarish expertise probably involving tortuous remedies like bloodletting, laxatives, induced vomiting, or downing vials of mercury (Hager 52).
There isn’t a data about how illnesses unfold, so going right into a crowded hospital might imply publicity to smallpox and cholera (Kirsch and Ogas 80). If you’re unfortunate sufficient to want surgical procedure (or have a doctor prescribe an unneeded one—once more, there may be virtually no data of illness pathways), there might be no anesthesia. Discovering the perfect surgeon probably means discovering the quickest one, who can work as quickly as potential to reduce the time orderlies need to restrain you when you’re shrieking and writhing on the desk. When you survive the surgical procedure, you continue to have a major likelihood of dying of an an infection since there’s no data of germ principle and so no aseptic methods (Kirsch and Ogas 45). And in case you’re a pregnant girl, you may count on the maternity ward to be much more fucked up. Nearly 15 percent of infants born within the UK within the mid-Nineteenth century died at start.
Examine that with the medical care offered in any developed nation immediately, and let’s simply say, we’ve come a good distance. The toddler mortality charge in developed international locations is now lower than 6 per 1,000 stay births, or 0.6 percent. The common life expectancy in developed international locations is often higher than 80 in comparison with about 40 within the mid-Nineteenth century. We have now medication or different remedies for nearly the entire most typical illnesses, and humanity is curing extra day-after-day. The longer term seems to be much more promising, particularly with the rising capabilities of AI and the funding behind them. The Chan Zuckerberg Initiative (CZI), for instance, goals to assist scientists remedy, forestall, or handle all illnesses by the tip of the twenty first century.
How has healthcare made this progress? And why does healthcare proceed to draw disproportionate funding in AI immediately? It’s not merely higher knowledge; it’s higher construction round data. Lengthy earlier than computer systems, drugs started growing shared understandings of illnesses and causal relationships, managed vocabularies to catalog real-world entities, and knowledge requirements to make sure observations have been empirical and replicable. Taken collectively, these frameworks kind what we’d now acknowledge as a data graph.
At a excessive degree, data graphs resolve a recurring set of issues that change into unavoidable as domains scale:
- Search and retrieval throughout fragmented methods, codecs, and terminologies
- Discovery and design in complicated, interconnected methods
- Reuse and repurposing of present data and belongings
- Determination help beneath uncertainty, with explainable reasoning
- Advice and personalization grounded in area semantics
- Governance, traceability, and regulatory compliance
Mature area data graphs in healthcare are the explanation medication might be designed to focus on particular illnesses, why your physician is aware of in regards to the damaging negative effects of a drug in Japan even when it goes by a unique identify there, and why physicians can mixture and study from observations from thousands and thousands of medical encounters and experiments, typically in real-time.
On this three-part collection, I hope to supply some context and insights round how data graphs (and their precedents) have labored in healthcare, how healthcare grew to become the business chief in data graphs, and share some potential classes for different industries grappling with comparable challenges.
What’s a data graph?
A data graph is a layered data system by which ontologies outline that means, managed vocabularies catalog entities, and observational knowledge gives proof—permitting data to build up, evolve, and be reasoned over as understanding improves.
An ontology defines courses and the relationships between courses; it’s the principle underpinning the data graph. In drugs, courses are issues like pathogens, illnesses, and medicines. The ontology defines the constraints and causal assumptions for a way these items relate. For instance, pathogens are organisms and may trigger illnesses. Medication are chemical substances that may goal pathogens and, probably, inhibit illnesses. The ontology offers with courses fairly than cases–it doesn’t inform you which pathogens trigger which illnesses or which medication inhibit which pathogens.
The cases are outlined as managed vocabularies. Managed vocabularies are catalogs of cases of the courses outlined within the ontology. For instance, there are millions of identified pathogens that may trigger illnesses in people: every thing from viruses to micro organism to parasites. There are additionally 1000’s of medication and 1000’s of illnesses. These cases of courses are cataloged and maintained by specialists and are usually up to date as we study extra about them. Some managed vocabularies in healthcare are known as ‘omics’ as a result of they’re about issues that finish with the suffix “omics” resembling genomics, proteomics, and metabolomics.
Observe: I’m utilizing the broad time period “managed vocabularies” right here as an umbrella time period that features taxonomies, glossaries, dictionaries, reference knowledge, and thesauri. There are variations between these, however for the needs of this high-level article, we’re simply going to make use of the time period managed vocabulary for all of them.

The best way we study extra in regards to the world is thru statement, and in healthcare these observations are handled as proof. Medical trials and laboratory experiments produce observational knowledge that justify, refine, or refute claims about how entities in our managed vocabularies relate to one another. How do we all know that the pathogen Treponema pallidum causes the illness syphilis? As a result of scientists did an experiment and measured the end result and produced proof. How do we all know that Salvarsan targets and destroys Treponema pallidum and cures syphilis? As a result of scientists ran medical research and measured the consequences of treating syphilis sufferers with Salvarsan.

Connecting entities like this creates a graph. Entities in a graph are generally known as nodes, and the connections are known as edges. Graphs can comprise thousands and thousands of nodes and edges, and with this construction, patterns begin to emerge. For instance, you may establish a very powerful or impactful nodes in a graph, distinguish clusters of nodes which can be deeply linked, or discover the shortest paths between completely different entities. These methods (also known as graph analytics) are extensively utilized in drugs as half of what’s generally known as community drugs to establish illness mechanisms and potential therapeutic targets (Barabási, Gulbahce, Loscalzo, 2011). That is all potential with a graph, however since we’ve got an ontology, we’ve got greater than only a graph. We have now a data graph.
Connections in a data graph characterize express assertions in regards to the world: information. The data graph isn’t simply saying, “Salvarsan is linked to Treponema pallidum.” It’s saying “Salvarsan inhibits Treponema pallidum.” It additionally states that “Treponema pallidum causes syphilis.” These two information, mixed with the logic encoded within the ontology, allow the data graph to deduce a brand new relationship or truth—particularly, that Salvarsan could deal with or remedy syphilis. This is called reasoning or the power to derive “logical penalties from a set of information or axioms.” Information graphs excel at this as a result of they make each the information and the foundations for combining them express.
Drugs has been utilizing this information administration construction for many years. Scientists do experiments and study new issues. The findings of those experiments result in updates within the managed vocabularies and/or relationships between entities within the managed vocabularies. Gene X is said to protein Y, which is concerned within the organic course of Z. Because the variety of entities and relationships develop, so does our data. Generally, however a lot much less incessantly, the ontology adjustments. A considerable change in an ontology is not only an incremental improve in data, however typically a change in the best way we perceive the world.
Healthcare is the chief in data graphs as a result of it excels in all three of those layers. It has spent a long time refining causal fashions for a way the pure world works; meticulously cataloging thousands and thousands of illnesses, medication, proteins, and every thing else related for drugs; and conducting empirical, replicable experiments with standardized knowledge outputs. These foundations have been strengthened by robust regulatory strain that mandated standardization and comparability of proof, widespread pre-competitive collaboration and public funding, and early adoption of open, vendor-neutral semantic requirements. Mixed, these components created the circumstances by which data graphs might thrive as core infrastructure fairly than experimental expertise.
What issues do data graphs resolve?
After getting entities mapped collectively, validated with real-world proof, and grounded in causal pathways, you will have a data graph, and you are able to do all types of cool stuff. I’ll undergo a few of the most outstanding use instances of information graphs in healthcare immediately and the way they could apply to different domains.
Search
In all probability the commonest use case for data graphs is search. Trendy healthcare requires the power to retrieve related, linked context throughout heterogeneous and multimodal knowledge. Suppose you’re employed at a big pharmaceutical firm and also you need to know every thing a few given drug. You may need to repurpose this drug, assess its security danger, or examine it with a competitor. Or, possibly the FDA requested you for details about it. You’d have to go looking in relational databases for experimental knowledge, content material administration methods for medical trial studies, and a number of third-party databases for established public or business data. Not solely is the information scattered throughout disconnected methods and in several codecs (relational, textual content, slides, audio), the drug can also go by completely different names. The corporate could have outsourced medical trials to a UK firm who known as it by its generic identify, for instance.
As generative AI has change into extra extensively adopted, retrieval has emerged as a important functionality in each business. Massive Language Fashions (LLMs) have been educated on numerous knowledge, however not your knowledge, so the power to retrieve related inside context is essential when working with these fashions. We now name this context engineering: “the artwork and science of filling the context window with simply the precise data at every step of an agent’s trajectory,” as described by Lance Martin of LangChain.
Healthcare is uniquely effectively positioned to reap the benefits of this new period of AI due to its longstanding funding in data graphs. Duties like submitting regulatory studies are so much simpler if you’ll be able to retrieve the related inside context, proof, and information. There are firms, like Weave, who’re utilizing data graphs to do precisely this. They use the ability of the graph to retrieve the related data and an LLM to summarize and reply the regulatory questions, enabling automated report era.
Massive monetary organizations like Morgan Stanley, Bloomberg, HSBC, and JPMorgan Chase are additionally utilizing data graphs to unify knowledge silos to construct analysis assistants and superior search capabilities for his or her workers and shoppers.
Discovery and Design
By understanding the best way completely different entities work together, each in principle and within the lab, scientists working in drug discovery can design medication for goal. Fairly than testing completely different compounds blindly, hoping they discover one thing helpful, drug hunters can now work backwards from a desired consequence (resembling reducing blood strain) to establish candidate compounds, whereas accounting for affected person variations (genetics, age, intercourse), interconnected methods, and potential antagonistic results, all whereas complying with regulatory constraints. Lots of the world’s largest pharmaceutical firms, together with AbbVie, AstraZeneca, GSK, Pfizer, Merck, Novartis, Novo Nordisk, Roche, and Sanofi use data graphs for drug discovery. There are additionally firms who focus completely on curating healthcare data graphs for drug discovery like BioRelate and BenevolentAI.
This similar kind of drawback seems in lots of different industries. Banks typically must create monetary merchandise (e.g., structured notes) that obtain a desired consequence (e.g., increased yield with restricted draw back) whereas accounting for interconnected methods, mitigating antagonistic results, and complying with regulatory constraints. Likewise, public coverage practitioners typically must create interventions that obtain a desired consequence (e.g., lowering poverty) whereas accounting for numerous native contexts (e.g., geography, tradition, local weather), interconnected methods, and potential antagonistic results.
Repurposing
Fairly than designing a completely new drug to realize an consequence, it’s generally simpler to repurpose an present drug. When Dr. David Fajgenbaum was recognized with a uncommon immune dysfunction whereas nonetheless in medical faculty, he was informed he had weeks to stay and a priest was known as in to learn him his final rites. Whereas there was not sufficient time to design a brand new drug, there was time to repurpose one thing off the shelf. That’s precisely what he did. He discovered a drug initially meant to forestall organ transplant rejection and used it on himself. His illness has been in remission for 11 years, he completed medical faculty, and began the nonprofit Every Cure to “be certain that sufferers don’t undergo whereas potential remedies conceal in plain sight.” Every Cure uses, amongst different methods, data graphs.
Drug repurposing is about taking an present product, understanding its underlying construction, and safely making use of it in a brand new context. Public coverage follows the identical sample. Practitioners establish interventions that labored in a single context, perceive why they labored, and reapply them elsewhere. Likewise, many firms are sitting on a gold mine of knowledge, collected for some goal lengthy forgotten. However by understanding the that means and context of the information, it may be repackaged and reused for various functions.
Determination help
Healthcare professionals typically depend on choice help methods to help in making selections that embody many interconnected components and incomplete knowledge (Yang, et al., Al Khatib et al., Zhang et al.). Day-after-day, physicians must make selections about find out how to deal with and diagnose their sufferers primarily based on restricted, evolving data. A person affected person’s digital well being information (EHR) might be sparse and have restricted predictive energy (Yang, et al.). Information graphs give the doctor the power to attach EHRs with managed vocabularies (illnesses, signs, medication) and observational knowledge from earlier research and, more and more, patient-generated knowledge from wearables (Al Khatib, et al.).
This helps the doctor make extra knowledgeable diagnoses and remedy suggestions by grounding selections in what is thought from associated instances, populations, and medical proof, whereas nonetheless accounting for the particular context of the affected person. These are particularly useful as a result of the underlying reasoning might be made express and explainable, in distinction to many black field AI options. Corporations like Evidently are constructing choice help instruments, powered by data graphs and AI, to attach affected person knowledge throughout EHRs and present medical insights to assist medical practitioners make higher, extra knowledgeable, and explainable selections in actual time.
Different industries are additionally utilizing data graphs to energy choice help instruments. The MITRE Corporation, the R&D group, publishes MITRE ATT&CK, a data graph of adversary techniques and methods for choice help in cybersecurity operations. OpenCorporates, is an open legal-entity data graph that’s utilized by firms like Embody for decision support relating to due diligence.
Recommender methods
Whereas choice help focuses on diagnostic accuracy, security, and adherence to medical pointers, recommender methods in healthcare deal with personalizing and prioritizing choices for sufferers. These methods typically depend on patient-centric data graphs (generally known as Individualized Knowledge Graphs or Personalized Health Knowledge Graphs) to combine medical historical past, EHR knowledge, reference data, and knowledge from wearables. Fairly than figuring out whether or not a medical choice is appropriate, recommender methods floor and rank related choices resembling remedy plans, life-style interventions, follow-up actions, or care pathways which can be most acceptable for a particular affected person at a given second.
Different industries use recommender methods powered by data graphs and semantic expertise much more than healthcare. Nearly every thing you purchase and every thing you watch is fed to you through suggestion methods. On-line retailers like Amazon use them to recommend stuff you may prefer to buy, streaming companies like Netflix use them to serve up your subsequent binge-watch, and LinkedIn makes use of them to suggest jobs to candidates and candidates to recruiters.
Governance
Healthcare is a extremely regulated business. Drug firms must adjust to rules to make sure they’re monitoring and assessing any potential antagonistic results of their medication; one thing known as pharmacovigilance. In addition they retailer people’ well being knowledge, which is extremely non-public and delicate, and must adjust to rules protecting this just like the California Consumer Privacy Act (CCPA) or the General Data Protection Regulation (GDPR). To do that, they deal with one thing known as data lineage—the systematic monitoring of how knowledge is generated, reworked, and used throughout methods. Information graphs facilitate good knowledge governance by connecting area data to data in regards to the group itself, resembling enterprise processes, org construction, possession, roles, and insurance policies. Organizations can then hint how knowledge strikes by means of methods, establish who’s chargeable for it, perceive which groups are allowed to make use of it and for what functions, and implement governance guidelines (Oliveira, et al.).
Monetary companies corporations, like these in healthcare, depend on knowledge graph approaches to help enterprise knowledge governance. Recent research proposes extending these similar foundations to AI governance by linking knowledge, insurance policies, and selections in a unified semantic layer. In regulated environments, governance isn’t a secondary concern—it’s the mechanism by which belief, accountability, and explainability are enforced at scale.
Conclusion
Information graphs will not be a current invention, nor are they a aspect impact of contemporary AI. They’re a means of organizing data that enables that means to be shared, proof to build up, and reasoning to stay express as understanding evolves. By separating principle (ontologies), cases (managed vocabularies), and proof (observational knowledge), data graphs make it potential to construct methods that do greater than retailer information—they help discovery, clarification, reuse, and belief.
Lengthy earlier than massive language fashions, healthcare invested closely in defining shared ideas, cataloging the pure world, and standardizing how observations are documented and evaluated. Over time, these practices created dense, interconnected data buildings that might be prolonged, queried, and reasoned over as new discoveries emerged. Trendy AI methods are highly effective exactly as a result of they’re now being layered on high of this basis, not as a result of they exchange it.
Within the subsequent a part of this collection, I’ll look extra carefully at how healthcare grew to become the worldwide chief in data graph maturity. That story consists of regulatory strain, pre-competitive collaboration, public funding of shared data, and early dedication to open requirements. Within the closing half, I’ll step again from healthcare solely and discover what different industries (finance, coverage, manufacturing, power, and others) can study from this trajectory as they try and construct AI-ready methods of their very own.
The central declare is straightforward: progress at scale relies upon much less on smarter fashions than on higher construction. Healthcare realized this lesson early. Others at the moment are being compelled to study it shortly.
In regards to the creator: Steve Hedden is the Head of Product Administration at TopQuadrant, the place he leads the technique for EDG, a platform for data graph and metadata administration. His work focuses on bridging enterprise knowledge governance and AI by means of ontologies, taxonomies, and semantic applied sciences. Steve writes and speaks usually about data graphs, and the evolving position of semantics in AI methods.
Bibliography
Al Khatib, Hassan S., et al. “Affected person-centric data graphs: a survey of present strategies, challenges, and purposes.” Frontiers in Synthetic Intelligence 7 (2024): 1388479.
Barabási AL, Gulbahce N, Loscalzo J. Community drugs: a network-based method to human illness. Nat Rev Genet. 2011 Jan;12(1):56-68. doi: 10.1038/nrg2918. PMID: 21164525; PMCID: PMC3140052.
Hager, Thomas. Ten Medication: How Vegetation, Powders, and Drugs Have Formed the Historical past of Drugs. Harry N. Abrams, 2019.
Isaacson, Walter. The Code Breaker: Jennifer Doudna, Gene Modifying, and the Way forward for the Human Race. Simon & Schuster, 2021.
Kirsch, Donald R., and Ogi Ogas. The Drug Hunters: The Inconceivable Quest to Uncover New Medicines. Arcade, 2017.
Oliveira, Miguel AP, et al. “Semantic Modelling of Organizational Information as a Foundation for Enterprise Information Governance 4.0–Software to a Unified Medical Information Mannequin.” arXiv preprint arXiv:2311.02082 (2023).
Rajabi, E.; Kafaie, S. Information Graphs and Explainable AI in Healthcare. Info 2022, 13, 459. https://doi.org/10.3390/info13100459
Yang, Carl, et al. “A evaluate on data graphs for healthcare: Assets, purposes, and guarantees.” arXiv preprint arXiv:2306.04802 (2023).
Yong Zhang, Ming Sheng, Rui Zhou, Ye Wang, Guangjie Han, Han Zhang, Chunxiao Xing, Jing Dong. “HKGB: An Inclusive, Extensible, Clever, Semi-auto-constructed Information Graph Framework for Healthcare with Clinicians’ Experience Included.” Info Processing & Administration (2020). https://doi.org/10.1016/j.ipm.2020.102324.
