Understanding Alzheimer’s: Building Knowledge Graphs from Unstructured Data with Gemini

16 min readFeb 1, 2025

Alzheimer’s disease (AD) is the most common cause of dementia, accounting for 60 to 80% of cases. It is a progressive neurodegenerative disorder that primarily affects memory, thinking, and behavior.

Given the recent advances in technology, protein folding, medicine and pharmacology, it is reasonable to suppose we will see some disease cures in our lifetime. Besides, we are supposed to live longer than our close ancestors. Thus, it is important to be healthy as we age. As you will see below, there are factors we cannot change to prevent Alzheimer, like genetics and DNA aging, but we can control many of the causes, like cardiovascular disease, smoking, alcohol abuse and obesity.

Here, I will use 4 PDFs (unstructured data from technical articles about Alzheimer) to build a Knowledge Graph with the help of Google’s Gemini and Neo4j, to better understand the disease, its causes, effects, possible treatments (if they exist) at the gene level and protein level.

You will see ahead that, by querying the graph, I found out that one of the possible causes of Alzheimer’s disease is the mutation of the following genes:

APP gene on chromosome 21
Presenilin 1 (PSEN1) gene on chromosome 14
Presenilin 2 (PSEN2) gene on chromosome 1
ε4 allele (gene variation) of Apolipoprotein E (APOE)

These mutations make it easier to accumulate the amyloid-beta (Aβ) protein in the brain. This is the main reason why I added a CRISPR document in the PDFs folder.

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a revolutionary gene-editing technology that allows scientists to make precise changes to DNA. It works like molecular scissors that can cut specific sections of genetic code, allowing researchers to remove, add, or alter genes in organisms.

By the end of this article, you will have learned how to build a Knowledge Graph and a GraphRAG from unstructured documents, and also understand better the details of the etiopathology of Alzheimer’s disease.

First, let’s understand key epidemiological aspects of AD, to understand the distribution, determinants, and control of the disease and health-related conditions in the population. This will help us to identify risk factors for AD, track the disease pattern, develop strategies for prevention and control, by using proper queries in the Knowledge Graph.

Prevalence

Global Prevalence: Approximately 55 million people worldwide live with dementia, and Alzheimer is the leading cause.

Age-Related Risk: Incidence increases exponentially with age. Around 5 to 10% of people over 65 years are affected and 30 to 50% of those over 85 years have Alzheimer’s disease.

Gender Differences: Women are more likely to develop Alzheimer than men, partly due to longer life expectancy.

Regional Variations: Higher prevalence is reported in high-income countries, but increasing trends are observed in low and middle-income countries.

Incidence

The incidence rate doubles every 5 years after the age of 65. It is approximately 10 per 1,000 person for people aged 65–69 (1%), and around 80–90 per 1,000 person for those aged 85 and older (9%).

Risk Factors

Non-modifiable: age (strongest risk factor), genetics (e.g., APOE-ε4 allele), family history (first-degree relatives have higher risk) and sex (higher prevalence in women).

Modifiable: cardiovascular disease (hypertension, diabetes, obesity), smoking (highly oxidative), alcohol use, physical inactivity, social isolation and depression, traumatic brain injury and poor diet (high saturated fats, low antioxidants).

Mortality

Alzheimer’s disease is among the top 10 leading causes of death worldwide. It is the 6th leading cause of death in the U.S..

Economic Impact

The global cost of dementia care has surpassed $1 trillion per year and is expected to rise as populations continue to age. This economic burden also impacts quality of life, as longer lifespans often mean greater reliance on Social Security (what will become a huge government problem in the future) while facing the growing expenses of elderly care.

Now, let’s code the solution:

First, you will need to create a Neo4j instance at Aura. Please refer to my other article to do so. Also, if you already have tabular data, you can read this other article of mine. But here, we will use unstructured data, from technical articles, PDFs from NHI.gov.

Create your Aura instance, get the username and password, and the instance address (URI).

First, create a Python environment. I suggest not using an Anaconda environment, especially if you are using VS Code, as you may have conflict of pre-installed libraries, what can cause some trouble. Use a clean environment and activate it.

python3 -m venv neo4j-env
. neo4j-env/bin/activate

Now let’s install the necessary libraries:

pip install fsspec langchain-text-splitters tiktoken numpy torch vertexai
pip install "neo4j-graphrag[google]"
pip install google-cloud google-cloud-aiplatform

Import the libraries and add the Neo4j credentials in the notebook:

import json
import neo4j
import asyncio
import vertexai
from neo4j_graphrag.indexes import create_vector_index
from neo4j_graphrag.llm import VertexAILLM
from vertexai.generative_models import GenerationConfig
from vertexai.language_models import TextEmbeddingModel
from neo4j_graphrag.embeddings.base import Embedder
from typing import Any
from neo4j_graphrag.llm import OpenAILLM as LLM
from neo4j_graphrag.generation import RagTemplate
from neo4j_graphrag.generation.graphrag import GraphRAG
from neo4j_graphrag.retrievers import VectorRetriever
from neo4j_graphrag.retrievers import VectorCypherRetriever
from vertexai.language_models import TextEmbeddingModel, TextEmbeddingInput
from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import FixedSizeSplitter
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline

NEO4J_URI = "neo4j+s://642bhudyg.databases.neo4j.io"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "g8ftf6a8vgw87vg8gwv8g7v8ag8v"

vertexai.init(project="your-project", location="us-central1")

Then, we will define the Gemini-1.5-Flash LLMs that will create the JSON structure to be used to build a Knowledge Graph, and also to create the embeddings of each graph node. Here, I created a customized VertexAIEmbeddings class, given that I was getting an error from neo4j_graphrag.embeddings.vertexai library. Add your Google VertexAI credentials.

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="service_account_key.json"

generation_config = GenerationConfig(temperature=0.8)

llm = VertexAILLM(
    model_name="gemini-1.5-flash", generation_config=generation_config
)


class VertexAIEmbeddings(Embedder):
    def __init__(self, model: str = "text-embedding-004") -> None:
        self.vertexai_model = TextEmbeddingModel.from_pretrained(model)

    def embed_query(
        self,
        text: str,
        task_type: str = "RETRIEVAL_QUERY",
        **kwargs: Any
    ) -> list[float]:
        inputs = [TextEmbeddingInput(text, task_type)]
        embeddings = self.vertexai_model.get_embeddings(inputs, **kwargs)
        return embeddings[0].values

embedder = VertexAIEmbeddings()

We define the Neo4j driver:

driver = neo4j.GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

Then, we will create the node labels: basic labels, academic labels, and medical labels, defining also all the relationship types between these nodes.

basic_node_labels = ["Object", "Entity", "Group", "Person", "Organization", "Place"]

academic_node_labels = ["ArticleOrPaper", "PublicationOrJournal"]

medical_node_labels = ["Anatomy", "BiologicalProcess", "Cell", "CellularComponent",
                       "CellType", "Condition", "Disease", "Drug",
                       "EffectOrPhenotype", "Exposure", "GeneOrProtein", "Molecule",
                       "MolecularFunction", "Pathway"]

node_labels = basic_node_labels + academic_node_labels + medical_node_labels

# define relationship types
rel_types = ["ACTIVATES", "AFFECTS", "ASSESSES", "ASSOCIATED_WITH", "AUTHORED",
    "BIOMARKER_FOR", "CAUSES", "CITES", "CONTRIBUTES_TO", "DESCRIBES", "EXPRESSES",
    "HAS_REACTION", "HAS_SYMPTOM", "INCLUDES", "INTERACTS_WITH", "PRESCRIBED",
    "PRODUCES", "RECEIVED", "RESULTS_IN", "TREATS", "USED_FOR"]

Now we will create a prompt to create VALID JSONs (! not an easy task for any LLM) so that they can be used to build our Knowledge Graph. This is the most important and vital function in the notebook, as the quality of responses will depend upon the quality of the entities and relationships found, as well as the existence of a Valid JSON.

It’s a quite big prompt, but it works like a charm:

prompt_template = '''
You are a medical researcher whose task is to extract information from medical papers
and structuring it in a property graph to inform further medical and research Q&A.

You will be given medical texts about Alzheimer disease and you will:
- extract the entities (nodes) and specify their type
- extract the relationships between these nodes (the relationship direction goes from the start node to the end node)

Assign a unique ID (string) to each node, and reuse it to define relationships.
Do respect the source and target node types for relationship and
the relationship direction.

Use the following node labels and relationships:

basic_node_labels = ["Object", "Entity", "Group", "Person", "Organization", "Place"]

academic_node_labels = ["ArticleOrPaper", "PublicationOrJournal"]

medical_node_labels = ["Anatomy", "BiologicalProcess", "Cell", "CellularComponent",
                       "CellType", "Condition", "Disease", "Drug",
                       "EffectOrPhenotype", "Exposure", "GeneOrProtein", "Molecule",
                       "MolecularFunction", "Pathway"]


relationship types = ["ACTIVATES", "AFFECTS", "ASSESSES", "ASSOCIATED_WITH", "AUTHORED",
    "BIOMARKER_FOR", "CAUSES", "CITES", "CONTRIBUTES_TO", "DESCRIBES", "EXPRESSES",
    "HAS_REACTION", "HAS_SYMPTOM", "INCLUDES", "INTERACTS_WITH", "PRESCRIBED",
    "PRODUCES", "RECEIVED", "RESULTS_IN", "TREATS", "USED_FOR"]


- Use only the information from the Input text below.  Do not add any additional information you may have.
- If the input text is empty, return empty Json.
- Make sure to create as many nodes and relationships as needed to offer rich medical context for further research.
- An AI knowledge assistant must be able to read this graph and immediately understand the context to inform detailed research questions.
- Multiple documents will be ingested from different sources and we are using this property graph to connect information,
so make sure entity types are fairly general.

Do not return any additional information other than the VALID JSON in it.

IMPORTANT FORMAT RULES:
1. Return ONLY valid JSON - no other text before or after
2. All strings must use double quotes, not single quotes
3. The response must contain both "nodes" and "relationships" arrays, even if empty
4. IDs must be strings, not numbers (e.g., "0" not 0)
5. Every node must have id, label, and properties with a name
6. Every relationship must have type, start_node_id, end_node_id, and properties


**Strictly return valid JSON output following this format:**

{{
  "nodes": [
    {{
      "id": "0",
      "label": "EntityType",
      "properties": {{
        "name": "EntityName"
      }}
    }},
    {{
      "id": "1",
      "label": "AnotherEntityType",
      "properties": {{
        "name": "AnotherEntityName"
      }}
    }}
  ],
  "relationships": [
    {{
      "type": "TYPE_OF_RELATIONSHIP",
      "start_node_id": "0",
      "end_node_id": "1",
      "properties": {{
        "details": "Description of the relationship"
      }}
    }}
  ]
}}

Use only fhe following nodes and relationships (if provided):
{schema}

Assign a unique ID (string) to each node, and reuse it to define relationships.
Do respect the source and target node types for relationship and
the relationship direction.

Do not return any additional information other than the JSON in it.

Examples:
{examples}


Now, do your task. This is the Input text:

{text}

'''

We now build the KG pipeline, that will use the core LLM, the embedder LLM, our prompt, a text splitter for text chunks, our node labels, driver and node relationships:

kg_builder_pdf = SimpleKGPipeline(
    llm=llm,
    driver=driver,
    text_splitter=FixedSizeSplitter(chunk_size=1000, chunk_overlap=100),
    embedder=embedder,
    entities=node_labels,
    relations=rel_types,
    prompt_template=prompt_template,
    from_pdf=True
)

Then we run this pipeline to build our Knowledge Graph from 5 PDF documents, and store data in our Neo4j database: this will take some time.

pdf_file_paths = ['pdfs/Alzheimers Disease _Etiopathology_ NHI.pdf',
             'pdfs/Alzheimers Disease _Facts_ NHI.pdf',
             'pdfs/Alzheimers Disease _Pharmacology_ NHI.pdf',
             'pdfs/Antioxidant Therapy in Alzheimer.pdf',
             'pdfs/CRISPR.pdf']

for path in pdf_file_paths:
    print(f"Processing : {path}")
    pdf_result = await kg_builder_pdf.run_async(file_path=path)
    print(f"Result: {pdf_result}")
    await asyncio.sleep(2)

This will create the nodes, relationships and node embeddings:

Nodes and relationships creation in Neo4j

It’s done. This is the database we created:

Now we just need to create a Knowledge Graph Retrieval based on the embeddings: note that the dimensions here must be the same size of the embedder dimensions. If you make a mistake here, you will have to reset your Neo4j instance and start all over again. We also create a Vector Index to be queried.

create_vector_index(driver, name="text_embeddings", label="Chunk",
                    embedding_property="embedding", dimensions=768, similarity_fn="cosine")

vector_retriever = VectorRetriever(
    driver,
    index_name="text_embeddings",
    embedder=embedder,
    return_properties=["text"],
)

After that, we will use a Cypher (query language similar to SQL) to define the query scope of our Neo4j’s Knowledge Graph, the logic for traversing the graph.

Keeping it simple, we’ll traverse up to 2-3 hops out from each Chunk, capture the relationships encountered, and include them in the response alongside our text chunks.

vc_retriever = VectorCypherRetriever(
    driver,
    index_name="text_embeddings",
    embedder=embedder,
    retrieval_query="""
//1) Go out 2-3 hops in the entity graph and get relationships
WITH node AS chunk
MATCH (chunk)<-[:FROM_CHUNK]-()-[relList:!FROM_CHUNK]-{1,2}()
UNWIND relList AS rel

//2) collect relationships and text chunks
WITH collect(DISTINCT chunk) AS chunks,
  collect(DISTINCT rel) AS rels

//3) format and return context
RETURN '=== text ===\n' + apoc.text.join([c in chunks | c.text], '\n---\n') + '\n\n=== kg_rels ===\n' +
  apoc.text.join([r in rels | startNode(r).name + ' - ' + type(r) + '(' + coalesce(r.details, '') + ')' +  ' -> ' + endNode(r).name ], '\n---\n') AS info
"""
)

The great advantage here is that you will run a Cypher once. As the Cypher requires specialized knowledge, this will make it easier to query the graph using Natural Language from now on.

It is possible to visualize all nodes and relationships included in this Cypher query by running in Python:

vc_res = vc_retriever.get_search_results(query_text = "What are the probable causes and treatments to Alzheimer?", top_k=3)

kg_rel_pos = vc_res.records[0]['info'].find('\n\n=== kg_rels ===\n')
print("# Text Chunk Context:")
print(vc_res.records[0]['info'][:kg_rel_pos])
print("# KG Context From Relationships:")
print(vc_res.records[0]['info'][kg_rel_pos:])

In a simple Cypher, we can see part of the Knowledge Graph:

MATCH (chunk:Chunk)
MATCH path = (chunk)<-[:FROM_CHUNK]-()-[r1]->(n:Anatomy)
RETURN path
LIMIT 5

By increasing the complexity of the query we have more relationships:

MATCH (chunk:Chunk)
MATCH path = (chunk)<-[:FROM_CHUNK]-()-[r1]->(n)
WHERE any(label IN ['Anatomy', 'Exposure', 'Technology', 'GeneOrProtein',
                   'Molecule', 'Disease', 'CellularComponent', 'BiologicalProcess'] 
          WHERE label IN labels(n))
RETURN path
LIMIT 50

This is what we get in a closer look:

Now, our database is ready and populated. Note that these queries were made using Cypher in a Neo4j Aura instance.

Unfortunately, due to the free tier, we cannot run a full Cypher query in the database to consider the full graph in our queries, because the instance doesn’t have the necessary memory to do so. Let’s try another query with some selected entities:

MATCH (n1)-[r]-(n2)
WHERE any(label IN ['Anatomy', 'Exposure', 'Technology', 'GeneOrProtein',
                   'Molecule', 'Disease', 'CellularComponent', 'BiologicalProcess'] 
          WHERE label IN labels(n1))
AND any(label IN ['Anatomy', 'Exposure', 'Technology', 'GeneOrProtein',
                  'Molecule', 'Disease', 'CellularComponent', 'BiologicalProcess'] 
         WHERE label IN labels(n2))
RETURN *
LIMIT 50

Now, as we know each node has its own embeddings, let’s see them (at the right of the picture). Click any node:

MATCH (chunk:Chunk)
WITH chunk, chunk.embedding as emb
RETURN chunk
LIMIT 3

We populated the graph properly with nodes and relationships, and the nodes contain chunk embeddings. At this point, you can plug a Graph Neural Network, given that you will have the whole structure of the graph to work with, including node embeddings.

However, here we will build a GraphRAG pipeline, with the LLM and a Retriever. At this point, we can also use a customized prompt template. We will compare a pure Vector Search with the Vector Search plus the Cypher response. You will see that the Vector Search plus Cypher response is much richer and detailed. Here, given the prompt, the LLM will only answer what is inside the Cypher query scope. Let’s see some examples:

rag_template = RagTemplate(template='''Answer the Question using the following 
Context. Only respond with information mentioned in the Context. 
Do not inject any speculative information not mentioned.

# Question:
{query_text}

# Context:
{context}

# Answer:
''', expected_inputs=['query_text', 'context'])

v_rag  = GraphRAG(llm=llm, retriever=vector_retriever, prompt_template=rag_template)
vc_rag = GraphRAG(llm=llm, retriever=vc_retriever, prompt_template=rag_template)

q = "What are the probable causes and treatments to Alzheimer? provide in list format."
print(f"Vector Response: \n{v_rag.search(q, retriever_config={'top_k':5}).answer}")
print("\n===========================\n")
print(f"Vector + Cypher Response: \n{vc_rag.search(q, retriever_config={'top_k':5}).answer}")

Vector response:

Vector Response: 
- **Probable Causes of Alzheimer’s:**
  - Amyloid-beta (Aβ) toxicity
  - Tauopathy
  - Inflammation
  - Oxidative stress
  - Combination of genetic, environmental, and lifestyle factors

- **Treatments for Alzheimer’s:**
  - Cholinesterase inhibitors
  - Partial N-methyl D-aspartate (NMDA) antagonists
  - Antioxidants to reduce oxidative stress

Vector + Cypher Response:

Vector + Cypher Response: 
**Probable Causes of Alzheimer's Disease:**
1. Amyloid-beta (Aβ) toxicity
2. Tauopathy
3. Inflammation
4. Oxidative stress
5. Genetic factors
6. Environmental factors
7. Lifestyle factors

**Treatments for Alzheimer's Disease:**
1. Cholinesterase inhibitors
2. Partial N-methyl D-aspartate (NMDA) antagonists
3. Antioxidant therapy
4. Anti-inflammatory drugs
5. Estrogen therapy
6. Vitamin E
7. Red wine (in moderate amounts)
8. Gene editing strategies
9. Various medications including Memantine, Donepezil, Galantamine, Rivastigmine, Aducanumab, Lecanemab, Donanemab, and others.

q = "Can you summarize Alzheimer? including common symptoms, effects, and drug treatments? Provide in detailed list format."

vc_rag_result = vc_rag.search(q, retriever_config={'top_k': 5}, return_context=True)

print(f"Vector + Cypher Response: \n{vc_rag_result.answer}")

**Alzheimer's Disease Summary:**

**Common Symptoms:**
1. Memory Impairment
2. Cognitive Decline
3. Behavioral Changes
4. Sleeplessness
5. Depression
6. Anxiety
7. Agitation
8. Neuropsychiatric Symptoms
9. Inability to carry out multistep tasks
10. Problems recognizing family and friends
11. Confusion
12. Impaired Judgment
13. Visuospatial Functions Impairment
14. Paranoia
15. Delusions
16. Hallucinations

**Effects:**
1. Destruction of memory and thinking skills
2. Inability to carry out simplest tasks
3. Complete dependence on others for care
4. Shrinkage and atrophy of the brain
5. Loss of cognitive functioning
6. Behavioral and psychological symptoms
7. Emotional, physical, and financial costs for caregivers

**Drug Treatments:**
1. Cholinesterase Inhibitors (e.g., Donepezil, Rivastigmine, Galantamine)
2. N-methyl D-aspartate (NMDA) Antagonists (e.g., Memantine)
3. Monoclonal Antibodies (e.g., Aducanumab, Lecanemab, Donanemab)
4. Anti-inflammatory drugs
5. Antioxidant Therapy (e.g., Vitamin E)
6. Estrogen Replacement Therapy
7. Sodium Oligomannate (GV-971)
8. Sembragiline
9. Resveratrol
10. Anti-neuroinflammation drugs
11. Glutaminyl Cyclase Inhibitors (e.g., PQ912)
12. BACE Inhibitors (e.g., Verubecestat, Lanabecestat, Atabecestat)
13. Tau-aggregation Inhibitors
14. Immunotherapy

**Note:** There is no cure for Alzheimer's disease, and treatments focus on managing symptoms and slowing progression.

By running this script below, you will see all nodes and relationships regarding Alzheimer’s disease treatment (there are a lot of them):

vc_ls = vc_rag_result.retriever_result.items[0].content.split('\\n---\\n')
for i in vc_ls:
    if "treat" in i: print(i)

Other queries for our GraphRAG tool:

q = "What are the most promising treatments for Alzheimer? Which drug treatments? Give the names of researchers. Provide in detailed list format."
print(f"Vector + Cypher Response: \n{vc_rag.search(q, retriever_config={'top_k': 5}).answer}")

Vector + Cypher Response: 
1. **Promising Treatments for Alzheimer's Disease:**
   - **Anti-Aβ Vaccine:** Showed promising results with no toxicity and clinical improvements.
   - **BACE Inhibitor:** Demonstrated promising results with no toxicity and clinical improvements.
   - **Anti-Neuroinflammation Drugs:** Indicated promising results with no toxicity and clinical improvements.

2. **Drug Treatments:**
   - **Cholinesterase Inhibitors:** Includes drugs like rivastigmine, galantamine, and donepezil.
   - **Partial N-methyl D-aspartate (NMDA) Antagonists:** Includes memantine.
   - **Aducanumab:** Approved by the FDA in 2021, it is a monoclonal antibody targeting amyloid-β.
   - **Lecanemab:** Received accelerated approval from the FDA.
   - **Donanemab:** Expected to receive FDA approval.

3. **Researchers:**
   - Carlos Elias Conti Filho
   - Lairane Bridi Loss
   - Clairton Marcolongo-Pereira
   - Joamyr Victor Rossoni Junior
   - Rafael Mazioli Barcelos
   - Orlando Chiarelli-Neto
   - Bruno Spalenza da Silva
   - Roberta Passamani Ambrosio
   - Fernanda Cristina de Abreu Quintela Castro
   - Sarah Fernandes Teixeira
   - Nathana Jamille Mezzomo

These researchers are associated with the Faculty of Medicine, University Center of Espirito Santo, Colatina, Brazil, and have contributed to the study of advances in Alzheimer's disease pharmacological treatment.

q = "Which molecular function should be fixed to reverse the symptioms of Alzheimer? How the most promising drug treatment work on it? Provide in detailed list format."
print(f"Vector + Cypher Response: \n{vc_rag.search(q, retriever_config={'top_k': 5}).answer}")

Vector + Cypher Response: 
1. **Molecular Function to be Fixed:**
   - The molecular functions that should be targeted to reverse the symptoms of Alzheimer's disease include:
     - Amyloid-beta (Aβ) aggregation
     - Tau protein aggregation
     - BACE-1 activity
     - Neuroinflammation
     - Excitotoxicity
     - Cholinergic impairment

2. **Most Promising Drug Treatments and Their Mechanisms:**
   - **Anti-Aβ Vaccine:**
     - Targets amyloid-beta aggregation.
     - Shows promising results in clinical improvements without toxicity.
   
   - **BACE Inhibitor:**
     - Targets BACE-1 activity to reduce amyloid-beta production.
     - Demonstrates clinical improvements without toxicity.
   
   - **Anti-Neuroinflammation Drugs:**
     - Target neuroinflammation pathways.
     - Show promising results in clinical improvements without toxicity.
   
   - **Cholinesterase Inhibitors:**
     - Increase levels of acetylcholine to address cholinergic impairment.
     - Approved for symptomatic treatment of Alzheimer's disease.
   
   - **Partial NMDA Antagonists:**
     - Address excitotoxicity by modulating NMDA receptor activity.
     - Approved for symptomatic treatment of Alzheimer's disease.

q = "What is the etiopathology of Alzheimer? How does the disease appear ? Which proteins are affected? How the disease progress? Provide in detailed list format."
print(f"Vector + Cypher Response: \n{vc_rag.search(q, retriever_config={'top_k': 5}).answer}")

Vector + Cypher Response: 
- **Etiopathology of Alzheimer's Disease:**
  - Characterized by the accumulation of abnormal neuritic plaques and neurofibrillary tangles in the brain.
  - Loss of neurons, particularly cholinergic neurons in the basal forebrain and the neocortex.
  - Two prominent pathophysiological hypotheses:
    - Cholinergic Hypothesis: Reduced levels of acetylcholine (ACh) due to neuronal loss in the Nucleus Basalis of Meynert.
    - Other theories include amyloid-beta (Aβ) toxicity, tauopathy, inflammation, and oxidative stress.

- **Appearance of the Disease:**
  - Distinguished impairment of thought, memory, and language abilities.

- **Proteins Affected:**
  - Amyloid-beta (Aβ) and tau proteins are central to the disease's pathogenesis.
  - Hyperphosphorylation of tau protein, making it resistant to proteolytic degradation, plays a key role in neurofibrillary degeneration.

- **Progression of the Disease:**
  - On average, patients live about 8 years after initial diagnosis, but the disease can last as long as 20 years.
  - The disease progresses with cognitive decline, leading to impaired quality of life, functional decline, and eventually death.
  - Pathological changes include the formation of neuritic plaques and neurofibrillary tangles, leading to neuronal loss and brain atrophy.

q = "Given that probably we can do CRISPR on the Amyloid Precursor Protein (APP) Gene, how does this overcome the weaknesses of the Cholinergic and Amyloid Hypotheses? Provide in detailed list format."
print(f"Vector + Cypher Response: \n{vc_rag.search(q, retriever_config={'top_k': 5}).answer}")

**Vector + Cypher Response:**
- CRISPR/Cas9 technology allows for the knockout of APP alleles, which has been shown to decrease the expression of Aβ protein. This directly addresses the Amyloid Hypothesis by reducing the levels of amyloid beta, which is believed to cause neuronal toxicity and contribute to Alzheimer's disease (AD).
- The insertion of protective mutations, such as the A673T mutation, using CRISPR/Cas9 can reduce β-secretase cleavage by 40%, potentially slowing down or hindering the progression of AD. This provides a targeted approach to mitigate the effects of amyloid beta accumulation, a central aspect of the Amyloid Hypothesis.
- By targeting the APP gene, CRISPR/Cas9 can directly influence the production of amyloid beta, offering a more precise intervention compared to the Cholinergic Hypothesis, which focuses on the downstream effects of amyloid beta on cholinergic neurons.
- The ability to delete specific regions of the APP gene, such as the 3′-UTR, has been shown to drastically reduce Aβ accumulation, providing a potential therapeutic strategy that directly addresses the root cause of amyloid-related pathology in AD.

And this last one, in Cypher, where I want to know how doing CRISPR on the Amyloid Precursor Protein Gene overcomes the weaknesses of the Cholinergic Hypothesis:

MATCH (chunk:Chunk)
WHERE chunk.text CONTAINS 'CRISPR' 
   OR chunk.text CONTAINS 'Amyloid' 
   OR chunk.text CONTAINS 'APP' 
   OR chunk.text CONTAINS 'Cholinergic'
MATCH path = (chunk)<-[:FROM_CHUNK]-(entity)
RETURN path
LIMIT 50

Without the memory limitations of the Neo4j free tier, one can build an extremely complete and complex GrapRAG with hundreds of documents, and by relaxing the prompt template for GraphRAG, it is also possible to find new avenues of research and gain new ideas and relationships between concepts, which can make Alzheimer’s disease research more fruitful.

👏👏👏 if you liked ☺️

Acknowledgements

✨ Google ML Developer Programs and Google Cloud Champion Innovators Program supported this work by providing Google Cloud Credits ✨

🔗 https://developers.google.com/machine-learning

🔗 https://cloud.google.com/innovators/champions?hl=en

Understanding Alzheimer’s: Building Knowledge Graphs from Unstructured Data with Gemini

Written by Rubens Zimbres

No responses yet