You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@puririshi98 The Feature, Motivation, and Pitch:
Modern data warehouses face several critical challenges, including understanding data lineage, identifying data silos, and interpreting complex transformations in ETL processes. Existing systems, including those leveraging LLMs, fall short of addressing these challenges due to a lack of grounding in the structured relationships inherent in data warehouses.
I propose a feature to enable seamless integration of Graph Neural Networks (GNNs) and Large Language Models (LLMs) to model data warehouses as graphs, allowing for improved reasoning and understanding of:
Data Lineage: By representing transformations, dependencies, and data flow as graph structures, a hybrid GNN+LLM system can analyze and explain lineage paths.
Data Silos: Detecting disconnected components in the data warehouse graph to suggest potential integrations.
ETL Transformations: Providing insights into how raw data evolves through complex transformations into actionable insights.
Schema and Query Understanding: Modeling schemas as graphs can improve LLM capabilities to generate and interpret queries based on relationships between tables.
The integration would involve using PyG (PyTorch Geometric) for GNN modeling and extending existing libraries for training hybrid GNN+LLM architectures. This will allow data warehouse systems to gain both structural awareness (from GNNs) and semantic reasoning (from LLMs), reducing hallucinations and improving the interpretability of predictions and queries.
Alternatives
Existing LLM solutions provide semantic reasoning but often hallucinate without structured context.
Pure GNN solutions focus on structural reasoning but lack the language capabilities needed for intuitive query interaction.
Ensemble systems attempt to combine these capabilities but lack a unified framework for data warehouse tasks.
Additional context
Data lineage tools (e.g., Neo4j integrations or metadata graphing systems) could serve as a starting point for the graph representation of data warehouses.
Recent work on combining GNNs with LLMs for question answering and recommendation systems could provide foundational knowledge for this hybrid architecture.
This feature would enable the PyTorch Geometric community to explore real-world applications in data science, bridging the gap between NLP and data engineering.
The text was updated successfully, but these errors were encountered:
🚀 The feature, motivation and pitch
@puririshi98
The Feature, Motivation, and Pitch:
Modern data warehouses face several critical challenges, including understanding data lineage, identifying data silos, and interpreting complex transformations in ETL processes. Existing systems, including those leveraging LLMs, fall short of addressing these challenges due to a lack of grounding in the structured relationships inherent in data warehouses.
I propose a feature to enable seamless integration of Graph Neural Networks (GNNs) and Large Language Models (LLMs) to model data warehouses as graphs, allowing for improved reasoning and understanding of:
The integration would involve using PyG (PyTorch Geometric) for GNN modeling and extending existing libraries for training hybrid GNN+LLM architectures. This will allow data warehouse systems to gain both structural awareness (from GNNs) and semantic reasoning (from LLMs), reducing hallucinations and improving the interpretability of predictions and queries.
Alternatives
Additional context
This feature would enable the PyTorch Geometric community to explore real-world applications in data science, bridging the gap between NLP and data engineering.
The text was updated successfully, but these errors were encountered: