Query Workflow¶
Detailed Query Workflow¶
- Query Reception:
- Queries are broadcasted via pub/sub channels to all relevant nodes.
-
Each node decides if it has the matching vector slice to respond.
-
Vector Search and Model Inference:
- Nodes execute vector similarity searches with Vulkan-accelerated GPUs using ggml libraries.
-
MNN models are loaded dynamically based on the type of query.
-
Response Generation and Aggregation:
- Partial responses from each node are aggregated into a final response by a designated aggregator node or function.
sequenceDiagram
User ->> PubSub: Sends Query
PubSub ->> Node1: Broadcast Query
PubSub ->> Node2: Broadcast Query
Node1 ->> Node1: Search Vectors (Vulkan + ggml)
Node2 ->> Node2: Inference (MNN Model)
Node1 -->> Aggregator: Partial Response 1
Node2 -->> Aggregator: Partial Response 2
Aggregator ->> User: Final Response