Future Roadmap Proposal for subagent-fleet (Generation 4)
Updated: Early 2027 Projections
With Generation 3 successfully released (KV Cache Sharing, Hybrid Routing, Zero-Trust A2A Security, and Namespaced Memory), subagent-fleet has resolved the major bottlenecks of structured, hierarchical agent workflows.
However, looking ahead at bleeding-edge research and power-user discussions across the r/LocalLLaMA and r/LLMDevs communities, the paradigm is shifting again. The rigid "Supervisor -> Worker" model is giving way to decentralized, peer-to-peer swarm intelligence.
Here is the proposed "Generation 4" roadmap for subagent-fleet.
1. Decentralized "Blackboard" Architecture (High Priority)
The Demand: In complex local setups, the "Supervisor" agent becomes a massive bottleneck and a Single Point of Failure (SPOF). Small local models (e.g., 8B parameters) struggle to synthesize outputs from 5+ specialized subagents, leading to "computational traffic jams" [1]. Users want to move to a peer-to-peer "Blackboard" pattern where agents broadcast intermediate results to a shared pool, and other agents react autonomously based on local triggers [2].
The Solution:
- Deprecate rigid point-to-point delegation chains.
- Introduce a subagent-fleet blackboard daemon. Agents write structured outputs (Markdown/JSON) to this shared state. Other agents subscribe to specific triggers (e.g., the reviewer agent automatically wakes up when a coder agent posts a "PR_READY" event to the blackboard).
- Citations:
- [1] Community discussions on Supervisor bottlenecks in Swarm frameworks.
- [2] Emerging multi-agent patterns in frameworks like SwarmSys (Explorers, Workers, Validators).
2. GPU Microscheduling for Parallel Swarms (High Priority)
The Demand: While swarms are theoretically parallel, local hardware (like a single RTX 4090 or Mac Studio) forces these requests to process serially. Running 4 agents simultaneously causes VRAM out-of-memory (OOM) errors or massive latency spikes [3].
The Solution:
- Build a native GPU Microscheduler into the subagent-fleet proxy layer.
- Instead of raw pass-through to LiteLLM, subagent-fleet queues inter-agent requests, monitors live VRAM via ollama ps or vllm metrics, and dispatches agent inferences only when compute cycles are available, dynamically adjusting max_parallel limits on the fly.
- Citations:
- [3] r/LocalLLaMA feature requests for handling parallel inference batching in local swarms.
3. Generative UI / Dynamic Rendering (Medium Priority)
The Demand: Terminal logs (even with rich tracing) are becoming insufficient to monitor 10+ autonomous agents. Users want the UI to be as dynamic as the swarm itself [4]. If a researcher agent finds tabular data, the UI should render a table; if a coder agent writes a patch, it should render a diff view.
The Solution:
- Expand the subagent-fleet dashboard to support Generative UI.
- Intercept agent outputs via Server-Sent Events (SSE) and stream them to the local React dashboard. Use a lightweight parsing model to dynamically render React components (Charts, Tables, Markdown, Diffs) based on the agent's current intent.
- Citations:
- [4] Discussions on Vercel AI SDK and the shift toward Generative UI in multi-agent systems.
4. Markdown-Based State Persistence (Medium Priority)
The Demand: Relying on Redis or SQLite for state management is often seen as overkill or "too opaque" for local developers. If the orchestration layer crashes, the state is locked in a database [5]. Developers strongly prefer "Markdown-based state" that survives ephemeral session restarts and can be version-controlled in Git.
The Solution:
- Introduce state_driver: markdown in fleet.yaml.
- Instead of memory databases, subagent-fleet manages state by continuously updating TASKS.md, DECISIONS.md, and CONTEXT.md in a .fleet_state/ directory. Agents read these files before acting, ensuring a "shared view of reality" that developers can easily read and edit by hand.
- Citations:
- [5] Community consensus on Aider/Claude Code workflows favoring human-readable context files over hidden databases.
5. Dynamic Role Switching (Long-Term)
The Demand: Defining 30+ static agents in YAML is cumbersome. Users want agents that can monitor the environment and change their own system prompts [6]. For example, a "Coding Agent" that notices a backlog of untested code dynamically switches to a "Testing Agent" persona.
The Solution:
- Allow agents to mutate their own configurations via a specialized MCP tool (fleet-config-editor).
- This allows a swarm to self-balance—if research is done, 4 researcher agents can rewrite their own prompts to become implementer agents to help clear the coding backlog.
- Citations:
- [6] Research into Active Inference and self-balancing agent clusters.
✅ Completed Milestones (Generations 1, 2 & 3)
The following features have been fully implemented (up to v0.1.2):
- Gen 1: Discovery, routing, LiteLLM/Claude Code generation, unified observability.
- Gen 2: Aider support, Wake-on-LAN power management, Sandboxed Workspaces, HITL Middleware, dynamic routing hooks, and state fallbacks.
- Gen 3: Cross-Agent KV Cache Sharing (vLLM prefix caching), Hybrid Cloud Tiered Routing, Zero-Trust Agent-to-Agent Security, and Namespaced Agent Memory (SQLite).