skyflo

Architecture

Skyflo is a self-hosted AI operations agent for Kubernetes and CI/CD automation with native Jenkins support. The Engine runs a LangGraph workflow with an approval gate for every mutating operation, executes typed tools via the MCP server, and streams real-time SSE updates to the Command Center.

Components

Engine (/engine)

Core backend that turns natural language into safe, auditable infrastructure operations using a LangGraph workflow.

MCP Server (/mcp)

Model Context Protocol (MCP) server exposing schema-validated infrastructure tools for the Engine.

Command Center (/ui)

Next.js command interface for real-time operations visibility and control.

Kubernetes Controller (/kubernetes-controller)

Go-based Kubernetes operator managing Skyflo component lifecycle via CRD.

Execution Workflow

Skyflo uses a graph-based workflow powered by LangGraph. The workflow enforces a deterministic loop:

  1. Plan
    • Analyzes the user’s natural language to determine intent
    • Performs lightweight discovery when needed to ground the plan
    • Produces structured tool calls for the next phase
  2. Approve and Execute
    • Executes MCP tools (kubectl, argo, helm, jenkins) with validated parameters
    • Requires explicit approval for every mutating tool call
    • Resolves dynamic parameters from previous steps and supports recursive operations
    • Streams progress and results back to the model and UI
  3. Verify
    • Evaluates outcomes against the original intent and summarizes results
    • Decides whether to auto-continue, request approval, or stop
    • Routes context back to the model phase for refinement if issues are detected
  4. Persist
    • Stores tool calls, parameters, and results
    • Supports audit and replay

Technical Stack

Layer Technologies
Backend Python 3.11+, FastAPI, LangGraph, LiteLLM, Tortoise ORM, Aerich, Redis, PostgreSQL
MCP FastMCP, httpx, Pydantic, Streamable HTTP transport
Frontend React, Next.js 14, TypeScript, Tailwind CSS, Radix UI + shadcn/ui, Zustand, framer-motion
AI/ML LangGraph, multi-provider LLM integration via LiteLLM
Infrastructure Kubernetes, Argo Rollouts, Helm
Communication Server-Sent Events (SSE), Redis pub/sub
Security JWT + refresh token rotation via fastapi-users, role-based access, HttpOnly cookies