Architecture
TOC
Introduction toCore Architectural ComponentsGlobal ClusterWorkload ClusterExternal IntegrationsScalability and High AvailabilityFunctional PerspectiveTechnical PerspectiveKey Component High Availability MechanismsIntroduction to
The () provides an enterprise-grade Kubernetes-based platform that enables organizations to build, deploy, and manage applications consistently across hybrid and multi-cloud environments. integrates core Kubernetes capabilities with enhanced management, observability, and security services, offering a unified control plane and flexible workload clusters.
The architecture follows a hub-and-spoke model, consisting of a global cluster and multiple workload clusters. This design provides centralized governance while allowing independent workload execution and scalability.
For canonical definitions of platform-wide terms such as global cluster, workload cluster, and cluster plugin, see Glossary.
Core Architectural Components
Global Cluster
The global cluster serves as the centralized management and control hub of . It provides platform-wide services such as authentication, policy management, cluster lifecycle operations, and observability. It's also a central hub for multi-cluster management and provides cross-cluster functionality.
Key components include:
- Gateway Acts as the main entry point to the platform. It manages API requests from the UI, CLI (kubectl), and automation tools, routing them to appropriate backend services.
- Authentication and Authorization (Auth) Integrates with external Identity Providers (IdPs) to provide Single Sign-On (SSO) and RBAC-based access control.
- Web Console Provides a web-based interface for . It interfaces with platform APIs through the gateway.
- Cluster Management Handles the registration, provisioning, and lifecycle management of workload clusters.
- Services
- Operator Lifecycle Manager (OLM) and Cluster Plugins Manages the installation, updates, and lifecycle of operators and cluster extensions.
- Internal Image Registry Offers an out-of-box integrated container image repository with role-based access.
- Observability
Provides centralized logging, metrics, and tracing for both the
globaland workload clusters. - Cluster Proxy
Enables secure communication between the
globalcluster and workload clusters.
Workload Cluster
Workload clusters are Kubernetes-based environments managed by the global cluster. Each workload cluster runs isolated application workloads and inherits governance and configuration from the central control plane.
External Integrations
- Identity Provider (IdP) Supports federated authentication via standard protocols (OIDC, SAML) for unified user management.
- API and CLI Access
Users can interact with through RESTful APIs, the web console, or command-line tools like
kubectlandac. - Load Balancer (VIP/DNS/SLB)
Provides high availability and traffic distribution to the Gateway and ingress endpoints of the
globaland workload Clusters.
Scalability and High Availability
is designed for horizontal scalability and high availability:
- Each component can be deployed redundantly to eliminate single points of failure.
- The
globalcluster supports managing dozens to hundreds of workload clusters. - Workload clusters can scale independently according to workload demand.
- The use of VIP/DNS/Ingress ensures seamless routing and failover.
Functional Perspective
()'s complete functionality consists of Core and extensions based on two technical stacks: Operator and Cluster Plugin.
-
Core
The minimal deliverable unit of , providing core capabilities such as cluster management, container orchestration, projects, and user administration.
- Meets the highest security standards
- Delivers maximum stability
- Offers the longest support lifecycle
-
Extensions
Extensions in both the Operator and Cluster Plugin stacks can be classified into:
- Aligned – Life cycle strategy consisting of multiple maintenance streams, with alignment to .
- Agnostic – Life cycle strategy consisting of multiple maintenance streams, released independently from .
For more details about extensions, see Extend.
Technical Perspective
Platform Component Runtime
All platform components run as containers within a Kubernetes management cluster (the global cluster).
High Availability Architecture
- The
globalcluster typically consists of at least three control plane nodes and multiple worker nodes - High availability of etcd is central to cluster HA; see Key Component High Availability Mechanisms for details
- Load balancing can be provided by an external load balancer or a self-built VIP inside the cluster
Request Routing
- Client requests first pass through the load balancer or self-built VIP
- Requests are forwarded to ALB (the platform's default Kubernetes Ingress Gateway) running on designated ingress nodes (or control-plane nodes if configured)
- ALB routes traffic to the target component pods according to configured rules
Replica Strategy
- Core components run with at least two replicas
- Key components (such as registry, MinIO, ALB) run with three replicas
Fault Tolerance & Self-healing
- Achieved through cooperation between kubelet, kube-controller-manager, kube-scheduler, kube-proxy, ALB, and other components
- Includes health checks, failover, and traffic redirection
Data Storage & Recovery
- Control-plane configuration and platform state are stored in etcd as Kubernetes resources
- In catastrophic failures, recovery can be performed from etcd snapshots
Primary / Standby Disaster Recovery
- Two separate
globalclusters: Primary Cluster and Standby Cluster - The disaster recovery mechanism is based on real-time synchronization of etcd data from the Primary Cluster to the Standby Cluster.
- If the Primary Cluster becomes unavailable due to a failure, services can quickly switch to the Standby Cluster.
Key Component High Availability Mechanisms
etcd
- Deployed on three (or five) control plane nodes
- Uses the RAFT protocol for leader election and data replication
- Three-node deployments tolerate up to one node failure; five-node deployments tolerate up to two
- Supports local and remote S3 snapshot backups
Monitoring Components
- Prometheus: Multiple instances, deduplication with Thanos Query, and cross-region redundancy
- VictoriaMetrics: Cluster mode with distributed VMStorage, VMInsert, and VMSelect components
Logging Components
- Nevermore collects logs and audit data
- Kafka / Elasticsearch / Razor / Lanaya are deployed in distributed and multi-replica modes
Networking Components (CNI)
- Kube-OVN / Calico / Flannel: Achieve HA via stateless DaemonSets or triple-replica control plane components
ALB
- Operator deployed with three replicas, leader election enabled
- Instance-level health checks and load balancing
Self-built VIP
- High-availability virtual IP based on Keepalived
- Supports heartbeat detection and active-standby failover
Harbor
- ALB-based load balancing
- PostgreSQL with Patroni HA
- Redis Sentinel mode
- Stateless services deployed in multiple replicas
Registry and MinIO
- Registry deployed with three replicas
- MinIO in distributed mode with erasure coding, data redundancy, and automatic recovery