AI Infrastructure and Kubernetes Cluster Project

Designed and deployed a highly available AI infrastructure environment using virtualization, distributed storage, and container orchestration to support scalable AI model hosting and workload management.

Project Description

Built a 3-node Proxmox VE virtualization cluster to provide a resilient and scalable compute environment for hosting virtual machines and containerized services. Configured Ceph distributed storage to ensure high availability, fault tolerance, and shared storage across all cluster nodes.

Deployed Linux-based virtual machines to support infrastructure and application workloads. Implemented a MicroK8s Kubernetes cluster to manage container orchestration, scaling, and service deployment across the environment.

Deployed and managed AI models and services including Ollama, LobeChat, OpenClaw, DeepSeek, Qwen, and Llama, enabling local AI inference and chatbot capabilities within the cluster environment.

Configured Docker containers and Kubernetes workloads using YAML manifests to automate deployment, scaling, and service management. Implemented shared storage integration and validated automated failover to ensure service continuity during node failures.

Performed extensive testing of cluster resiliency, workload distribution, and service availability to validate high availability and performance under failure conditions.

Technologies: Proxmox VE, Ceph, Linux Mint, Docker, Kubernetes (MicroK8s), Ollama, LobeChat, OpenClaw, DeepSeek, Qwen, Llama, YAML, Virtualization, Distributed Storage

Project Documentation