Building Friday Home Lab with Kubernetes and AI

This spring break, I built "Friday" (named after Iron Man's OTHER AI), a production home lab where I practice industry standard skills: container orchestration, CI/CD pipelines, automation, and observability. What it does VM 1: the core services - Self-hosted services (Immich for photos, Nextcloud for files, Vaultwarden for passwords) - Full monitoring stack with Grafana, InfluxDB, and Telegraf to track system metrics VM 2: locally hosted Ollama LLM - phi3:mini allows for AI-powered alerting via an n8n pipeline that sends AI-interpreted incidents to a Discord bot on my phone VM 3: k3s cluster - Hosts my portfolio website, deployed on k3s with a GitHub Actions CI/CD pipeline for edits + Cloudflare tunnel to avoid port forwarding All on a refurbished OptiPlex 3080 Micro running Proxmox. I learned that building an infrastructure works in iterations bringing pieces together instead of throwing them all at once. Breaking VMs and losing track of compose files taught me more than any tutorial - iteration and failure are the best teachers. Read the full breakdown: markcalip.com #DevOps #CloudEngineer #SoftwareEngineering #MLOps #Kubernetes #Docker #CICD #AI #SelfHosted #Homelab #LearnInPublic #CloudComputing

To view or add a comment, sign in

More Relevant Posts

FAUN.dev()

1,325 followers
3w
Report this post
Kubernetes keeps quietly moving the **goalposts**: v1.36 is already teeing up removals, security tweaks, and a few sacred cows headed for the exit. At the same time, observability is getting more portable, and AI infra is hardening from hype into real projects and safer runtimes. --- 🧱 Broadcom Makes Its Pitch To Run **Kubernetes** On **VMware VCF** --- 🧭 How **OpenTelemetry** Helps Teams Change Observability Backends Without Re-Instrumenting Everything --- 🧨 **Kubernetes v1.36** Sneak Peek --- 🧩 **llm-d** officially a **CNCF Sandbox** project --- ☁️ Offload GA: The Full Power of Docker, for Every Developer, Everywhere. --- 🛡️ Sandboxes: Run Agents in **YOLO Mode**, Safely --- Steal the ideas, skip the scars. Cheers! FAUN.dev() Team KubernetesLinks is out — read this issue: http://from.faun.to/r/RblR Subscribe to get the next one: https://faun.dev/join/ Have a great week! #kubernetes #k8s #cloudnative #devops #platformengineering #observability #opentelemetry #cncf #containers #security #ai #llmops
Like Comment
To view or add a comment, sign in
Anna Maresova

Linux Roots, AI Horizons | Director, SUSE Labs Essentials
1mo Edited
Report this post
Last week, I noticed my Claude Max usage behaving unexpectedly. Same workflows as usual, no obvious change in usage patterns, but the $200/month plan (with weekly limits) was getting close to exhaustion much faster than expected. With two days still to go, I was already near 90%. Luckily, at the same time, two things surfaced: - Claude Code had leaked - Token caching in Claude Code had been broken for about a week, making resumed sessions significantly more expensive If you rely on persistent context, this has a real impact. Resuming sessions is not an edge case. It is the default way to work if you want continuity and a reasonable cognitive load. Reconstructing context repeatedly is both inefficient and error-prone. The community identified the issue quickly and produced a fix. Having access to the code made it possible to review and validate before applying it. This highlights a familiar tension. We are building systems that are increasingly positioned around safety and ethics. At the same time, key parts of the stack remain closed, which makes it harder to understand failure modes when something goes wrong. From an open source perspective, this is not a new problem. Linux became foundational infrastructure largely because it allowed inspection, modification, and shared debugging at scale. Modern AI is built on top of that same ecosystem: open infrastructure, open tooling, and large amounts of publicly available code. It is reasonable to expect similar properties higher up the stack. Right now, the practical workaround is to reconstruct enough of the system behavior to keep workflows stable. When transparency is limited, reverse engineering becomes part of normal operation. We have seen this pattern before. #AIEngineering #DeveloperProductivity #OpenSource #ProductionSystems #AIInfrastructure

1 Comment
Like Comment
To view or add a comment, sign in
Ivan Vokhmin
3w
Report this post
Distributed AI Memory: Why File-Based Vector Storage Makes Sync Trivial When building MadCat (my AI memory system), I made a deliberate choice: every memory is a JSON file with a globally unique name (with help of UUIDs). Why? Because it turns distributed sync from a complex database problem into a simple folder merge. So, for example, this is my crontab-based sync strategy for 3+ laptops: 0 */3 * * * rclone copy -vv dropbox:madcat /path/to/memories \ && rclone copy -vv --min-age 10m /path/to/memories dropbox:madcat That's it. No conflict resolution. No merge logic. Just folder merge. The `--min-age 10m` flag prevents syncing files not fully written yet. Benefits: ✅ Multi-device AI memory (your assistant remembers things across machines) ✅ Zero database locks ✅ Works with any cloud storage (Dropbox, S3, Google Drive via rclone) ✅ Human-readable backups (just JSON files) Trade-offs: ⚠️ Not ideal for high-frequency writes ⚠️ Filesystem limits apply (should not be a problem on modern FS though) For an AI assistant that lives across multiple devices, this architecture gives us "good enough" consistency with minimal complexity. Sometimes the boring solution is the right one. Full implementation: https://lnkd.in/dYNfFcqS #AI #VectorSearch #DistributedSystems #RAG #Engineering #madcat_vector_db

Ivan Vokhmin / madcat · GitLab gitlab.com
Like Comment
To view or add a comment, sign in
Fahad Bilal S.
6d
Report this post
Scaling AI systems with 16 MCP servers isn't easy. When building Atlas, I had to rethink server architecture from scratch. I chose a distributed approach, assigning specific tasks to each server to minimize bottlenecks. This allowed me to optimize resource allocation and improve overall system performance. Debugging was a challenge, but using semantic memory helped identify issues quickly. The outcome was worth it - a robust and efficient system. What's the most significant challenge you've faced when designing a distributed system? #AI #Automation #FullStack
Like Comment
To view or add a comment, sign in
Datadog Developer

849 followers
6d
Report this post
Datadog is using OpenAI Codex to bring system-level context into code review. Every PR is evaluated beyond the diff: • cross-service interactions • API contract changes with downstream impact • gaps in test coverage across coupled systems To validate it, #Datadog replayed real incidents against historical PRs. → Codex identified issues in ~22% of cases that had already passed review “For me, a Codex comment feels like the smartest engineer I’ve worked with and who has infinite time to find bugs.” - Bradley Carter, Engineering Manager at Datadog What changes in practice: • risks surface earlier, before production • reviewers spend less time hunting for edge cases • more attention goes to design and architecture Read all about it: https://lnkd.in/ew6Xi63G #codex #openai #datadog
Like Comment
To view or add a comment, sign in
Ajeet Yadav
1mo
Report this post
🚨 Kubernetes is entering its AI-native era — and we’re laying the foundation. Introducing Podscape v2.3.0 — now with native MCP support. This release is a big step toward making Kubernetes AI-integrated, while keeping everything fast, visual, and open. --- ### ⚡ What’s new in v2.3.0? 🤖 MCP Server (AI-ready foundation) Connect tools like Claude, Cursor, or other LLMs directly to your cluster. → Fetch logs → Inspect resources → Enable AI-assisted workflows (coming next) 🛡️ Security Hub TLS tracking + vulnerability scans powered by Trivy — without leaving the UI or wiring tools together. 🛠️ Native Node Operations Drain, cordon, uncordon — directly from the UI with safe handling. 💠 Open Source Core Community-driven and central to everything we’re building. --- ### 🧠 Where we’re heading Instead of switching between: kubectl + Grafana + security tools… We’re building toward: > A single interface where humans + AI interact with Kubernetes together --- ### 🙌 Support the project If this direction resonates with you: ⭐ Star the repo: https://lnkd.in/ghfbRQ4i 📖 Release blog: https://lnkd.in/g2DPw6uC Your support helps us push this further 🚀 --- #Kubernetes #OpenSource #DevOps #PlatformEngineering #SRE #AI #CloudNati
1 Comment
Like Comment
To view or add a comment, sign in
devtech.pro

727 followers
1mo
Report this post
n8n Docker Setup: Why It Breaks (And the Easier Alternative) - DEV Community Navigating the complexities of self-hosting n8n with Docker can be a daunting journey, filled with common pitfalls such as SSL challenges, database persistence issues, and environment variable misconfigurations. While Docker brings undeniable benefits like portability and consistency, achieving a production-ready setup requires meticulous handling of reverse proxies, secure certificates, and mounted directories. These nuanced steps often mean the difference between a smooth operation and constant troubleshooting. For those who want to bypass the headaches, alternative deployment methods offer preconfigured setups that eliminate the need for extensive debugging or manual configuration. Imagine launching n8n in under three minutes with managed solutions that include HTTPS, PostgreSQL for production-ready databases, and automated backups—this is precisely where Devtech.pro shines. With expertise in building tailored workflows and automating operations, we simplify n8n deployment while letting you focus on innovation. Whether it's optimizing Docker or exploring hassle-free alternatives, our team ensures your workflows don't just run—they thrive securely and efficiently. Stop wrestling with infrastructure and start scaling smarter with solutions that fit your needs. Learn more at: https://lnkd.in/gXsdA_mW What are your thoughts on this? Don't hesitate to share your thoughts and ideas in the comments below. devtech.pro is always eager to hear from our community and learn about your experiences and perspectives. Looking forward to connecting with you! #devtech.pro #AI #technology #trending #news #innovation #technology This article is written and published by Doki. Doki is our documentation's and social media's AI Agent.
Like Comment
To view or add a comment, sign in
TheNextGenTechInsider.com

644 followers
3w
Report this post
Elastic Launches LLM-Powered Monitor to Detect Real-Time Supply Chain Compromises 📌 Elastic unveils a groundbreaking LLM-powered tool that scans top packages in PyPI and npm in real time, catching supply chain threats before they spread. Using AI to analyze code diffs for obfuscated payloads, persistence, and network anomalies, it auto-alerts teams-outpacing static scanners. Open-source and ready for DevOps, it turns dependency audits into proactive defense. 🔗 Read more: https://lnkd.in/dESswj6N #Elastic #Llm #Pypi #Npm #Supplychain
Like Comment
To view or add a comment, sign in
iTech22

367 followers
3w
Report this post
🚀 Are you treating your network infrastructure like code? If not, it's time to transform your NetDevOps toolkit. A modern enterprise network isn't managed; it's orchestrated. Managing a mission-critical 500+ node cluster requires a sophisticated ecosystem where everything—from Zero-Touch Provisioning (ZTP) to predictive threat hunting—is automated and version-controlled. I recently broke down the exact tech stack you need to bridge the gap between legacy hardware, modern cloud-native architecture, and specialized AI/ML workloads. My must-have list is split into six mission-critical categories: Infrastructure as Code (IaC) & Configuration Management This is about treating your network state as version-controlled code. 🛠️ Key Players: Ansible, Terraform, Arista CloudVision (CVP), Juniper Apstra Network Testing & Validation (Pre/Post Change) Prevent outages before they happen by modeling and simulating your network. 🛠️ Key Players: pyATS/Genie, Batfish, SuzieQ Programmability & Scripting Build bespoke automation when off-the-shelf tools fail, powered by your Source of Truth (SoT). 🛠️ Key Players: Python (Netmiko/NAPALM/Nornir), Go (Golang), NetBox/Nautobot Telemetry & Observability Proactive threat hunting and predictive maintenance are the goals here. SNMP is dead; long live streaming telemetry. 🛠️ Key Players: gNMI/gRPC, Prometheus & Grafana, ThousandEyes CI/CD & Pipeline Orchestration The "glue" that triggers all your automated testing and configuration deployments on every git push. 🛠️ Key Players: GitLab CI/CD, GitHub Actions, Jenkins AI/ML Performance Tuning (Specialized) Crucial for high-throughput GPU-to-GPU communication and complex cluster management. 🛠️ Key Players: NVIDIA Unified Fabric Manager (UFM), Mellanox NEO The future of network engineering is software-defined, automated, and observable. Are any of these tools missing from your repertoire, or do you have another game-changer to suggest? Join the discussion in the comments. Let's build better, more resilient networks. 👇 #NetworkEngineer #NetworkAutomation #NetDevOps #IaC #CloudNetworking #DDI #Cisco #Arista #Juniper #AI #MachineLearning #TechStack #CareerGrowth
Like Comment
To view or add a comment, sign in
Maximiliano Pizarro
2w
Report this post
MCP Gateway + OpenShift Lightspeed: AI-Powered Cluster Operations I've been running some tests integrating MCP Gateway (a Kuadrant extension) with OpenShift Lightspeed, and the results are amazing. The idea is simple yet powerful: federate multiple MCP Servers (Kubernetes, OpenShift, ArgoCD, Developer Hub) behind a single authenticated gateway, enabling the AI assistant integrated into the OpenShift console to execute real cluster operations using natural language. 55 tools. 5 MCP servers. A single entry point. From listing pods and troubleshooting, to syncing applications in ArgoCD or querying the Developer Hub catalog — all without leaving the chat. The video showcases the 3D integration architecture I put together to visualize exactly how all these components connect. 🔗 MCP Gateway (Kuadrant): https://lnkd.in/dtQjuetj 🔗 OpenShift Lightspeed: https://lnkd.in/dvau65s6 #OpenShift #AI #MCP #ModelContextProtocol #Kuadrant #ArgoCD #DeveloperHub #PlatformEngineering #RedHat #CloudNative
Like Comment
To view or add a comment, sign in

15 followers

1 Post

View Profile Connect

Building Friday Home Lab with Kubernetes and AI

More Relevant Posts

Explore related topics

Explore content categories