Files
docs-public/projects/self-hosted-ai/homelab-ai-infrastructure.md
wompmacho 07e353751d
All checks were successful
deploy-docs / build-and-deploy (push) Successful in 1m6s
add ai projects
2026-04-08 01:40:24 +00:00

2.7 KiB

title, description, draft, author, date, lastmod, tags
title description draft author date lastmod tags
Self-hosted AI Architecture and configuration details for the self-hosted AI environment including Ollama, Continue, and Open WebUI. false wompmacho 2026-04-08T01:00:00-04:00 2026-04-08
ai
ollama
continue.dev
open-webui
self-hosted
gemma

Homelab AI Infrastructure Overview

This document outlines the current self-hosted Artificial Intelligence infrastructure, detailing how models are hosted, accessed, and utilized across different interfaces within the homelab environment.

Core Inference Engine: Ollama

The backbone of the AI setup is driven by Ollama, which handles the actual model inference and API routing.

  • Host Environment: Dedicated Gaming PC. This machine provides the necessary GPU compute power and VRAM to run large language models efficiently.
  • Network Address: http://10.0.0.109:11434
  • Active Models:
    • gemma4:26b (Heavy): The primary model used for complex reasoning, comprehensive chat, and applying structural code edits.
    • gemma4:e4b (Fast): A smaller, optimized model specifically dedicated to low-latency tasks like real-time code autocomplete.

Developer Integration: Continue.dev

For software development, the AI is integrated directly into the coding environment, turning the IDE into an AI-powered workspace.

  • Environment: VS Code running via code-server on the Linux host.
  • Extension: Continue.dev
  • Routing: The extension is configured via ~/.continue/config.yaml to offload all inference to the remote Ollama instance on the Gaming PC.
  • Autonomous Capabilities:
    • The setup is enhanced with the Model Context Protocol (MCP).
    • A local shell-mcp-server runs on the code-server host, exposing the execute_shell_command tool.
    • Combined with aggressive system prompting, this allows the gemma4:26b model to autonomously execute Linux commands, read outputs, and debug the local workspace without requiring the developer to manually run commands.

General User Interface: Open WebUI

For general-purpose queries, document analysis, and conversational AI outside of the IDE, a dedicated web interface is utilized.

  • Hosting: Deployed as a Docker container within the homelab server infrastructure (data stored in /srv/open-webui).
  • Integration: Natively connected to the Ollama API on the Gaming PC over the local network (10.0.0.109).
  • Features:
    • Provides a polished, ChatGPT-like experience.
    • Allows users to interact seamlessly with the Gemma models.
    • Supports persistent chat histories and file uploads (via the /srv/open-webui/uploads and vector_db volumes) for Retrieval-Augmented Generation (RAG) capabilities.