All checks were successful
deploy-docs / build-and-deploy (push) Successful in 1m6s
2.7 KiB
2.7 KiB
title, description, draft, author, date, lastmod, tags
| title | description | draft | author | date | lastmod | tags | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Self-hosted AI | Architecture and configuration details for the self-hosted AI environment including Ollama, Continue, and Open WebUI. | false | wompmacho | 2026-04-08T01:00:00-04:00 | 2026-04-08 |
|
Homelab AI Infrastructure Overview
This document outlines the current self-hosted Artificial Intelligence infrastructure, detailing how models are hosted, accessed, and utilized across different interfaces within the homelab environment.
Core Inference Engine: Ollama
The backbone of the AI setup is driven by Ollama, which handles the actual model inference and API routing.
- Host Environment: Dedicated Gaming PC. This machine provides the necessary GPU compute power and VRAM to run large language models efficiently.
- Network Address:
http://10.0.0.109:11434 - Active Models:
gemma4:26b(Heavy): The primary model used for complex reasoning, comprehensive chat, and applying structural code edits.gemma4:e4b(Fast): A smaller, optimized model specifically dedicated to low-latency tasks like real-time code autocomplete.
Developer Integration: Continue.dev
For software development, the AI is integrated directly into the coding environment, turning the IDE into an AI-powered workspace.
- Environment: VS Code running via
code-serveron the Linux host. - Extension: Continue.dev
- Routing: The extension is configured via
~/.continue/config.yamlto offload all inference to the remote Ollama instance on the Gaming PC. - Autonomous Capabilities:
- The setup is enhanced with the Model Context Protocol (MCP).
- A local
shell-mcp-serverruns on thecode-serverhost, exposing theexecute_shell_commandtool. - Combined with aggressive system prompting, this allows the
gemma4:26bmodel to autonomously execute Linux commands, read outputs, and debug the local workspace without requiring the developer to manually run commands.
General User Interface: Open WebUI
For general-purpose queries, document analysis, and conversational AI outside of the IDE, a dedicated web interface is utilized.
- Hosting: Deployed as a Docker container within the homelab server infrastructure (data stored in
/srv/open-webui). - Integration: Natively connected to the Ollama API on the Gaming PC over the local network (
10.0.0.109). - Features:
- Provides a polished, ChatGPT-like experience.
- Allows users to interact seamlessly with the Gemma models.
- Supports persistent chat histories and file uploads (via the
/srv/open-webui/uploadsandvector_dbvolumes) for Retrieval-Augmented Generation (RAG) capabilities.