This commit is contained in:
46
projects/self-hosted-ai/homelab-ai-infrastructure.md
Normal file
46
projects/self-hosted-ai/homelab-ai-infrastructure.md
Normal file
@@ -0,0 +1,46 @@
|
||||
---
|
||||
title: "Self-hosted AI"
|
||||
description: "Architecture and configuration details for the self-hosted AI environment including Ollama, Continue, and Open WebUI."
|
||||
draft: false
|
||||
author: "wompmacho"
|
||||
date: '2026-04-08T01:00:00-04:00'
|
||||
lastmod: '2026-04-08'
|
||||
tags: ["ai", "ollama", "continue.dev", "open-webui", "self-hosted", "gemma"]
|
||||
---
|
||||
|
||||
# Homelab AI Infrastructure Overview
|
||||
|
||||
This document outlines the current self-hosted Artificial Intelligence infrastructure, detailing how models are hosted, accessed, and utilized across different interfaces within the homelab environment.
|
||||
|
||||
## Core Inference Engine: Ollama
|
||||
|
||||
The backbone of the AI setup is driven by **Ollama**, which handles the actual model inference and API routing.
|
||||
|
||||
* **Host Environment:** Dedicated Gaming PC. This machine provides the necessary GPU compute power and VRAM to run large language models efficiently.
|
||||
* **Network Address:** `http://10.0.0.109:11434`
|
||||
* **Active Models:**
|
||||
* `gemma4:26b` (Heavy): The primary model used for complex reasoning, comprehensive chat, and applying structural code edits.
|
||||
* `gemma4:e4b` (Fast): A smaller, optimized model specifically dedicated to low-latency tasks like real-time code autocomplete.
|
||||
|
||||
## Developer Integration: Continue.dev
|
||||
|
||||
For software development, the AI is integrated directly into the coding environment, turning the IDE into an AI-powered workspace.
|
||||
|
||||
* **Environment:** VS Code running via `code-server` on the Linux host.
|
||||
* **Extension:** Continue.dev
|
||||
* **Routing:** The extension is configured via `~/.continue/config.yaml` to offload all inference to the remote Ollama instance on the Gaming PC.
|
||||
* **Autonomous Capabilities:**
|
||||
* The setup is enhanced with the Model Context Protocol (MCP).
|
||||
* A local `shell-mcp-server` runs on the `code-server` host, exposing the `execute_shell_command` tool.
|
||||
* Combined with aggressive system prompting, this allows the `gemma4:26b` model to autonomously execute Linux commands, read outputs, and debug the local workspace without requiring the developer to manually run commands.
|
||||
|
||||
## General User Interface: Open WebUI
|
||||
|
||||
For general-purpose queries, document analysis, and conversational AI outside of the IDE, a dedicated web interface is utilized.
|
||||
|
||||
* **Hosting:** Deployed as a Docker container within the homelab server infrastructure (data stored in `/srv/open-webui`).
|
||||
* **Integration:** Natively connected to the Ollama API on the Gaming PC over the local network (`10.0.0.109`).
|
||||
* **Features:**
|
||||
* Provides a polished, ChatGPT-like experience.
|
||||
* Allows users to interact seamlessly with the Gemma models.
|
||||
* Supports persistent chat histories and file uploads (via the `/srv/open-webui/uploads` and `vector_db` volumes) for Retrieval-Augmented Generation (RAG) capabilities.
|
||||
Reference in New Issue
Block a user