You’ve got your workstation ready and your Ollama server humming in the background. But let’s be honest: copying and pasting code between a terminal and a text editor is a tedious, flow-killing process.
To truly unlock the power of local AI, you need a bridge. You need a way for the AI to “see” your files, understand your project structure, and write code directly into your editor.
For me, that bridge is Continue.
Why Continue? The Last Truly Open Assistant
If you search for AI extensions in VSCodium or VS Code, you’ll find dozens of options. But almost all of them come with a “catch”: they require an account, they track your telemetry, or they demand a monthly subscription.
Continue is different.
It is one of the few truly open-source AI assistants that respects the user. It doesn’t want my email address or my credit card. It simply acts as a professional interface that connects my editor to whatever AI “brain” I choose to use. Because it is open, I have total control over my data. When paired with Ollama, my code stays on my computer.
Connecting the Brain to the Body: The config.yaml
To make Continue talk to my Ollama server, I have to tell it where to look. This is done through a configuration file called config.yaml.
I think of the config.yaml as the “instruction manual” for my assistant. It tells Continue which model to use, where the server is located on my network, and what tasks the AI is allowed to perform.
If you are using the Server Strategy I mentioned in my last post (running Ollama on a separate machine), you must point the apiBase to the IP address of that server.
Here is the configuration I use to get things moving:
name: Local Config
version: 1.0.0
schema: v1
tabAutocompleteOptions:
maxPromptTokens: 512
models:
- name: qwen3.5:27b
provider: ollama
model: qwen3.5:27b
apiBase: http://192.168.1.100:11434
roles: [chat, edit, autocomplete, apply, summarize]
capabilities: [tool_use]
contextLength: 8192
maxTokens: 2048
timeout: 120000
Two critical notes for your setup:
- The IP Address: In the example above,
192.168.1.100is a placeholder. You’ll need to replace this with the actual local IP address of your Ollama server. - Model Flexibility: You aren’t limited to one model. Any model you’ve downloaded via Ollama (e.g.,
gemma4,qwen3-coder) can be added here. I often switch between a heavy-duty model for complex refactoring and a lightweight one for quick chats.
The “Vibe Coding” Workflow
Once this is configured, the entire nature of the work changes. I no longer “write code” in the traditional, line-by-line sense; I orchestrate it.
I can highlight a block of messy, legacy code and ask Continue to “refactor this for readability,” or describe a new feature in plain English and watch the AI generate the boilerplate in real-time. Because the AI is running on my local network, the latency is incredibly low and the privacy is absolute.
When the AI Hits a Wall
Even the best local models have limits. Sometimes a prompt is too complex, or the model gets stuck in a logic loop.
When my local setup isn’t giving me the answer I need, I don’t fight it—I pivot. This is where the “Hybrid Approach” comes in. I use my local setup for 90% of my work to maintain speed and privacy, but I keep a browser tab open for the “heavy hitters.”
For those moments of total blockage, I consult the giants: ChatGPT, GitHub Copilot, or my personal favorite for research and factual accuracy, Perplexity.ai.
My Studio is Complete
I’ve now built the full stack:
- The OS: A creator-focused environment (MX Linux and macOS).
- The Engine: A private, local LLM server (Ollama).
- The Interface: A professional, open-source coding assistant (Continue).
The tools are ready. The friction is gone. Now, the only question left is: What am I going to build next?