The Secret Weapon: Transforming VSCodium with the Continue Extension

You’ve got your workstation ready and your Ollama server humming in the background. But let’s be honest: copying and pasting code between a terminal and a text editor is a tedious, flow-killing process.

To truly unlock the power of local AI, you need a bridge. You need a way for the AI to “see” your files, understand your project structure, and write code directly into your editor.

For me, that bridge is Continue.

Why Continue? The Last Truly Open Assistant

If you search for AI extensions in VSCodium or VS Code, you’ll find dozens of options. But almost all of them come with a “catch”: they require an account, they track your telemetry, or they demand a monthly subscription.

Continue is different.

It is one of the few truly open-source AI assistants that respects the user. It doesn’t want my email address or my credit card. It simply acts as a professional interface that connects my editor to whatever AI “brain” I choose to use. Because it is open, I have total control over my data. When paired with Ollama, my code stays on my computer.

Connecting the Brain to the Body: The config.yaml

To make Continue talk to my Ollama server, I have to tell it where to look. This is done through a configuration file called config.yaml.

I think of the config.yaml as the “instruction manual” for my assistant. It tells Continue which model to use, where the server is located on my network, and what tasks the AI is allowed to perform.

If you are using the Server Strategy I mentioned in my last post (running Ollama on a separate machine), you must point the apiBase to the IP address of that server.

Here is the configuration I use to get things moving:

name: Local Config 
version: 1.0.0 
schema: v1 
tabAutocompleteOptions:   
  maxPromptTokens: 512 
models:   
  - name: qwen3.5:27b     
    provider: ollama     
    model: qwen3.5:27b     
    apiBase: http://192.168.1.100:11434     
    roles: [chat, edit, autocomplete, apply, summarize]     
    capabilities: [tool_use]     
    contextLength: 8192     
    maxTokens: 2048     
    timeout: 120000 

Two critical notes for your setup:

  1. The IP Address: In the example above, 192.168.1.100 is a placeholder. You’ll need to replace this with the actual local IP address of your Ollama server.
  2. Model Flexibility: You aren’t limited to one model. Any model you’ve downloaded via Ollama (e.g., gemma4qwen3-coder) can be added here. I often switch between a heavy-duty model for complex refactoring and a lightweight one for quick chats.

The “Vibe Coding” Workflow

Once this is configured, the entire nature of the work changes. I no longer “write code” in the traditional, line-by-line sense; I orchestrate it.

I can highlight a block of messy, legacy code and ask Continue to “refactor this for readability,” or describe a new feature in plain English and watch the AI generate the boilerplate in real-time. Because the AI is running on my local network, the latency is incredibly low and the privacy is absolute.

When the AI Hits a Wall

Even the best local models have limits. Sometimes a prompt is too complex, or the model gets stuck in a logic loop.

When my local setup isn’t giving me the answer I need, I don’t fight it—I pivot. This is where the “Hybrid Approach” comes in. I use my local setup for 90% of my work to maintain speed and privacy, but I keep a browser tab open for the “heavy hitters.”

For those moments of total blockage, I consult the giants: ChatGPTGitHub Copilot, or my personal favorite for research and factual accuracy, Perplexity.ai.

My Studio is Complete

I’ve now built the full stack:

  • The OS: A creator-focused environment (MX Linux and macOS).
  • The Engine: A private, local LLM server (Ollama).
  • The Interface: A professional, open-source coding assistant (Continue).

The tools are ready. The friction is gone. Now, the only question left is: What am I going to build next?

Explore the Free Apps