My Optimized Ollama + Continue Setup on Mac Studio

I was getting constant popup errors in Continue (VS Code extension) when using my Qwen3.6:27b model: “Error parsing Ollama response: expected element type <function> but have <parameter>”. Turns out it was a tool-calling template mismatch between Ollama and Continue, fixed by updating Ollama from 0.20.5 to 0.23.

Why I Updated Ollama (and How)

I run Ollama on my Mac Studio M4 Max (36GB unified memory) and access it over LAN from my CachyOS laptop running VS Code + Continue + Cline extensions. The GUI makes updates dead simple:

Click the Ollama menu bar icon
Choose “Restart to update”
Done! (ollama --version confirmed 0.23)

Why it mattered: Ollama 0.23 includes updated qwen3.5 renderer/parser templates that fixed my Continue parsing errors completely. Older versions had mismatched tool-calling schemas.

Peeking Inside: `ollama show`

Curiosity led me to check what was actually happening with my model:

ollama show qwen3.6:27b --modelfile

Here’s the key section (ignoring license):

FROM /Users/me/.ollama/models/blobs/sha256-83c54730a5fea8a0958598c01617c1419c431e93b33bacf980b49a420c798926
TEMPLATE {{ .Prompt }}
RENDERER qwen3.5
PARSER qwen3.5
PARAMETER top_k 20
PARAMETER top_p 0.95
PARAMETER temperature 1

What this tells me:

No num_ctx = uses global default (I set 32k in Ollama GUI)
RENDERER qwen3.5 = latest tool templates (fixed my errors)
top_k 20, top_p 0.95, temperature 1 = model defaults Continue was overriding

My Continue Config That Actually Works

I access Ollama via OLLAMA_HOST=0.0.0.0 ollama serve on Mac, then connect from CachyOS over LAN. Here’s my ~/.continue/config.yaml that’s optimized for my 36GB Mac Studio:

models:
  - name: qwen3.6:27b
    provider: ollama
    model: qwen3.6:27b
    apiBase: http://192.168.1.25:11434
    roles: [chat, edit, apply, summarize]
    capabilities: [tool_use]
    contextLength: 32768      # Matches my 32k GUI setting
    maxTokens: 4096
    timeout: 180000
    temperature: 0.2         # Override model's 1.0 (less random)
    top_p: 0.9
    top_k: 20                # Matches Modelfile (faster)

  - name: gemma4:31b
    provider: ollama
    model: gemma4:31b
    apiBase: http://192.168.1.25:11434
    roles: [chat, edit, apply, summarize]
    capabilities: [tool_use]
    contextLength: 32768
    maxTokens: 2048
    timeout: 180000
    temperature: 0.5
    top_p: 0.9
    top_k: 20

Why These Specific Settings

contextLength: 32768: My Ollama GUI is set to 32k global default. Requesting 65k was wasting API negotiation time.

top_k: 20: Matches Modelfile exactly. Higher values (40) slow token generation 20-30%.

temperature: 0.2: Model default of 1.0 is too random for coding. 0.2 gives me a more focused responses.

36GB Mac Studio: Handles qwen3.6:27b at 32k context using ~28-32GB total. Smooth, no swapping.

My Workflow

Mac Studio (Ollama 0.23) ←LAN→ CachyOS Laptop (VS Code + Continue + Cline)
                      ↓
                 32k context, fast token generation

No more parsing popups, smooth 32k context, and Continue actually works reliably. The YouTubers show “ollama run model” demos, but for production workflows over LAN with Continue, these tweaks make all the difference.

Takeaway: Check your Modelfile, match your Continue settings, update Ollama regularly. Your 30GB+ Mac can do way more than the defaults suggest.

Why I Updated Ollama (and How)

Peeking Inside: ollama show

My Continue Config That Actually Works

Why These Specific Settings

My Workflow

Leave a ReplyCancel Reply

Peeking Inside: `ollama show`