I was getting constant popup errors in Continue (VS Code extension) when using my Qwen3.6:27b model: “Error parsing Ollama response: expected element type <function> but have <parameter>”. Turns out it was a tool-calling template mismatch between Ollama and Continue, fixed by updating Ollama from 0.20.5 to 0.23.
Why I Updated Ollama (and How)
I run Ollama on my Mac Studio M4 Max (36GB unified memory) and access it over LAN from my CachyOS laptop running VS Code + Continue + Cline extensions. The GUI makes updates dead simple:
- Click the Ollama menu bar icon
- Choose “Restart to update”
- Done! (
ollama --versionconfirmed 0.23)
Why it mattered: Ollama 0.23 includes updated qwen3.5 renderer/parser templates that fixed my Continue parsing errors completely. Older versions had mismatched tool-calling schemas.
Peeking Inside: ollama show
Curiosity led me to check what was actually happening with my model:
ollama show qwen3.6:27b --modelfile
Here’s the key section (ignoring license):
FROM /Users/me/.ollama/models/blobs/sha256-83c54730a5fea8a0958598c01617c1419c431e93b33bacf980b49a420c798926
TEMPLATE {{ .Prompt }}
RENDERER qwen3.5
PARSER qwen3.5
PARAMETER top_k 20
PARAMETER top_p 0.95
PARAMETER temperature 1
What this tells me:
- No
num_ctx= uses global default (I set 32k in Ollama GUI) RENDERER qwen3.5= latest tool templates (fixed my errors)top_k 20,top_p 0.95,temperature 1= model defaults Continue was overriding
My Continue Config That Actually Works
I access Ollama via OLLAMA_HOST=0.0.0.0 ollama serve on Mac, then connect from CachyOS over LAN. Here’s my ~/.continue/config.yaml that’s optimized for my 36GB Mac Studio:
models:
- name: qwen3.6:27b
provider: ollama
model: qwen3.6:27b
apiBase: http://192.168.1.25:11434
roles: [chat, edit, apply, summarize]
capabilities: [tool_use]
contextLength: 32768 # Matches my 32k GUI setting
maxTokens: 4096
timeout: 180000
temperature: 0.2 # Override model's 1.0 (less random)
top_p: 0.9
top_k: 20 # Matches Modelfile (faster)
- name: gemma4:31b
provider: ollama
model: gemma4:31b
apiBase: http://192.168.1.25:11434
roles: [chat, edit, apply, summarize]
capabilities: [tool_use]
contextLength: 32768
maxTokens: 2048
timeout: 180000
temperature: 0.5
top_p: 0.9
top_k: 20
Why These Specific Settings
contextLength: 32768: My Ollama GUI is set to 32k global default. Requesting 65k was wasting API negotiation time.
top_k: 20: Matches Modelfile exactly. Higher values (40) slow token generation 20-30%.
temperature: 0.2: Model default of 1.0 is too random for coding. 0.2 gives me a more focused responses.
36GB Mac Studio: Handles qwen3.6:27b at 32k context using ~28-32GB total. Smooth, no swapping.
My Workflow
Mac Studio (Ollama 0.23) ←LAN→ CachyOS Laptop (VS Code + Continue + Cline)
↓
32k context, fast token generation
No more parsing popups, smooth 32k context, and Continue actually works reliably. The YouTubers show “ollama run model” demos, but for production workflows over LAN with Continue, these tweaks make all the difference.
Takeaway: Check your Modelfile, match your Continue settings, update Ollama regularly. Your 30GB+ Mac can do way more than the defaults suggest.