Right now, the AI world is split into two camps: those who pay a monthly subscription to a cloud provider, and those who run their own AI locally on their own hardware.
For me, running AI locally is the ultimate power move. My data never leaves my room, it works offline, and it’s completely free once the hardware is paid for. But when I first started exploring the local landscape, I found myself staring at two main contenders: LM Studio and Ollama.
Both are incredible pieces of software, but after rigorous testing in my Beginner Projects lab, I’ve settled on Ollama. Here is the rationale behind that choice and how I’ve integrated it into my workflow.
What is Ollama, exactly?
I like to think of Ollama as a “manager” for Large Language Models (LLMs).
Normally, running a local AI model is a technical headache involving Python environments, complex dependencies, and GPU configurations that can take an entire afternoon to debug. Ollama simplifies all of that into a single, lightweight application. It handles the downloading, loading, and execution of the models in the background, giving me a clean, streamlined way to interact with the AI.
Why Ollama over LM Studio?
LM Studio is fantastic—it has a beautiful visual interface that makes it feel like a finished product. But for my specific needs, Ollama wins on efficiency and flexibility.
Ollama runs as a background service. This means once it’s installed, it stays “alive” in the system. I don’t have to keep a heavy application window open just to keep the AI active. More importantly, Ollama is designed to be a server. This allows me to connect other tools—like VSCodium or custom web interfaces—directly to the AI engine via an API.
Getting Started (The 3-Minute Setup)
Setting up Ollama is perhaps the easiest part of my entire coding journey. If you want to try it, the process is dead simple:
- Download: I head to
Ollama.comand grab the version for my OS. - Install: Run the installer.
- Run a Model: I open my terminal and type:
ollama run gemma4:26b(or the 4b version, depending on the hardware specs).
That’s it. I’m now chatting with a world-class AI entirely offline.
My “Pro” Strategy: The Dedicated Server
As I got deeper into local AI, I noticed a recurring problem: LLMs are “resource hungry.” They want every single bit of RAM and GPU power my computer can provide.
If I run my IDE (VSCodium) on the same machine as the AI, my computer is constantly splitting resources between the “Thinking” (the AI) and the “Doing” (the Code Editor). This leads to lag and thermal throttling.
My Professional Setup: To solve this, I treat Ollama as a dedicated server.
Instead of running everything on one machine, I run Ollama on my powerful Mac Studio (the server) and run VSCodium on a modest Dell Precision PC on the same local network (the client).
By decoupling the two, the Mac Studio can dedicate 100% of its VRAM and CPU to the AI model, while my Dell PC remains snappy and responsive. It gives me the speed of a high-end workstation with the ergonomics of a dedicated coding station.
What’s Next?
Now that I have the “brain” (Ollama) running on my system, I have to move past the terminal. Chatting in a command line is fine for testing, but it’s not how you actually build software.
The real magic happens when I integrate that local AI directly into my code editor so it can see my files and help me write functions in real-time.
In my next post, I’m going to show you how I use VSCodium and the “Continue” extension to turn a local Ollama server into a fully automated AI coding assistant.