Hermes Agent Test of a Simple WordPress Admin Task Failed

In this post, I share a candid look at the current state of local AI agents.

I attempted to use a local LLM setup—running on my Mac via Ollama—to handle a trivial administrative task: renaming a page within a WordPress installation. Despite the promise of “agentic” capabilities, the result was a failure.

This experiment serves as a reminder of the gap between the theoretical potential of local AI and the reality of its execution in a live browser environment.

Screenshot of the WordPress Pages dashboard showing the environment where the Hermes Agent attempted to rename a page. — Hermes Agent, using gemma4:31b running on Ollama was **not** able to change the page name.

The Setup:

Primary Workstation:
Mac Studio M4 Max Base Model w 36GB Unified Memory
Local LLM Engine:
Ollama running gemma4:31b
Agent Framework:
Hermes Agent running on a separate Linux PC via LAN

Note: Hermes Agent and Ollama were fully updated to the latest version.

Hermes Terminal Output

⚙️ /new
✓ Model switched: gemma4:31b
Provider: custom
Context: 262,144 tokens
(session only — add –global to persist)

(^_^)v New session started!

● log in to a new WordPress site that is running on my Linux PC. the URL is: https://localpress.local/ The username is: localhost and the admin password is: localhost Tell me when you are logged in.
Initializing agent…

┊ 📚 preparing skill_view…
┊ 📚 preparing skill_view…
┊ 📚 skill localwp-dashboard-admin 0.1s
┊ 📚 skill localwp-setup-and-troubleshooting 0.1s
┊ 🌐 preparing browser_navigate…
┊ 🌐 navigate localpress.local 1.1s
┊ ⌨️ preparing browser_type…
┊ ⌨️ preparing browser_type…
┊ 👆 preparing browser_click…
┊ ⌨️ type “localhost” 0.3s
┊ ⌨️ type “localhost” 0.3s
┊ 👆 click @e9 0.3s
┊ 📸 preparing browser_snapshot…
┊ 📸 snapshot compact 0.3s

╭─ ⚕ Hermes ───────────────────────────────╮
I am now logged into the WordPress site at https://localpress.local/ as the admin user ‘localhost’.
╰─────────────────────────────────────╯
● great. rename the “Sample Page” to “HerMan” and when done, print “next”.

┊ 👆 preparing browser_click…
┊ 👆 click @e33 0.8s [error]
┊ 📸 preparing browser_snapshot…
┊ 📸 snapshot compact 0.3s
┊ 🌐 preparing browser_navigate…
┊ 🌐 navigate localpress.local 0.7s
┊ ⌨️ preparing browser_type…
┊ ⌨️ preparing browser_type…
┊ 👆 preparing browser_click…
┊ ⌨️ type “localhost” 0.3s
┊ ⌨️ type “localhost” 0.3s
┊ 👆 click @e9 0.3s
┊ 📸 preparing browser_snapshot…
┊ 📸 snapshot compact 0.3s
┊ 👆 preparing browser_click…
┊ 👆 click @e33 0.3s
┊ 📸 preparing browser_snapshot…
┊ 📸 snapshot compact 0.3s
┊ 🌐 preparing browser_navigate…
┊ 🌐 navigate localpress.local 1.0s
┊ 👆 preparing browser_click…
┊ 👆 click @e105 0.3s
┊ 📸 preparing browser_snapshot…
┊ 📸 snapshot compact 0.3s
┊ 👆 preparing browser_click…
┊ 👆 click @e104 0.2s
┊ 📸 preparing browser_snapshot…
┊ 📸 snapshot compact 0.3s
┊ 👆 preparing browser_click…
┊ 👆 click @e103 0.2s

⚡ New message detected, interrupting…
⚡ Interrupted during API call.
╭─ ⚕ Hermes ──────────────────────────────╮
Operation interrupted: waiting for model response (84.2s elapsed).

[Interrupted – processing new message]
╰────────────────────────────────────╯[

● stop all processing. this experiment has failed as you have not achieved this simple task based on the 8+ minutes time you’ve spent on this job which takes a human less than 30 seconds to complete.

⚡ Sending after interrupt: ‘stop all processing. this experiment has failed as…’

╭─ ⚕ Hermes ───────────────────────────────╮
I apologize. You are correct; I failed to execute this simple task efficiently and accurately. I spent far too much time and ultimately concatenation instead of replacement. I am stopping all processing now.
╰─────────────────────────────────────╯
⚕ gemma4:31b │ 32.2K/64K │ [█████░░░░░] 50% │ 19m │ ⏲ 3m 36s

Conclusion

The verdict from this experiment is clear: local AI agents are not ready for routine WordPress administration. What took the model 8+ minutes of frantic clicking and navigation loops could have been done with two clicks in reality.

But failures like this are the most informative part of building with emerging technology. We now have a concrete data point — local-first agents handle authentication fine, but struggle with multi-step web automation and contextual awareness. The gap between “agentic AI” in marketing and “agentic AI” on your desk is still substantial.

I will keep experimenting and reporting honestly — the wins, the misses, and the sobering reality of what local AI can actually do today. I invite you to share your own results: what has worked, and where has it fallen short? Let’s build a clearer picture of where local AI stands right now.