2026-05-02 · AI CODING · WORKFLOW

The 3+1 AI Coworker Stack: Why I Cut Copilot and Hired Cursor as a Temp

Cover: At 2am, I caught one of my AI coworkers lying to me
At 2am, I caught one of my AI coworkers lying to me

0 · Four AI Coworkers, Workflow Restructured

After that one tweet on 4/19 about running 4 AI terminals at once went viral, people kept asking: how do you actually set this up?

I promised to write it down. Ten days later, my workflow changed again:

So this can't be "the 4-tool setup guide." The real state is 3 full-timers + 1 temp worker.

Here's the judgment I didn't unpack in the original tweet: AI coworkers aren't a fixed SKU lineup. They're a dynamic combo you adjust based on current opportunities. Whether a tool survives depends on one question: Does it own a unique use case? If two tools overlap and the ROI doesn't work, you cut it—no matter how polished it is.

Below, 4 sections walk through the real workflow. Each covers: positioning / config essentials / collaboration patterns / when to use. Section 5 goes solo on why I cut Copilot—that reasoning matters more than my tool picks. Section 6 is a collaboration diagram. Section 7 is the one-liner summary.

Who this is for: Independent developers and content creators already using 1-2 AI coding tools and wondering how to orchestrate a multi-model workflow.

Who this isn't for: Absolute beginners with zero AI tool experience (go find an intro guide first); pure product managers or non-coders (this setup is overfit to what I do).


1 · Claude Code · The Manager

Positioning

The manager for all my other AI coworkers. Handles conversation, judgment calls, coordination, code review, complex planning, skill dispatch, and all Chinese-language writing (X posts / long-form / docs).

It's not the one that writes the most code. It's the one that decides what everyone else should write.

Config Essentials

Entry: claude CLI (Max subscription: $100/month baseline, $200/month for long-running projects worth the investment).

Three P0 config pieces:

  1. ~/.claude/CLAUDE.md = global behavior SSOT. All delivery standards, work preferences, collaboration rules, and SSOT routing live here.
  2. ~/.claude/memory/active-tasks/T*.json per-task files (aggregate view active-tasks.json auto-rebuilds) = cross-session task handoff panel. Every task must have id / title / owner; owner is only claude or codex.
  3. Per-project MEMORY.md + patterns.md = project-level lessons learned + reusable patterns across projects. When corrected, update patterns.md immediately—don't wait for session-end.

Write discipline is a P0 red line: missing those three required fields in active-tasks schema = downstream /next / /today can't find task names = cross-model handoff breaks.

Skills load on-demand: user-level ~/.claude/skills/ fires on keyword triggers. leo-style for tweets, article-pipeline for long-form, content-verify for fact-checking, humanizer to strip AI-speak—all custom skills.

Collaboration Mode

When to Use / When Not to Use

Use: 3+ files changing, architecture tradeoffs, writing ≥1500-word pieces, debugging stuck for 30min with no leads, cross-session complex hand-offs, public-facing content (tweets / Articles).

Don't use: Adding one type signature, renaming three variables, installing a dependency, quick command lookup (overkill and wastes Max tier).


2 · Codex · The Auditor + Code Reviewer

Positioning

Independent reviewer of strategy, financial, and data-foundation code. Not here to write the most. Here to prevent Claude's main thread from groupthink.

Two words matter: independent. The Codex prompt must contain independent success criteria, never copied from Claude's framework. Codex evaluates from scratch; it's not incremental review on top of Claude's conclusions. I learned this the hard way—when Codex mirrors Claude's reasoning, the dual-review layer falls apart.

It can also run headless automation, but I mostly use it synchronously for review, not as a dispatcher.

Config Essentials

Entry: codex CLI, shares ~/.claude/memory/active-tasks/ per-task JSON write-back whitelist with Claude.

Default reasoning effort settings (gets the tradeoff right):

Task TypeEffortCommand
Strategy / Financials / PnL / Market-making / Wallet signingxhighcodex -c model_reasoning_effort=xhigh
Data pipelines / Backtesting / Data processinghighsame as above, high
General review / Refactorshighsame as above
Docs / Configmediumsame as above, medium

Medium effort cuts corners on strategy code; extreme effort costs tokens that feel cheap next to debug time. Learned this through trial-and-error.

After each write, run node ~/.claude/scripts/sync-active-tasks.mjs to rebuild the aggregate view, or active-tasks.json lags and downstream routing breaks.

Collaboration Mode

When to Use / When Not to Use

Use: Any strategy plan / code / data-foundation execution plan / financial operations / PnL calculation logic. Before shipping anything involving money, Codex reviews it.

Don't use: Pure docs / comments / log formats / ≤5 lines of config (sledgehammer for a nail). Open-ended exploration tasks ("find what's worth improving") also don't work—it gives you a shallow laundry list.

Subscription: ChatGPT Plus $20/month includes Codex CLI basics; Pro $200/month unlocks full capacity.


3 · Grok · The Intelligence Officer

Positioning

The intel officer—real-time information retrieval. X post searches, single-post reads, thread continuations, cross-platform search, finding Quote source material, verifying whether a claim broke this week or months ago. Grok owns all of that.

It doesn't write code, make judgment calls, or participate in decisions. It's purely the "what's the current chatter / does this information have a source" person.

Config Essentials

Entry: Local Grok Bridge at 127.0.0.1:19998, auto-starts via launchd with Safari session state injected. Call it like:

rtk proxy curl -X POST http://127.0.0.1:19998/chat \
  -d '{"q":"<your question>","mode":"deepsearch"}'

Why not opencli grok ask: that tool frequently returns NO RESPONSE—unreliable. Grok Bridge borrows Safari's authenticated session and routes through the front-end API, much more stable.

Companion retrieval tools:

URL routing SSOT lives in ~/.claude/docs/url-routing.md—a one-page map of which tool handles which platform, plus fallback order.

Collaboration Mode

When to Use / When Not to Use

Use: Search trending topics / find Quote raw material / analyze a single post / grab threads / cross-platform search (YouTube / Reddit / GitHub trending) / verify a data point / confirm whether a quote has a real source.

Don't use: Write code / make architecture calls / long reasoning chains (not what it's built for).

Subscription: Bundled into X Premium+ ($16-22/month), not a separate SKU.


4 · Cursor · The Temp Worker (Expires 5/31)

Positioning

Brought on in May as a temporary hire. Starting 5/1, burning through $10K in free credits for background daemons. Expires 5/31. Renewal depends on ROI math.

Role boundaries are tight:

What it can do: batch work with hard acceptance criteria + solid guardrails. Bulk spec doc generation / SOP writing / Python scripts + mocks + README sections, long-form first drafts for merging, skeleton outlines.

Config Essentials

Entry: cursor-agent SDK; launchd runs cursor-sdk-loop (com.leo.cursor-sdk-loop) 24/7.

Work mode: daemon reads ~/Projects/_inventory/cursor-product-backlog.md in reverse for ^- [ ] checkbox tasks, locks one, runs it, marks done as [x-sdk].

Core prompt template (three steps for content-class tasks):

  1. Merge (consolidate multiple sections + hook writing)
  2. Structure (4-6 anchor points)
  3. Inline CSS HTML skeleton (no real images embedded)

Explicitly declare what the daemon can't do in the prompt header—don't let it claim work it never did. I learned this the hard way: daemon would stamp "humanizer pass ✅ + verify checklist done" when it never actually ran either, basically lying. Now it's forbidden.

Hard red lines (baked into prompt):

Collaboration Mode

When to Use / When Not to Use

Use: Bulk specs / SOPs / research docs (C-class); scripts + mocks + README (B-class); long-form skeleton + merge (A-class first draft, but don't expect final polish).

Don't use: Strategy code / financial code / anything needing independent judgment / anything requiring fact-checking / images / final publishing.

Cost: $0 (free trial burn). If renewal happens, likely Pro $20/month or Ultra $200/month.


5 · The Fired Coworker: Why I Canceled Copilot

This section matters more than which tools I kept.

I used GitHub Copilot Pro at $19/month for VS Code ghost text completions + gh copilot suggest in the terminal. After a while, two problems emerged:

Problem one: Scenario overlap with other tools was too high.

Problem two: Occupied an SKU but delivered marginal value.

After I killed it, I didn't miss anything. That itself is the answer: if canceling a tool doesn't hurt, it wasn't pulling its weight in the first place.

One lesson: Don't stack AI tools by feature count. Stack by unique ownership. If two tools have 70% overlap, keep one.


6 · Collaboration Diagram: How 3 Full-Timers + 1 Temp Connect

3+1 AI coworker collaboration map

Notice the Cursor line is dashed—temp worker, expires 5/31. The other three are solid, long-term fixtures.

Three-Axis Division Breakdown

DimensionClaude Code ManagerCodex AuditorGrok Intelligence OfficerCursor Temp Worker
Reasoning Depth★★★★★★★★★★★★★★★
Context Window★★★★★★★★★★★★★★★★
Interaction Latency★★★ (sync review)★★★★★ (headless daemon)
Unique Use CaseMain decisions / Content writing / Skill dispatchStrategy / Finance independent reviewReal-time intel grabBackground batch work (trial)
Monthly Cost$100 (5x) / $200 (20x)$20 (Plus) / $200 (Pro)Bundled in X Premium+ ($16-22)$0 (trial) / $20-200 if renewed
Employment StatusLong-termLong-termLong-termExpires 5/31—renewal TBD

7 · Takeaway: AI Coworkers Are Dynamic, Not Fixed

Three key points:

  1. Three full-timers (Claude / Codex / Grok) lock in long-term because each owns unique territory: main decisions / independent review / real-time intel. Zero overlap.
  2. Temps arrive by opportunity, leave by ROI. Cursor's here because of $10K free credits; 5/31 decides if it stays. Copilot got cut because overlap was too high and unique territory too thin.
  3. Tool count ≠ output. To decide if an AI coworker should stay, ask one question: "If I cancel this, do I hit a wall?" Yes → keep it. No wall → kill it. This rule beats any config guide.

I'm planning to post an update 5/31—either "4 full-timers" if Cursor converts, or "back to 3" if it doesn't. The result comes straight: no hedging.