I thought I'd document how I'm currently using AI tools, both because it's changed a lot in the past month and also in the hope that someone can look at this, be horrified, and tell me all the things I'm doing that could be better.

For context, I work as a researcher producing technical research memos and papers. Most projects involve sustained desk research, usually calculations and data analysis in Python, turned into memos or external outputs. My previous posts on far UVC and whole brain emulation give a sense of the kind of work.

It's hard to estimate the speed-up from AI exactly, but I'd guess these projects would have taken several times longer by hand: partly more background reading, partly slower calculations and coding, partly slower writing. (Note that total productivity boost is smaller; in the past I would not have produced such detailed outputs, and did fewer tasks requiring a lot of coding.)

Basic workflow

My core aim is to keep my attention focused on the most important task at hand while Claude agents work in parallel. I have a single Claude Code session I talk to—with WisprFlow, since dictation is faster than typing—and that main agent spawns sub-agents that run autonomously. These sub-agents perform tasks like doing research, writing code, creating drafts, reviewing comments, or red-teaming work. They produce outputs like research notes, Jupyter notebooks, or short drafts that I or the main Claude Code session can use. The ultimate product is usually a research report, which only I or the main Claude edit.

By having a single main Claude that I talk to, I can immediately get an overview as to what's going on. I can quickly spin up tasks and kill them if they're no longer necessary without baby-sitting multiple Claudes. The main Claude helps keep me focused on the most important tasks, like planning or writing, while aggressively delegating to sub-agents so that it remains responsive.

The underlying principle is that my attention is expensive but Claude time is cheap. The aim is thus to minimize the friction and attention required to create sub-agents that can pull on threads and perform background tasks without requiring me to babysit them. It's fine if some of these agents get confused or produce something that I never read, as long as they don't take up too much of my time.

I rarely use planning mode or spend long setting Claude up, instead working iteratively to solve problems as they arise. Generally I've found the planning takes too long — by which point I'm either wasting my time waiting or have become distracted — and the plan often has too many subtle issues to be executed well anyway. Sometimes Claude keeps failing at a task and it takes me a few iterations to step back and help it. Still, iterating quickly beats planning more; it is better to let Claude fail a couple of times cheaply than spend my attention up front on a plan that probably won't survive contact with the work.

My setup

Each project lives in its own folder, which is a git repo pushed to GitHub. Contents include:

  • memo.md — the primary document.
  • CLAUDE.md — project-level style guide and standing instructions.
  • TODO.md — state between sessions: what's done, what's in progress, open decisions I haven't made.
  • drafts/ — in-progress text not yet in the main doc. Sub-agents write here.
  • research/ — background notes, fact-checks, number verifications. Sub-agents also write here.
  • code/ — Python modules and Jupyter notebooks. Hard calculation lives in modules; notebooks call into them and are where I interact with data, produce plots, and see what's going on.

I run both the Claude Code desktop app and the terminal client, the latter primarily for fast mode. Sub-agents write markdown files which I view in Obsidian, and Python scripts and notebooks that I view using Jupyter.

A couple of scripts move content between Markdown and Google Docs: one converts a Markdown file into a Google Doc; another pulls comments off a downloaded Google Doc and walks me through them.

On top of the project CLAUDE.md, I have a global config that enforces the workflow above across all projects, plus globally defined sub-agents for common research and drafting tasks. There's also a hook that runs a slop-check sub-agent on drafting tasks.

Claude's performance deteriorates once context gets too long, so I periodically retire the main Claude to avoid this. Before retirement I have it write up the TODO list, then spin up a new main Claude with that as its starting point. I keep the old one alive for a bit in case the new Claude needs anything more, and so any sub-agents still running can wrap up.

Issues to be aware of

Foreground sub-agents kill parallelism. Claude can spawn sub-agents in foreground or background mode. Foreground blocks the main Claude until the sub-agent returns; background runs in parallel. Claude defaults to foreground, so I've told it to always use background.

File collisions between parallel sub-agents. If two sub-agents edit the same file concurrently they get confused. Make sure sub-agents are working on independent files.

Claude has no sense of time. It's funny when Claude claims some project extension would take 3 days of work and so probably isn't worth it, then completes the task in 5 minutes. More problematic are scripts that take a long time to run. Claude will happily sit there running a script for 20 minutes, make a minor change, run it for 20 minutes, and repeat, not bothered at all that hours of calendar time are being wasted making minor improvements.

Claude is not consistently candid. By default Claude is sycophantic: it'll guess what you want to hear, fit evidence to a conclusion it has decided you want, and second-guess itself when pushed. Very little context is needed for it to form a view of what I want, and it isn't truth-seeking by default. The sub-agent architecture helps a bit; spawning a cold sub-agent with a red-teaming brief gets you something closer to honest pushback.

Beyond sycophancy, Claude will sometimes just be weird — fabricate a citation, make up new numbers rather than using available datasets, claim to have read papers that it hadn’t, try to solve simple problems with increasingly elaborate bash commands, insist on “shipping” things that are broken, and otherwise make mistakes that no human would ever make. I basically agree with Ryan Greenblatt's post on this and have seen many of the pathological behaviors he describes.

Other tips

  • Dump context generously. Paste in whole documents, PDFs, prior drafts. Sometimes I give it directly to the main Claude; sometimes I make it available for sub-agents to consult.
  • WisprFlow + ramble. I don't bother pre-composing. Talk for 30 seconds, throw the mess at Claude, and let it clean up. You can include meta-commentary or high-level thoughts or additional context in the ramble, I find that helps Claude produce better outputs.
  • Generate lots of outputs and archive them. Drafts, research notes, scratch notebooks—let them accumulate, then archive periodically. I find it’s generally easier for sub-agents to create new products rather than editing old ones.
  • Keep sub-agent outputs short. A couple of thousand tokens at most. Long docs are harder for Claude to read back. I have sub-agents produce a TL;DR at the top of any output so the main Claude (or downstream sub-agents) can orient quickly, and include a date / timestamp.
  • Use scripts to verify numbers. Write Python scripts that do all the key calculations in a research report, so that numbers can easily be double-checked or updated. I’ve also tried using Jupyter notebooks for this but agents are less good at reading them, so notebooks are most valuable if you want something human intelligible rather than merely for double-checking.

Reflections

My relationship to research output has shifted; it's often closer to managing someone else's work than producing my own. Claude handles bounded tasks well — coding, data wrangling, literature search — but the human still has to provide taste, conceptual structure, and the big picture, and check the work. What I spend my time on now is judgement and conceptual problems, rather than the routine work of trawling Google Scholar or grinding through derivations.

I find my new workflow more exhausting than the old one. Back-to-back judgement calls and rapid context switches tire me out fast. It can be hyperstimulating and addictive in the way a video game is, like that feeling you get when you're clicking the next-turn button in Civilization at 1am despite the pleasure having left you hours ago.

How I currently use AIs for research