Anyone else using AI agents for automated code review? : r/webdev

347

Posted by r/webdev • u/frontendDevChloe • 6 hours ago 2

Discussion

Anyone else using AI agents for automated code review? What's your setup?

Been experimenting with hooking up an AI agent to our GitHub repo so it can read PRs and post review comments automatically. It works surprisingly well for catching obvious issues like missing error handling and off-by-ones.

Curious what system prompts and tool setups others are using — seems like there's a huge range of approaches and the difference in output quality is massive.

Comment as frontendDevChloe

Sort by: Best Top New Controversial

289

R react_ryan 289 points 5 hours ago 1

We use a Claude-based agent hooked into GitHub Actions. The workflow triggers on every PR and gives the agent access to three tools: read_file, list_changed_files, and post_review_comment. The system prompt is kept minimal — basically just tells it what repo style guide to enforce.

One thing I'd strongly recommend: constrain the tool definitions tightly. If you give it a post_review_comment with no limits on what it can say, you'll get some very... creative feedback. 😅

112

f frontendDevChloe OP 112 points 5 hours ago

That matches what I'm seeing — the tool definitions are doing a lot of the heavy lifting. How long is your system prompt roughly? Trying to figure out if more detailed instructions actually help or just increase token usage for diminishing returns.

R react_ryan 78 points 4 hours ago

Ours is about 400 tokens — covers the code style guide rules, severity levels (nit / warning / blocking), and a note not to comment on whitespace. Anything longer and the model starts getting inconsistent. I think the tool descriptions actually carry more weight than the system prompt in practice.

163

T typescript_tina 163 points 5 hours ago

We tried three different prompting strategies over about two months. The biggest improvement came from switching to a chain-of-thought style where we ask the model to first list all the changed functions, then evaluate each one before deciding whether to comment. Reduced false positives by almost 40%.

Also: if anyone is storing their system prompt in a config file checked into the repo — stop. Keep it in secrets management. Had a junior dev accidentally expose ours in a public fork last quarter. Nothing catastrophic, but not great.

a api_tools_expert 97 points 4 hours ago

{fill}

f frontendDevChloe OP 34 points 3 hours ago

That's actually a really useful angle — hadn't considered that approach. Does the output format vary much between runs?

c css_wizard99 58 points 3 hours ago

We actually went the opposite direction — minimal tooling, just let the LLM read the diff as plain text and return a Markdown review. No function calling at all. It's less structured but way faster to set up, and for a small team it's good enough.

The big limitation is you can't do anything automated with the output, but if you just want a second set of eyes on PRs it does the job.

d devops_dino 41 points 2 hours ago

One thing no one mentions: latency. Our PR review agent takes 15–25 seconds per run which is fine, but if you're triggering on every push you'll burn through API credits fast. We ended up gating it behind a /ai-review comment command so it only runs on demand.