AI-Powered Code Review: How to Set Up Automated PR Reviews
Automate pull request reviews with AI using Claude, GPT-4, or local models. Catch bugs, enforce standards, and speed up reviews with CI/CD integration.
The Code Review Bottleneck
Code review is essential but expensive. Senior engineers spend 4-8 hours per week reviewing PRs. Junior engineers wait hours or days for feedback. Critical bugs slip through when reviewers are fatigued or rushed.
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><text x="80" y="25" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Input</text><circle cx="80" cy="50" r="14" fill="none" stroke="#3b82f6" stroke-width="2"/><circle cx="80" cy="100" r="14" fill="none" stroke="#3b82f6" stroke-width="2"/><circle cx="80" cy="150" r="14" fill="none" stroke="#3b82f6" stroke-width="2"/><text x="230" y="25" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Hidden</text><circle cx="230" cy="45" r="14" fill="#6366f1" opacity="0.8"/><circle cx="230" cy="85" r="14" fill="#6366f1" opacity="0.8"/><circle cx="230" cy="125" r="14" fill="#6366f1" opacity="0.8"/><circle cx="230" cy="165" r="14" fill="#6366f1" opacity="0.8"/><text x="380" y="25" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Hidden</text><circle cx="380" cy="55" r="14" fill="#a855f7" opacity="0.8"/><circle cx="380" cy="100" r="14" fill="#a855f7" opacity="0.8"/><circle cx="380" cy="145" r="14" fill="#a855f7" opacity="0.8"/><text x="520" y="25" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Output</text><circle cx="520" cy="80" r="14" fill="none" stroke="#2dd4bf" stroke-width="2"/><circle cx="520" cy="130" r="14" fill="none" stroke="#2dd4bf" stroke-width="2"/><line x1="94" y1="50" x2="216" y2="45" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="50" x2="216" y2="85" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="50" x2="216" y2="125" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="50" x2="216" y2="165" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="100" x2="216" y2="45" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="100" x2="216" y2="85" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="100" x2="216" y2="125" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="100" x2="216" y2="165" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="150" x2="216" y2="45" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="150" x2="216" y2="85" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="150" x2="216" y2="125" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="150" x2="216" y2="165" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="45" x2="366" y2="55" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="45" x2="366" y2="100" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="45" x2="366" y2="145" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="85" x2="366" y2="55" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="85" x2="366" y2="100" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="85" x2="366" y2="145" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="125" x2="366" y2="55" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="125" x2="366" y2="100" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="125" x2="366" y2="145" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="165" x2="366" y2="55" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="165" x2="366" y2="100" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="165" x2="366" y2="145" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="55" x2="506" y2="80" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="55" x2="506" y2="130" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="100" x2="506" y2="80" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="100" x2="506" y2="130" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="145" x2="506" y2="80" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="145" x2="506" y2="130" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Neural network architecture: data flows through input, hidden, and output layers.</p></div>
AI-powered code review does not replace human reviewers. It augments them by catching the mechanical issues — style violations, potential bugs, security flaws, performance anti-patterns — so humans can focus on architecture, design, and business logic.
Approaches to AI Code Review
There are three tiers of AI code review, each with different trade-offs:
Tier 1: Cloud API (Easiest)
Use OpenAI, Anthropic, or Google APIs to analyze diffs. Fast to set up, but code leaves your network.
Tier 2: Self-Hosted Model (Private)
Run a code-specialized model like CodeLlama or DeepSeek Coder locally. Code stays internal but requires GPU hardware.
Tier 3: Fine-Tuned Model (Best Quality)
Fine-tune a model on your codebase and past review comments. Best results but requires ML expertise.
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 180" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="180" rx="12" fill="#1a1a2e"/><rect x="30" y="60" width="80" height="50" rx="25" fill="#3b82f6" opacity="0.85"/><text x="70" y="90" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">Prompt</text><rect x="145" y="50" width="90" height="70" rx="8" fill="#6366f1" opacity="0.85"/><text x="190" y="80" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Embed</text><text x="190" y="95" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">[0.2, 0.8...]</text><rect x="270" y="50" width="90" height="70" rx="8" fill="#a855f7" opacity="0.85"/><text x="315" y="75" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Vector</text><text x="315" y="90" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Search</text><text x="315" y="105" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui" opacity="0.7">top-k=5</text><rect x="395" y="50" width="90" height="70" rx="8" fill="#2dd4bf" opacity="0.85"/><text x="440" y="80" text-anchor="middle" fill="#1a1a2e" font-size="11" font-family="system-ui" font-weight="bold">LLM</text><text x="440" y="95" text-anchor="middle" fill="#1a1a2e" font-size="9" font-family="system-ui">+ context</text><rect x="520" y="60" width="55" height="50" rx="25" fill="#f59e0b" opacity="0.85"/><text x="547" y="90" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Reply</text><defs><marker id="arrow4" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><line x1="112" y1="85" x2="143" y2="85" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow4)"/><line x1="237" y1="85" x2="268" y2="85" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow4)"/><line x1="362" y1="85" x2="393" y2="85" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow4)"/><line x1="487" y1="85" x2="518" y2="85" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow4)"/><text x="300" y="155" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Retrieval-Augmented Generation (RAG) Flow</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">RAG architecture: user prompts are embedded, matched against a vector store, then fed to an LLM with retrieved context.</p></div>
Building a GitHub Actions AI Reviewer
Here is a complete GitHub Actions workflow that reviews every PR:
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
ai-review:
runs-on: ubuntu-latest
permissions:
pull-requests: write
contents: read
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get diff
id: diff
run: |
git diff origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff
echo "diff_size=$(wc -c < /tmp/pr.diff)" >> $GITHUB_OUTPUT
- name: AI Review
if: steps.diff.outputs.diff_size < 100000
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
DIFF=$(cat /tmp/pr.diff)
REVIEW=$(curl -s https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d "{
\"model\": \"claude-sonnet-4-20250514\",
\"max_tokens\": 4096,
\"messages\": [{
\"role\": \"user\",
\"content\": \"Review this code diff. Focus on bugs, security issues, performance problems, and style. Be concise. Format as markdown.\n\nDiff:\n$DIFF\"
}]
}" | jq -r '.content[0].text')
gh pr comment ${{ github.event.number }} \
--body "## AI Code Review\n\n$REVIEW\n\n---\n*Automated review by AI. Human review still required.*"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}Self-Hosted Alternative with Ollama
For teams that cannot send code to external APIs, use Ollama in your CI pipeline:
# In your Gitea Actions or self-hosted runner
- name: AI Review (Self-Hosted)
run: |
DIFF=$(git diff origin/main...HEAD)
REVIEW=$(curl -s http://ollama.internal:11434/api/generate \
-d "{
\"model\": \"deepseek-coder:6.7b\",
\"prompt\": \"Review this diff for bugs and issues:\n$DIFF\",
\"stream\": false
}" | jq -r '.response')
echo "$REVIEW" > review.mdThis is exactly how we handle it at TechSaaS — our Gitea Actions runner calls a local Ollama instance, keeping all code on our infrastructure.
What Good AI Reviews Catch
Based on our experience running AI reviews on hundreds of PRs:
Consistently catches:
Sometimes catches:
Rarely catches:
Reducing False Positives
AI reviewers are noisy by default. Reduce false positives with:
1. Custom system prompts: Tell the model your tech stack, conventions, and what to ignore 2. File filtering: Skip generated files, lock files, and vendor directories 3. Severity levels: Only comment on medium+ severity issues 4. Diff size limits: Skip massive PRs (AI context windows struggle with 5000+ line diffs)
# Example: Filter files before sending to AI
EXCLUDE_PATTERNS = [
"*.lock", "*.min.js", "*.generated.*",
"vendor/*", "node_modules/*", "__pycache__/*",
"*.pb.go", "*.swagger.json"
]Measuring Impact
Track these metrics before and after enabling AI review:
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 160" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="160" rx="12" fill="#1a1a2e"/><rect x="20" y="40" width="80" height="60" rx="6" fill="#3b82f6" opacity="0.85"/><text x="60" y="65" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Raw</text><text x="60" y="80" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Data</text><rect x="125" y="40" width="80" height="60" rx="6" fill="#6366f1" opacity="0.85"/><text x="165" y="65" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Pre-</text><text x="165" y="80" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">process</text><rect x="230" y="40" width="80" height="60" rx="6" fill="#a855f7" opacity="0.85"/><text x="270" y="65" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Train</text><text x="270" y="80" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Model</text><rect x="335" y="40" width="80" height="60" rx="6" fill="#2dd4bf" opacity="0.85"/><text x="375" y="65" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Evaluate</text><text x="375" y="80" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Metrics</text><rect x="440" y="40" width="80" height="60" rx="6" fill="#f59e0b" opacity="0.85"/><text x="480" y="65" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Deploy</text><text x="480" y="80" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Model</text><rect x="545" y="40" width="40" height="60" rx="6" fill="#6366f1" opacity="0.6"/><text x="565" y="75" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Mon</text><defs><marker id="arrow3" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><line x1="102" y1="70" x2="123" y2="70" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow3)"/><line x1="207" y1="70" x2="228" y2="70" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow3)"/><line x1="312" y1="70" x2="333" y2="70" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow3)"/><line x1="417" y1="70" x2="438" y2="70" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow3)"/><line x1="522" y1="70" x2="543" y2="70" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow3)"/><path d="M375,102 L375,130 L270,130 L270,102" stroke="#f59e0b" stroke-width="1" stroke-dasharray="4,3" fill="none" marker-end="url(#arrow3b)"/><defs><marker id="arrow3b" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto-start-reverse"><path d="M0,0 L8,3 L0,6" fill="#f59e0b"/></marker></defs><text x="322" y="143" text-anchor="middle" fill="#f59e0b" font-size="9" font-family="system-ui">retrain loop</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">ML pipeline: from raw data collection through training, evaluation, deployment, and continuous monitoring.</p></div>
The Human-AI Review Workflow
The ideal workflow combines both:
1. PR is opened 2. AI reviews immediately (< 2 minutes) 3. Developer addresses AI feedback 4. Human reviewer focuses on design and logic 5. Approval and merge
This cuts total review cycle time by 40-60% while improving quality. At TechSaaS, we set up this pipeline as part of our DevOps consulting. Reach out at [email protected].
Need help with ai & machine learning?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.