hire ai developer
The filter I use: scoping, skills checklist, sourcing, 5 interview questions with real answers, 2026 rate bands, and the 3 red flags that have burned every client.
TL;DR — Most "AI developers" on the market right now are prompt engineers with a job title upgrade. The actual hires that ship are builders who can write the un-sexy plumbing: retries, idempotency, rate limits, eval harnesses, observability. This post is the filter I use — scoping, skills, sourcing, interview questions with real answers, 2026 rate bands, and the three red flags that have burned every client I've rescued.
I've shipped production AI systems for solo founders, funded startups, and enterprises — AppHandoff, MCP Beast, a pile of custom LLM pipelines. I've also been called in to fix hires that didn't work out. The pattern is the same every time: the team hired someone who could prompt, not someone who could build. This post is the playbook that keeps you out of that category.
| Situation | Hire | Budget signal |
|---|---|---|
| You have a bottleneck and one clear use case (support, docs, intake) | Freelance AI engineer, 30-day trial | $150–$300/hr |
| You raised seed, need a founding AI engineer | Full-time senior, generalist | $180k–$240k + equity |
| You're pre-PMF and just need to validate | Part-time freelance, 10–20 hrs/wk | $5k–$15k/mo |
| You have proprietary data + unique domain | Full-time senior + part-time researcher | $250k+ combined |
| You want "AI features" with no idea what | Do not hire yet. Scope first. | $0 — go read posts below |
See freelance vs agency for the engagement-model breakdown and AI developer for startups for stage-by-stage staffing.
Every failed AI hire I've diagnosed started the same way: the company opened a job posting before writing down what problem the hire would solve. You'll waste $200k and six months if you skip this.
Write this down, in order, before posting a job:
If you can't fill in those four lines, you're hiring for a vibe, not a job. Scope first.
A prompt engineer writes instructions. An AI engineer builds systems that keep working when the model misbehaves. The difference shows up in the same five places every time:
duration_ms and error_type per call, they're flying blind.If none of that comes up naturally in the interview, you're talking to a prompt engineer.
Ship-ready AI engineers in 2026 have:
Every item on this list should show up as a specific story in the interview. If they're vague on three or more, pass.
Not LinkedIn job board. You'll drown in resume spam from people who took a Coursera class last month. Sources that actually work in 2026:
Skip: generalist recruiters ("we have AI developers!" — they don't), bootcamp grad pipelines (the curriculum is 18 months behind), and anyone whose LinkedIn changed from "blockchain consultant" to "AI consultant" in 2023.
Ask these verbatim. I've used them on dozens of candidates; the good ones light up, the fakes stall within 30 seconds.
1. "A user says your AI gave them wrong information. Walk me through how you'd debug it." Weak answer: "I'd rewrite the prompt." Strong answer: "Pull the trace — the exact tokens in and out. Check retrieval: did we fetch the right context? Check the eval suite: does this case match a known regression? Then decide if it's a prompt problem, a retrieval problem, or a model problem — they need different fixes."
2. "How do you stop hallucinations in a customer-facing bot?" Weak: "Better prompts." Strong: "You don't stop them — you constrain and detect. Constrain: RAG with citation-required prompting, function calling for anything structured, refuse-to-answer on low-confidence retrieval. Detect: eval set of known-hard questions run on every deploy, plus logging every 'I don't know' so we learn the coverage gaps."
3. "Walk me through how you'd structure rate-limit errors from OpenAI so the front end can recover gracefully."
Weak: "Just retry."
Strong: Talks about exponential backoff, surfacing retry_after_ms to the client, queueing at the server, and — ideally — mentions structured error payloads (this is what MCP got right; see MCP vs API).
4. "Explain fine-tuning vs RAG like I'm your CEO." Weak: Uses the words "parameters" and "embeddings" in the first sentence. Strong: "RAG is giving the model an open book at test time. Fine-tuning is teaching it the book once, permanently. RAG is cheap and updates instantly. Fine-tuning is expensive but makes the model faster and more consistent on narrow tasks. Most companies need RAG first and never need fine-tuning."
5. "Show me something you shipped that's running right now." Weak candidates show you a Loom of a demo. Strong candidates give you a URL or a repo and walk you through a real trace.
| Role | Full-time base | Freelance hourly | Reality check |
|---|---|---|---|
| Junior (1–2 yrs, API integrator) | $130k–$160k | $75–$125 | Can wire GPT to your app; will break on production edge cases |
| Senior (3–5 yrs, system builder) | $180k–$240k | $150–$250 | Ships evals, handles failure modes, owns a feature end-to-end |
| Lead / Staff (5+ yrs, architect) | $250k+ (often $300k+ with equity) | $250–$400 | Designs the platform; you hire one, not four |
| Specialist (researcher, eval infra, novel modality) | $300k+ | $400+ | Only hire when the senior hits a ceiling |
Same numbers appear in AI developer for startups and freelance vs agency — if you're comparing posts, they match on purpose.
The trap: paying senior rates for junior output because the candidate used the word "transformer" five times. Filter on shipped work, not vocabulary.
I've rescued enough of these to know the pattern:
If you see two of three, pivot within the 30-day trial. Don't sunk-cost-fallacy into a $200k hire.
Before the full-time offer, run a paid trial. Scope looks like this:
If weeks 1–2 slip more than 3 days, you're seeing the red flags early. That's the point of the trial.
I run this filter in practice. If you want help scoping, reviewing candidates, or just shipping the first version yourself with a builder who's done it before — describe what you're building and I'll tell you what the first 30 days should look like.