general

How to Hire an AI Developer: the Honest 2026 Guide

A complete decision framework for hiring an AI developer in 2026 — engagement models, interview questions, red flags, portfolio review, and rate expectations.

Ralph DuinApril 24, 2026 · 17 min read

shareX LI

How to Hire an AI Developer: the Honest 2026 Guide

Hiring the right AI developer in 2026 is a decision that depends far more on your situation than on any single candidate's CV. This guide gives you a complete framework — from working out whether you need an AI developer at all, through evaluating candidates, to rate expectations and engagement models — so you can make that decision with clarity rather than guesswork.

TL;DR: how to hire an AI developer

An AI developer is a software engineer who designs, builds, and ships production applications that use large language models, diffusion models, or similar AI systems as core components — not as a bolt-on feature, but as the primary delivery mechanism. In 2026 the practical process looks like this: first, confirm you actually need AI-specific skills rather than a capable generalist (see the decision framework below); second, choose the right engagement model for your stage and budget (full-time, freelance, agency, or fractional); third, source candidates through the channels where real AI practitioners are active, not just general job boards; fourth, filter with concrete technical questions that separate production experience from demo experience; and fifth, verify with a paid, time-boxed technical task before committing to a longer engagement.

What "AI developer" means in 2026

The label has been applied loosely since GPT-3 landed in developer hands, but by 2026 a working definition has emerged from the people actually hiring and being hired in this space. An AI developer is a software engineer whose core skill is integrating AI models into production systems — handling prompt engineering, context management, retrieval-augmented generation (RAG), output validation, cost control, observability, and the infrastructure that makes AI features reliable at scale.

That definition intentionally excludes three adjacent roles that get conflated with it.

An ML engineer trains, fine-tunes, and evaluates models. They work in PyTorch or JAX, manage GPU clusters, and optimise loss functions. Most product companies in 2026 do not need to train their own models — they consume foundation model APIs. Hiring an ML engineer to build a GPT-powered feature is like hiring a mechanical engineer to fit a pre-built engine. The skills barely overlap.

An AI researcher publishes papers, advances the state of the art, and works in academic or frontier-lab settings. Occasionally the right hire for a company doing genuinely novel model architecture work. Almost never the right hire for a startup building a product on top of existing models.

A prompt engineer writes and iterates on prompts, typically without writing production code. A legitimate and useful skill, but one that lives inside product and content roles, not engineering. If someone's primary credential is "I write great prompts", they are not an AI developer.

The AI developer you are trying to hire does all of the following: writes production-grade TypeScript, Python, or Go; integrates LLM APIs with proper error handling, retries, and fallbacks; builds or extends retrieval systems and embeddings pipelines; instruments AI features for cost and latency monitoring; and ships features that stay within a defined operating budget. If you want the longer picture of what that looks like in a startup context, the post on AI developers for startups covers the domain in more detail.

Decision framework: do you actually need an AI developer?

Before opening a job description, ask yourself what problem you are trying to solve. In practice, three scenarios exist where a traditional software developer — with no AI specialisation at all — is the better hire.

Scenario 1: AI is a thin layer on an otherwise standard product. If you are adding a chatbot widget to a customer support tool, a "summarise this document" button to an existing app, or a basic semantic search bar to a content site, you probably do not need an AI specialist. These integrations are well-documented, the libraries are mature, and a competent full-stack developer can implement them in a few days by following API documentation. Paying AI developer rates for this work is over-engineering the hire.

Scenario 2: The real bottleneck is data engineering or traditional software quality. Many founders believe they need AI capabilities when what they actually need is a well-structured database, reliable background processing, or a properly designed API. If your core workflow is messy, adding AI to it makes the mess faster, not better. A traditional developer who cleans up the foundation is usually the higher-ROI hire at this stage.

Scenario 3: You need a proof of concept, not production engineering. Plenty of good no-code and low-code AI tooling exists in 2026 for validating whether AI actually solves a user problem. If you have not yet proved that the AI feature is something users will pay for, spend £500 on a prototype rather than £15,000 on an engineer. Hire the AI developer after the experiment succeeds, not before.

If none of those apply — if AI is central to your product's value proposition, if you are processing documents or conversations at volume, if cost control and reliability are critical — then an AI developer is genuinely the right hire. Continue with the framework below.

The four engagement models

There is no universally correct way to engage an AI developer. The right model depends on your stage, budget, technical leadership capacity, and how central AI is to your long-term product roadmap.

Model	Pros	Cons	Cost signal (2026)	When to use
Full-time	Deep product context; long-term ownership; builds institutional knowledge	High fixed cost; difficult to hire well; wrong hire is expensive to unwind	£80,000–£160,000/yr base (UK); $120,000–$220,000 (US); senior AI engineers command top of range	AI is core to the product and you have 12+ months of engineering work; Series A and beyond
Freelance	Fast to engage; specialised skills without overhead; flexible scope	Higher day rate; knowledge leaves with them; availability risk on long projects	£600–£1,400/day (UK); $900–$2,000/day (US); top-tier specialists bill at the high end	Defined, time-boxed project; existing technical team that just needs AI expertise; pre-funding
Agency	Full team from day one; coordinated delivery across frontend, backend, AI layer; accountability	Higher total cost; less flexibility to pivot; agency priorities may not match yours	£800–£1,800/day equivalent; full AI product builds from £30,000; complex integrations from £60,000	You need an end-to-end product built and maintained; limited internal technical capacity; Series A with a clear scope
Fractional	Strategic AI leadership without full-time cost; helps hire and manage junior engineers; architecture decisions made well	Limited execution bandwidth; not a substitute for hands-on delivery capacity	£3,000–£8,000/month for 1–2 days per week; depends heavily on seniority	Pre-seed or seed stage; need AI architectural direction without full-time CTO; evaluating whether to build or buy

For a deeper look at how the freelance and agency options compare in practice, the post on freelance AI developer vs agency works through the tradeoffs with real scenarios. If your need is more strategic than delivery-focused, a fractional CTO may be the more precise solution.

Where to actually find AI developers in 2026

The gap between where people post AI developer jobs and where AI developers actually are is wide. Here are the channels that produce real signal.

GitHub. The most reliable quality signal in software is shipped code. Search GitHub for repositories that implement RAG pipelines, OpenAI function calling, LangChain agents, or vector store integrations that have real commit history, tests, and documentation. Developers who maintain these repos are practitioners. Star count is not the signal; code quality and commit consistency are. A developer whose last repo was three years ago and has 14 stars is not the same as one with an actively maintained library that other engineers depend on.

AppHandoff. An increasingly useful source for vetted AI developers who have shipped production work. AppHandoff is specifically oriented around AI-augmented development, which means the practitioners listed there have real delivery context rather than just credentials.

Specific subreddits. The r/LocalLLaMA, r/MachineLearning (for applied posts), and r/SideProject communities surface practitioners building real things. The AI-adjacent posts in r/webdev and r/programming are also worth monitoring. These communities are useful for both finding people directly and asking for referrals from people who know the space.

Upwork with filters. Upwork is not reliable at the default search level, but the Top Rated Plus filter combined with specific skills (LangChain, OpenAI, RAG, vector databases, prompt engineering in a coding context) and a minimum earnings threshold substantially narrows the pool to people who have delivered real work. Review each portfolio item critically: look for shipped URLs, not mockups.

LinkedIn with boolean search. A search string like ("AI developer" OR "LLM engineer" OR "AI engineer") AND (OpenAI OR LangChain OR RAG OR "vector database") NOT (researcher OR PhD OR "machine learning engineer") filtered to your geography and connection degree surfaces a more relevant pool than any job board. Second-degree connections are particularly valuable — warm introductions convert at dramatically higher rates than cold outreach.

Referrals from your technical network. The single most reliable channel. If you know any engineers who have built AI features in production, ask them who they would hire or work alongside. The AI developer community is smaller than the broader developer market, and reputation travels fast within it. One good referral is worth 50 inbound applications.

Specialist communities and events. London AI meetups, the Latent Space community, Weights & Biases Discord, and similar spaces are where practitioners gather to compare notes. Attending or sponsoring these events — or simply being present in their online spaces — puts you in front of people who are actively building AI systems.

The 7-question interview

These questions are designed to distinguish AI developers who have shipped production systems from those who have run demos or completed tutorials. The table below also shows which questions matter most by role type.

"Walk me through how you would add a rate limit to an OpenAI-backed endpoint." A real answer covers: token bucket or sliding window at the API layer, handling 429 responses from OpenAI with exponential backoff, per-user or per-organisation limits in your own system, and cost implications. A weak answer describes adding a simple request counter without thinking about distributed state, retries, or cost exposure.
"What is your approach to prompt regression testing?" A real answer describes some form of input/output snapshot testing, evaluation datasets, LLM-as-judge patterns, or structured comparison of prompt versions before deploying to production. A weak answer is "I test it manually" or "I check it looks right in the playground."
"How do you control costs on an AI feature that is embedded in a user workflow?" Real answer: prompt caching where the provider supports it, request deduplication, tiered model routing (cheaper model for simple tasks, expensive model for complex ones), hard per-user spending caps enforced server-side, and spend alerts. Weak answer: "I try to keep prompts short."
"Describe a time an AI feature you shipped behaved unexpectedly in production. What happened and what did you change?" This is a production-experience filter. Anyone who has shipped AI at scale has a story here. Vague or hypothetical answers suggest demo-level experience. Specific answers — with a real root cause, a real fix, and something learned — suggest genuine production exposure.
"How do you handle a situation where the LLM output needs to be structured and validated before use?" Real answer: output parsers with explicit schemas, retries with corrective prompts on parse failure, fallback to a default or human escalation path, and logging of all failures for pattern analysis. Weak answer: "I use JSON mode" without any mention of what happens when JSON mode still produces invalid output.
"What observability does your AI layer need that a standard API does not?" Real answer: token consumption per request (input and output separately), model version pinned and logged, latency at each pipeline stage (retrieval, generation, post-processing), failure mode classification (provider error vs. validation failure vs. timeout), and cost attribution by user or feature. Weak answer: "Logging the response."
"Show me the most technically interesting AI system you have shipped. What is the URL?" The only acceptable answer includes a live URL or a credible explanation of why it is not public (enterprise client, internal tool, under NDA with a named client). Screenshots and demos of things that were never deployed are not a substitute for shipped work.

Question	Full-time hire	Freelance specialist	Agency evaluation	Fractional
Rate limiting an AI endpoint	Essential	Essential	Ask the lead engineer	Useful
Prompt regression testing	Essential	Essential	Ask the lead engineer	Essential
Cost control in user workflows	Essential	Essential	Ask account lead	Essential
Production failure story	Essential	Essential	Ask the lead engineer	Essential
Structured output handling	Essential	Essential	Ask the lead engineer	Useful
AI observability requirements	Essential	Useful	Ask account lead	Essential
Shipped URL	Essential	Essential	Ask for client references	Essential

Red flags in the first call

These are patterns that appear in the first conversation with a weak candidate. They are worth knowing in advance because they often present as enthusiasm or confidence rather than obvious weakness.

1. Vibes-based architecture. Listen for answers to technical questions that use language like "it felt right", "I try to keep it simple", or "I just go with what works" without any structural reasoning behind those choices. Architecture decisions in AI systems — model selection, context window management, chunking strategy, latency budgeting — have real tradeoffs that a practitioner will articulate explicitly. Someone who cannot explain why they made a decision cannot be trusted to make the next one well.

2. No production shipping evidence. If you ask for live examples and the answer is screenshots, a Loom walkthrough, a staging environment, or "it was a client project and I cannot share it" without a named client and reference contact — treat this as a yellow flag that becomes a red flag if it is the pattern across all examples rather than a single exception.

3. No testing plan for AI outputs. AI features fail in ways that are different from traditional software: they fail silently, probabilistically, and inconsistently. A developer who does not have a clear view of how they would catch and surface those failures before they reach users does not understand the risk profile of what they are building.

4. Over-indexed on one model vendor. "I only use OpenAI" or "I only use Claude" is not a problem per se — specialisation is fine. The problem is when the candidate shows no awareness of why they might need to change, how they would handle a vendor outage, or what the switching cost of their current architecture would be. The AI infrastructure landscape in 2026 is still volatile; a developer who has not thought about portability is building you into a corner.

5. Vague on cost control. AI APIs have variable per-token costs, and those costs can compound surprisingly quickly in user-facing products. A developer who waves away cost questions — "it should not be expensive", "we can deal with that later", "it is only a few pence per call" — has not shipped an AI feature at meaningful scale. Ask them what the monthly API cost would be at 1,000 active users. A practitioner can estimate this. Someone who has not thought about it cannot.

What to ask for in their portfolio

Portfolio review is where many hiring managers are misled. Here is what to insist on and what to discount.

Insist on shipped URLs. Not staging environments, not Vercel preview links, not GitHub repos of side projects that never launched. A live product in production — something with real users, real traffic, and something that would break if the engineer stopped maintaining it — is the only evidence of genuine end-to-end delivery. The question is simple: "Can I visit this in a browser right now?"

Ask about observability. Walk through what monitoring exists on an AI feature they have shipped. If they cannot tell you what their token consumption was last month, what their p95 latency is on the generation step, or how they are alerted when their error rate spikes, they are not operating a production system — they are running a demo on real infrastructure.

Look at auth architecture. How does the product handle authentication? Are JWTs validated server-side? Are AI endpoints protected against unauthenticated access? Is there rate limiting per authenticated user? Weak auth in AI systems is particularly costly because token consumption can be exploited by anonymous traffic. A developer who built something real will have thought about this.

Discount AI demos. A demo is a controlled environment with scripted inputs. It tells you that the developer can make the model produce the right output when they are in control of everything. It tells you nothing about how the system behaves when a real user asks something unexpected, submits a file that breaks the parser, or hits the endpoint at 3 AM when your context cache is cold. Be polite about the demo, then ask what happens when it goes wrong.

Ask about the cost and hosting stack. How is the app deployed? What does it cost to run per month? How does that scale with usage? A developer who has shipped a production AI system can answer these questions precisely. Someone who built a prototype can only answer them approximately, if at all.

2026 rate expectations

AI developer rates in 2026 sit meaningfully above general software developer rates — typically 30 to 60 per cent higher at equivalent seniority levels — reflecting genuine scarcity of production-experienced practitioners. In the UK, freelance day rates for mid-level AI developers run from £600 to £900 per day; senior practitioners with strong RAG, agentic systems, and production reliability experience bill at £1,000 to £1,400 per day and sometimes beyond on specialist projects. Full-time salaries range from around £80,000 for a capable mid-level hire to £140,000 or more for someone who can own AI architecture at a scaling company. US rates are broadly 30 to 40 per cent higher in comparable markets. Expect to pay at the top of the range for candidates who can show live systems, strong observability practice, and experience managing AI costs in production. For a full breakdown of cost scenarios by project type and engagement model, see the companion post on how much it costs to hire an AI developer.

FAQs

What does an AI developer do?

An AI developer designs and builds software systems that use AI models — primarily large language models — as a core functional component. In practice that means writing the application code that calls model APIs, building retrieval pipelines that give the model relevant context, engineering the prompts and output validation logic that make the feature reliable, and instrumenting everything so that cost, latency, and failure rates are visible and controllable. They are full-stack software engineers with a specialisation in integrating AI systems into production products, not researchers or data scientists.

How do I hire the right AI developer?

Filter on production evidence, not credentials or enthusiasm. Ask for live URLs of systems they have shipped, ask how they handle prompt regression testing and cost control, and ask them to describe a real production failure and how they resolved it. Candidates who have actually shipped AI systems at scale can answer all of these questions specifically and comfortably. Candidates who have not will give vague, hedged, or theoretical answers. Run a paid technical task — ideally a real, bounded problem from your own codebase — before making a full commitment. The guide above covers the full evaluation framework in detail.

Are AI developers expensive?

Yes, relative to general software developers. Day rates for experienced freelance AI developers run from £600 to £1,400 in the UK (more in US markets), and full-time salaries for strong candidates typically start around £80,000 in London. That premium exists because production-experienced AI developers are genuinely scarce — most engineers who call themselves AI developers have not shipped AI features at scale. The right question is not whether AI developers are expensive in absolute terms, but whether the problem they are solving justifies the rate. For the full cost analysis, see the post on AI developer costs.

Full-time or freelance AI developer?

Full-time makes sense when AI is central to your long-term product and you have sustained engineering work — typically Series A and beyond. Freelance makes sense when you have a defined project scope, an existing technical team that just needs AI expertise, or a budget that does not yet support a full-time hire. Fractional is increasingly common at seed stage for companies that need architectural direction without hands-on delivery capacity. The engagement models table above covers the full tradeoff in one place. There is also a dedicated analysis in the post on freelance AI developer vs agency if you are deciding between those two routes specifically.

Can I hire an AI developer without being technical myself?

Yes, but you need to compensate for not being able to evaluate the technical work yourself. Three approaches help: first, engage a fractional CTO to run the hiring process and review the technical output on your behalf; second, use the interview questions and red flags in this guide to filter out the obviously weak candidates before they reach the technical evaluation stage; third, insist on a paid technical task before committing, and get independent technical feedback on the output. Non-technical founders who hire AI developers well are typically those who are rigorous about evidence — shipped URLs, live systems, real references — rather than relying on how a candidate presents in conversation.

How we work

We are a small, senior team that builds and ships AI-powered products for startups and scale-ups — end to end, from architecture decisions through to production deployment and ongoing maintenance. We have shipped RAG pipelines, document processing systems, LLM-backed API layers, and AI features embedded in user-facing products. If you are at the stage where this guide has given you a clearer sense of what you need, and you would like to talk about whether we are a good fit for it, the place to start is our hire an AI developer page. If you are still working out whether AI development or a broader technical advisory is the right engagement, the consultancy page covers that. For the complementary perspective on this topic — focused on evaluating specific candidate types and interview patterns — the post on how to hire an AI developer covers that angle in depth.

▢ end of post

shareX LinkedIn