How We'd Hire an AI Engineer in 2026
What 'AI engineer' actually means, what to test for in interviews, what to pay, and the red flags that distinguish real engineers from prompt-tinkerers.
"AI engineer" is the most-misused job title of 2026. It covers people doing fundamentally different work: prompt engineers writing system prompts for chatbots, ML engineers fine-tuning models on GPUs, full-stack engineers integrating LLM APIs into production products, research engineers training new models from scratch. Hiring well requires being explicit about which of these you actually need.
This piece is the working playbook we'd use to hire an AI engineer in 2026. It's based on real hiring at small-and-medium teams, with the assumption that you're not hiring at FAANG-scale and you can't pay $1M packages.
What "AI engineer" actually means
Five distinct roles get conflated:
Prompt engineer
Writes prompts and evaluates LLM outputs. Doesn't typically write production code. Best at iterating on prompt patterns and evaluating subtle quality differences.
Compensation: $80K-150K typically. Sometimes part of a content team rather than engineering.
When to hire: if your AI work is mostly tuning a chatbot's system prompt or building a content-generation workflow.
Application AI engineer
Full-stack engineer who specifically builds AI-integrated features. Knows how to call LLM APIs, structure RAG pipelines, write streaming UIs, evaluate LLM outputs, manage API costs.
Compensation: $200K-350K total comp at small-medium companies; $400K-600K at frontier labs and well-funded AI startups.
When to hire: this is the most common need. If you're shipping AI-shaped product features into a SaaS or consumer product, this is the role.
ML engineer / fine-tuning engineer
Trains models, fine-tunes existing models, manages training infrastructure. Knows PyTorch, distributed training, GPU economics, model architecture.
Compensation: $250K-500K total comp.
When to hire: if you have a specific need to fine-tune custom models. Most companies don't.
Research engineer
Implements novel research, builds new model architectures, runs experiments at frontier scale. PhD-typical, often dual-track with researchers.
Compensation: $400K-1M+ total comp; mostly at frontier labs.
When to hire: if you're a frontier lab. Otherwise, you don't need this role.
AI infrastructure engineer
Manages GPU fleets, inference stacks, training infrastructure. Knows vLLM, SGLang, Kubernetes, distributed serving.
Compensation: $200K-400K total comp.
When to hire: if you self-host LLMs at scale or run training infrastructure.
For most teams, the role you actually need is application AI engineer. The rest of this article focuses on hiring for that role.
What to test for
Five specific skills, in priority order:
1. LLM API fluency
Concrete: can the candidate write code that calls an LLM API, handles streaming, manages errors, structures prompts properly? Do they understand the difference between system prompts, few-shot examples, and tool definitions? Do they know strict-mode JSON output and how to use it?
Test: have them implement a small feature end-to-end (e.g., "build a customer support classifier that calls an LLM and returns a structured JSON result") in a 90-minute pairing session. The right candidate will write reasonable code, handle the error cases, and discuss tradeoffs.
Red flag: candidates who've only used ChatGPT's web interface and have never actually called an API in code.
2. Eval discipline
Concrete: do they know how to evaluate whether an AI feature is working? Can they design a small eval set, run it against multiple models, and report results meaningfully? Do they understand the difference between human eval, LLM-as-judge, and automated metrics?
Test: ask them to walk through how they'd evaluate a hypothetical AI feature. The right candidate has structured ideas about what to measure, how to track regressions, and how to A/B test prompt changes.
Red flag: candidates who think "we'll just look at the outputs and see if they're good."
3. Cost and latency awareness
Concrete: do they understand input vs output token pricing? Per-million economics at production scale? P50 vs P99 latency? Streaming vs non-streaming UX implications?
Test: ask them to estimate the monthly cost of a hypothetical workload (e.g., "10K queries per day, average 5K input + 500 output tokens, what model would you pick and what would it cost?"). The right candidate will think about input/output asymmetry, model tiering, and rough math.
Red flag: candidates who think LLM costs are "negligible" or who can't estimate within an order of magnitude.
4. Production engineering instincts
Concrete: do they understand observability, error handling, rate limiting, retry patterns, security considerations (prompt injection, output sanitization)? Have they shipped AI features to real users and dealt with the operational fallout?
Test: ask about a specific production AI feature they've shipped. Probe on what broke, what they did about it, what they'd do differently.
Red flag: candidates who've only done greenfield work or demos.
5. Model and provider knowledge
Concrete: do they know the current capabilities of major models? Can they explain when they'd pick GPT-5 vs Claude vs Gemini? Do they know the open-weight ecosystem? Are they keeping up with the field?
Test: 15 minutes of casual conversation about recent model releases. The right candidate will have opinions, will know the recent benchmarks, will reference specific tradeoffs they've made.
Red flag: candidates who haven't kept up with the field beyond GPT-4.
What not to test for
Three things that are common interview topics but don't predict success:
Deep ML theory. Application AI engineers don't need to derive backprop. They need to know enough ML to read papers and have an intuition for why models behave as they do. Detailed coursework-style ML quizzing is the wrong filter.
Frontier-lab familiarity. Knowledge of the latest research papers is nice but not necessary. Most AI engineering work is grinding through application-layer challenges (latency, cost, quality, observability), not implementing the latest paper.
Specific framework expertise. LangChain vs LlamaIndex vs raw API calls is a tooling preference, not a competence signal. A good engineer can pick up any of these in a week.
Compensation in 2026
Application AI engineer comp varies wildly by company and location. Rough ranges:
- Bootstrapped / small startup: $150K-220K base + small equity
- Series A-B startup: $180K-280K base + meaningful equity
- Series C+ AI-focused startup: $220K-350K base + strong equity
- Big tech (Google, Microsoft, Meta): $250K-450K total comp
- Frontier labs (OpenAI, Anthropic, Google DeepMind): $400K-700K+ total comp
The market for application AI engineers is genuinely hot. Expect to compete with multiple offers. Senior people can typically find work at >$300K in major tech hubs.
Where to find candidates
Three productive sources:
Public AI work. People who've shipped open-source AI tools, written technical blog posts, or contributed to LangChain/LlamaIndex/etc. are visible signal of engagement with the field. GitHub and Twitter are reasonable hunting grounds.
AI-friendly companies' alumni networks. Engineers who've worked at AI-native startups (Cursor, Anthropic, Cohere, etc.) typically have the right skill mix. Recruiting from them is competitive but they'll know the field.
Adjacent backgrounds. Strong full-stack engineers with curiosity about AI can often skill up rapidly. Don't only hunt for "AI engineers" specifically, hunt for strong engineers who are leaning into the space.
Red flags
Five red flags during interviews:
- Talks only about prompt engineering, never about evals. This is a weak engineer.
- No production deployment experience. Theoretical work doesn't transfer.
- Can't estimate costs within an order of magnitude. Will lose your money.
- Hasn't kept up with model releases beyond GPT-4. Will under-utilize what's available.
- Insists their preferred framework is the only right answer. Inflexibility is a real cost.
Concrete recommendation
If you're hiring an application AI engineer in 2026:
- Be clear about which role you actually need. Don't hire a research engineer when you need an application engineer or vice versa.
- Source candidates from public AI work and adjacent strong engineers. Don't only chase "AI engineer" titles.
- Test for API fluency, eval discipline, cost awareness, production instincts, and field knowledge. In that order.
- Pay competitively. The market is hot; underpaying loses you good candidates.
- Onboard with real AI-shipping work. A new AI engineer should ship a real feature in their first month, not learn theory.
The market is changing fast. Hire for the engineer who can keep up with the field, not the engineer who knew the right answer in 2024.
Further reading
Keep reading
- AI Safety in Production: A Builder's Checklist
Prompt injection, data leakage, hallucination, and the operational practices that keep AI products from blowing up in your face.
- What 'AI-Native' Actually Means in 2026
Every SaaS company claims to be AI-native. Most aren't. Here's how to tell, for hiring, for product strategy, for buying decisions.
- What We Got Wrong Building LLMDex (and What We'd Do Differently)
An honest postmortem from 18 months of building a programmatic SEO site for AI tools. The architectural mistakes, the editorial misjudgments, and what we'd do differently.
Intelligence, distilled weekly.
One short email every Friday, new model launches, leaderboard moves, and pricing drops. Curated by hand. Free, no spam.