← all field notes
№ 52 ops · 12 min read

Hiring engineers in the age of AI

A concrete hiring rubric and 14 interview questions for engineers in 2026 — evaluating architectural judgment, AI-assisted execution, and culture, not raw coding ability.


Your engineering interview loop was designed for a world where the bottleneck was writing code. The best engineers in 2026 are not the fastest coders — they’re the ones with the best judgment about what to build, how to direct AI toward the right outcome, and when to override what it produced.

The job title still says “engineer,” but the role has fractured. The person you actually need is part architect, part product thinker, part operator, and part team multiplier — designs systems before touching a keyboard, kills bad ideas before they become features, plans for the 3am page before it fires, makes the people around them better. AI writes the code. The human decides whether the code should exist at all.

Augment Code put it well in their hiring framework: the human role has shifted from author to architect and editor. You define intent, make trade-off decisions, set guardrails, and serve as the last line of quality. Raw coding ability no longer separates exceptional engineers from competent ones — regardless of whether they’re writing backends, frontends, data pipelines, or infrastructure.

CoderPad’s 2026 State of Tech Hiring confirms this from the demand side — technical assessments are up 48% globally, and 82% of developers find AI at least somewhat useful. Companies leading in AI are hiring more engineers, not fewer. The bottleneck isn’t code generation. It’s judgment, systems thinking, and the ability to work alongside AI without losing control of the outcome.

If your interview loop still centers on “can this person write a balanced binary tree on a whiteboard,” you’re selecting for a skill that AI does better than any human. Here’s what to select for instead.

The rubric

Six dimensions, each with sub-criteria you score independently. 3 = strong, 2 = adequate, 1 = weak. Open the scorecard (Google Sheet).

1. Architecture & design

Can the candidate think at the level of systems, not just functions? These apply whether someone is building APIs, frontends, data pipelines, or infrastructure.

  • Trade-off analysis under ambiguity — can they explain why one approach beats another in terms the business cares about, not just technical preference?
  • Failure mode awareness — do they design for the 3am outage, or only the happy path?
  • Scale reasoning — do they ask about constraints (current bottlenecks, data patterns, cost) before proposing solutions?
  • API and contract design — can they design interfaces that other teams can integrate safely?
  • Data modeling and storage decisions — do they choose the right storage for the access pattern, not just the storage they know?

2. AI-assisted execution

Can the candidate direct AI tools toward the right outcome — and catch it when the output is wrong?

  • Directs AI toward well-defined subtasks — uses AI for implementation while keeping control of the overall design. Treats AI as a tool, not an oracle.
  • Critically reviews and debugs AI output — can explain why AI-generated code is or isn’t correct. This is the single strongest signal.
  • Clear boundary between AI-delegated and human-owned work — knows which tasks to hand off and which to protect. No boundary means no judgment.
  • Understands AI limitations in production contexts — knows where models hallucinate, where context windows break, where AI-generated code introduces security or concurrency risks.

3. Systems & operations

Does the candidate understand what makes code production-worthy?

  • Observability — thinks about monitoring, alerting, and SLOs from day one. Monitors what the user cares about, not what the default dashboard provides.
  • Debugging methodology — has a systematic approach: logs, metrics, traces, reproduction. Asks what changed recently before guessing.
  • Cost awareness — considers compute, storage, and API call costs as design constraints, not afterthoughts.
  • Incident response and on-call mindset — has a mental model for what happens after the code is merged. Designs for operability.

4. Product & problem selection

Does the candidate solve the right problems, or just the ones assigned?

  • Asks about users and business context before building — wants to know the success metric, not just the spec.
  • Pushes back on problem framing when appropriate — would rather solve a simpler problem well than an interesting problem poorly.
  • Chooses simple solutions over interesting ones — the boring choice that saves the team real time or pain.

5. Learning velocity

How fast does the candidate adapt when tools and practices change underneath them?

  • Has adopted and discarded tools based on results — not just tried things, but evaluated and made deliberate choices.
  • Can articulate what changed in their workflow and why — adoption without displacement usually means the tool isn’t actually being used.
  • Stays current without hype-chasing — runs personal experiments but doesn’t chase every new release.

6. Culture & collaboration

AI amplifies individual output — but a 10x individual who can’t collaborate is a net negative. Score this separately from technical ability.

  • Curiosity — asks questions, explores unfamiliar territory, wants to understand the system beyond their immediate scope.
  • Navigates disagreement constructively — describes what they learned from technical disagreements, not just how they won them.
  • Communicates technical decisions to non-engineers — can explain trade-offs in terms a PM or exec can act on.
  • Shares context on AI-generated code with the team — flags AI-generated sections in PRs, documents reasoning, takes responsibility for the review burden.
  • Mentors or lifts others — impact isn’t just individual output. Improves the people around them.
  • Holds opinions loosely — has strong technical convictions but adapts to team norms and new information.

The questions

Fourteen questions, grouped by what they assess. Each one includes what you’re listening for and the red flag.

Architecture & design

1. “Walk me through a system you designed that had to handle a non-obvious failure mode. What was the failure, and how did your design account for it?”

Listen for: specifics about the failure mode, the trade-off they made, and whether they designed for it proactively or reactively. Strong candidates talk about the failure before it happened.

Red flag: only discusses happy-path architecture.

2. “You’re designing a service that needs to process 10x its current load within 6 months. Where do you start?”

Listen for: questions before answers. Strong candidates ask about current bottlenecks, data access patterns, and cost constraints before proposing solutions. They think about what not to change as much as what to change.

Red flag: immediately proposes a technology (“just use Kafka”) without understanding the constraints.

3. “Tell me about a time you chose a boring solution over an interesting one. Why?”

Listen for: a clear trade-off between technical elegance and operational simplicity. The best answer is a story where the boring choice saved the team real time or pain.

Red flag: can’t think of one. Engineers who always choose the interesting solution are expensive to operate alongside.

AI-assisted execution

4. “Walk me through your daily workflow. Where does AI fit in, and where doesn’t it?”

Listen for: specificity. Strong candidates can name exact tools, describe which tasks they delegate to AI, and — critically — which tasks they don’t. The boundary matters more than the tools.

Red flag: vague answers like “I use Copilot for everything.” No boundary means no judgment.

5. “Tell me about a time AI-generated code was wrong in a way that wasn’t obvious. How did you catch it?”

Listen for: a real story with a real failure mode. Strong candidates describe a subtle bug — a race condition, a missed edge case, a security hole — that they caught through review, testing, or domain knowledge. This is the single strongest signal for AI-assisted work: the ability to be the quality backstop.

Red flag: “that hasn’t really happened to me.” It has. They just didn’t notice.

6. “If I gave you a new codebase you’ve never seen and asked you to add a feature using AI tools, how would you approach it?”

Listen for: a process that starts with understanding, not prompting. Strong candidates talk about reading the existing code, understanding the architecture, then using AI for implementation — not pasting requirements into a chat window and hoping.

Red flag: leads with “I’d prompt the AI to…” without any mention of understanding the system first.

Systems & operations

7. “Your service is throwing 500s for 2% of requests. Walk me through your debugging process.”

Listen for: a systematic approach — logs, metrics, traces, reproduction. Strong candidates ask what changed recently, check deployment history, and think about partial failures. Let them use AI tools during the discussion. Watch whether they use AI to accelerate diagnosis or to replace thinking.

Red flag: guesses without data. “It’s probably the database” is not a debugging process.

8. “How do you decide what to monitor in a new service?”

Listen for: a framework, not a checklist. Strong candidates think about SLOs, user-facing metrics, and leading indicators of failure — not just CPU and memory. They monitor the thing the user cares about, not the thing the infrastructure provides by default.

Red flag: “whatever the default dashboard gives us.”

Product & problem selection

9. “Tell me about a feature you decided not to build. What was the reasoning?”

Listen for: business judgment. The feature was technically feasible but wrong for the user, the timeline, or the system’s current maturity. Strong candidates kill their own ideas.

Red flag: can’t think of a feature they chose not to build. This suggests they build whatever’s asked without filtering.

10. “A PM asks you to add a feature that you think is a bad idea. What do you do?”

Listen for: pushback with evidence, not ego. Strong candidates describe how they’d frame the concern — data, user impact, operational cost — and what they’d do if overruled. The answer to “what if they still want it” matters as much as the initial pushback.

Red flag: “I’d just build it, they’re the PM.” Or the opposite: “I’d refuse.” Neither is a real answer.

Learning velocity

11. “What AI tool or workflow did you adopt in the last 6 months that meaningfully changed how you work? What did it replace?”

Listen for: a concrete change in behavior, not a tool name. Strong candidates describe what they stopped doing when they adopted the new thing. Adoption without displacement usually means the tool isn’t actually being used.

Red flag: names a tool but can’t describe the workflow change. “I started using Cursor” is not an answer. “I stopped writing boilerplate tests by hand because Cursor generates them and I review them” is.

12. “What’s something AI tools are bad at today that people assume they’re good at?”

Listen for: nuance and firsthand experience. Strong candidates have bumped into real limitations — context window issues, hallucinated APIs, broken concurrency patterns — and can describe them specifically. This question reveals whether someone has used AI tools enough to know where they break.

Red flag: “AI is pretty good at everything now.” It is not.

Culture & collaboration

13. “Tell me about a technical disagreement you had with a teammate. How did it resolve, and would you do anything differently?”

Listen for: respect for the other person’s view, willingness to change their mind, and a resolution that wasn’t just “I was right.” Strong candidates describe what they learned, not just how they won.

Red flag: every story ends with them being right. Or they can’t recall a disagreement — which means they either avoid conflict or don’t notice it.

14. “How do you share context with your team when you’ve used AI to generate a significant chunk of code?”

Listen for: a process — PR descriptions that call out AI-generated sections, documentation of the reasoning, flagging areas that need extra review. Strong candidates recognize that AI-generated code shifts the review burden to the team and own that.

Red flag: “I just push it like any other code.” That’s how AI-generated bugs become the team’s problem.

Red flags to watch across the loop

A few patterns that span the whole loop:

Tool fluency without technical depth. The candidate is fast with AI tools but can’t explain the code they produce. This is the most dangerous hire in 2026 — someone who ships faster but embeds hidden defects. Canva’s interview team found the same thing: the best candidates don’t just prompt — they ask clarifying questions, use AI for well-defined subtasks, and critically review output.

No boundary between AI tasks and human tasks. Every engineer needs a clear mental model of what they delegate and what they don’t. If someone uses AI for everything indiscriminately, they’ve abdicated judgment.

Resistance to AI as identity. Some candidates treat not using AI as a badge. In 2026, this is like refusing to use an IDE. It doesn’t demonstrate skill — it demonstrates rigidity.

Can’t debug AI output. If you ask them to review AI-generated code and they can’t identify issues, they’re a liability in a codebase where AI writes the first draft.

The heuristic

You’re not hiring a coder anymore. The engineer you need in 2026 is an architect who designs systems before writing them, a product thinker who kills bad ideas before they become tickets, an operator who plans for the 3am page, and a teammate who makes everyone around them better. AI handles the typing. You’re hiring for everything else. Interview for judgment, not keystrokes — the candidate who pushes back on AI output is worth more than the one who prompts their way to a solution.

tl;dr

The pattern. Engineering interviews still optimize for raw coding ability — a skill AI now handles — while ignoring the architect, product thinker, operator, and collaborator the role actually demands. The fix. Score candidates across six dimensions (architecture, AI-assisted execution, systems & ops, product taste, learning velocity, culture) with specific sub-criteria, and use questions that reveal how they direct, evaluate, and override AI output. The outcome. You hire engineers who think like architects, challenge like PMs, operate like SREs, and use AI as a tool — not engineers who either ignore it or depend on it blindly.

// co-written with ai · edited by humans

← all field notes Start a retainer →
// related notes