Resources · Buyer checklist

How to Tell If an Agency Actually Uses AI Well — or Is Just Charging You for the Buzzword

Last updated: Jun 2026Buyers vetting software agencies who hear 'AI-powered' everywhere and can't tell genuine practice from marketing theatre.

Photo: Markus Winkler / Pexels

The short answer

Ask for specifics, not slogans. An agency that genuinely uses AI well can name its tools, show where a human reviews every AI-generated line, and point to automated testing, version control and small, reviewable changes — the practices DORA (2025) says make AI actually pay off. Buzzword-only shops answer in adjectives. Regulators now fine 'AI washing' (SEC and FTC, 2024), so treat unverifiable AI claims as a red flag, not a feature. The tell is process and proof, not vocabulary.

— Key takeaways

AI maturity is about the practices around the tools, not the tools themselves — Google's DORA research (2025) found AI 'amplifies what's already there', so weak teams get worse, not better.
Adoption is near-universal but trust is falling: 84% of developers use or plan to use AI tools, yet trust in AI accuracy dropped to ~33% (Stack Overflow Developer Survey, 2025).
'AI washing' is real and enforced — the SEC charged firms $400,000 for false AI claims (March 2024) and the FTC ran 'Operation AI Comply' (September 2024). Treat unverifiable claims as a red flag.
The strongest green flag: a human owns the merge. GitHub (2025) and ThoughtWorks (2025) both stress AI output must pass review, testing and static analysis before it ships.
Using AI does not make code cheaper or safer by default — independent tests found large shares of AI-generated code carry security flaws (Veracode 2025; NYU 2021).
Ask vendors to show artifacts — tools named, review pipeline, test coverage, cycle-time data. Answers in adjectives ('cutting-edge', 'AI-driven') with no evidence are the tell.

— Compare your options

How each build option actually uses AI — and whether you can verify it

Option	How AI is typically used	Who reviews the AI output	Verifiable?	Best for
DIY AI / vibe-coding tools	You prompt; AI writes most of the code	You — if you can read it	Fully (you are the reviewer)	Prototypes, throwaway or internal tools
Freelancer	Varies wildly by individual	Usually only them	Hard — ask to see their workflow	Scoped tasks where you can check the output
In-house team	Whatever your team has adopted	Your own engineers	Yes — it's your process to govern	Long-term ownership you control
Agency (buzzword-only)	'AI-powered' in the pitch; unclear in practice	Often unspecified	No — answers in adjectives, not artifacts	Avoid until they can show proof
Agency that's AI-matureus	AI for drafts, tests and docs, inside a review pipeline	A senior engineer owns the merge	Yes — tools, tests, version control, cycle-time	Builds where speed and accountability both matter

What does it mean for an agency to 'use AI well'?

It doesn't mean owning the latest tools — almost everyone does now. Google's DORA programme (the long-running research on software delivery performance) put it bluntly in 2025: AI 'amplifies what's already there.' Drop AI into a team with strong testing, version control and review, and it gets faster. Drop it into a team without those, and it ships problems faster. The tools are a multiplier, not a fix.

So 'uses AI well' is a statement about practice, not purchasing. The mature version is AI for first drafts, test scaffolding, documentation and refactoring suggestions — always inside a pipeline where a human reviews, tests and is accountable for what merges. The opposite is 'vibe coding' (accepting AI output with minimal scrutiny) shipped straight to production. When you evaluate an agency, you're really evaluating the discipline wrapped around the AI, not the AI.

What questions should I ask to test an agency's AI claims?

Make them get specific. Which AI tools do you use, and for which parts of the work? Where in your workflow does a human review AI-generated code, and who signs off before it merges? How do you stop AI from introducing security flaws or duplicated code? Can you show test coverage, or cycle-time data before and after adopting AI? Have you ever rejected AI output — when, and why?

The point isn't any single 'correct' answer; it's whether they answer in artifacts or in adjectives. A genuine team reaches for examples, names tools, and describes a review pipeline without hesitation. A buzzword shop returns to marketing language — 'cutting-edge', 'AI-driven', 'next-generation' — and gets vague when you ask who reviews the output. Specificity under follow-up questions is the single best signal you have.

What are the green flags of genuine AI maturity?

Look for the surrounding engineering practice. DORA's 2025 model ties real AI value to concrete capabilities: a clear, communicated AI policy; healthy access to internal data; strong automated testing; mature version control; small, reviewable batches of change; and a user-centred focus. An agency that can speak to those is using AI inside a system, not as a sticker.

The clearest single flag is human-in-the-loop review. GitHub framed it well in 2025 — 'developers will always own the merge button' — and ThoughtWorks went further, placing 'complacency with AI-generated code' firmly on its do-not-do list and prescribing test-driven development, static analysis and human review. So ask to see the review step. A team that treats AI output as a first draft to be tested and reviewed — never as finished work — is one using it well.

What are the red flags of 'AI washing'?

'AI washing' — overstating AI capability for marketing — is common enough that regulators now act on it. In March 2024 the US SEC charged two firms a combined $400,000 for false AI claims; in September 2024 the FTC ran 'Operation AI Comply', a sweep of actions against deceptive AI marketing, with the blunt line that 'there is no AI exemption from the laws on the books.' If public companies get fined for it, a sales deck certainly isn't above it.

The practical red flags: AI as an adjective with no detail; an inability to name tools or describe where humans review output; AI pitched purely as a discount or a headcount cut with no quality story attached; and 'we let AI write it' with no mention of testing or accountability. None of these are illegal in an agency pitch — but each is a sign you're buying the buzzword, not the capability.

Does an agency using AI mean it's cheaper or safer?

No — and assuming so is how buyers get burned. On cost, AI's measured gains are real on small tasks but evaporate on complex work, so 'we use AI' is not a reason to expect a discount. On safety, the evidence is pointed: Veracode's 2025 review found about 45% of AI-generated code samples failed security tests, and an earlier NYU study found roughly 40% of one tool's generated programs contained vulnerabilities. Stack Overflow's 2025 survey shows developers themselves now trust AI accuracy less, not more (around 33%).

The takeaway is not 'avoid AI' — it's that AI well-used is a quality safeguard, not a price cut. An agency that pitches AI as 'faster and cheaper' with no mention of review and testing is describing the exact path to insecure, throwaway code. One that pitches AI as 'faster, with the same review rigour' understands what it's doing.

When should you NOT hire an agency at all?

If your real need is a prototype, a demo, or an internal tool a few trusted people will use, you may not need an agency — AI tools or a capable freelancer can get you there, and you can review the output yourself. The AI-maturity question only earns its weight once something real depends on the build.

Hold off on an agency, too, when you intend to own the product in-house long term — then you should be hiring and building the practice internally — or when the project is genuinely throwaway and not worth the diligence. Use this checklist when you're commissioning software that has to ship, scale, handle real users or data, and be maintained for years. That's when how well a team actually uses AI — inside real engineering discipline — starts to decide your outcome.

— FAQ

Questions buyers ask before they decide.

QHow can I tell if an agency really uses AI or is just marketing it?

Ask for specifics and watch whether they answer in artifacts or adjectives. A genuine team can name its tools, describe exactly where a human reviews AI-generated code, and point to testing, version control and cycle-time data. A buzzword-only shop returns to language like 'AI-driven' and gets vague about who reviews the output. Specificity under follow-up questions is the clearest signal.

QWhat questions should I ask an agency about its AI use?

Five good ones: Which AI tools do you use, and for what? Where does a human review AI-generated code before it merges? How do you prevent AI-introduced security flaws or duplicated code? Can you show test coverage or before/after cycle-time? Have you ever rejected AI output, and why? The answers matter less than whether they're concrete and confident or vague and promotional.

QDoes an agency using AI mean it will be cheaper or faster?

Not by default. AI speeds isolated, low-risk tasks, but independent research (METR, 2025) found it can slow experienced engineers on complex work, and DORA found heavier AI use didn't reliably improve delivery. 'We use AI' is not a reason to expect a discount. Well-used AI buys speed on bounded tasks while preserving review rigour — it isn't an automatic price cut.

QIs AI-generated code safe to ship without review?

No. Veracode's 2025 study found about 45% of AI-generated code samples failed security tests, and an earlier NYU study found roughly 40% of one tool's generated programs were vulnerable. AI code should be treated as a first draft that still needs testing, static analysis and human review. An agency that ships AI output unreviewed is the risk you're trying to avoid.

QWhat are the red flags of 'AI washing' in a dev agency?

AI used only as an adjective with no detail; inability to name tools or describe a human review step; AI pitched purely as a discount or headcount cut with no quality story; and 'we let AI write it' with no mention of testing or accountability. AI washing is common enough that the SEC (March 2024) and FTC (September 2024) have taken enforcement action against exaggerated AI claims.

QWhen should I not hire an agency at all?

When you only need a prototype, demo or internal tool you can review yourself, AI tools or a freelancer are usually enough. Skip an agency too if you plan to own the product in-house long term, or if the work is genuinely throwaway. The AI-maturity question matters most when you're building something real — with users, revenue or data — that must scale and be maintained.

— Keep exploring

— Sources

U.S. SEC — Press Release 2024-36: charges against two advisers for false AI claims ($400k total, Mar 2024) · accessed Jun 2026
U.S. FTC — 'Operation AI Comply' crackdown on deceptive AI claims (Sep 2024) · accessed Jun 2026
Stack Overflow Developer Survey 2025 — AI adoption (84%) and falling trust (~33%) · accessed Jun 2026
Google DORA — 2025 report and AI Capabilities Model (practices that make AI pay off) · accessed Jun 2026
METR — early-2025 AI impact on experienced developers (−19%, Jul 2025) · accessed Jun 2026
ThoughtWorks Technology Radar Vol. 33 — 'Complacency with AI-generated code' on Hold (Nov 2025) · accessed Jun 2026
GitHub — Code review in the age of AI: developers will always own the merge button (2025) · accessed Jun 2026
Veracode — 2025 GenAI Code Security Report (~45% of AI code failed security tests) · accessed Jun 2026
Pearce et al., 'Asleep at the Keyboard?' (NYU) — ~40% of Copilot programs vulnerable (2021) · accessed Jun 2026
Hero image by Markus Winkler (Pexels License) · accessed Jun 2026

— Get a straight answer

Tell us what you're building. We'll tell you honestly.

Whether you need a full team, a few senior engineers, or just a sounding board for your AI-built prototype — a short call will tell you which.

— WHEREVER YOU ARE

hello@indianic.comWhatsApp Chat

RESPONSE TIME

< 4 hours

NDA

On request

FREE POC

3 – 5 days